- edited description
Masking partcle with zeros
Hi, with Relion 3.0 beta 2 every time i say No to the Mask Individual Partciles with zero in 2DClasses, 3D classes or refinement, relion crashes. see below. however if i say Yes it runs like a charm.
in: /home/bio21em1/relion-3.0_beta/src/acc/cuda/cuda_settings.h, line 81 in: /home/bio21em1/relion-3.0_beta/src/acc/cuda/cuda_settings.h, line 81 slave 3 encountered error: === Backtrace === /usr/local/relion3/bin/relion_refine_mpi(_ZN11RelionErrorC1ERKSsS1_l+0x41) [0x43eaf1] /usr/local/relion3/bin/relion_refine_mpi(_Z36globalThreadExpectationSomeParticlesR14ThreadArgument+0xe9) [0x523b59] /usr/local/relion3/bin/relion_refine_mpi(_Z11_threadMainPv+0x3f) [0x52d50f] /lib64/libpthread.so.0() [0x375c607aa1] /lib64/libc.so.6(clone+0x6d) [0x375bee8bcd] ================== ERROR:
A GPU-function failed to execute.
If this occured at the start of a run, you might have GPUs which are incompatible with either the data or your installation of relion. If you
-> INSTALLED RELION YOURSELF: if you e.g. specified -DCUDA_ARCH=50
and are trying ot run on a compute 3.5 GPU (-DCUDA_ARCH=3.5),
this may happen.
-> HAVE MULTIPLE GPUS OF DIFFERNT VERSIONS: relion needs GPUS with
at least compute 3.5. You may be trying to use a GPU older than
this. If you have multiple generations, try specifying --gpu <X>
with X=0. Then try X=1 in a new run, and so on. The numbering of
GPUs may not be obvious from the driver or intuition. For a list
of GPU compute generations, see
en.wikipedia.org/wiki/CUDA#Version_features_and_specifications
-> ARE USING DOUBLE-PRECISION GPU CODE: relion was been written so
as to not require this, and may thus have unforeseen requirements
when run in this mode. If you think it is nonetheless necessary,
please consult the developers with this error.
If this occurred at the middle or end of a run, it might be that
-> YOUR DATA OR PARAMETERS WERE UNEXPECTED: execution on GPUs is
subject to many restrictions, and relion is written to work within
common restraints. If you have exotic data or settings, unexpected
configurations may occur. See also above point regarding
double precision.
If none of the above applies, please report the error to the relion developers at github.com/3dem/relion/issues
in: /home/bio21em1/relion-3.0_beta/src/acc/cuda/cuda_settings.h, line 81 in: /home/bio21em1/relion-3.0_beta/src/acc/cuda/cuda_settings.h, line 81 slave 1 encountered error: === Backtrace === /usr/local/relion3/bin/relion_refine_mpi(_ZN11RelionErrorC1ERKSsS1_l+0x41) [0x43eaf1] /usr/local/relion3/bin/relion_refine_mpi(_Z36globalThreadExpectationSomeParticlesR14ThreadArgument+0xe9) [0x523b59] /usr/local/relion3/bin/relion_refine_mpi(_Z11_threadMainPv+0x3f) [0x52d50f] /lib64/libpthread.so.0() [0x375c607aa1] /lib64/libc.so.6(clone+0x6d) [0x375bee8bcd] ================== ERROR:
A GPU-function failed to execute.
If this occured at the start of a run, you might have GPUs which are incompatible with either the data or your installation of relion. If you
-> INSTALLED RELION YOURSELF: if you e.g. specified -DCUDA_ARCH=50
and are trying ot run on a compute 3.5 GPU (-DCUDA_ARCH=3.5),
this may happen.
-> HAVE MULTIPLE GPUS OF DIFFERNT VERSIONS: relion needs GPUS with
at least compute 3.5. You may be trying to use a GPU older than
this. If you have multiple generations, try specifying --gpu <X>
with X=0. Then try X=1 in a new run, and so on. The numbering of
GPUs may not be obvious from the driver or intuition. For a list
of GPU compute generations, see
en.wikipedia.org/wiki/CUDA#Version_features_and_specifications
-> ARE USING DOUBLE-PRECISION GPU CODE: relion was been written so
as to not require this, and may thus have unforeseen requirements
when run in this mode. If you think it is nonetheless necessary,
please consult the developers with this error.
If this occurred at the middle or end of a run, it might be that
-> YOUR DATA OR PARAMETERS WERE UNEXPECTED: execution on GPUs is
subject to many restrictions, and relion is written to work within
common restraints. If you have exotic data or settings, unexpected
configurations may occur. See also above point regarding
double precision.
If none of the above applies, please report the error to the relion developers at github.com/3dem/relion/issues
in: /home/bio21em1/relion-3.0_beta/src/acc/cuda/cuda_settings.h, line 81 in: /home/bio21em1/relion-3.0_beta/src/acc/cuda/cuda_settings.h, line 81 slave 2 encountered error: === Backtrace === /usr/local/relion3/bin/relion_refine_mpi(_ZN11RelionErrorC1ERKSsS1_l+0x41) [0x43eaf1] /usr/local/relion3/bin/relion_refine_mpi(_Z36globalThreadExpectationSomeParticlesR14ThreadArgument+0xe9) [0x523b59] /usr/local/relion3/bin/relion_refine_mpi(_Z11_threadMainPv+0x3f) [0x52d50f] /lib64/libpthread.so.0() [0x375c607aa1] /lib64/libc.so.6(clone+0x6d) [0x375bee8bcd] ================== ERROR:
A GPU-function failed to execute.
If this occured at the start of a run, you might have GPUs which are incompatible with either the data or your installation of relion. If you
-> INSTALLED RELION YOURSELF: if you e.g. specified -DCUDA_ARCH=50
and are trying ot run on a compute 3.5 GPU (-DCUDA_ARCH=3.5),
this may happen.
-> HAVE MULTIPLE GPUS OF DIFFERNT VERSIONS: relion needs GPUS with
at least compute 3.5. You may be trying to use a GPU older than
this. If you have multiple generations, try specifying --gpu <X>
with X=0. Then try X=1 in a new run, and so on. The numbering of
GPUs may not be obvious from the driver or intuition. For a list
of GPU compute generations, see
en.wikipedia.org/wiki/CUDA#Version_features_and_specifications
-> ARE USING DOUBLE-PRECISION GPU CODE: relion was been written so
as to not require this, and may thus have unforeseen requirements
when run in this mode. If you think it is nonetheless necessary,
please consult the developers with this error.
If this occurred at the middle or end of a run, it might be that
-> YOUR DATA OR PARAMETERS WERE UNEXPECTED: execution on GPUs is
subject to many restrictions, and relion is written to work within
common restraints. If you have exotic data or settings, unexpected
configurations may occur. See also above point regarding
double precision.
If none of the above applies, please report the error to the relion developers at github.com/3dem/relion/issues
in: /home/bio21em1/relion-3.0_beta/src/acc/cuda/cuda_settings.h, line 81 in: /home/bio21em1/relion-3.0_beta/src/acc/cuda/cuda_settings.h, line 81 slave 4 encountered error: === Backtrace === /usr/local/relion3/bin/relion_refine_mpi(_ZN11RelionErrorC1ERKSsS1_l+0x41) [0x43eaf1] /usr/local/relion3/bin/relion_refine_mpi(_Z36globalThreadExpectationSomeParticlesR14ThreadArgument+0xe9) [0x523b59] /usr/local/relion3/bin/relion_refine_mpi(_Z11_threadMainPv+0x3f) [0x52d50f] /lib64/libpthread.so.0() [0x375c607aa1] /lib64/libc.so.6(clone+0x6d) [0x375bee8bcd] ================== ERROR:
A GPU-function failed to execute.
If this occured at the start of a run, you might have GPUs which are incompatible with either the data or your installation of relion. If you
-> INSTALLED RELION YOURSELF: if you e.g. specified -DCUDA_ARCH=50
and are trying ot run on a compute 3.5 GPU (-DCUDA_ARCH=3.5),
this may happen.
-> HAVE MULTIPLE GPUS OF DIFFERNT VERSIONS: relion needs GPUS with
at least compute 3.5. You may be trying to use a GPU older than
this. If you have multiple generations, try specifying --gpu <X>
with X=0. Then try X=1 in a new run, and so on. The numbering of
GPUs may not be obvious from the driver or intuition. For a list
of GPU compute generations, see
en.wikipedia.org/wiki/CUDA#Version_features_and_specifications
-> ARE USING DOUBLE-PRECISION GPU CODE: relion was been written so
as to not require this, and may thus have unforeseen requirements
when run in this mode. If you think it is nonetheless necessary,
please consult the developers with this error.
If this occurred at the middle or end of a run, it might be that
-> YOUR DATA OR PARAMETERS WERE UNEXPECTED: execution on GPUs is
subject to many restrictions, and relion is written to work within
common restraints. If you have exotic data or settings, unexpected
configurations may occur. See also above point regarding
double precision.
If none of the above applies, please report the error to the relion developers at github.com/3dem/relion/issues
[odin:288157] 3 more processes have sent help message help-mpi-api.txt / mpi-abort [odin:288157] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
Comments (10)
-
reporter -
Thank you very much for your report. Unfortunately, I could not reproduce your problem. I tried Class2D and Refine3D of our tutorial dataset and both are running fine. Can you provide more details?
- What is your GPU?
- What is the box size?
- What is the full command?
- Which version of CUDA and gcc are you using?
- Does it crash immediately, or after several iterations?
-
reporter the GPUs are Ge Force GTX1080 with 12GB of meemory. i have 4 of them. the box size is 344, but ihave tried with down to 90 and it is the same. i am running from the GUI job_type == 8 is_continue == false Ignore CTFs until first peak? == No Have data been phase-flipped? == No Do bimodal angular searches? == Yes Combine iterations through disc? == No Do CTF-correction? == Yes Use fast subsets (for large data sets)? == No Classify 2D helical segments? == No Use parallel disc I/O? == Yes Pre-read all particles into RAM? == Yes Submit to queue? == No Mask individual particles with zeros? == No Perform image alignment? == Yes Continue from here: == Class2D/job011/run_ct10_it011_optimiser.star Input images STAR file: == JoinStar/job679/join_particles.star Which GPUs to use: == Tube diameter (A): == 200 Limit resolution E-step to (A): == -1 Minimum dedicated cores per node: == 1 Number of classes: == 50 Number of iterations: == 25 Number of MPI procs: == 5 Number of pooled particles: == 20 Number of threads: == 1 Offset search range (pix): == 5 Offset search step (pix): == 1 Additional arguments: == Mask diameter (A): == 120 In-plane angular sampling: == 5 Queue submit command: == qsub Standard submission script: == /usr/local/bin/qsub.csh Queue name: == openmpi Angular search range - psi (deg): == 6 Copy particles to scratch directory: == Regularisation parameter T: == 2 Use GPU acceleration? == Yes
i have CUDA8.0, v8.0.44 and gcc 4.8.2
it crashes immediatly. i dont have that issue with Relion 2.1/
-
-
assigned issue to
-
assigned issue to
-
@bforsbe Can you look at this problem?
-
Is this a helical dataset?
-
reporter No just standard.
-
Perhaps I then misunderstand, but why are you using
Do bimodal angular searches? == Yes
Could I also ask you to decrease your translation search a bit? Range 5 and step 1 will give quite many translations, and you might be falling under the last case in the original error message:
-> YOUR DATA OR PARAMETERS WERE UNEXPECTED: execution on GPUs is subject to many restrictions, and relion is written to work within common restraints. If you have exotic data or settings, unexpected configurations may occur.
-
reporter hi Bjoern, i am using the GUI and have no possibility of choosing the bimodal angular search parameters. or is this called something else in the GUI? i will change the Range and step to see what happens. but i have never had any issues with these parameters on the same dataset in RElion2.1 Cheers eric
-
Bimodals had nothing to do with it, indeed. It's just set to yes by default, and presumably not used in non-helical runs.
Hi can't reproduce your fault though. I tried a few datasets and noise-masking works fine for me.
I suspect it's something incompatible with the cuda version, your runtime, the you driver, and/or the gpu. As far as I know we've kept backwards compatibility, so a driver update would be my first suggestion, in case something changed in how RNGs are executed.
- Log in to comment