Masking partcle with zeros

Issue #17 new
Eric Hanssen created an issue

Hi, with Relion 3.0 beta 2 every time i say No to the Mask Individual Partciles with zero in 2DClasses, 3D classes or refinement, relion crashes. see below. however if i say Yes it runs like a charm.

in: /home/bio21em1/relion-3.0_beta/src/acc/cuda/cuda_settings.h, line 81 in: /home/bio21em1/relion-3.0_beta/src/acc/cuda/cuda_settings.h, line 81 slave 3 encountered error: === Backtrace === /usr/local/relion3/bin/relion_refine_mpi(_ZN11RelionErrorC1ERKSsS1_l+0x41) [0x43eaf1] /usr/local/relion3/bin/relion_refine_mpi(_Z36globalThreadExpectationSomeParticlesR14ThreadArgument+0xe9) [0x523b59] /usr/local/relion3/bin/relion_refine_mpi(_Z11_threadMainPv+0x3f) [0x52d50f] /lib64/libpthread.so.0() [0x375c607aa1] /lib64/libc.so.6(clone+0x6d) [0x375bee8bcd] ================== ERROR:

A GPU-function failed to execute.

If this occured at the start of a run, you might have GPUs which are incompatible with either the data or your installation of relion. If you

-> INSTALLED RELION YOURSELF: if you e.g. specified -DCUDA_ARCH=50
   and are trying ot run on a compute 3.5 GPU (-DCUDA_ARCH=3.5), 
   this may happen.

-> HAVE MULTIPLE GPUS OF DIFFERNT VERSIONS: relion needs GPUS with
   at least compute 3.5. You may be trying to use a GPU older than
   this. If you have multiple generations, try specifying --gpu <X>
   with X=0. Then try X=1 in a new run, and so on. The numbering of
   GPUs may not be obvious from the driver or intuition. For a list
   of GPU compute generations, see

   en.wikipedia.org/wiki/CUDA#Version_features_and_specifications

-> ARE USING DOUBLE-PRECISION GPU CODE: relion was been written so
   as to not require this, and may thus have unforeseen requirements
   when run in this mode. If you think it is nonetheless necessary,
   please consult the developers with this error.

If this occurred at the middle or end of a run, it might be that

-> YOUR DATA OR PARAMETERS WERE UNEXPECTED: execution on GPUs is 
   subject to many restrictions, and relion is written to work within
   common restraints. If you have exotic data or settings, unexpected
   configurations may occur. See also above point regarding 
   double precision.

If none of the above applies, please report the error to the relion developers at github.com/3dem/relion/issues

in: /home/bio21em1/relion-3.0_beta/src/acc/cuda/cuda_settings.h, line 81 in: /home/bio21em1/relion-3.0_beta/src/acc/cuda/cuda_settings.h, line 81 slave 1 encountered error: === Backtrace === /usr/local/relion3/bin/relion_refine_mpi(_ZN11RelionErrorC1ERKSsS1_l+0x41) [0x43eaf1] /usr/local/relion3/bin/relion_refine_mpi(_Z36globalThreadExpectationSomeParticlesR14ThreadArgument+0xe9) [0x523b59] /usr/local/relion3/bin/relion_refine_mpi(_Z11_threadMainPv+0x3f) [0x52d50f] /lib64/libpthread.so.0() [0x375c607aa1] /lib64/libc.so.6(clone+0x6d) [0x375bee8bcd] ================== ERROR:

A GPU-function failed to execute.

If this occured at the start of a run, you might have GPUs which are incompatible with either the data or your installation of relion. If you

-> INSTALLED RELION YOURSELF: if you e.g. specified -DCUDA_ARCH=50
   and are trying ot run on a compute 3.5 GPU (-DCUDA_ARCH=3.5), 
   this may happen.

-> HAVE MULTIPLE GPUS OF DIFFERNT VERSIONS: relion needs GPUS with
   at least compute 3.5. You may be trying to use a GPU older than
   this. If you have multiple generations, try specifying --gpu <X>
   with X=0. Then try X=1 in a new run, and so on. The numbering of
   GPUs may not be obvious from the driver or intuition. For a list
   of GPU compute generations, see

   en.wikipedia.org/wiki/CUDA#Version_features_and_specifications

-> ARE USING DOUBLE-PRECISION GPU CODE: relion was been written so
   as to not require this, and may thus have unforeseen requirements
   when run in this mode. If you think it is nonetheless necessary,
   please consult the developers with this error.

If this occurred at the middle or end of a run, it might be that

-> YOUR DATA OR PARAMETERS WERE UNEXPECTED: execution on GPUs is 
   subject to many restrictions, and relion is written to work within
   common restraints. If you have exotic data or settings, unexpected
   configurations may occur. See also above point regarding 
   double precision.

If none of the above applies, please report the error to the relion developers at github.com/3dem/relion/issues

in: /home/bio21em1/relion-3.0_beta/src/acc/cuda/cuda_settings.h, line 81 in: /home/bio21em1/relion-3.0_beta/src/acc/cuda/cuda_settings.h, line 81 slave 2 encountered error: === Backtrace === /usr/local/relion3/bin/relion_refine_mpi(_ZN11RelionErrorC1ERKSsS1_l+0x41) [0x43eaf1] /usr/local/relion3/bin/relion_refine_mpi(_Z36globalThreadExpectationSomeParticlesR14ThreadArgument+0xe9) [0x523b59] /usr/local/relion3/bin/relion_refine_mpi(_Z11_threadMainPv+0x3f) [0x52d50f] /lib64/libpthread.so.0() [0x375c607aa1] /lib64/libc.so.6(clone+0x6d) [0x375bee8bcd] ================== ERROR:

A GPU-function failed to execute.

If this occured at the start of a run, you might have GPUs which are incompatible with either the data or your installation of relion. If you

-> INSTALLED RELION YOURSELF: if you e.g. specified -DCUDA_ARCH=50
   and are trying ot run on a compute 3.5 GPU (-DCUDA_ARCH=3.5), 
   this may happen.

-> HAVE MULTIPLE GPUS OF DIFFERNT VERSIONS: relion needs GPUS with
   at least compute 3.5. You may be trying to use a GPU older than
   this. If you have multiple generations, try specifying --gpu <X>
   with X=0. Then try X=1 in a new run, and so on. The numbering of
   GPUs may not be obvious from the driver or intuition. For a list
   of GPU compute generations, see

   en.wikipedia.org/wiki/CUDA#Version_features_and_specifications

-> ARE USING DOUBLE-PRECISION GPU CODE: relion was been written so
   as to not require this, and may thus have unforeseen requirements
   when run in this mode. If you think it is nonetheless necessary,
   please consult the developers with this error.

If this occurred at the middle or end of a run, it might be that

-> YOUR DATA OR PARAMETERS WERE UNEXPECTED: execution on GPUs is 
   subject to many restrictions, and relion is written to work within
   common restraints. If you have exotic data or settings, unexpected
   configurations may occur. See also above point regarding 
   double precision.

If none of the above applies, please report the error to the relion developers at github.com/3dem/relion/issues

in: /home/bio21em1/relion-3.0_beta/src/acc/cuda/cuda_settings.h, line 81 in: /home/bio21em1/relion-3.0_beta/src/acc/cuda/cuda_settings.h, line 81 slave 4 encountered error: === Backtrace === /usr/local/relion3/bin/relion_refine_mpi(_ZN11RelionErrorC1ERKSsS1_l+0x41) [0x43eaf1] /usr/local/relion3/bin/relion_refine_mpi(_Z36globalThreadExpectationSomeParticlesR14ThreadArgument+0xe9) [0x523b59] /usr/local/relion3/bin/relion_refine_mpi(_Z11_threadMainPv+0x3f) [0x52d50f] /lib64/libpthread.so.0() [0x375c607aa1] /lib64/libc.so.6(clone+0x6d) [0x375bee8bcd] ================== ERROR:

A GPU-function failed to execute.

If this occured at the start of a run, you might have GPUs which are incompatible with either the data or your installation of relion. If you

-> INSTALLED RELION YOURSELF: if you e.g. specified -DCUDA_ARCH=50
   and are trying ot run on a compute 3.5 GPU (-DCUDA_ARCH=3.5), 
   this may happen.

-> HAVE MULTIPLE GPUS OF DIFFERNT VERSIONS: relion needs GPUS with
   at least compute 3.5. You may be trying to use a GPU older than
   this. If you have multiple generations, try specifying --gpu <X>
   with X=0. Then try X=1 in a new run, and so on. The numbering of
   GPUs may not be obvious from the driver or intuition. For a list
   of GPU compute generations, see

   en.wikipedia.org/wiki/CUDA#Version_features_and_specifications

-> ARE USING DOUBLE-PRECISION GPU CODE: relion was been written so
   as to not require this, and may thus have unforeseen requirements
   when run in this mode. If you think it is nonetheless necessary,
   please consult the developers with this error.

If this occurred at the middle or end of a run, it might be that

-> YOUR DATA OR PARAMETERS WERE UNEXPECTED: execution on GPUs is 
   subject to many restrictions, and relion is written to work within
   common restraints. If you have exotic data or settings, unexpected
   configurations may occur. See also above point regarding 
   double precision.

If none of the above applies, please report the error to the relion developers at github.com/3dem/relion/issues

[odin:288157] 3 more processes have sent help message help-mpi-api.txt / mpi-abort [odin:288157] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages

Comments (10)

  1. Takanori Nakane

    Thank you very much for your report. Unfortunately, I could not reproduce your problem. I tried Class2D and Refine3D of our tutorial dataset and both are running fine. Can you provide more details?

    • What is your GPU?
    • What is the box size?
    • What is the full command?
    • Which version of CUDA and gcc are you using?
    • Does it crash immediately, or after several iterations?
  2. Eric Hanssen reporter

    the GPUs are Ge Force GTX1080 with 12GB of meemory. i have 4 of them. the box size is 344, but ihave tried with down to 90 and it is the same. i am running from the GUI job_type == 8 is_continue == false Ignore CTFs until first peak? == No Have data been phase-flipped? == No Do bimodal angular searches? == Yes Combine iterations through disc? == No Do CTF-correction? == Yes Use fast subsets (for large data sets)? == No Classify 2D helical segments? == No Use parallel disc I/O? == Yes Pre-read all particles into RAM? == Yes Submit to queue? == No Mask individual particles with zeros? == No Perform image alignment? == Yes Continue from here: == Class2D/job011/run_ct10_it011_optimiser.star Input images STAR file: == JoinStar/job679/join_particles.star Which GPUs to use: == Tube diameter (A): == 200 Limit resolution E-step to (A): == -1 Minimum dedicated cores per node: == 1 Number of classes: == 50 Number of iterations: == 25 Number of MPI procs: == 5 Number of pooled particles: == 20 Number of threads: == 1 Offset search range (pix): == 5 Offset search step (pix): == 1 Additional arguments: == Mask diameter (A): == 120 In-plane angular sampling: == 5 Queue submit command: == qsub Standard submission script: == /usr/local/bin/qsub.csh Queue name: == openmpi Angular search range - psi (deg): == 6 Copy particles to scratch directory: == Regularisation parameter T: == 2 Use GPU acceleration? == Yes

    i have CUDA8.0, v8.0.44 and gcc 4.8.2

    it crashes immediatly. i dont have that issue with Relion 2.1/

  3. Björn Forsberg

    Perhaps I then misunderstand, but why are you using

    Do bimodal angular searches? == Yes

    Could I also ask you to decrease your translation search a bit? Range 5 and step 1 will give quite many translations, and you might be falling under the last case in the original error message:

     -> YOUR DATA OR PARAMETERS WERE UNEXPECTED: execution on GPUs is 
        subject to many restrictions, and relion is written to work within
        common restraints. If you have exotic data or settings, unexpected
        configurations may occur.
    
  4. Eric Hanssen reporter

    hi Bjoern, i am using the GUI and have no possibility of choosing the bimodal angular search parameters. or is this called something else in the GUI? i will change the Range and step to see what happens. but i have never had any issues with these parameters on the same dataset in RElion2.1 Cheers eric

  5. Björn Forsberg

    Bimodals had nothing to do with it, indeed. It's just set to yes by default, and presumably not used in non-helical runs.

    Hi can't reproduce your fault though. I tried a few datasets and noise-masking works fine for me.

    I suspect it's something incompatible with the cuda version, your runtime, the you driver, and/or the gpu. As far as I know we've kept backwards compatibility, so a driver update would be my first suggestion, in case something changed in how RNGs are executed.

  6. Log in to comment