Allow to opt out of precalculated AB matrices when onthefly_shifts = true

Issue #36 resolved
Dimitry Tegunov created an issue

I encountered this problem in v2.1, but it looks like v3 would behave the same.

For sub-tomogram averaging, onthefly_shifts defaults to true, which causes AB matrices to be pre-calculated. Even with small boxes (170) and fine translational sampling, this leads to a very large memory footprint, severely limiting the number of MPI processes I can spawn. However, GPU calculations don't use these matrices and calculate sin & cos values on the fly. I commented out the pre-calculation, and everything seems to be running fine with plenty of processes. Obviously, this will break the failsafe mode because CPU calculations would still need the matrices. But if a data set doesn't cause fallbacks, it's not a problem.

I think it would be great to have an option to disable AB matrices for sub-tomo averaging on GPUs in case a user is sure failsafe mode won't be encountered.

Comments (6)

  1. Takanori Nakane

    Obviously, this will break the failsafe mode because CPU calculations would still need the matrices. But if a data set doesn't cause fallbacks, it's not a problem

    I don't understand this. When GPU calculation fails, the program simply dies. There is no automatic fallback to the CPU code path.

  2. Dimitry Tegunov reporter

    Something like this: 7d228da

    Sorry, that's how I imagined failsafe mode worked – fall back to CPU for FP64 calculations. Should have looked into the code.

  3. Takanori Nakane

    Actually, --do_shifts_onthefly is disabled when --cpu or --gpu.

        if (do_shifts_onthefly && (do_gpu || do_cpu))
        {   
            std::cerr << "WARNING: --onthefly_shifts cannot be combined with --cpu or --gpu, setting do_shifts_onthefly to false" << std::endl;
            do_shifts_onthefly = false;
        } 
    

    However, this is turned on again later if mymodel.data_dim == 3, which is the problem. So I'd like to fix this as follows. Could you please confirm if this makes sense? I don't work on tomography at all, so I cannot test myself.

    diff --git a/src/ml_optimiser.cpp b/src/ml_optimiser.cpp
    index 8b9a0b0..22f3d1a 100644
    --- a/src/ml_optimiser.cpp
    +++ b/src/ml_optimiser.cpp
    @@ -1856,7 +1856,7 @@ void MlOptimiser::initialiseGeneral(int rank)
                    // Don't do norm correction for volume averaging at this stage....
                    do_norm_correction = false;
    
    -               if (!((do_helical_refine) && (!ignore_helical_symmetry))) // For 3D helical sub-tomogram averaging, either is OK, so let the user decide
    +               if (!((do_helical_refine) && (!ignore_helical_symmetry)) && !(do_cpu || do_gpu)) // For 3D helical sub-tomogram averaging, either is OK, so let the user decide
                            do_shifts_onthefly = true; // save RAM for volume data (storing all shifted versions would take a lot!)
    
                    if (do_skip_align)
    
  4. Dimitry Tegunov reporter

    Can confirm, this solves the problem!

    There is some irony in GPU calculations requiring orders of magnitude less CPU memory than the CPU code path.

  5. Log in to comment