lsd_cc_spectral.py breaks when run with only 2 CPUs on laptop

Issue #12 on hold
Kasper Schmidt created an issue

When running lsd_cc_spectral.py I encountered the error:

Traceback (most recent call last):
 File "/Users/kschmidt/work/lsdcat/lsd_cc_spectral.py", line 189, in <module>
   nans_select=nans_select)
 File "/Users/kschmidt/work/lsdcat/lib/wavelength_smooth_lib.py", line 224, in filter_parallel
   result.append(t.get()) # store the workers results in list result
 File "/Users/kschmidt/ureka/Ureka/python/lib/python2.7/multiprocessing/pool.py", line 554, in get
   raise self._value
multiprocessing.pool.MaybeEncodingError: Error sending result: 'array([[ 0.00000000e+00,   0.00000000e+00,   0.00000000e+00, ...,
         2.15167743e-09,   2.95775204e-09,   4.01545761e-09],
      [  0.00000000e+00,   0.00000000e+00,   0.00000000e+00, ...,
         1.32669936e-07,   1.83914957e-07,   2.51180392e-07],
      [  0.00000000e+00,   0.00000000e+00,   0.00000000e+00, ...,
         4.72068740e-06,   6.59109549e-06,   9.04678512e-06],
      ...,
      [  0.00000000e+00,   0.00000000e+00,   0.00000000e+00, ...,
         2.81101082e-03,  -1.03879050e-02,  -2.16530300e-02],
      [  0.00000000e+00,   0.00000000e+00,   0.00000000e+00, ...,
         2.08807770e-03,  -2.75243848e-03,  -7.15100488e-03],
      [  0.00000000e+00,   0.00000000e+00,   0.00000000e+00, ...,
         4.09917250e-04,  -7.14648361e-04,  -1.93858773e-03]])'. Reason: 'SystemError('NULL result without error in PyObject_Call',)'

This happens in the t.get() which tries to store the output from a multiprocessing thread into the results list. The problem occurred as I am running the code on my laptop, and asked to only use 2 CPUs. When I allowed it to use 4CPUs it no longer had problems handling the large arrays. But if you only have two CPUs at you disposable this might be a problem.

Comments (5)

  1. Christian Herenz repo owner

    Can you please give me the full command that you used in this case? Looking at the error and at the code I'm completely lost why this happens. Even for 2 cores the logic should distribute the tasks more or less equally... But I will try to figure out what is going on.

  2. Kasper Schmidt reporter

    Sorry for the late reply. The full set of commands I used for the testing (including the lsd_cc_spectral.py command) are:

    #! /bin/bash
    
    NCPU=4 #48
    
    # PSF Parameters
    field_id=15
    p0=0.836496041376
    p1=-4.42958396561e-05
    
    # PATH + filenames - note how the filenames are generated here from field_id
    field_path=/Volumes/DATABCKUP3/MUSE/candels-cdfs-${field_id}/
    input_cube=${field_path}DATACUBE_candels-cdfs-${field_id}_v1.0.fits
    effnoise_file=${field_path}EFFNOISE_5px_candels-cdfs-${field_id}_v1.0.fits
    input_cube_base=`basename ${input_cube}`
    output_dir=$field_path  #/Users/kschmidt/work/MUSE/lsdcatruns/
    
    echo ----------------------------
    echo "NB enable launch of individual commands in script by removing the '######'s"
    echo ----------------------------
    echo "Data dir :" $field_path
    echo "Field ID :" $field_id
    echo "PSF p0   :" $p0
    echo "PSF p1   :" $p1
    echo "NCPU     :" $NCPU
    echo "Out dir  :" $output_dir
    
    echo ----------------------------
    echo Median filtering:
    med_filt_output=${output_dir}median_filtered_${input_cube_base}
    med_filt_com="median-filter-cube.py ${input_cube} --signalHDU=1 --varHDU=2 --num_cpu=${NCPU} --width=151 \
         --output=${med_filt_output}"
    echo ${med_filt_com}
    ###### ${med_filt_com}
    
    echo ----------------------------
    echo Applying effective noise:
    apply_eff_noise_com="apply_eff_noise.py ${med_filt_output} ${effnoise_file} --NHDU=1 --blowup --rsexp \
        --output=${med_filt_output}_effnoised.fits"
    echo ${apply_eff_noise_com}
    ###### ${apply_eff_noise_com}
    
    echo ----------------------------
    echo Spatial cross-correlation:
    med_filt_output_base=`basename ${med_filt_output} .fits`
    spat_cced_out=${output_dir}spat_cced_${med_filt_output_base}_effnoised.fits
    lsd_cc_spat_com="lsd_cc_spatial.py --input=${med_filt_output}_effnoised.fits --SHDU=0 --NHDU=4 \
        --threads=${NCPU} --gaussian --lambda0=7050 -p0=${p0} -p1=${p1} --output=${spat_cced_out}"
    echo ${lsd_cc_spat_com}
    ###### ${lsd_cc_spat_com}
    
    echo ----------------------------
    echo Spectral cross-correlation:
    spat_cced_out_base=`basename ${spat_cced_out}`
    spec_cced_out=${output_dir}spec_cced_${spat_cced_out_base}
    lsd_cc_spec_com="lsd_cc_spectral.py --input=${spat_cced_out} --threads=${NCPU} --FWHM=250 --SHDU=0 --NHDU=1 \
        --output=${spec_cced_out}"
    echo ${lsd_cc_spec_com}
    ###### ${lsd_cc_spec_com}
    
    echo ----------------------------
    echo Creation of S/N cube:
    s2n_com="s2n-cube.py --input=${spec_cced_out} \
        --output=${output_dir}s2n_candels-cdfs-${field_id}.fits --nanmask=${input_cube} \
        --nanmaskhdu=4" # KBS nanmaskhdu cahnged from default 3 (EXP 401x401x3681 cube) to 4 (white 401x401 img) 
    echo ${s2n_com}
    ###### ${s2n_com}
    
    echo ----------------------------
    echo "If you now want to detect objects take a look at:"
    echo "catalogscript_template160826.sh and/or forcefluxscript_template160826.sh"
    

    I hope this helps.

  3. Christian Herenz repo owner

    OK - this is very reasonable and I would use it the same way.

    The reason it breaks for 2CPUs something in python and from what I understand it will not be fixed in the 2.7 branch http://bugs.python.org/issue17560

    Basically, the data-chunks are too large for the multi-processing call. In principle, if you would have even larger data in could break on 3 or more CPUs... mh... There are workarounds.... For example in the thread they suggest https://pythonhosted.org/joblib/ - but then it maybe would make more sense to port it to Python3 at some point.

    For now, we keep this open as a Known Issue / Limitation - I will write something also in the README about that.

  4. Log in to comment