error in Multiprocessing.py with singularity

Issue #72 resolved
Ido Tamir created an issue

Hello,

I run presto in a singularity 3.4.1 container on a slurm cluster with nextflow and when having multiple instances running I randomly get the error:

Command output:
  clip-c2-70 98352
  IDENTIFIER: 98352
  DIRECTORY: .
  PRESTO VERSION: 0.5.13-2019.08.29

  START
     1: FilterSeq quality        17:49 03/31/20
  ERROR:
      Traceback (most recent call last):
        File "/usr/local/bin/FilterSeq.py", line 239, in <module>
          filterSeq(**args_dict)
        File "/usr/local/bin/FilterSeq.py", line 83, in filterSeq
          nproc, queue_size)
        File "/usr/local/lib/python3.7/site-packages/presto/Multiprocessing.py", line 197, in manageProcesses
          alive = mp.Value(ctypes.c_bool, True)
        File "/usr/lib64/python3.7/multiprocessing/context.py", line 135, in Value
          ctx=self.get_context())
        File "/usr/lib64/python3.7/multiprocessing/sharedctypes.py", line 74, in Value
          obj = RawValue(typecode_or_type, *args)
        File "/usr/lib64/python3.7/multiprocessing/sharedctypes.py", line 49, in RawValue
          obj = _new_value(type_)
        File "/usr/lib64/python3.7/multiprocessing/sharedctypes.py", line 41, in _new_value
          wrapper = heap.BufferWrapper(size)
        File "/usr/lib64/python3.7/multiprocessing/heap.py", line 263, in __init__
          block = BufferWrapper._heap.malloc(size)
        File "/usr/lib64/python3.7/multiprocessing/heap.py", line 242, in malloc
          (arena, start, stop) = self._malloc(size)
        File "/usr/lib64/python3.7/multiprocessing/heap.py", line 134, in _malloc
          arena = Arena(length)
        File "/usr/lib64/python3.7/multiprocessing/heap.py", line 74, in __init__
          dir=self._choose_dir(size))
        File "/usr/lib64/python3.7/tempfile.py", line 340, in mkstemp
          return _mkstemp_inner(dir, prefix, suffix, flags, output_type)
        File "/usr/lib64/python3.7/tempfile.py", line 258, in _mkstemp_inner
          fd = _os.open(file, flags, 0o600)
      PermissionError: [Errno 13] Permission denied: '/dev/shm/pym-49784-3so0rtft'

This happens more or less randomly and I suspect it happens when the processes are on the same node. It did not happen when I processed only one dataset. Is this possible?

Its a bit difficult to debug. Do you know what I could do?

Thank you very much,

ido

Comments (4)

  1. Jason Vander Heiden

    This one is hard to debug. We do see this on some computing clusters, more often with AssemblePairs or AlignSets. I think it’s caused by running out of allocated memory.

    @Julian Zhou , did you have any luck working around this on farnam?

  2. Julian Zhou

    I’ve actually not encountered this particular problem, or anything to do with multiprocessing or FilterSeq really. @Jason Vander Heiden I think you might be thinking of my getting stuck with AssemblePairs, but rather because of blastn and something to do with the file system, and not because of MPI.

  3. Log in to comment