error in Multiprocessing.py with singularity
Hello,
I run presto in a singularity 3.4.1 container on a slurm cluster with nextflow and when having multiple instances running I randomly get the error:
Command output:
clip-c2-70 98352
IDENTIFIER: 98352
DIRECTORY: .
PRESTO VERSION: 0.5.13-2019.08.29
START
1: FilterSeq quality 17:49 03/31/20
ERROR:
Traceback (most recent call last):
File "/usr/local/bin/FilterSeq.py", line 239, in <module>
filterSeq(**args_dict)
File "/usr/local/bin/FilterSeq.py", line 83, in filterSeq
nproc, queue_size)
File "/usr/local/lib/python3.7/site-packages/presto/Multiprocessing.py", line 197, in manageProcesses
alive = mp.Value(ctypes.c_bool, True)
File "/usr/lib64/python3.7/multiprocessing/context.py", line 135, in Value
ctx=self.get_context())
File "/usr/lib64/python3.7/multiprocessing/sharedctypes.py", line 74, in Value
obj = RawValue(typecode_or_type, *args)
File "/usr/lib64/python3.7/multiprocessing/sharedctypes.py", line 49, in RawValue
obj = _new_value(type_)
File "/usr/lib64/python3.7/multiprocessing/sharedctypes.py", line 41, in _new_value
wrapper = heap.BufferWrapper(size)
File "/usr/lib64/python3.7/multiprocessing/heap.py", line 263, in __init__
block = BufferWrapper._heap.malloc(size)
File "/usr/lib64/python3.7/multiprocessing/heap.py", line 242, in malloc
(arena, start, stop) = self._malloc(size)
File "/usr/lib64/python3.7/multiprocessing/heap.py", line 134, in _malloc
arena = Arena(length)
File "/usr/lib64/python3.7/multiprocessing/heap.py", line 74, in __init__
dir=self._choose_dir(size))
File "/usr/lib64/python3.7/tempfile.py", line 340, in mkstemp
return _mkstemp_inner(dir, prefix, suffix, flags, output_type)
File "/usr/lib64/python3.7/tempfile.py", line 258, in _mkstemp_inner
fd = _os.open(file, flags, 0o600)
PermissionError: [Errno 13] Permission denied: '/dev/shm/pym-49784-3so0rtft'
This happens more or less randomly and I suspect it happens when the processes are on the same node. It did not happen when I processed only one dataset. Is this possible?
Its a bit difficult to debug. Do you know what I could do?
Thank you very much,
ido
Comments (4)
-
-
I’ve actually not encountered this particular problem, or anything to do with multiprocessing or
FilterSeq
really. @Jason Vander Heiden I think you might be thinking of my getting stuck withAssemblePairs
, but rather because ofblastn
and something to do with the file system, and not because of MPI. -
Ah, yeah, I was thinking of #65.
-
- changed status to resolved
Reopen if it reappears.
- Log in to comment
This one is hard to debug. We do see this on some computing clusters, more often with AssemblePairs or AlignSets. I think it’s caused by running out of allocated memory.
@Julian Zhou , did you have any luck working around this on farnam?