- changed title to MPI.COMM_SELF.Spawn Cannot spawned when called in a script in background (might be a BUG)
MPI.COMM_SELF.Spawn Cannot spawned when called in a script in background (might be a BUG)
Hi,
I was testing the MPI4PY unit tests in mpi4py/test
, and found there might be a bug here
What I tested is test_spawn.py
. I have 3 test cases as following:
- It works fine when
python test_spawn.py
- It still works fine when
mpirun --oversubscribe -np 2 -H host1,host2 python test_spawn.py
or make it running in background, likempirun --oversubscribe -np 2 -H host1,host2 python test_spawn.py &
. I set bothhost1
andhost2
with the same environment. (Although there might be some tmp file mismatch, it still could pass a few tests) - Here comes the bug (I think it is): I use two scripts, namely
script.sh
andrun.sh
, respectively.
script.sh
is like
mpirun --oversubscribe -np 2 -H host1,host2 python test_spawn.py
run.sh
is just:
sh script.sh &
In this case, there will be a timeout exception:
[@nmyjs_104_22 test]$ [nmyjs_104_37:34159] OPAL ERROR: Timeout in file base/pmix_base_fns.c at line 193
E[nmyjs_104_22:23226] OPAL ERROR: Timeout in file base/pmix_base_fns.c at line 193
*** An error occurred in MPI_Init_thread
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
*** and potentially your MPI job)
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems. This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):
ompi_dpm_dyn_init() failed
--> Returned "Timeout" (-15) instead of "Success" (0)
--------------------------------------------------------------------------
[warn] Epoll ADD(4) on fd 35 failed. Old events were 0; read change was 0 (none); write change was 1 (add): Bad file descriptor
[warn] Epoll ADD(4) on fd 51 failed. Old events were 0; read change was 0 (none); write change was 1 (add): Bad file descriptor
[warn] Epoll ADD(4) on fd 48 failed. Old events were 0; read change was 0 (none); write change was 1 (add): Bad file descriptor
[warn] Epoll ADD(4) on fd 30 failed. Old events were 0; read change was 0 (none); write change was 1 (add): Bad file descriptor
I'm thinking that this is a bug, because the only difference between case2 and case3 is that case3 invoke mpirun ...
in background from a bash script.
I saw this exception in my own project too, so I'm guessing it a bug of MPI4PY
or openmpi
.
Comments (5)
-
reporter -
reporter -
assigned issue to
-
assigned issue to
-
I'm almost sure this issue is not mpi4py's fault, but the backend MPI implementation. You seem to be using Open MPI, however you have not stated its version. I guess you have to ask Open MPI folks about it, IIRC, Open MPI 2.x releases had issues with spawning.
-
- changed status to invalid
I'm marking this issue as invalid. If you can provide actual evidence that this is indeed a bug in mpi4py, then I'll reopen it and work on any required fixes.
-
reporter Hi Lisandro, I'm using
openmpi 2.0.2
. I'll try to use other versions to see if that works. Thank you - Log in to comment