hang when sending Python object given particular import structure
Hi— I’ve been having a strange issue when trying to send a complicated Python object using mpi4py’s send
. Trying to send such an object results in a hang, but only if worker processes have been initialized in a particular way. Here’s my best minimal example so far (where I load a certain pickled Python object from test_object.dat
):
import cPickle
BIG_BAGGAGE = True
WORKER_IN_SEPARATE_FILE = True
if WORKER_IN_SEPARATE_FILE:
from debug_simple_mpi import *
else:
execfile("debug_simple_mpi.py")
if my_rank == 0:
if BIG_BAGGAGE:
with open("test_object.dat","rb") as fin:
baggage = cPickle.load(fin)
else:
baggage = "a small string"
# send job to worker and wait for result
comm.send(("'Test message'",baggage),dest=1)
result = comm.recv(source=1)
print("Master successfully got result: {}".format(result))
where debug_simple_mpi.py
sets up the worker:
from mpi4py import MPI
comm = MPI.COMM_WORLD
my_rank = comm.Get_rank()
if my_rank != 0:
# Wait for a message
message = comm.recv(source=0)
# Extract command from message
command, baggage = message
# Run command and send result
result = eval(command)
comm.send(result, dest=0)
This code works for all cases except when both BIG_BAGGAGE
and WORKER_IN_SEPARATE_FILE
are True, in which case it hangs. In particular, the message is sent fine when the worker has been initialized within the same file (and there’s no issue in any case if the message is small).
I realize this may be due to something in my own code from the Python object in test_object.dat
. But it’s very confusing to me why it would matter “where” the worker was initialized. Do you happen to have any ideas?
I’m on macOS 10.14.6, python 2.7.17, open mpi 4.0.3. Thanks.
Comments (6)
-
-
reporter Thanks for the quick response.
It seems to hang when rank 0 has sent the message (it gets past the
comm.send
line) and rank 1 is waiting to receive it. The size oftest_object.dat
is 26k.It fails in a different way when trying to send a large object like your big string. In that case, rank 0 never gets past the
comm.send
line, regardless ofWORKER_IN_SEPARATE_FILE
. (A slightly smaller string,"A” * 2**24
, works fine in all cases.) -
Weird. I just run your code on macOS 10.14.6 (Mojave), Python 2.7.16 (system, Apple-provided Python), and Open MPI 4.0.3 (installed with Homebrew). I used
baggage = ”A” * 2**25
(also tried2**30
), and everything seems to work, I get an error/warning message at the end, but that’s most likely an Open MPI bug/issue at finalization.At this point, I’m clueless about what’s going on in your system. I would try to run things with MPICH, just to discard the issue is somehow Python/mpi4py releated. I did, and it worked just fine.
-
reporter Okay, I solved the large string problem by reinstalling both openmpi and mpi4py using anaconda (before I had installed them using homebrew and pip). Now
baggage = "A" * 2**30
always works.Unfortunately my original problem with the more complicated Python object still remains (hanging when
BIG_BAGGAGE
andWORKER_IN_SEPARATE_FILE
are True). Anything else I should try? -
reporter After further debugging, it appears the hang is happening when C code that the object depends on is recompiled, which happens automatically as soon as it is unpickled (I was unaware such a thing was possible!). I dare say this is not an issue for mpi4py. Thanks for your help, and I’ll close the issue for now (I may update this thread if we solve the problem in case someone else has a similar issue).
-
reporter - changed status to resolved
- Log in to comment
Where is the hanging happening? On the sender side (rank 0) or the receiving side (rank 1)? What’s the size in bytes of the file
test_object.dat
? Does it fail if you send a big string for several megabytes, e.g.baggage = ”A” * 2**25
?