hang when sending Python object given particular import structure

Issue #163 resolved
Bryan Daniels created an issue

Hi— I’ve been having a strange issue when trying to send a complicated Python object using mpi4py’s send. Trying to send such an object results in a hang, but only if worker processes have been initialized in a particular way. Here’s my best minimal example so far (where I load a certain pickled Python object from test_object.dat):

import cPickle

BIG_BAGGAGE = True
WORKER_IN_SEPARATE_FILE = True

if WORKER_IN_SEPARATE_FILE:
    from debug_simple_mpi import *
else:
    execfile("debug_simple_mpi.py")

if my_rank == 0:
    if BIG_BAGGAGE:
        with open("test_object.dat","rb") as fin:
            baggage = cPickle.load(fin)
    else:
        baggage = "a small string"

    # send job to worker and wait for result
    comm.send(("'Test message'",baggage),dest=1)
    result = comm.recv(source=1)

    print("Master successfully got result: {}".format(result))

where debug_simple_mpi.py sets up the worker:

from mpi4py import MPI
comm = MPI.COMM_WORLD
my_rank = comm.Get_rank()

if my_rank != 0:
    # Wait for a message
    message = comm.recv(source=0)
    # Extract command from message
    command, baggage = message
    # Run command and send result
    result = eval(command)
    comm.send(result, dest=0)

This code works for all cases except when both BIG_BAGGAGE and WORKER_IN_SEPARATE_FILE are True, in which case it hangs. In particular, the message is sent fine when the worker has been initialized within the same file (and there’s no issue in any case if the message is small).

I realize this may be due to something in my own code from the Python object in test_object.dat. But it’s very confusing to me why it would matter “where” the worker was initialized. Do you happen to have any ideas?

I’m on macOS 10.14.6, python 2.7.17, open mpi 4.0.3. Thanks.

Comments (6)

  1. Lisandro Dalcin

    Where is the hanging happening? On the sender side (rank 0) or the receiving side (rank 1)? What’s the size in bytes of the file test_object.dat? Does it fail if you send a big string for several megabytes, e.g. baggage = ”A” * 2**25?

  2. Bryan Daniels reporter

    Thanks for the quick response.

    It seems to hang when rank 0 has sent the message (it gets past the comm.send line) and rank 1 is waiting to receive it. The size of test_object.dat is 26k.

    It fails in a different way when trying to send a large object like your big string. In that case, rank 0 never gets past the comm.send line, regardless of WORKER_IN_SEPARATE_FILE. (A slightly smaller string, "A” * 2**24, works fine in all cases.)

  3. Lisandro Dalcin

    Weird. I just run your code on macOS 10.14.6 (Mojave), Python 2.7.16 (system, Apple-provided Python), and Open MPI 4.0.3 (installed with Homebrew). I used baggage = ”A” * 2**25(also tried 2**30), and everything seems to work, I get an error/warning message at the end, but that’s most likely an Open MPI bug/issue at finalization.

    At this point, I’m clueless about what’s going on in your system. I would try to run things with MPICH, just to discard the issue is somehow Python/mpi4py releated. I did, and it worked just fine.

  4. Bryan Daniels reporter

    Okay, I solved the large string problem by reinstalling both openmpi and mpi4py using anaconda (before I had installed them using homebrew and pip). Now baggage = "A" * 2**30 always works.

    Unfortunately my original problem with the more complicated Python object still remains (hanging when BIG_BAGGAGE and WORKER_IN_SEPARATE_FILE are True). Anything else I should try?

  5. Bryan Daniels reporter

    After further debugging, it appears the hang is happening when C code that the object depends on is recompiled, which happens automatically as soon as it is unpickled (I was unaware such a thing was possible!). I dare say this is not an issue for mpi4py. Thanks for your help, and I’ll close the issue for now (I may update this thread if we solve the problem in case someone else has a similar issue).

  6. Log in to comment