Example mpi4py code not working

Issue #54 resolved
jefflarson created an issue

Hello,

I am trying to run some previously-working code using mpi4py and it now just hangs. Even the example code

from mpi4py import MPI

comm = MPI.COMM_WORLD
rank = comm.Get_rank()

print('my rank is: ', rank)
print('the comm size is : ', comm.Get_size())

if rank == 0:
   print('about to send ', rank)
   data = {'a': 7, 'b': 3.14}
   comm.send(data, dest=1, tag=11)
   print('finished sending ', rank)
elif rank == 1:
   print('about to receive ', rank)
   data = comm.recv(source=0, tag=11)
   print('finished receiving ', rank)

just hangs (rank 1 does not complete its recv, and reports no error) when I run as follows:

$ mpiexec -np 2 python2 code.py 
('my rank is: ', 1)
('the comm size is : ', 2)
('about to receive ', 1)
('my rank is: ', 0)
('the comm size is : ', 2)
('about to send ', 0)
('finished sending ', 0)

My python/mpi4py build is as follows:

>>> sys.version
'2.7.6 (default, Jul  8 2014, 15:12:40) \n[GCC 4.4.7 20120313 (Red Hat 4.4.7-4)]'
>>> mpi4py.get_config()
{u'mpicc': u'/soft/mvapich2/2.2b_psm/intel-16.0/bin/mpicc'}
>>> mpi4py.get_include()
'/home/jlarson/.local/lib/python2.7/site-packages/mpi4py/include'
>>> mpi4py.__version__
'2.0.1a0'

Is this a known mvapich2 issue? I am happy to provide any additional information.

Thank you for your help, Jeff

Comments (16)

  1. Lisandro Dalcin

    Impossible to tell what's going on without more information, and for sure this is not an mpi4py issue. Where is the hang exactly happen? Can you add some "print" statements after from mpi4py import MPI to make sure the import line succeeded? Do you get the expected values for comm.Get_rank() and comm.Get_size()? Maybe the send call succeeds, but the recv call hangs? Please, add some manual instrumentation to you code (i.e, some "print" lines here and there) to figure out where the hang occurs, so you can provide a more complete report.

  2. jefflarson reporter

    Thank you for the quick response. I have updated the issue to highlight that the hang occurs on the rank=1 receiving.

  3. Lisandro Dalcin

    Please try the following: add the following line at the VERY beginning of your script, right BEFORE from mpipy import MPI, and tell us how it goes.

    import mpi4py
    mpi4py.rc.recv_mprobe = False
    from mpi4py import MPI
    
    # rest of your code ...
    
  4. jefflarson reporter

    If I do that, then it runs without issue!

    [jlarson@blogin1 ~]$ mpiexec -np 2 python2 code.py 
    ('my rank is: ', 0)
    ('the comm size is : ', 2)
    ('about to send ', 0)
    ('my rank is: ', 1)
    ('the comm size is : ', 2)
    ('about to receive ', 1)
    ('finished sending ', 0)
    ('finished receiving ', 1)
    

    But if I comment out the mpi4py.rc.recv_mprobe = False line, then it hangs without ever finishing the receive.

  5. Lisandro Dalcin

    OK, so it seems that your backend MPI was issues with matched probes. Matched probes were introduced in MPI 3.0, but some MPI 2.x implementations support matched probes anyway. If available, mpi4py uses matched probes to implement recv(), this way the call is thread-safe without using locks.

    What's the output of MPI.Get_version() ? I may consider enabling the use of matched probes only if the MPI implementation advertises itself as supporting MPI >= 3.0.

  6. jefflarson reporter

    There is perhaps an issue with my mpi4py build. My version is reported as 3.0, but the output from Get_vendor is not what I would expect:

    >>> import mpi4py
    >>> mpi4py.__path__
    ['/home/jlarson/.local/lib/python2.7/site-packages/mpi4py']
    >>> mpi4py.__version__
    '2.0.1a0'
    >>> from mpi4py import MPI
    >>> MPI.Get_version()
    (3, 0)
    >>> MPI.get_vendor()
    ('MPICH', (3, 1, 4))
    >>> MPI.Get_library_version()
    'MVAPICH2 Version      :\t2.2b\nMVAPICH2 Release date :\tMon Nov 12 20:00:00 EST 2015\nMVAPICH2 Device       :\tch3:psm\nMVAPICH2 configure    :\tCC=icc CXX=icpc FC=ifort F77=ifort --prefix=/soft/mvapich2/2.2b_psm/intel-16.0 --enable-fortran=all --enable-cxx --enable-romio --enable-threads=multiple --enable-thread-cs=global --disable-rdma-cm --enable-shared --enable-static --with-pbs --with-device=ch3:psm\nMVAPICH2 CC           :\ticc    -DNDEBUG -DNVALGRIND -O2\nMVAPICH2 CXX          :\ticpc   -DNDEBUG -DNVALGRIND -O2\nMVAPICH2 F77          :\tifort   -O2\nMVAPICH2 FC           :\tifort   -O2\n'
    

    This is from a fresh build: $ python setup.py build --mpicc=/soft/mvapich2/2.2b_psm/intel-16.0/bin/mpicc

    The build output can be found here: http://www.mcs.anl.gov/~jlarson/mpi4py_build.txt.

    Should the version be MVAPICH and not MPICH?

  7. Lisandro Dalcin

    The get_vendor() function is legacy mpi4py stuff, I should update it to recognize MVAPICH, I'll take a look at that.

    You should trust MPI.Get_library_version(), it says MVAPICH2 2.2b, that's the real version info.

    MPI.Get_version() returns the version of the MPI standard the library supports, it seems to be MPI 3.0, that's fine.

    I think everything is OK about you mpi4py build, however I believe your MPI has issues handling matched probes (the MPI_Mprobe/MPI_Mrecv calls), then the default mpi4py's recv() does not work as expected. Matched probes are a relatively new MPI feature, I guess it is not being used too much in the wild, so bugs in MPI implementations are expected. While it is tempting to thing that this could be an mpi4py bug, I really doubt it. mpi4py's recv() based on matched probes has been extensively tested with MPICH, Open MPI, and Intel MPI on Linux a macOS, and it seems to work just fine.

    Anyway, the mpi4py.rc.recv_mprobe = False trick seems to work for you, there is nothing else I can do for now. I'm closing this issue as resolved.

  8. jefflarson reporter

    Thank you for your help. Since I'm not an MPI/C expert, can you provide a simple, pure C example that presents this issue? I will send it upstream to MVAPICH2 and the sysadmins here.

    Thank you again,

    Jeff

  9. monolithu

    Hi,

    just wanted to say that I had the same problem. I'm really glad I found this issue, because disabling matched probes solved it. I'm using Intel-MPI 5.1.

    MPI.Get_version() shows (3,0) and MPI.Get_vendor() returns ('Intel MPI', (5, 1, 3)).

    Anyways, thanks!

    David

  10. Log in to comment