- edited description
Example mpi4py code not working
Hello,
I am trying to run some previously-working code using mpi4py and it now just hangs. Even the example code
from mpi4py import MPI
comm = MPI.COMM_WORLD
rank = comm.Get_rank()
print('my rank is: ', rank)
print('the comm size is : ', comm.Get_size())
if rank == 0:
print('about to send ', rank)
data = {'a': 7, 'b': 3.14}
comm.send(data, dest=1, tag=11)
print('finished sending ', rank)
elif rank == 1:
print('about to receive ', rank)
data = comm.recv(source=0, tag=11)
print('finished receiving ', rank)
just hangs (rank 1 does not complete its recv
, and reports no error) when I run as follows:
$ mpiexec -np 2 python2 code.py
('my rank is: ', 1)
('the comm size is : ', 2)
('about to receive ', 1)
('my rank is: ', 0)
('the comm size is : ', 2)
('about to send ', 0)
('finished sending ', 0)
My python/mpi4py build is as follows:
>>> sys.version
'2.7.6 (default, Jul 8 2014, 15:12:40) \n[GCC 4.4.7 20120313 (Red Hat 4.4.7-4)]'
>>> mpi4py.get_config()
{u'mpicc': u'/soft/mvapich2/2.2b_psm/intel-16.0/bin/mpicc'}
>>> mpi4py.get_include()
'/home/jlarson/.local/lib/python2.7/site-packages/mpi4py/include'
>>> mpi4py.__version__
'2.0.1a0'
Is this a known mvapich2 issue? I am happy to provide any additional information.
Thank you for your help, Jeff
Comments (16)
-
reporter -
Impossible to tell what's going on without more information, and for sure this is not an mpi4py issue. Where is the hang exactly happen? Can you add some "print" statements after
from mpi4py import MPI
to make sure the import line succeeded? Do you get the expected values forcomm.Get_rank()
andcomm.Get_size()
? Maybe thesend
call succeeds, but therecv
call hangs? Please, add some manual instrumentation to you code (i.e, some "print" lines here and there) to figure out where the hang occurs, so you can provide a more complete report. -
reporter - edited description
-
reporter - edited description
-
reporter Thank you for the quick response. I have updated the issue to highlight that the hang occurs on the rank=1 receiving.
-
reporter - edited description
-
Please try the following: add the following line at the VERY beginning of your script, right BEFORE
from mpipy import MPI
, and tell us how it goes.import mpi4py mpi4py.rc.recv_mprobe = False from mpi4py import MPI # rest of your code ...
-
reporter - edited description
-
reporter If I do that, then it runs without issue!
[jlarson@blogin1 ~]$ mpiexec -np 2 python2 code.py ('my rank is: ', 0) ('the comm size is : ', 2) ('about to send ', 0) ('my rank is: ', 1) ('the comm size is : ', 2) ('about to receive ', 1) ('finished sending ', 0) ('finished receiving ', 1)
But if I comment out the
mpi4py.rc.recv_mprobe = False
line, then it hangs without ever finishing the receive. -
OK, so it seems that your backend MPI was issues with matched probes. Matched probes were introduced in MPI 3.0, but some MPI 2.x implementations support matched probes anyway. If available, mpi4py uses matched probes to implement
recv()
, this way the call is thread-safe without using locks.What's the output of
MPI.Get_version()
? I may consider enabling the use of matched probes only if the MPI implementation advertises itself as supporting MPI >= 3.0. -
Oh, I double-checked mpi4py sources, and matched probes are enabled by default only if
MPI_VERSION >= 3
: https://bitbucket.org/mpi4py/mpi4py/src/master/src/atimport.h?fileviewer=file-view-default#atimport.h-33Anyway, I'm inclined to say that your MPI implementation is broken regarding matched probes. Ask for help to the support staff of your computing facility, or report this issue upstream to the MVAPICH team.
-
reporter There is perhaps an issue with my mpi4py build. My version is reported as 3.0, but the output from
Get_vendor
is not what I would expect:>>> import mpi4py >>> mpi4py.__path__ ['/home/jlarson/.local/lib/python2.7/site-packages/mpi4py'] >>> mpi4py.__version__ '2.0.1a0' >>> from mpi4py import MPI >>> MPI.Get_version() (3, 0) >>> MPI.get_vendor() ('MPICH', (3, 1, 4)) >>> MPI.Get_library_version() 'MVAPICH2 Version :\t2.2b\nMVAPICH2 Release date :\tMon Nov 12 20:00:00 EST 2015\nMVAPICH2 Device :\tch3:psm\nMVAPICH2 configure :\tCC=icc CXX=icpc FC=ifort F77=ifort --prefix=/soft/mvapich2/2.2b_psm/intel-16.0 --enable-fortran=all --enable-cxx --enable-romio --enable-threads=multiple --enable-thread-cs=global --disable-rdma-cm --enable-shared --enable-static --with-pbs --with-device=ch3:psm\nMVAPICH2 CC :\ticc -DNDEBUG -DNVALGRIND -O2\nMVAPICH2 CXX :\ticpc -DNDEBUG -DNVALGRIND -O2\nMVAPICH2 F77 :\tifort -O2\nMVAPICH2 FC :\tifort -O2\n'
This is from a fresh build: $ python setup.py build --mpicc=/soft/mvapich2/2.2b_psm/intel-16.0/bin/mpicc
The build output can be found here: http://www.mcs.anl.gov/~jlarson/mpi4py_build.txt.
Should the version be
MVAPICH
and notMPICH
? -
The
get_vendor()
function is legacy mpi4py stuff, I should update it to recognize MVAPICH, I'll take a look at that.You should trust
MPI.Get_library_version()
, it says MVAPICH2 2.2b, that's the real version info.MPI.Get_version()
returns the version of the MPI standard the library supports, it seems to be MPI 3.0, that's fine.I think everything is OK about you mpi4py build, however I believe your MPI has issues handling matched probes (the
MPI_Mprobe
/MPI_Mrecv
calls), then the default mpi4py'srecv()
does not work as expected. Matched probes are a relatively new MPI feature, I guess it is not being used too much in the wild, so bugs in MPI implementations are expected. While it is tempting to thing that this could be an mpi4py bug, I really doubt it. mpi4py'srecv()
based on matched probes has been extensively tested with MPICH, Open MPI, and Intel MPI on Linux a macOS, and it seems to work just fine.Anyway, the
mpi4py.rc.recv_mprobe = False
trick seems to work for you, there is nothing else I can do for now. I'm closing this issue as resolved. -
- changed status to resolved
-
reporter Thank you for your help. Since I'm not an MPI/C expert, can you provide a simple, pure C example that presents this issue? I will send it upstream to MVAPICH2 and the sysadmins here.
Thank you again,
Jeff
-
Hi,
just wanted to say that I had the same problem. I'm really glad I found this issue, because disabling matched probes solved it. I'm using Intel-MPI 5.1.
MPI.Get_version()
shows(3,0)
andMPI.Get_vendor()
returns('Intel MPI', (5, 1, 3))
.Anyways, thanks!
David
- Log in to comment