`MPI_ERR_TRUNCATE: message truncated` when using `Comm.irecv()` with messages larger than 32kb

Issue #65 resolved
kwikwag created an issue

When I use Comm.irecv it fails with some kind of a buffer problem, stating "message truncated" and crashing my app. I happens when messages are larger than 32Kb approximately, and only with irecv() and not with recv().

A workaround is sending the message size in a regular send(), then serializing it independently (using pickle/struct) into a bytearray and on the receiving side allocating a bytearray of the given size and then using Irecv() to receive it. However, this is pretty ugly.

mpi4py version is 2.0.0. Output is identical for Python 3.5 and Python 2.7. I'm using (Open MPI) 2.0.2 on 64-bit on our university's custom Debian-based Linux.

$ mpirun python test_irecv.py --size 32758 --irecv > /dev/null || echo error
[host:66595] 19 more processes have sent help message help-mpi-btl-base.txt / btl:no-nics
[host:66595] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
$ mpirun python test_irecv.py --size 32759 > /dev/null || echo error
[host:66706] 19 more processes have sent help message help-mpi-btl-base.txt / btl:no-nics
[host:66706] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
$ mpirun python test_irecv.py --size 32759 --irecv > /dev/null || echo error
Traceback (most recent call last):
  File "test_irecv.py", line 22, in <module>
    success, data = result_object.test()
  File "MPI/Request.pyx", line 242, in mpi4py.MPI.Request.test (src/mpi4py.MPI.c:75941)
  File "MPI/msgpickle.pxi", line 411, in mpi4py.MPI.PyMPI_test (src/mpi4py.MPI.c:44372)
mpi4py.MPI.Exception: MPI_ERR_TRUNCATE: message truncated
[host:66794] 19 more processes have sent help message help-mpi-btl-base.txt / btl:no-nics
[host:66794] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
error

The script I'm using to run (test_irecv.py) is attached.

Comments (7)

  1. Lisandro Dalcin

    The implementation of irecv() for large messages requires users to pass a buffer-like object large enough to receive the pickled stream. This is not documented (as most of mpi4py), and even non-obvious and unpythonic, but if you have some good knowledge about MPI you will will understands its limitations for implementing irecv() for pickled streams.

    Your code should look like this:

    buf = bytearray(1<<20) # 1 MB buffer, make it larger if needed. 
    request = comm.irecv(buf, source=0, tag=10)
    ...
    success, data = request.test()
    

    Please try the fix above and confirm whether it is working for you. In such a case, please close this issue.

  2. kwikwag reporter

    @dalcinl Yes, it seems to be functioning OK. Thank you! Although by the time I saw your answer, I also ran into some problems on the sending side (which I can try and track down again, but don't have it off hand), and have in the meantime implemented my job manager using Irecv and Isend and sending two messages, one with the buffer size and a subsequent one with the payload. It was pretty cumbersome to use. If I'll have time, which I doubt I will, I'll try to pitch into the code to make this work better. Of course, in the meantime, I do recommend this gets documented in some kind of manner.

  3. Lisandro Dalcin

    In the sending side, in you are using request = comm.isend(), you should just take care of keeping the request instance alive and eventually call request.wait(), otherwise you have two issues: 1) you leak an MPI handle and you break the rule that all initiated nonblocking communication should be completed, and 2) Python may garbage-collect early the memory of the send buffer, thus leading to segfaults or garbage begin communicated.

  4. Log in to comment