- edited description
Issue python2 / python3
from __future__ import division, print_function
from mpi4py import MPI
import numpy as np
comm = MPI.COMM_WORLD
rank = comm.Get_rank()
size = comm.Get_size()
work_size = 103
work = np.zeros(work_size)
base = work_size // size
leftover = work_size % size
print('base', base, '+ leftover', leftover, 'on rank', rank)
sizes = np.ones(size) * base
sizes[:leftover] += 1
offsets = np.zeros(size)
offsets[1:] = np.cumsum(sizes)[:-1]
start = offsets[rank]
local_size = sizes[rank]
work_local = np.arange(start, start + local_size, dtype=np.float64)
print ('local work: {} in rank {}'.format(work_local, rank))
comm.Allgatherv(work_local, [work, sizes, offsets, MPI.DOUBLE])
print('after allgatherv', work)
total = np.empty(1, dtype=np.float64)
comm.Allreduce(np.sum(work_local), total)
print ('work {} vs {} in rank {}'.format(np.sum(work), total, rank))
Hi,
I get the expected result (no errors, work 5253.0 vs [ 5253.] in rank 0) with: mpirun -np 4 python2 test.py
whereas: mpirun -np 4 python3 test.py
fails with the following error:
Traceback (most recent call last):
File "<...>/test.py", line 31, in <module>
comm.Allreduce(np.sum(work_local), total)
File "MPI/Comm.pyx", line 714, in mpi4py.MPI.Comm.Allreduce (src/mpi4py.MPI.c:99618)
File "MPI/msgbuffer.pxi", line 709, in mpi4py.MPI._p_msg_cco.for_allreduce (src/mpi4py.MPI.c:36450)
ValueError: mismatch in send count 8 and receive count 1
I use mpich 3.2.0 and mpi4py 2.0.0 with the recipes provided in mpi4py/conf/conda-recipes on both python 2.7.11 and python 3.5.1 environments.
What have I done wrong here ?
Comments (6)
-
reporter -
This is an issue with automatic NumPy -> MPI datatype mapping. I'm investigating it, should be related to NumPy and the buffer interface.
In the mean time, try the following, it should work
comm.Allreduce([np.sum(work_local), MPI.DOUBLE], total)
-
@neok Indeed, I think this is a bug in NumPy. Memoryviews of array scalars do not return the right format. See yourself:
>>> import numpy as np >>> a = np.zeros(1, dtype=np.float64) >>> memoryview(a).format 'd' >>> memoryview(a[0]).format 'B'
-
Indeed, the format is not implemented for scalar types: https://github.com/numpy/numpy/blob/master/numpy/core/src/multiarray/scalartypes.c.src#L2409
-
reporter @dalcinl I appreciate pointing out the root of the problem. It seems indeed that this is not a bug but an unimplemented feature in numpy, so we can close the issue.
So the advice here is to pass MPI datatypes explicitly.
Thanks for the support.
-
reporter - changed status to resolved
- Log in to comment
UPDATE: same bug happens with openmpi 1.10.2, regardless the number of processes, I did the whole compilation in a vanilla environment $ conda env create -y -n foo python=3.5.0 anaconda
mpi4py also causes causes bugs with h5py (but I guess it is not related ...) when switching from python2 to python3:
I really don't have a clue on this one, thank you for your help !