Issue python2 / python3

Issue #43 resolved
neok created an issue
from __future__ import division, print_function
from mpi4py import MPI
import numpy as np

rank = comm.Get_rank()
size = comm.Get_size()

work_size = 103
work = np.zeros(work_size)

base = work_size // size
leftover = work_size % size
print('base', base, '+ leftover', leftover, 'on rank', rank)

sizes = np.ones(size) * base
sizes[:leftover] += 1
offsets = np.zeros(size)
offsets[1:] = np.cumsum(sizes)[:-1]

start = offsets[rank]
local_size = sizes[rank]
work_local = np.arange(start, start + local_size, dtype=np.float64)

print ('local work: {} in rank {}'.format(work_local, rank))

comm.Allgatherv(work_local, [work, sizes, offsets, MPI.DOUBLE])
print('after allgatherv', work)
total = np.empty(1, dtype=np.float64)

comm.Allreduce(np.sum(work_local), total)

print ('work {} vs {} in rank {}'.format(np.sum(work), total, rank))


I get the expected result (no errors, work 5253.0 vs [ 5253.] in rank 0) with: mpirun -np 4 python2

whereas: mpirun -np 4 python3 fails with the following error:

Traceback (most recent call last):
  File "<...>/", line 31, in <module>
    comm.Allreduce(np.sum(work_local), total)
  File "MPI/Comm.pyx", line 714, in mpi4py.MPI.Comm.Allreduce (src/mpi4py.MPI.c:99618)
  File "MPI/msgbuffer.pxi", line 709, in mpi4py.MPI._p_msg_cco.for_allreduce (src/mpi4py.MPI.c:36450)
ValueError: mismatch in send count 8 and receive count 1

I use mpich 3.2.0 and mpi4py 2.0.0 with the recipes provided in mpi4py/conf/conda-recipes on both python 2.7.11 and python 3.5.1 environments.

What have I done wrong here ?

  1. neok reporter
    UPDATE: same bug happens with openmpi 1.10.2, regardless the number of processes, I did the whole compilation in a vanilla environment $ conda env create -y -n foo python=3.5.0 anaconda

    mpi4py also causes causes bugs with h5py (but I guess it is not related ...) when switching from python2 to python3:

      File "<...>/_test/lib/python3.5/site-packages/h5py/_hl/", line 300, in close
      File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper (<...>/work/h5py/_objects.c:3020)
      File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper (<...>/work/h5py/_objects.c:2978)
      File "h5py/h5i.pyx", line 150, in h5py.h5i.dec_ref (<...>/h5py/h5i.c:2526)
    RuntimeError: Can't decrement id ref count (Invalid argument, error stack:
    mpi_file_set_size(76): inconsistent arguments to collective routine```

    I really don't have a clue on this one, thank you for your help !

  2. Lisandro Dalcin

    This is an issue with automatic NumPy -> MPI datatype mapping. I'm investigating it, should be related to NumPy and the buffer interface.

    In the mean time, try the following, it should work

    comm.Allreduce([np.sum(work_local), MPI.DOUBLE], total)
  3. Lisandro Dalcin

    @neok Indeed, I think this is a bug in NumPy. Memoryviews of array scalars do not return the right format. See yourself:

    >>> import numpy as np
    >>> a = np.zeros(1, dtype=np.float64)
    >>> memoryview(a).format
    >>> memoryview(a[0]).format
  4. neok reporter

    @dalcinl I appreciate pointing out the root of the problem. It seems indeed that this is not a bug but an unimplemented feature in numpy, so we can close the issue.

    So the advice here is to pass MPI datatypes explicitly.

    Thanks for the support.

