Issue python2 / python3

Issue #43 resolved
neok created an issue
from __future__ import division, print_function
from mpi4py import MPI
import numpy as np

comm = MPI.COMM_WORLD
rank = comm.Get_rank()
size = comm.Get_size()

work_size = 103
work = np.zeros(work_size)

base = work_size // size
leftover = work_size % size
print('base', base, '+ leftover', leftover, 'on rank', rank)

sizes = np.ones(size) * base
sizes[:leftover] += 1
offsets = np.zeros(size)
offsets[1:] = np.cumsum(sizes)[:-1]

start = offsets[rank]
local_size = sizes[rank]
work_local = np.arange(start, start + local_size, dtype=np.float64)

print ('local work: {} in rank {}'.format(work_local, rank))

comm.Allgatherv(work_local, [work, sizes, offsets, MPI.DOUBLE])
print('after allgatherv', work)
total = np.empty(1, dtype=np.float64)

comm.Allreduce(np.sum(work_local), total)

print ('work {} vs {} in rank {}'.format(np.sum(work), total, rank))

Hi,

I get the expected result (no errors, work 5253.0 vs [ 5253.] in rank 0) with: mpirun -np 4 python2 test.py

whereas: mpirun -np 4 python3 test.py fails with the following error:

Traceback (most recent call last):
  File "<...>/test.py", line 31, in <module>
    comm.Allreduce(np.sum(work_local), total)
  File "MPI/Comm.pyx", line 714, in mpi4py.MPI.Comm.Allreduce (src/mpi4py.MPI.c:99618)
  File "MPI/msgbuffer.pxi", line 709, in mpi4py.MPI._p_msg_cco.for_allreduce (src/mpi4py.MPI.c:36450)
ValueError: mismatch in send count 8 and receive count 1

I use mpich 3.2.0 and mpi4py 2.0.0 with the recipes provided in mpi4py/conf/conda-recipes on both python 2.7.11 and python 3.5.1 environments.

What have I done wrong here ?

Comments (6)

  1. neok reporter
    • edited description

    UPDATE: same bug happens with openmpi 1.10.2, regardless the number of processes, I did the whole compilation in a vanilla environment $ conda env create -y -n foo python=3.5.0 anaconda

    mpi4py also causes causes bugs with h5py (but I guess it is not related ...) when switching from python2 to python3:

      File "<...>/_test/lib/python3.5/site-packages/h5py/_hl/files.py", line 300, in close
        h5i.dec_ref(id_)
      File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper (<...>/work/h5py/_objects.c:3020)
      File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper (<...>/work/h5py/_objects.c:2978)
      File "h5py/h5i.pyx", line 150, in h5py.h5i.dec_ref (<...>/h5py/h5i.c:2526)
    
    RuntimeError: Can't decrement id ref count (Invalid argument, error stack:
    mpi_file_set_size(76): inconsistent arguments to collective routine```
    

    I really don't have a clue on this one, thank you for your help !

  2. Lisandro Dalcin

    This is an issue with automatic NumPy -> MPI datatype mapping. I'm investigating it, should be related to NumPy and the buffer interface.

    In the mean time, try the following, it should work

    comm.Allreduce([np.sum(work_local), MPI.DOUBLE], total)
    
  3. Lisandro Dalcin

    @neok Indeed, I think this is a bug in NumPy. Memoryviews of array scalars do not return the right format. See yourself:

    >>> import numpy as np
    >>> a = np.zeros(1, dtype=np.float64)
    >>> memoryview(a).format
    'd'
    >>> memoryview(a[0]).format
    'B'
    
  4. neok reporter

    @dalcinl I appreciate pointing out the root of the problem. It seems indeed that this is not a bug but an unimplemented feature in numpy, so we can close the issue.

    So the advice here is to pass MPI datatypes explicitly.

    Thanks for the support.

  5. Log in to comment