mpi4py initialization breaks Fortran MPI_IN_PLACE

Issue #162 resolved
Patrick McNally created an issue

I work on an application that is written mostly in Python and Fortran. After trying to track down why certain operations using MPI_IN_PLACE were failing in our software but would work fine in standalone code, I discovered that the difference is whether mpi4py initializes MPI or we call MPI_Init() ourselves. If we initialize everything, the IN_PLACE calls work fine but if mpi4py initializes it, we get garbage in the array (mostly zeros). Note that if we import mpi4py.MPI in a function instead of at the module level, it all works fine.

I’m attaching some standalone code that reproduces the issue. Sorry in advance for the number of files but I was trying to narrow down the circumstances in which this occurs. You can mostly ignore the files at the root of the directory; I just included them so you can see how I’m building and running. The various Python files in the source directory just initialize MPI in various ways and run the test. A sample run from my box is in run.out. Please let me know if you need any additional information.

My complete environment is:
Python 2.7.11
mpi4py 3.0.3 (initially found issue with 1.3.1)
gcc 7.3.0
mpich 3.2
Red Hat Enterprise Linux 7.8

Comments (15)

  1. Lisandro Dalcin

    Please note that mpi4py uses MPI_Init_thread() by default. Before going deeper, I would suggest to modify the failing example this way:

    import mpi4py
    mpi4py.rc.threads = False # This way mpi4py will use `MPI_Init()`
    from mpi4py import MPI
    

    Does it still fail with these changes?

  2. Lisandro Dalcin

    In runPy.py, does it work if you import the inplace extension module BEFORE from mpi4py import MPI? I bet it will work, otherwise runPyFunc.py working would make no sense.

  3. Patrick McNally reporter

    Setting threads to False had no effect. I had previously tried using MPI_Init_thread() in C and it still worked.

    However, your second suggestion (putting the import of inplace before the import of MPI) did make it work. Any idea why that might be? It isn’t an option in the full code to import all compiled code objects before importing MPI, but if I just have to import one, that might work.

  4. Lisandro Dalcin

    This is probably related to the way MPICH initializes MPI depending on the shared libraries that have been loaded so far, and maybe the load order. You should ask for the gory details to the MPICH folks. There is very little mpi4py can do to alleviate the issue, mpi4py only depends on the C library, and that’s not going to change.

    All that being said, if you rebuild mpi4py with the following environment variable, things may work.

    export LDFLAGS="-Wl,--no-as-needed -lmpifort" # or whatever name the Fortran MPI library has in your system
    

    BTW, VERY IMPORTANT: Are you using shared libraries?? Can you double check that mpi4py and inplace are linked againt the same MPI shared libs (module the Fortran one)?

  5. Patrick McNally reporter

    Yes, we are using shared libraries and they are linked against the same libmpi.so, but only inplace is linked against libmpifort.so:
    (1004)-> ldd inplace/INSTALL/inplace.so
    libpython2.7.so.1.0 => /opt/create/ptoolsrte/0.5.5/packages/Python-2.7.11/lib/libpython2.7.so.
    libmpifort.so.12 => /opt/create/ptoolsrte/0.5.5/packages/configs/gcc-7.3.0_mpich-3.2/install/g
    libmpi.so.12 => /opt/create/ptoolsrte/0.5.5/packages/configs/gcc-7.3.0_mpich-3.2/install/gcc-7
    ...
    (1005)-> ldd mpi4py-3.0.3/INSTALL/lib/python/mpi4py/MPI.so
    libdl.so.2 => /lib64/libdl.so.2 (0x00007f8409c43000)
    libpython2.7.so.1.0 => /opt/create/ptoolsrte/0.5.5/packages/Python-2.7.11/lib/libpython2.7.so.
    libmpi.so.12 => /opt/create/ptoolsrte/0.5.5/packages/configs/gcc-7.3.0_mpich-3.2/install/gcc-7
    ...

    After rebuilding as you suggested, MPI.so is linked against libmpifort, but it still does not function properly if MPI is imported before inplace:
    (1012)-> ldd mpi4py-3.0.3/INSTALL/lib/python/mpi4py/MPI.so
    libmpifort.so.12 => /opt/create/ptoolsrte/0.5.5/packages/configs/gcc-7.3.0_mpich-3.2/install/g
    libdl.so.2 => /lib64/libdl.so.2 (0x00007f283ba99000)
    libpython2.7.so.1.0 => /opt/create/ptoolsrte/0.5.5/packages/Python-2.7.11/lib/libpython2.7.so.
    libmpi.so.12 => /opt/create/ptoolsrte/0.5.5/packages/configs/gcc-7.3.0_mpich-3.2/install/gcc-7
    ...

    I suspected this might be an MPICH issue, but so far I’ve been unable to reproduce the issue without mpi4py in the loop.

  6. Patrick McNally reporter

    Sorry to bother you again, but I just wanted to confirm that even though forcing mpi4py to link with libmpifort didn’t solve the issue, you still believe this to be an MPICH problem (load order or whatnot)? And thank you very much for all your assistance.

  7. Lisandro Dalcin

    I’m still not 100% sure. It is very very unlikely that this an mpi4py bug, maybe not even an MPICH bug, just some weird issue/assumption in the libraries that only trigger when they are dlopen'ed at runtime with RTLD_LOCAL as Python does for extension modules.

    I have to try things on my side, but I hit some problems building your example on Fedora 32 with system MPICH packages.

    In the mean time, could you try do dlopen the C and Fortran MPI libraries using ctypes (with mode RTLD_GLOBAL) before the mpi4py.MPI import? Maybe you can also try running the failing example setting the env var LD_PRELOAD=”/path/to/libmpifort.so” and then run mpiexec -n 1 python ...?

  8. Patrick McNally reporter

    I’m happy to try anything. Both of your suggestions fix the issue. For loading with ctypes, if I load either library (even just the C libmpi) it works and obviously loading both works as well. Same thing for LD_PRELOAD; I can point to either libmpi or libmpifort and it works. It seems odd that making it load the same libmpi that MPI.so is already linked to would cause it to work, but it does.

  9. Patrick McNally reporter

    I have not tried it on another system yet. One of my next steps was going to be trying this with OpenMPI and to add additional debug output to MPICH. Getting this working on my system would be nice, but we deploy this software on lots of systems that we don’t control so I would really like to understand the root cause.

    Looking at the Fortran bindings in MPICH it looks like they compare the send buffer to the address of the Fortran MPI_IN_PLACE integer. My suspicion is that my code’s MPI_IN_PLACE is somehow resolving to another variable so while the value might be the same, the address is different. It then passes my garbage buffer address on as the send buffer to the C MPI_Allreduce and copies garbage memory into my receive buffer.

  10. Lisandro Dalcin

    This is a minimal example reproducing the issue. It does not involve mpi4py, just ctypes loading shared libraries with RTLD_LOCAL.

    To reproduce, download the file and run:

    make
    python test.py
    

    It should work (same output value from C and F).

    Next, edit test.py and replace if 0if 1 such that the C library is loaded first. Run again, and it should stop working as expected.

    At this point, you can confirm this is not an mpi4py issue, but some weird thing related to shared libraries and load order when using RTLD_LOCAL.

  11. Patrick McNally reporter

    Ah, I see. I’ll engage with the MPICH developers and try to get this resolved. Thank you very much for your help and the minimal example, which is much less convoluted than what I would have likely created.

  12. Junchao Zhang

    @dalcinl I see in src/lib-mpi/compat/pcmpi.h, you have

    static void PyMPI_PCMPI_dlopen_libmpi(void)
    {
      void *handle1 = (void *)0;
      void *handle2 = (void *)0;
      int mode = RTLD_NOW | RTLD_GLOBAL;
      #ifdef RTLD_NOLOAD
      mode |= RTLD_NOLOAD;
      #endif
    ...
      if (!handle1) handle1 = dlopen("libmpi.so.2", mode);
      if (!handle1) handle1 = dlopen("libmpi.so.1", mode);
      if (!handle1) handle1 = dlopen("libmpi.so", mode);
    ...
    }
    

    So, why did you mention RTLD_LOCAL?

  13. Lisandro Dalcin

    @Junchao Zhang Those are workarounds to force the loading of the MPI library with RTLD_GLOBAL, otherwise things will simply not work with those MPIs. For example, that was the case for Open MPI for a long time. It took years for them to fix the issue. Fortunately, now the hack is no longer required for Open MPI.

    This RTLD_GLOBAL hack is effectively a way of “override“ Python’s default of using RTLD_LOCAL. However, it is a very carefull override, because it only affects the loading of the MPI library, with that extra precaution of using RTLD_NOLOAD flag if available. The hack is unreliable, and to do things really well you have to load libraries by version (otherwise it would not work in Linux distributions without installing develpment packages).

    Now, this RTLD_LOCAL issue is affecting MPICH, and perhaps this hack is the way to fix it, but I’m tired of wasting my time patching my code for issues that are not mine, and those that are responsible of the issue in the first time do not even acknowledge they have a problem.

  14. Log in to comment