Memory Allocation Error on POWER

Issue #89 invalid
Andreas Herten created an issue

We have an issue with allocation of memory in a large application we run on our POWER8NVL system. We managed to isolate the problem to a simple Python application which iteratively allocates memory. As soon as mpi4py is imported, the Python script breaks with an out of memory error. When mpi4py is not imported, everything works fine.

Currently, my theory is that the problem is somehow related to mpi4py and possibly the underlying OpenMPI installation and/or the LSF batch submission engine, plus system configuration. Maybe someone can help in shining a light on what could go wrong and help us debug the issue.

The error occurs when we call the script available in this Gist repository as test_memory.py (both Python 3 and Python 2, though I use Python 3). The way to trigger it:

$ bsub -Is -n 20 -x -tty /bin/bash
$$ mpirun -n 20 python3 test_memory.py

The script will run some iterations of allocating data before it crashes

Rank:  0 - Data size:  32064.06 MB - Mem used 42G (18.3 %)
Rank:  0 - Data size:  38464.07 MB - Mem used 48G (20.8 %)
Traceback (most recent call last):
Traceback (most recent call last):
  File "test_memory.py", line 18, in <module>
    data.append(np.ones((1024,1024,8), dtype=np.float64))
  File "/gpfs/software/opt/python/modules/3.6/numpy/1.13.1/lib/python3.6/site-packages/numpy-1.13.1-py3.6-linux-ppc64le.egg/numpy/core/numeric.py", line 192, in ones
    a = empty(shape, dtype, order)
MemoryError

The available system memory is ~240 GB, of which in the last iteration 48 GB are used (20.8 %). Only rank 0 allocates memory.

When using not mpirun -n 20 but a lower value, the process breaks at later stages, i.e. allocating more memory. (The relation between used memory before it breaks and number of ranks is quite perfectly quadratic…).

The architecture is a POWER8NVL Minsky server with 160 cores (2 nodes, 10 physical core, 8-fold simultaneous multi-threading). We are using OpenMPI 2.1.2, Python 3.6.1, mpi4py 3.0.0, numpy 1.13.1, all compiled with GCC 5.4.0. I also tested the current master branch of mpi4py with the same result.

Is mpi4py changing the way succeeding allocations occur? Could the package wrongly interpret the job environment? Are there system configurations we could tune? I ran the reproducer on a similar POWER system but could not trigger the error.

Comments (5)

  1. Lisandro Dalcin

    mpi4py does not play any special games with memory allocation. I guess this issue is related to Open MPI.

    Please try the following: At the very beginning of your script add these lines:

    import mpi4py
    mpi4py.rc.initialize = False  # disable automatic MPI_Init() at import
    from mpi4py import MPI  # the module is imported, but MPI_Init() is not called, you have to call MPI.Init() yourself
    ...
    rank = int(os.environ['OMPI_COMM_WORLD_RANK'])
    # rank = MPI.COMM_WORLD.Get_rank() # If MPI is not initialized, you cannot call this one
    ...
    

    If this works, I would rule out mpi4py as being the guilty party. To be 100% sure, perhaps you should use ctypes to dlopen libmpi.so and next call MPI_Init(NULL,NULL), and see what happens.

  2. Andreas Herten reporter

    Adding mpi4py.rc.initialize = False does indeed solve the issue. Does that mean each process of the run reserves its memory with an MPI_Init()?

    Concerning your second suggestion, I wrote a small script to import libmpi.so with ctypes. That failed, so I found your ompi Github issue. If I understand that correctly, adding the absolute path to the .so.20.… should make it work, right? It does not. It still fails when calling mpirun -n 1 python3 mpi_init_ctypes.py. The script is added to the Gist repo.

  3. Lisandro Dalcin

    Well, I have no idea what that actually means, but the call to MPI_Init() seems to be affecting the process in a weird way. I don't think Open MPI requires that much memory after MPI_Init() to the point of making your subsequent allocations fail.

    About your ctypes script, please use RTLD_GLOBAL, not RTLD_LOCAL. That GitHub issue is precisely about the problems related to dlopening with RTLD_LOCAL.

    Using "<prefix>/libmpi.so" should be enough. If not, just put the full lib name with the version numbers, that is, do not dlopen through the symlink.

  4. Andreas Herten reporter

    Thanks to your help I was able to further isolate the problem; and I think I found it. It wasn't related to mpi4py directly – so my bug is pretty much closed, as it does not concern the Python package.

    As you suggested I added a ctypes-based MPI_Init() call directly into the script (file is updated in Gist). That still lead to a out of memory error. To remove further dependencies, instead of allocating np.ones(), I added allocation of simple Python floats. This also didn't solve the issue, as I still called a (now senseless) import numpy at the top the file. Removing this actually solves the issue! So, the interplay of MPI and Numpy seems to be the culprit here.

    We linked our Numpy installation against an OpenBLAS library (it's a 160 threads machine after all). I just quickly compiled a Numpy version without OpenBLAS support, and it seems that this does indeed solve the issue. Numpy's example site.cfg mentions something related to forking and OpenBLAS, so that might really be responsible here. I'm going to revisit that problem after the weekend.

    Thank you for your support!

  5. Log in to comment