Memory Allocation Error on POWER
We have an issue with allocation of memory in a large application we run on our POWER8NVL system. We managed to isolate the problem to a simple Python application which iteratively allocates memory. As soon as mpi4py
is imported, the Python script breaks with an out of memory error. When mpi4py
is not imported, everything works fine.
Currently, my theory is that the problem is somehow related to mpi4py
and possibly the underlying OpenMPI installation and/or the LSF batch submission engine, plus system configuration. Maybe someone can help in shining a light on what could go wrong and help us debug the issue.
The error occurs when we call the script available in this Gist repository as test_memory.py (both Python 3 and Python 2, though I use Python 3). The way to trigger it:
$ bsub -Is -n 20 -x -tty /bin/bash
$$ mpirun -n 20 python3 test_memory.py
The script will run some iterations of allocating data before it crashes
Rank: 0 - Data size: 32064.06 MB - Mem used 42G (18.3 %)
Rank: 0 - Data size: 38464.07 MB - Mem used 48G (20.8 %)
Traceback (most recent call last):
Traceback (most recent call last):
File "test_memory.py", line 18, in <module>
data.append(np.ones((1024,1024,8), dtype=np.float64))
File "/gpfs/software/opt/python/modules/3.6/numpy/1.13.1/lib/python3.6/site-packages/numpy-1.13.1-py3.6-linux-ppc64le.egg/numpy/core/numeric.py", line 192, in ones
a = empty(shape, dtype, order)
MemoryError
The available system memory is ~240 GB, of which in the last iteration 48 GB are used (20.8 %). Only rank 0 allocates memory.
When using not mpirun -n 20
but a lower value, the process breaks at later stages, i.e. allocating more memory. (The relation between used memory before it breaks and number of ranks is quite perfectly quadratic…).
The architecture is a POWER8NVL Minsky server with 160 cores (2 nodes, 10 physical core, 8-fold simultaneous multi-threading). We are using OpenMPI 2.1.2, Python 3.6.1, mpi4py
3.0.0, numpy
1.13.1, all compiled with GCC 5.4.0. I also tested the current master branch of mpi4py
with the same result.
Is mpi4py
changing the way succeeding allocations occur? Could the package wrongly interpret the job environment? Are there system configurations we could tune?
I ran the reproducer on a similar POWER system but could not trigger the error.
Comments (5)
-
-
reporter Adding
mpi4py.rc.initialize = False
does indeed solve the issue. Does that mean each process of the run reserves its memory with anMPI_Init()
?Concerning your second suggestion, I wrote a small script to import
libmpi.so
withctypes
. That failed, so I found your ompi Github issue. If I understand that correctly, adding the absolute path to the.so.20.…
should make it work, right? It does not. It still fails when callingmpirun -n 1 python3 mpi_init_ctypes.py
. The script is added to the Gist repo. -
Well, I have no idea what that actually means, but the call to
MPI_Init()
seems to be affecting the process in a weird way. I don't think Open MPI requires that much memory afterMPI_Init()
to the point of making your subsequent allocations fail.About your ctypes script, please use
RTLD_GLOBAL
, notRTLD_LOCAL
. That GitHub issue is precisely about the problems related to dlopening withRTLD_LOCAL
.Using
"<prefix>/libmpi.so"
should be enough. If not, just put the full lib name with the version numbers, that is, do not dlopen through the symlink. -
reporter Thanks to your help I was able to further isolate the problem; and I think I found it. It wasn't related to
mpi4py
directly – so my bug is pretty much closed, as it does not concern the Python package.As you suggested I added a
ctypes
-basedMPI_Init()
call directly into the script (file is updated in Gist). That still lead to a out of memory error. To remove further dependencies, instead of allocatingnp.ones()
, I added allocation of simple Pythonfloat
s. This also didn't solve the issue, as I still called a (now senseless)import numpy
at the top the file. Removing this actually solves the issue! So, the interplay of MPI and Numpy seems to be the culprit here.We linked our Numpy installation against an OpenBLAS library (it's a 160 threads machine after all). I just quickly compiled a Numpy version without OpenBLAS support, and it seems that this does indeed solve the issue. Numpy's example
site.cfg
mentions something related to forking and OpenBLAS, so that might really be responsible here. I'm going to revisit that problem after the weekend.Thank you for your support!
-
- changed status to invalid
- Log in to comment
mpi4py does not play any special games with memory allocation. I guess this issue is related to Open MPI.
Please try the following: At the very beginning of your script add these lines:
If this works, I would rule out mpi4py as being the guilty party. To be 100% sure, perhaps you should use
ctypes
to dlopenlibmpi.so
and next callMPI_Init(NULL,NULL)
, and see what happens.