memory leak allreduce
I get a memory leak when using allreduce
with this code
import os
import psutil
from mpi4py import MPI
import numpy as np
comm = MPI.COMM_WORLD
rank = comm.Get_rank()
def get_memory_usage():
"""Return the memory usage in Mo."""
process = psutil.Process(os.getpid())
mem = process.memory_info()[0] / float(2 ** 20)
return mem
hist = np.arange(1000)
for _ in range(10):
print(f"rank {rank}, memory usage = {get_memory_usage():.3f} Mo")
for _ in range(1000):
# case 0: allreduce
# memory leak
result = comm.allreduce(hist, op=MPI.SUM)
# case 1: reduce
# no memory leak
# result = comm.reduce(hist, op=MPI.SUM, root=0)
# case 2: Allreduce
# no memory leak
# result = np.empty_like(hist)
# comm.Allreduce(hist, result, op=MPI.SUM)
assert result[0] == 0
assert result[1] == comm.size
Example of output:
rank 0, memory usage = 37.980 Mo
rank 1, memory usage = 37.941 Mo
rank 0, memory usage = 46.723 Mo
rank 1, memory usage = 38.871 Mo
rank 0, memory usage = 54.457 Mo
rank 1, memory usage = 38.871 Mo
rank 0, memory usage = 62.449 Mo
rank 1, memory usage = 39.188 Mo
rank 0, memory usage = 70.184 Mo
rank 1, memory usage = 39.188 Mo
rank 0, memory usage = 77.918 Mo
rank 1, memory usage = 39.188 Mo
rank 0, memory usage = 85.910 Mo
rank 1, memory usage = 39.188 Mo
rank 0, memory usage = 93.645 Mo
rank 1, memory usage = 39.445 Mo
rank 0, memory usage = 101.379 Mo
rank 1, memory usage = 39.445 Mo
rank 0, memory usage = 109.371 Mo
rank 1, memory usage = 39.445 Mo
Memory usage increases a lot for process 0.
mpirun --version
mpirun (Open MPI) 2.1.1
python -c "import mpi4py as m; print(m.__version__)"
3.0.0
Comments (9)
-
-
reporter -
gc.collect()
changes nothing -
with mpi4py 3.0.1, it's even worth because memory usage increases for all processes!
rank 1, memory usage = 38.090 Mo rank 0, memory usage = 37.922 Mo rank 1, memory usage = 46.578 Mo rank 0, memory usage = 46.316 Mo rank 1, memory usage = 54.695 Mo rank 0, memory usage = 54.324 Mo rank 1, memory usage = 62.688 Mo rank 0, memory usage = 62.059 Mo rank 1, memory usage = 70.633 Mo rank 0, memory usage = 70.051 Mo rank 1, memory usage = 78.848 Mo rank 0, memory usage = 78.043 Mo rank 1, memory usage = 86.809 Mo rank 0, memory usage = 85.777 Mo rank 1, memory usage = 94.855 Mo rank 0, memory usage = 93.770 Mo rank 1, memory usage = 102.910 Mo rank 0, memory usage = 101.504 Mo rank 1, memory usage = 110.902 Mo rank 0, memory usage = 109.496 Mo
- Cython / Numpy versions:
cython --version Cython version 0.29.2 python -c "import numpy as np; print(np.__version__)" 1.16.0
- The good news: I got the memory leak with python 3.7.1 and there is no memory leak with Python 3.7.2. So it could just be an old CPython bug...
Thank you for the help!
-
-
- changed status to resolved
-
I still have this issue with Intel MPI. Upgrading python (using conda-forge) and mpi4py (using pip) did not help.
(gfdyn) [x_ashmo@tetralith2 sandbox]$ MPICC=mpiicc pip install -U 'mpi4py==3.0.1' --ignore-installed --no-cache Collecting mpi4py==3.0.1 Downloading https://files.pythonhosted.org/packages/55/a2/c827b196070e161357b49287fa46d69f25641930fd5f854722319d431843/mpi4py-3.0.1.tar.gz (1.4MB) 100% |████████████████████████████████| 1.4MB 34.9MB/s Installing collected packages: mpi4py Running setup.py install for mpi4py ... done Successfully installed mpi4py-3.0.1 (gfdyn) [x_ashmo@tetralith2 sandbox]$ mpirun -np 2 python test_mem.py rank 0, memory usage = 31.383 Mo rank 0, memory usage = 41.164 Mo rank 0, memory usage = 49.156 Mo rank 0, memory usage = 56.891 Mo rank 0, memory usage = 64.883 Mo rank 0, memory usage = 72.875 Mo rank 0, memory usage = 80.609 Mo rank 0, memory usage = 88.660 Mo rank 0, memory usage = 96.391 Mo rank 0, memory usage = 104.383 Mo rank 1, memory usage = 31.336 Mo rank 1, memory usage = 40.793 Mo rank 1, memory usage = 48.762 Mo rank 1, memory usage = 56.723 Mo rank 1, memory usage = 64.684 Mo rank 1, memory usage = 72.934 Mo rank 1, memory usage = 80.926 Mo rank 1, memory usage = 88.918 Mo rank 1, memory usage = 96.895 Mo rank 1, memory usage = 104.887 Mo (gfdyn) [x_ashmo@tetralith2 sandbox]$ pip show mpi4py Name: mpi4py Version: 3.0.1 Summary: Python bindings for MPI Home-page: https://bitbucket.org/mpi4py/mpi4py/ Author: Lisandro Dalcin Author-email: dalcinl@gmail.com License: BSD Location: /home/x_ashmo/.conda/envs/gfdyn/lib/python3.7/site-packages Requires: Required-by: (gfdyn) [x_ashmo@tetralith2 sandbox]$ cython --version Cython version 0.29.6 (gfdyn) [x_ashmo@tetralith2 sandbox]$ python -c "import numpy as np; print(np.__version__)" 1.16.0 (gfdyn) [x_ashmo@tetralith2 sandbox]$ ml list Currently Loaded Modules: 1) EasyBuild/3.5.3-nsc17d8ce4 4) buildenv-intel/2018a-eb 7) binutils/.2.28 (H) 10) impi/.2018.1.163 (H) 13) FFTW/3.3.6-nsc1 2) nsc-eb-scripts/1.0 5) Python/3.6.3-anaconda-5.0.1-nsc1 8) icc/.2018.1.163-GCC-6.4.0-2.28 (H) 11) imkl/.2018.1.163 (H) 3) buildtool-easybuild/3.5.3-nsc17d8ce4 6) GCCcore/6.4.0 9) ifort/.2018.1.163-GCC-6.4.0-2.28 (H) 12) intel/2018a Where: H: Hidden Module (gfdyn) [x_ashmo@tetralith2 sandbox]$ which mpicc /software/sse/easybuild/prefix/software/impi/2018.1.163-iccifort-2018.1.163-GCC-6.4.0-2.28/bin64/mpicc (gfdyn) [x_ashmo@tetralith2 sandbox]$ which mpirun /software/sse/easybuild/prefix/software/impi/2018.1.163-iccifort-2018.1.163-GCC-6.4.0-2.28/bin64/mpirun (gfdyn) [x_ashmo@tetralith2 sandbox]$ python --version Python 3.7.2
-
Does the issue still happens with other MPI implementations?
-
@avmo I cannot reproduce the issue with same Python version and MPICH, both from system packages in Fedora 29.
-
I do not have access to MPICH, just OpenMPI. There is no issue with OpenMPI on the same hardware:
(gcc-openmpi) [x_ashmo@tetralith1 sandbox]$ mpirun -np 2 python test_mem.py rank 1, memory usage = 57.000 Mo rank 0, memory usage = 68.965 Mo rank 1, memory usage = 58.805 Mo rank 0, memory usage = 71.285 Mo rank 1, memory usage = 58.805 Mo rank 0, memory usage = 71.285 Mo rank 1, memory usage = 58.805 Mo rank 0, memory usage = 71.285 Mo rank 1, memory usage = 58.805 Mo rank 0, memory usage = 71.285 Mo rank 1, memory usage = 58.805 Mo rank 0, memory usage = 71.285 Mo rank 1, memory usage = 58.805 Mo rank 0, memory usage = 71.285 Mo rank 1, memory usage = 58.805 Mo rank 0, memory usage = 71.285 Mo rank 1, memory usage = 58.805 Mo rank 0, memory usage = 71.285 Mo rank 1, memory usage = 58.805 Mo rank 0, memory usage = 71.285 Mo (gcc-openmpi) [x_ashmo@tetralith1 sandbox]$ ml list Currently Loaded Modules: 1) mpprun/4.0 4) nsc-eb-scripts/1.0 7) binutils/.2.28 (H) 10) hwloc/.1.11.8 (H) 13) FFTW/.3.3.7 (H) 16) buildenv-gcc/2018a-eb 2) nsc/.1.1 (H,S) 5) buildtool-easybuild/3.5.3-nsc17d8ce4 8) GCC/6.4.0-2.28 11) OpenMPI/.2.1.2 (H) 14) ScaLAPACK/.2.0.2-OpenBLAS-0.2.20 (H) 17) Python/3.6.3-anaconda-5.0.1-nsc1 3) EasyBuild/3.5.3-nsc17d8ce4 6) GCCcore/6.4.0 9) numactl/.2.0.11 (H) 12) OpenBLAS/.0.2.20 (H) 15) foss/2018a Where: S: Module is Sticky, requires --force to unload or purge H: Hidden Module (gcc-openmpi) [x_ashmo@tetralith1 sandbox]$ conda list # packages in environment at /home/x_ashmo/.conda/envs/gcc-openmpi: # bzip2 1.0.6 h14c3975_1002 conda-forge ca-certificates 2019.3.9 hecc5488_0 conda-forge certifi 2019.3.9 py37_0 conda-forge cython 0.29.6 py37hf484d3e_0 conda-forge libblas 3.8.0 4_openblas conda-forge libcblas 3.8.0 4_openblas conda-forge libffi 3.2.1 he1b5a44_1006 conda-forge libgcc-ng 8.2.0 hdf63c60_1 libgfortran 3.0.0 1 conda-forge liblapack 3.8.0 4_openblas conda-forge libstdcxx-ng 8.2.0 hdf63c60_1 mpi4py 3.0.1 <pip> ncurses 6.1 hf484d3e_1002 conda-forge numpy 1.16.2 py37h8b7e671_1 conda-forge openblas 0.3.5 ha44fe06_0 conda-forge openssl 1.1.1b h14c3975_1 conda-forge pip 19.0.3 py37_0 conda-forge psutil 5.6.1 <pip> python 3.7.2 h381d211_0 conda-forge readline 7.0 hf8c457e_1001 conda-forge setuptools 40.8.0 py37_0 conda-forge sqlite 3.26.0 h67949de_1001 conda-forge tk 8.6.9 h84994c4_1001 conda-forge wheel 0.33.1 py37_0 conda-forge xz 5.2.4 h14c3975_1001 conda-forge zlib 1.2.11 h14c3975_1004 conda-forge
-
Good news: I can't reproduce it on a clean conda environment, with Intel-MPI... somehow my old conda-environment is "infected", I don't understand why.
pip uninstall
followed by a reinstall does not help. -
Fixed the old environment!
(gfdyn) [x_ashmo@tetralith1 sandbox]$ conda update numpy Fetching package metadata ............. Solving package specifications: . Package plan for installation in environment /home/x_ashmo/.conda/envs/gfdyn: The following packages will be UPDATED: numpy: 1.16.0-py37_blas_openblash1522bff_1000 conda-forge [blas_openblas] --> 1.16.2-py37_blas_openblash1522bff_0 conda-forge [blas_openblas] Proceed ([y]/n)? y numpy-1.16.2-p 100% |##############################################################################################################################################################################################| Time: 0:00:00 82.09 MB/s (gfdyn) [x_ashmo@tetralith1 sandbox]$ LD_LIBRARY_PATH=$LIBRARY_PATH mpirun -np 2 python test_mem.py rank 0, memory usage = 81.500 Mo rank 0, memory usage = 83.305 Mo rank 0, memory usage = 83.305 Mo rank 0, memory usage = 83.305 Mo rank 0, memory usage = 83.305 Mo rank 0, memory usage = 83.305 Mo rank 0, memory usage = 83.305 Mo rank 0, memory usage = 83.305 Mo rank 0, memory usage = 83.305 Mo rank 0, memory usage = 83.305 Mo rank 1, memory usage = 31.602 Mo rank 1, memory usage = 32.891 Mo rank 1, memory usage = 32.891 Mo rank 1, memory usage = 32.891 Mo rank 1, memory usage = 32.891 Mo rank 1, memory usage = 32.891 Mo rank 1, memory usage = 32.891 Mo rank 1, memory usage = 32.891 Mo rank 1, memory usage = 32.891 Mo rank 1, memory usage = 32.891 Mo
The memory leak seems to have originated from
numpy==1.16.0
. - Log in to comment
I cannot reproduce your issue. Maybe mpi4py 3.0.0 do have a problem, but then it seems that 3.0.1 fixed it. Maybe just because it uses a recent Cython version to generate the C wrappers? Or maybe the issue is with Open MPI, or maybe a different Python 3 version? Maybe it is your numpy version and/or Python's garbage collector? Can you call
gc.collect()
at the end of the inner loop?