Memory leak with repeated solve in python 3 (when using OpenMPI)

Issue #986 resolved
Søren Madsen created an issue

Dear all,

I am having issues with memory leaking with python3. Here is a small program that shows increasing memory usage:

#import petsc4py
#petsc4py.init('-log_view')
from dolfin import *

rank=MPI.rank(mpi_comm_world())
dolfin.cpp.common.monitor_memory_usage()

SolType='krylov'
#SolType='direct'

res=10
mesh=BoxMesh(mpi_comm_world(),Point(0.0,0.0,0.0),Point(1.0,1.0,1.0),res,res,res)

V = FunctionSpace(mesh, 'Lagrange', 1)

def u0_boundary(x, on_boundary):
    return on_boundary
Dbc = DirichletBC(V, Constant(1.0), u0_boundary)

nSol=2
sols=[]
for i in range(0,nSol):
    sols.append( Function(V) )

def Forward():
    for i in range(0,nSol):
        u=TrialFunction(V)
        v=TestFunction(V)

        F = inner(grad(u), grad(v))*dx
        L = Constant(0)*v*dx

        A, b = assemble_system(F, L, Dbc)

        if SolType=='krylov':
            pc = PETScPreconditioner('petsc_amg')
            solver = PETScKrylovSolver('gmres', pc)
            solver.set_operator(A)
            solver.solve(sols[i].vector(), b)
        else:
            solve(A,sols[i].vector(),b,'mumps')

for i in range(0,2000):
    Forward()

Here is a graph of the memory usage dolfin reports when using the two different solvers memoryLeak.png

The output from petscs logview shows the same amount of created and destroyed objects, so no obvious issue there.

I am using python 3.6.3 under Ubuntu and have compiled fenics 2018 myself (source of error :-)).

How do I go about debugging this?

Is there some kind of smart workaround for this (kind of) issue?

Comments (17)

  1. Søren Madsen reporter

    Yes, I tried also with the following loop

    for i in range(0,2000):
        Forward()
        gc.collect()
    

    Still leaking like above.

  2. Jan Blechta

    [deleted]

    Is it the last stable stable release you used? Is it with SWIG or pybind11? Note that nobody will bother with debugging removed SWIG. If you found a way how to reproduce the problem in a Docker container that would be helpful.

  3. Søren Madsen reporter

    I use the latest development release. It seems to use swig as cmake reports nothing with pybind11. Do you know how to compile with pybind11 instead of swig?

  4. Søren Madsen reporter

    Out of curiosity, did you also see the memory leak? (it seem to be indicated in the [deleted] part than you did?)

  5. Jan Blechta

    The latest dev version does not have SWIG.

    I don't know now. I got somehow confused about my multiple local installation and hence deleted that part. Let me get back to it later again.

  6. Søren Madsen reporter

    I tried to compile the latest developer version and using this guide to install with pybind11: https://bitbucket.org/fenics-project/dolfin/src/1c693f6a413833938ea984d900d7d123504e66c0/python/README.rst?at=master&fileviewer=file-view-default#README.rst-4,7,9,11,13,16,22

    It seems that the program have to be changed a bit to run (mpi_comm_world() -> MPI.comm_world) and I could not find the 'dolfin.cpp.common.monitor_memory_usage()' function any more. So to monitor memory usage I used the python 'resource' library and printed 'resource.RUSAGE_SELF' after each call to 'Forward()'. The memory usage was much lower but still increasing linearly.

    I will see if I can get a docker image working and test with that.

  7. Søren Madsen reporter

    Hi again!

    It turns out that it works with the 2017.2.0 docker image. No linear memory increase! My own compile of 2017.2.0 shows the linear increase shown above. I'm closing the issue as it seems to be my way of compiling that is wrong (it shows the linear increase on two different flavours of linux with two different version of PETSc).

    Thanks for pointing me in the right direction :-)

    Best Regards, Søren

  8. Prof Garth Wells

    It could be a PETSc leak, especially in view of GAMG being used ,which has changed a lot over the past 2 years. The PETSc version isn't reported.

  9. Søren Madsen reporter

    Could be I'm not compiling with pybind11 as you pointed out earlier.

    PETSc version is 3.7.6 on both the systems on which I compile fenics.

  10. Søren Madsen reporter

    It seems that I get the memory leak when using OpenMPI. If I switch to mpich the memory is almost constant. Go figure!

  11. Søren Madsen reporter

    I have tested it some more, and it seems that switching from OpenMPI to mpich indeed solved the problem, also in the large-scale optimization I was doing. Maybe a "warning" about using OpenMPI should be placed somewhere or this should be investigated some more?

  12. Log in to comment