Memory leak with repeated solve in python 3 (when using OpenMPI)

Issue #986 resolved

Søren Madsen created an issue 2018-01-19

Dear all,

I am having issues with memory leaking with python3. Here is a small program that shows increasing memory usage:

#import petsc4py
#petsc4py.init('-log_view')
from dolfin import *

rank=MPI.rank(mpi_comm_world())
dolfin.cpp.common.monitor_memory_usage()

SolType='krylov'
#SolType='direct'

res=10
mesh=BoxMesh(mpi_comm_world(),Point(0.0,0.0,0.0),Point(1.0,1.0,1.0),res,res,res)

V = FunctionSpace(mesh, 'Lagrange', 1)

def u0_boundary(x, on_boundary):
    return on_boundary
Dbc = DirichletBC(V, Constant(1.0), u0_boundary)

nSol=2
sols=[]
for i in range(0,nSol):
    sols.append( Function(V) )

def Forward():
    for i in range(0,nSol):
        u=TrialFunction(V)
        v=TestFunction(V)

        F = inner(grad(u), grad(v))*dx
        L = Constant(0)*v*dx

        A, b = assemble_system(F, L, Dbc)

        if SolType=='krylov':
            pc = PETScPreconditioner('petsc_amg')
            solver = PETScKrylovSolver('gmres', pc)
            solver.set_operator(A)
            solver.solve(sols[i].vector(), b)
        else:
            solve(A,sols[i].vector(),b,'mumps')

for i in range(0,2000):
    Forward()

Here is a graph of the memory usage dolfin reports when using the two different solvers

The output from petscs logview shows the same amount of created and destroyed objects, so no obvious issue there.

I am using python 3.6.3 under Ubuntu and have compiled fenics 2018 myself (source of error :-)).

How do I go about debugging this?

Is there some kind of smart workaround for this (kind of) issue?

Comments (17)

Jan Blechta
Did you try dropping there a call to gc.collect()?
- 2018-01-19T13:07:43+00:00
Søren Madsen reporter
Yes, I tried also with the following loop
```
for i in range(0,2000):
    Forward()
    gc.collect()
```
Still leaking like above.
- 2018-01-19T13:14:32+00:00
Søren Madsen reporter
Can anyone reproduce this?
- 2018-01-22T16:40:53+00:00
Jan Blechta
[deleted]

Is it the last stable stable release you used? Is it with SWIG or pybind11? Note that nobody will bother with debugging removed SWIG. If you found a way how to reproduce the problem in a Docker container that would be helpful.
- 2018-01-22T16:59:26+00:00
Søren Madsen reporter
I use the latest development release. It seems to use swig as cmake reports nothing with pybind11. Do you know how to compile with pybind11 instead of swig?
- 2018-01-22T18:12:04+00:00
Søren Madsen reporter
Out of curiosity, did you also see the memory leak? (it seem to be indicated in the [deleted] part than you did?)
- 2018-01-24T07:39:35+00:00
Jan Blechta
The latest dev version does not have SWIG.

I don't know now. I got somehow confused about my multiple local installation and hence deleted that part. Let me get back to it later again.
- 2018-01-26T08:49:44+00:00
Søren Madsen reporter
I tried to compile the latest developer version and using this guide to install with pybind11: https://bitbucket.org/fenics-project/dolfin/src/1c693f6a413833938ea984d900d7d123504e66c0/python/README.rst?at=master&fileviewer=file-view-default#README.rst-4,7,9,11,13,16,22

It seems that the program have to be changed a bit to run (mpi_comm_world() -> MPI.comm_world) and I could not find the 'dolfin.cpp.common.monitor_memory_usage()' function any more. So to monitor memory usage I used the python 'resource' library and printed 'resource.RUSAGE_SELF' after each call to 'Forward()'. The memory usage was much lower but still increasing linearly.

I will see if I can get a docker image working and test with that.
- 2018-01-29T11:34:09+00:00
Søren Madsen reporter
Hi again!

It turns out that it works with the 2017.2.0 docker image. No linear memory increase! My own compile of 2017.2.0 shows the linear increase shown above. I'm closing the issue as it seems to be my way of compiling that is wrong (it shows the linear increase on two different flavours of linux with two different version of PETSc).

Thanks for pointing me in the right direction :-)

Best Regards, Søren
- 2018-02-05T08:57:36+00:00
Søren Madsen reporter
- changed status to invalid
- 2018-02-05T08:58:48+00:00
Jan Blechta
Still weird, if one is able to "cause" a memory leak by "a wrong compilation procedure".
- 2018-02-05T09:17:12+00:00
Prof Garth Wells
It could be a PETSc leak, especially in view of GAMG being used ,which has changed a lot over the past 2 years. The PETSc version isn't reported.
- 2018-02-05T09:21:47+00:00
Søren Madsen reporter
Could be I'm not compiling with pybind11 as you pointed out earlier.

PETSc version is 3.7.6 on both the systems on which I compile fenics.
- 2018-02-05T09:25:37+00:00
Søren Madsen reporter
It seems that I get the memory leak when using OpenMPI. If I switch to mpich the memory is almost constant. Go figure!
- 2018-02-08T15:30:20+00:00
Søren Madsen reporter
I have tested it some more, and it seems that switching from OpenMPI to mpich indeed solved the problem, also in the large-scale optimization I was doing. Maybe a "warning" about using OpenMPI should be placed somewhere or this should be investigated some more?
- 2018-02-16T15:50:27+00:00
Søren Madsen reporter
- changed status to resolved
Switching from OpenMPI to Mpich solved the problem.
- 2018-02-16T15:51:07+00:00
Søren Madsen reporter
- changed title to Memory leak with repeated solve in python 3 (when using OpenMPI)
- 2018-02-19T06:50:21+00:00
Log in to comment

Assignee: –

Type: bug

Priority: major

Status: resolved

Component: –

Milestone: 2018.1

Version: dev

Votes: 0

Watchers: 1