- changed milestone to 2018.1
-
assigned issue to
Python3 non-deterministic destruction of class members
The following code demonstrates that class owning more than one object with parallel destructor (here MUMPS solver) leads to non-deterministic destruction order and eventual deadlock. This happens because members are owned through dictionaries which have random iteration order due to hash randomization in Python <= 3.5.
Here is a very important, and possibly wide-spread, pattern in user codes, which is the key of the problem:
self.s0 = PETScLUSolver()
self.s1 = PETScLUSolver()
(imagine any other objects with parallel destructors). Objects s0
and s1
might get destroyed in different order on different ranks which leads to a deadlock.
Possible solution to this problem is to distribute FEniCS with Python 3.6 where the problem should not happen (not tested) because dicts are iterated in the order of insertion.
To reproduce run the following code with mpirun -n 2 python3 test_py3_gc.py
. To circumvent the problem run PYTHONHASHSEED=0 mpirun -n 2 python3 test_py3_gc.py
.
from dolfin import *
# Bilinear and linear form
mesh = UnitSquareMesh(3, 3)
V = FunctionSpace(mesh, "P", 1)
u = TrialFunction(V)
v = TestFunction(V)
a = u*v*dx
L = v*dx
# Assemble A, x, b
A = PETScMatrix(mesh.mpi_comm())
assemble(a, tensor=A)
b = assemble(L)
u = Function(V)
x = u.vector()
# Make MUMPS talk to us
PETScOptions.set("-mat_mumps_icntl_4", 2)
class Foo(object):
"""Class owning two objects (MUMPS solvers) with parallel
destruction semantics"""
def __init__(self, A, x, b):
"""Create two MUMPS solvers and init them by solving"""
self.s0 = PETScLUSolver(A.mpi_comm(), A, "mumps")
self.s1 = PETScLUSolver(A.mpi_comm(), A, "mumps")
self.s0.solve(x, b)
self.s1.solve(x, b)
# Create an instance owning two objects and observe
# they are destroyed in different order across MPI
# ranks leading to deadlocks
foo = Foo(A, x, b)
Attach the debugger and check the stacktrace to see that a deadlock occured in garbage collection of MUMPS when MUMPS tries to do something MPI-collective.
Comments (6)
-
reporter -
Does an explicit
del
avoid the problem, or do we need an explicitdestroy
function that calls the destructor? -
reporter Explicit del where? The order of destruction in the example above differs across MPI ranks. That causes a deadlock.
There is a need for explicit destroy. It is another issue filed in the issue tracker.
-
Using
del
to force destruction in an appropriate order rather than relying on garbage collection. I have a vague recollection that usingdel
does not guarantee that the destructor is called at that point. -
reporter Yes, explicit
destroy
is the the most safe thing one can do.I am throwing a usual programming pattern which fails in Py 3.5 and should be fine in 3.6. So I suggest to only abandon Py2 if Py3.6 is already used for FEniCS distribution channels.
-
reporter - changed status to wontfix
Not an issue in Python 3.6.
- Log in to comment