Python3 non-deterministic destruction of class members

Issue #877 wontfix
Jan Blechta created an issue

The following code demonstrates that class owning more than one object with parallel destructor (here MUMPS solver) leads to non-deterministic destruction order and eventual deadlock. This happens because members are owned through dictionaries which have random iteration order due to hash randomization in Python <= 3.5.

Here is a very important, and possibly wide-spread, pattern in user codes, which is the key of the problem:

        self.s0 = PETScLUSolver()
        self.s1 = PETScLUSolver()

(imagine any other objects with parallel destructors). Objects s0 and s1 might get destroyed in different order on different ranks which leads to a deadlock.

Possible solution to this problem is to distribute FEniCS with Python 3.6 where the problem should not happen (not tested) because dicts are iterated in the order of insertion.

To reproduce run the following code with mpirun -n 2 python3 test_py3_gc.py. To circumvent the problem run PYTHONHASHSEED=0 mpirun -n 2 python3 test_py3_gc.py.

from dolfin import *

# Bilinear and linear form
mesh = UnitSquareMesh(3, 3)
V = FunctionSpace(mesh, "P", 1)
u = TrialFunction(V)
v = TestFunction(V)
a = u*v*dx
L = v*dx

# Assemble A, x, b
A = PETScMatrix(mesh.mpi_comm())
assemble(a, tensor=A)
b = assemble(L)
u = Function(V)
x = u.vector()

# Make MUMPS talk to us
PETScOptions.set("-mat_mumps_icntl_4", 2)


class Foo(object):
    """Class owning two objects (MUMPS solvers) with parallel
    destruction semantics"""

    def __init__(self, A, x, b):
        """Create two MUMPS solvers and init them by solving"""
        self.s0 = PETScLUSolver(A.mpi_comm(), A, "mumps")
        self.s1 = PETScLUSolver(A.mpi_comm(), A, "mumps")
        self.s0.solve(x, b)
        self.s1.solve(x, b)

# Create an instance owning two objects and observe
# they are destroyed in different order across MPI
# ranks leading to deadlocks
foo = Foo(A, x, b)

Attach the debugger and check the stacktrace to see that a deadlock occured in garbage collection of MUMPS when MUMPS tries to do something MPI-collective.

Comments (6)

  1. Prof Garth Wells

    Does an explicit del avoid the problem, or do we need an explicit destroy function that calls the destructor?

  2. Jan Blechta reporter

    Explicit del where? The order of destruction in the example above differs across MPI ranks. That causes a deadlock.

    There is a need for explicit destroy. It is another issue filed in the issue tracker.

  3. Prof Garth Wells

    Using del to force destruction in an appropriate order rather than relying on garbage collection. I have a vague recollection that using del does not guarantee that the destructor is called at that point.

  4. Jan Blechta reporter

    Yes, explicit destroy is the the most safe thing one can do.

    I am throwing a usual programming pattern which fails in Py 3.5 and should be fine in 3.6. So I suggest to only abandon Py2 if Py3.6 is already used for FEniCS distribution channels.

  5. Log in to comment