PYTHONHASHSEED and random order of object destruction

Issue #73 wontfix

Former user created an issue 2017-08-10

It is important to main a consistent PYTHONHASHSEED cross the entire MPI application. The object destruction order in Python is controlled by PYTHONHASHSEED.

If there are MPI calls during object destruction (e.g. freeing a communicator), the code will crash or hang. This is the only place where the order of collective calls is out of the programmer's hands: if we avoid cyclic references (gc is also random).

Minimal case to reproduce:

from mpi4py import MPI

class A:
    def __init__(self, comm, i):
        self.comm = comm
        self.i = i
    def __del__(self):
        alli = self.comm.allgather(self.i)
        if self.comm.rank == 0:
             print('deleting', alli)
        self.comm.barrier()

class B:
    def __init__(self, comm):
       self.sample = A(comm, 0)
       self.that = A(comm, 1)

b = B(MPI.COMM_WORLD)
del b

On a SGI MPT system,

(3.5) PBS r327i0n14:~> mpiexec -n 4 python dead.py   
deleting [1, 0, 1, 0]
deleting [0, 1, 0, 1]

Setting PYTHONHASHSEED

(3.5) PBS r327i0n14:~> export PYTHONHASHSEED=2
(3.5) PBS r327i0n14:~> mpiexec -n 4 python dead.py 
deleting [0, 0, 0, 0]
deleting [1, 1, 1, 1]

Comments (13)

Feng Yu
Sorry. I forgot to sign in.
- 2017-08-10T02:44:45+00:00
Feng Yu
It appears there is no way to set the variable during python runtime.
- 2017-08-10T02:46:08+00:00
Lisandro Dalcin
Even if you had a way to change the hash seed at runtime, that would not prevent (in general, not just simple examples) the gc collecting objects in random order. This issue is even more problematic in a PyPy runtime. This is an issue of using MPI with dynamic languages.

There is nothing I can do from my side to fix/alleviate this issue. There are basically two approaches from user's side to get things done:
1. Implement a destroy() method in your classes that has to be called manually and collectively.
2. implement the context manager protocol (__enter__/__exit__) and use the with statement. This is the recommended, pythonic way.
- 2017-08-10T12:26:14+00:00
Lisandro Dalcin
- changed status to wontfix
- 2017-08-10T12:26:31+00:00
Feng Yu
The two solutions you proposed are two facets of one. It is basically to avoid the life-cycle management of the hosting language but to use a customized (with / destroy) one that is collective. It may call for some really elevated with statement -- requiring every collective object to be used as a context manager..

It may be useful to have a CollectiveObject base class or a decorator to signify the importance of a secondary life-cycle of collective objects ... Have you seen of examples of such base classes?

PS : The issue goes beyond face level object destruction. The order of object destruction in a dictionary is also affected by the seed. Thus we may need a CollectiveDictionary as well.
- 2017-08-10T17:35:02+00:00
Lisandro Dalcin
Destruction in the Python language is not deterministic by definition of the language. Under some special circumstances, CPython may have some determinism. Other runtimes like PyPy are even less deterministic. You can write a CollectiveDict to enforce object to be dereferenced deterministically, however you cannot enforce the actual collection with the refcount drops to 0, because of a) possible reference cycles, and b) deferred collection as in PyPy.

IMHO, these kinds of experiments are beyond the scope of mpi4py, if is very easy to get things wrong, and I'm not willing to add something to mpi4py core I'm not 100% confident with to have trouble in the future because it does not meet the quality users expect.

PS: I'm still struggling to find a way to define meaningful deallocation of MPI handles. For now, users are expect to call xxx.Free() manually for anything MPI they create.
- 2017-08-10T19:18:31+00:00
Feng Yu
Without meldling with the hash seed I think one can use an orderedobject base class -- I've just coded one up and put it on pypi: https://pypi.python.org/pypi/orderedobject

The case I am facing is the destruction of fftw-mpi plans. My solver objects own a few FFTW Plans, each Plan wrapping a MPI communicator. I didn't have an issue with openmpi or Cray, but recently I ported them to a SGI, and noticed on MPT they must be freed collectively -- if not, the code hangs in mpi_comm_free.

Unfortunately, I don't yet see a way to fit the use pattern as a context manager. If I create these plans inside a context manager, it means I have to replan every time I start to do a bunch of FFTs -- which sort of worked against the spirit of having plans in the first place. The issue is the underlying C library made the assumption that the memory cycle and life cycles are identical; but the undeterministic destruction means the destructor can only model the memory cycle -- and that's under the assumption the memory allocator supports random access, which may not be true with some libraries.

The Python destructors are therefore mostly useless -- thus I think your current tactic on .Free is probably the best way. (I may have missed a few Free() on MPI types... now that you've pointed this out).
- 2017-08-11T00:56:29+00:00
Lisandro Dalcin
Note however that most MPI handles should be safe to be freed by a collector. The exceptions are communicator (MPI_Comm), windows (MPI_Win), and files (MPI_Files). Eventually, I would like to implement automatic collection for the other types, and emit a warning for leaked comm/win/file handles. But I'm facing the problem of handling shared handles, i.e. two Python instances that internally reference the same MPI handle. Such handle sharing can occur either from Python code, e.g. comm = MPI.Comm(othercomm), or in extension modules using mpi4py's C API support, i.e.PyMPIComm_New(). Maybe I should attempt to implement my own refcounting of MPI handles, but I don't have a final decision on the best approach.
- 2017-08-11T11:07:18+00:00
Lisandro Dalcin
@rainwoodman You should not fill-up the PyPI namespace with experiments. IMHO you should put your code in a repository, with is usually trivial to pip install by passing the URL. A public repo is a much better place of others folks to comment on your code, raise issues, etc., otherwise your code end up buried in tarball with is harder to find and look at.
- 2017-08-11T11:12:34+00:00
Lisandro Dalcin
@rainwoodman Have you considered keeping track of your solver instances in a module-level list instance, implement a solver._destroy() method, and then register a function with atexit to traverse the list and destroy the solvers collectively and in order? At this point, I think you agree that relying in __del__ methods for collection in MPI applications is quite hard to get right.
- 2017-08-11T11:18:58+00:00
Feng Yu
- orderedobject is also on github. I use automatic deployment to put it to pypi. I think people will find orderedobject useful for other purposes well.
- I thought about collector and really disliked it. I feel controlling refcounts is cleaner than contronlling the GC. My sense is that GC is the ultimate form of a collector -- if we want to keep the number of objects finite within the life cycle of the application. We already know a GC is undeterministic.
- I agree del is hard to get right. But I think it is more intuitive. Because on the programming side, it only add two simple rules writing parallel applications which are well motivated: 1. Ensure collective operations occur in order (which MPI requires). This requires using ordereddict, orderedobject, and orderedclass (the last one is an example in python doc) in place of dict, object and class. 2. avoid cyclic references (a good design usually prefers this).
- The fundamental inconsistency is that modern languages encourages decoupling operation cycle from memory alloc cycle; while older languages (MPI based on) doesn't. There must be some very good argument to support decoupling them, I just haven't been able to fully appreciate. To me, main memory is a resource too (and can error and fail), so why treat it so differently from others and add these complication? I think the answer to whether it is worthy to add a layer of collector on top of MPI to fake the decoupling may as well lie in the answer to this.
- 2017-08-11T20:05:32+00:00
Lisandro Dalcin
Could you paste a link to the orderedobject repo? Also, if you have the time, It would be good if you raise all these issues and advertise orderedobject in mpi4py's mailing list in Google Groups.
- 2017-08-13T07:51:38+00:00
Feng Yu
Here is the link. https://github.com/rainwoodman/orderedobject

I will need to first start using and testing the solution in my down stream projects before advertising the solution to more people. But the demand on a solution is there.

I wonder if we shall get a bunch of MPI+Python people sitting together and work out what works and what doesn't. It may be that MPI needs to be improved (I remember fault tolerent is one big pain too), or it may be that Python needs to be more conservative -- at least we shall probably have a voice rather than just trying to cope with what is out there.. It may also shed some light on whether we need something beyond MPI in modern days, and how it shall be shaped.

The PyHPC meeting is in November, but I don't think they have slots for a BOF session -- it doesn't hurt to ask. What do you think?
- 2017-08-16T02:45:01+00:00
Log in to comment

Assignee: –

Type: bug

Priority: major

Status: wontfix

Votes: 0

Watchers: 2