Jit file locking is probably buggy
This could be either in ffc jit or in instant. When running dolfin tests with mpirun, we seem to get deadlocks with all processes using 100% cpu forever on some tests. Making sure the tests are run in serial first seems to solve the problem. I may be wrong and the problem may be somewhere else. But the horrible implementation of file locking that we have is a likely culprit.
Comments (5)
-
-
reporter Yes, and I think blaming the jit file locking is incorrect here. Johannes pinned the precise-i386 failures down to a small mesh.xml that fails to get partitioned correctly, similar to a previous issue but in a different place:
https://bitbucket.org/fenics-project/dolfin/issue/476/not-all-processes-exit-when-creation-of
-
reporter - changed status to invalid
-
Ok. On
mpich
buildbot we could blame 2 years old MPICH release but what about Py3 buildbot? Can you test with Py3 locally, Martin? -
reporter - removed milestone
Removing milestone: 1.6 (automated comment)
- Log in to comment
Are you referring to buildbot failures? They happen randomly on
precise-i386
,mpich
andtrusty-amd64-py3
. The first two are timeouts, deadlocks or segfaults but I wouldn't bother trying fixing it with such an old MPI releases. Py3 buildbot tends to get into infinite recursion (on different places in plot demo) and stack overflow, http://fenicsproject.org:8010/builders/dolfin-master-full-trusty-amd64-py3/builds/665/steps/make%20run_regressiontests/logs/demo.log