Collective exception handling in unit tests

Issue #718 new
Jan Blechta created an issue

Running following code with mpirun -n 3 python test_exchook.py, i.e. without pytest,

# file: test_exchook.py
import pytest
from dolfin import *

r = MPI.rank(mpi_comm_world())
if r==0:
    1/0
MPI.barrier(mpi_comm_world())

rank 0 will signal ABRT and mpirun will terminate all ranks.

When run with pytest mpirun -n 3 py.test test_exchook.py this is not possible. Exceptions are catched and handled by pytest. This easily leads to a deadlock on subsequent collective operation.

Comments (3)

  1. Jan Blechta reporter

    Possible resolution: something like

    def interruptable_barrier(comm=mpi_comm_world(), exc=None):
        if MPI.max(comm, int(bool(exc))):
            raise exc or CollectiveException("other rank raised an exception")
    

    which would be used instead of usual MPI.barrier in use_gc_barrier and would need to wrap single lines which are prone to fail (we can make it context manager). Moreover pytest would need to be instructed to call this hook on exception handling.

  2. Log in to comment