Making a function space with non-matching cell type should raise an exception

Issue #833 resolved
Patrick Farrell created an issue

Consider the following code:

from dolfin import *

mesh = UnitIntervalMesh(2)
ele = FiniteElement("CG", triangle, 1) # wrong, should be interval
V = FunctionSpace(mesh, ele) # should raise an exception, not segfault

When I run it with a version of FEniCS compiled last week, I get

[pfarrell@saoirse:/tmp]$ python crash.py 
[0]PETSC ERROR: ------------------------------------------------------------------------
[0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range
[0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
[0]PETSC ERROR: or see http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
[0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors
[0]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run 
[0]PETSC ERROR: to get more information on the crash.
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD 
with errorcode 59.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------

I think it should raise an exception rather than segfaulting.

Comments (29)

  1. Jan Blechta

    @martinal, could it be an UFL issue? There's ufl.FunctionSpace construction hidden behind a call on the last line.

  2. Martin Sandve Alnæs

    I'm not saying it should, I'm saying this is not UFL but dolfin.

    Also I don't get a segv:

    martinal:ffc/test/regression (martinal/topic-new-ufc-class-generation) $ mpirun -n 3 python crash.py 
    Building mesh (dist 0a)
    Process 0: *** Warning: Mesh is empty, unable to create entities of dimension 0.
    Process 0: *** Warning: Mesh is empty, unable to create connectivity 0 --> 1.
    Process 2: *** Warning: Mesh is empty, unable to create entities of dimension 0.
    Process 2: *** Warning: Mesh is empty, unable to create connectivity 0 --> 1.
    
    ===================================================================================
    =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
    =   PID 31709 RUNNING AT martinal-P15SM
    =   EXIT CODE: 134
    =   CLEANING UP REMAINING PROCESSES
    =   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
    ===================================================================================
    YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Aborted (signal 6)
    This typically refers to a problem with your application.
    Please see the FAQ page for debugging suggestions
    

    which is the same message as with this code:

    from dolfin import *
    raise RuntimeError("")
    
  3. Chris Richardson

    In serial, I get:

    *** -------------------------------------------------------------------------
    *** DOLFIN encountered an error. If you are not able to resolve this issue
    *** using the information listed below, you can ask for help at
    ***
    ***     fenics-support@googlegroups.com
    ***
    *** Remember to include the error message listed below and, if possible,
    *** include a *minimal* running example to reproduce the error.
    ***
    *** -------------------------------------------------------------------------
    *** Error:   Unable to complete call to function build().
    *** Reason:  Assertion dofmap._ufc_dofmap->topological_dimension() == D failed.
    *** Where:   This error was encountered inside ../../dolfin/fem/DofMapBuilder.cpp (line 81).
    *** Process: 0
    *** 
    *** DOLFIN version: 2017.1.0.dev0
    *** Git changeset:  c7c843cf30c86e3b1042cf9f23b97f8b81480d11
    *** -------------------------------------------------------------------------
    

    which makes some kind of sense. In parallel, it does abort() - but it suggests that the error can be picked up in DofMapBuilder.

  4. Martin Sandve Alnæs

    So

    1) It's probably a segfault because the assertion is disabled in Patricks build

    2) dolfin. and/or ufl.FunctionSpace.init should check for compatibility of domain and element

  5. Chris Richardson

    I'll fix the assert->error. Maybe I can find the reason for SEGV in parallel. Possibly also due to tiny mesh in code above.

  6. Martin Sandve Alnæs

    My guess is that if the assert is removed in the release build, inconsistent dimensions will easily cause a segfault sooner or later.

  7. Chris Richardson

    For some reason, dolfin_error() is not printing anything in parallel, just calls abort()... I'll push a branch, maybe somebody else can take a look.

  8. Jan Blechta

    There's no reason to change assertion error in DofMapBuilder into exception. This is internal library code (not strictly, it is in DOLFIN API, but barely any user is using it). The proper fix should be in ufl.FunctionSpace.__init__.

  9. Jan Blechta

    dolfin_error() is not printing anything in parallel

    Are you running python or C++ DOLFIN program? Python goes through here. Maybe having a short sleep there would be better.

  10. Chris Richardson

    It's not very important whether or not it is an error or an assertion, it is hardly a costly change, and this makes things safer. What I am more concerned about is why I don't see an error message with mpirun.

  11. Chris Richardson

    @blechta - yes, with Python. Or use mpi barrier perhaps? Maybe not. It is a difficult problem... actually it even fails to produce an error message for me if I use mpirun -n 1.

  12. Jan Blechta

    I don't see an error message with mpirun

    How do I reproduce it? Patrick's code prints a traceback and exception (coming from the assertion error) for me, and Aborted (core dumped) when run sequentially. With mpirun -n 3 it segfaults, PETSc error handler catches SIGSEGV and calls MPI_Abort, which results in a message you're seeing. Maybe try

    from dolfin import *
    SubSystemsManager.init_petsc()
    from petsc4py import PETSc
    PETSc.Sys.pushErrorHandler("traceback")
    
    mesh = UnitIntervalMesh(2)
    ele = FiniteElement("CG", triangle, 1) # wrong, should be interval
    V = FunctionSpace(mesh, ele) # should raise an exception, not segfault
    

    to check it isn't just petsc4py with it's annoying error handler hiding messages from PETSc.

  13. Chris Richardson

    It's not petsc4py - I am not using it. I guess there may be some variation in MPI implementations, so that could be the reason for different behaviour. First, note that the mesh is too small to use in parallel, so increase it to say 20. I then put a print statement inside the clause with dolfin_error(). The print (cout) statement works, but the dolfin_error() does not print.

  14. Jan Blechta

    Or use mpi barrier perhaps?

    Absolutely not.

    Let me investigate if there's something new in exception hooks in Py3 which I missed. We push our own hook to sys.excephook and I was only tuning it on Py2 some time ago.

  15. Jan Blechta

    Fix is in master.

    Sorry for flooding the original issue. What do you suggest about that @martinal? Can you fix on the UFL side?

  16. Log in to comment