Making a function space with non-matching cell type should raise an exception
Consider the following code:
from dolfin import *
mesh = UnitIntervalMesh(2)
ele = FiniteElement("CG", triangle, 1) # wrong, should be interval
V = FunctionSpace(mesh, ele) # should raise an exception, not segfault
When I run it with a version of FEniCS compiled last week, I get
[pfarrell@saoirse:/tmp]$ python crash.py
[0]PETSC ERROR: ------------------------------------------------------------------------
[0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range
[0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
[0]PETSC ERROR: or see http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
[0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X to find memory corruption errors
[0]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run
[0]PETSC ERROR: to get more information on the crash.
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
with errorcode 59.
NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
I think it should raise an exception rather than segfaulting.
Comments (29)
-
-
Isn't this just how dolfin behaves on exceptions?
-
@martinal - python user code should not cause a
SEGV
! -
I'm not saying it should, I'm saying this is not UFL but dolfin.
Also I don't get a segv:
martinal:ffc/test/regression (martinal/topic-new-ufc-class-generation) $ mpirun -n 3 python crash.py Building mesh (dist 0a) Process 0: *** Warning: Mesh is empty, unable to create entities of dimension 0. Process 0: *** Warning: Mesh is empty, unable to create connectivity 0 --> 1. Process 2: *** Warning: Mesh is empty, unable to create entities of dimension 0. Process 2: *** Warning: Mesh is empty, unable to create connectivity 0 --> 1. =================================================================================== = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES = PID 31709 RUNNING AT martinal-P15SM = EXIT CODE: 134 = CLEANING UP REMAINING PROCESSES = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES =================================================================================== YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Aborted (signal 6) This typically refers to a problem with your application. Please see the FAQ page for debugging suggestions
which is the same message as with this code:
from dolfin import * raise RuntimeError("")
-
In serial, I get:
*** ------------------------------------------------------------------------- *** DOLFIN encountered an error. If you are not able to resolve this issue *** using the information listed below, you can ask for help at *** *** fenics-support@googlegroups.com *** *** Remember to include the error message listed below and, if possible, *** include a *minimal* running example to reproduce the error. *** *** ------------------------------------------------------------------------- *** Error: Unable to complete call to function build(). *** Reason: Assertion dofmap._ufc_dofmap->topological_dimension() == D failed. *** Where: This error was encountered inside ../../dolfin/fem/DofMapBuilder.cpp (line 81). *** Process: 0 *** *** DOLFIN version: 2017.1.0.dev0 *** Git changeset: c7c843cf30c86e3b1042cf9f23b97f8b81480d11 *** -------------------------------------------------------------------------
which makes some kind of sense. In parallel, it does
abort()
- but it suggests that the error can be picked up inDofMapBuilder
. -
I guess this
dolfin_assert()
should be adolfin_error()
-
So
1) It's probably a segfault because the assertion is disabled in Patricks build
2) dolfin. and/or ufl.FunctionSpace.init should check for compatibility of domain and element
-
I'll fix the assert->error. Maybe I can find the reason for SEGV in parallel. Possibly also due to tiny mesh in code above.
-
My guess is that if the assert is removed in the release build, inconsistent dimensions will easily cause a segfault sooner or later.
-
For some reason,
dolfin_error()
is not printing anything in parallel, just callsabort()
... I'll push a branch, maybe somebody else can take a look. -
There's no reason to change assertion error in
DofMapBuilder
into exception. This is internal library code (not strictly, it is in DOLFIN API, but barely any user is using it). The proper fix should be inufl.FunctionSpace.__init__
. -
dolfin_error() is not printing anything in parallel
Are you running python or C++ DOLFIN program? Python goes through here. Maybe having a short sleep there would be better.
-
It's not very important whether or not it is an error or an assertion, it is hardly a costly change, and this makes things safer. What I am more concerned about is why I don't see an error message with
mpirun
. -
@blechta - yes, with Python. Or use mpi barrier perhaps? Maybe not. It is a difficult problem... actually it even fails to produce an error message for me if I use
mpirun -n 1
. -
I don't see an error message with mpirun
How do I reproduce it? Patrick's code prints a traceback and exception (coming from the assertion error) for me, and
Aborted (core dumped)
when run sequentially. Withmpirun -n 3
it segfaults, PETSc error handler catches SIGSEGV and calls MPI_Abort, which results in a message you're seeing. Maybe tryfrom dolfin import * SubSystemsManager.init_petsc() from petsc4py import PETSc PETSc.Sys.pushErrorHandler("traceback") mesh = UnitIntervalMesh(2) ele = FiniteElement("CG", triangle, 1) # wrong, should be interval V = FunctionSpace(mesh, ele) # should raise an exception, not segfault
to check it isn't just petsc4py with it's annoying error handler hiding messages from PETSc.
-
It's not
petsc4py
- I am not using it. I guess there may be some variation in MPI implementations, so that could be the reason for different behaviour. First, note that the mesh is too small to use in parallel, so increase it to say 20. I then put a print statement inside the clause withdolfin_error()
. The print (cout
) statement works, but thedolfin_error()
does not print. -
-
Or use mpi barrier perhaps?
Absolutely not.
Let me investigate if there's something new in exception hooks in Py3 which I missed. We push our own hook to
sys.excephook
and I was only tuning it on Py2 some time ago. -
Can you confirm that Py2 behaviour is fine?
-
Yes, it is python3 related. python2 works fine.
-
Seems that it's buffering. Can you confirm that
python3 -u
works fine? -
Yes, correct,
python3 -u
works. -
Fix is in master.
Sorry for flooding the original issue. What do you suggest about that @martinal? Can you fix on the UFL side?
-
I don't have time.
-
- changed status to invalid
Let's solve this on UFL level: https://bitbucket.org/fenics-project/ufl/issues/94.
-
- changed status to resolved
Fix issue 833
→ <<cset 46a5e7711883>>
-
- changed status to open
Needs also tweak in DOLFIN 46a5e77
-
- changed status to resolved
Raise on cell mismatch in function space construction
Fixes issue
#833. Thanks Patrick Farrell for reporting.→ <<cset a02231f6693f>>
-
- Log in to comment
@martinal, could it be an UFL issue? There's
ufl.FunctionSpace
construction hidden behind a call on the last line.