Intermittent failures with multiple threads

Issue #326 resolved
James R. Maddison created an issue

Intermittent failures can be encountered when using multiple threads

Example:

from dolfin import *
parameters["num_threads"] = 2
mesh = UnitSquareMesh(10, 10)
space = FunctionSpace(mesh, "CG", 1)
F  = Function(space)
L = inner(TestFunction(space), F) * dx
solve(L == 0, F)

Terminal output example:

No Jacobian form specified for nonlinear variational problem.
Differentiating residual form F to obtain Jacobian J = F'.
Solving nonlinear variational problem.
  *** Warning: Form::coloring does not properly consider form type.
  Coloring mesh.
Segmentation fault (core dumped)

Comments (14)

  1. Chris Richardson

    Confirmed. I also have this behaviour. Running with gdb gives the following backtrace

    Program received signal SIGABRT, Aborted.
    0x00007ffff6f08037 in raise () from /lib/x86_64-linux-gnu/libc.so.6
    (gdb) bt
    #0  0x00007ffff6f08037 in raise () from /lib/x86_64-linux-gnu/libc.so.6
    #1  0x00007ffff6f0b698 in abort () from /lib/x86_64-linux-gnu/libc.so.6
    #2  0x00007ffff6f455ab in ?? () from /lib/x86_64-linux-gnu/libc.so.6
    #3  0x00007ffff6f51a46 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
    #4  0x00007fffe9250f1d in PetscStackDestroy () at /home/chris/code/FEniCS/src/petsc-3.4.3/src/sys/error/pstack.c:151
    #5  0x00007fffe92eaff0 in PetscFinalize () at /home/chris/code/FEniCS/src/petsc-3.4.3/src/sys/objects/pinit.c:1015
    #6  0x00007fffebf28285 in dolfin::SubSystemsManager::finalize_petsc () at /home/chris/code/FEniCS/src/dolfin/dolfin/common/SubSystemsManager.cpp:274
    #7  0x00007fffebf28299 in dolfin::SubSystemsManager::finalize () at /home/chris/code/FEniCS/src/dolfin/dolfin/common/SubSystemsManager.cpp:218
    #8  0x00007ffff6f0d121 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
    #9  0x00007ffff6f0d1a5 in exit () from /lib/x86_64-linux-gnu/libc.so.6
    #10 0x00007ffff6ef2eac in __libc_start_main () from /lib/x86_64-linux-gnu/libc.so.6
    #11 0x00000000004e1115 in _start ()
    
  2. Prof Garth Wells

    @chris_richardson Is this using OpenMPI from Ubuntu? I think the package does not enable MPI with threads.

  3. Chris Richardson

    @garth-wells Yes, I think so. But, if not running in parallel, should that matter? I can try on another system with a different MPI, or recompiling without MPI, if you think that will help.

  4. Lawrence Mitchell

    I believe the problem may be that PETSc's internal stack management is not thread safe unless compiled --with-pthreadclasses. If there are calls to (say) MatSetValues inside an openmp region, this may be the cause.

  5. Prof Garth Wells

    @chris_richardson It shouldn't matter, but I have recently seen a system that crashes when MPI threads are enabled and DOLFIN is run using Python and without mpirun.

    Threaded assembly crashes for me when PETSc is compiled in debug mode. PETSc is not officially thread-safe, but we have an ugly workaround with colouring and precomputation of the sparsity. We should try the option suggested by @wence , which I think is relatively new.

  6. Chris Richardson

    @garth-wells @wence Is there a way to check whether PETSc has been compiled with that option, and disable threaded assembly if not?

  7. Lawrence Mitchell
    #include <petscconf.h>
    
    #if !(defined(PETSC_HAVE_PTHREADCLASSES) || defined(PETSC_HAVE_OPENMP))
    disable_threaded_assembly()
    #endif
    

    I think.

  8. Jan Blechta

    Or maybe

    #include <petscconf.h>
    
    #if !(defined(PETSC_HAVE_PTHREADCLASSES) || defined(PETSC_HAVE_OPENMP))
    disable_threaded_assembly_to_PETSc_tensors()
    keep_enabled_threaded_assembly_to_uBLAS_tensors()
    keep_enabled_threaded_assembly_to_scalars()
    #endif
    
  9. Log in to comment