Segfault in MatDestroy in harmonic smoothing test

Issue #395 invalid
Martin Sandve Alnæs created an issue

I'm trying to trigger parallel errors by running the different subsets of unit tests with mpirun -n 2,3,4 and got a segfault on one of the processes with -n 2.

The stacktraces below show one process is in the destructor of LinearSolver while the other is in LinearSolver::solve.

This should be investigated. I'll report back if I can reproduce or not.

Process A (segfaulted):

(gdb) where
#0  0x00007fffeef8e34b in MatDestroy ()
   from /home/martinal/opt/fenics/dorsal-dev-1410/lib/libpetsc.so
#1  0x00007fffef1e7b35 in PCReset_ML(_p_PC*) ()
   from /home/martinal/opt/fenics/dorsal-dev-1410/lib/libpetsc.so
#2  0x00007fffef0eedfd in PCReset ()
   from /home/martinal/opt/fenics/dorsal-dev-1410/lib/libpetsc.so
#3  0x00007fffef0eefe8 in PCDestroy ()
   from /home/martinal/opt/fenics/dorsal-dev-1410/lib/libpetsc.so
#4  0x00007fffef187ac3 in KSPDestroy ()
   from /home/martinal/opt/fenics/dorsal-dev-1410/lib/libpetsc.so
#5  0x00007ffff037a976 in dolfin::PETScKrylovSolver::~PETScKrylovSolver (
    this=0x1ca2250, __in_chrg=<optimized out>)
    at ../../dolfin/la/PETScKrylovSolver.cpp:189
#6  0x00007ffff037abd9 in dolfin::PETScKrylovSolver::~PETScKrylovSolver (
    this=0x1ca2250, __in_chrg=<optimized out>)
    at ../../dolfin/la/PETScKrylovSolver.cpp:192
#7  0x00007ffff038863a in _M_release (this=0x20e0520)
    at /usr/include/c++/4.8/bits/shared_ptr_base.h:144
#8  ~__shared_count (this=0x1c9d7a8, __in_chrg=<optimized out>)
    at /usr/include/c++/4.8/bits/shared_ptr_base.h:546
#9  ~__shared_ptr (this=0x1c9d7a0, __in_chrg=<optimized out>)
    at /usr/include/c++/4.8/bits/shared_ptr_base.h:781
#10 ~shared_ptr (this=0x1c9d7a0, __in_chrg=<optimized out>)
---Type <return> to continue, or q <return> to quit---
    at /usr/include/c++/4.8/bits/shared_ptr.h:93
#11 dolfin::KrylovSolver::~KrylovSolver (this=0x1c9d710, 
    __in_chrg=<optimized out>) at ../../dolfin/la/KrylovSolver.cpp:107
#12 0x00007ffff0388689 in dolfin::KrylovSolver::~KrylovSolver (this=0x1c9d710, 
    __in_chrg=<optimized out>) at ../../dolfin/la/KrylovSolver.cpp:110
#13 0x00007ffff038671c in operator() (this=<optimized out>, 
    __ptr=<optimized out>) at /usr/include/c++/4.8/bits/unique_ptr.h:67
#14 ~unique_ptr (this=0x7fffffff8830, __in_chrg=<optimized out>)
    at /usr/include/c++/4.8/bits/unique_ptr.h:184

#15 dolfin::LinearSolver::~LinearSolver (this=0x7fffffff87a0, 
    __in_chrg=<optimized out>) at ../../dolfin/la/LinearSolver.cpp:124
#16 0x00007ffff038d72c in dolfin::solve (A=..., x=..., b=..., method=..., 
    preconditioner=...) at ../../dolfin/la/solve.cpp:47

#17 0x00007ffff058979f in dolfin::HarmonicSmoothing::move (mesh=..., 
    new_boundary=...) at ../../dolfin/ale/HarmonicSmoothing.cpp:170
#18 0x00007ffff058ce49 in dolfin::ALE::move (mesh=..., new_boundary=...)
    at ../../dolfin/ale/ALE.cpp:36
#19 0x00007ffff0493059 in dolfin::Mesh::move (this=this@entry=0x1dae700, 
    boundary=...) at ../../dolfin/mesh/Mesh.cpp:317
#20 0x00007fffd0d929fc in _wrap_Mesh_move__SWIG_0 (swig_obj=0x7fffffff8de0, 
    nobjs=2) at modulePYTHON_wrap.cxx:21864
#21 _wrap_Mesh_move (self=<optimized out>, args=<optimized out>)

Process B:

(gdb) where
#0  0x00007fffee87b9b4 in opal_progress () from /usr/lib/libmpi.so.1
#1  0x00007fffee7c91f5 in ompi_request_default_wait_all ()
   from /usr/lib/libmpi.so.1
#2  0x00007fffcab3f8bf in ompi_coll_tuned_allreduce_intra_recursivedoubling ()
   from /usr/lib/openmpi/lib/openmpi/mca_coll_tuned.so
#3  0x00007fffee7d5775 in PMPI_Allreduce () from /usr/lib/libmpi.so.1
#4  0x00007fffe7a6be88 in ML_gsum_scalar_int ()
   from /home/martinal/opt/fenics/dorsal-dev-1410/lib/libml.so.11
#5  0x00007fffe7a6d155 in ML_build_global_numbering ()
   from /home/martinal/opt/fenics/dorsal-dev-1410/lib/libml.so.11
#6  0x00007fffef1eabc4 in MatWrapML_MPIAIJ(ML_Operator_Struct*, MatReuse, _p_Mat**) () from /home/martinal/opt/fenics/dorsal-dev-1410/lib/libpetsc.so
#7  0x00007fffef1eeeb8 in PCSetUp_ML(_p_PC*) ()
   from /home/martinal/opt/fenics/dorsal-dev-1410/lib/libpetsc.so
#8  0x00007fffef0efcfd in PCSetUp ()
   from /home/martinal/opt/fenics/dorsal-dev-1410/lib/libpetsc.so
#9  0x00007fffef183c31 in KSPSetUp ()
   from /home/martinal/opt/fenics/dorsal-dev-1410/lib/libpetsc.so
#10 0x00007fffef184426 in KSPSolve ()
   from /home/martinal/opt/fenics/dorsal-dev-1410/lib/libpetsc.so
#11 0x00007ffff037f395 in dolfin::PETScKrylovSolver::solve (
    this=this@entry=0x1edee30, x=..., b=...)
    at ../../dolfin/la/PETScKrylovSolver.cpp:419
---Type <return> to continue, or q <return> to quit---
#12 0x00007ffff0381400 in dolfin::PETScKrylovSolver::solve (this=0x1edee30, A=
    ..., x=..., b=...) at ../../dolfin/la/PETScKrylovSolver.cpp:473
#13 0x00007ffff0388a51 in dolfin::KrylovSolver::solve (this=0x1b6cc80, A=..., 
    x=..., b=...) at ../../dolfin/la/KrylovSolver.cpp:153

#14 0x00007ffff03869d9 in dolfin::LinearSolver::solve (
    this=this@entry=0x7fffffff87a0, A=..., x=..., b=...)
    at ../../dolfin/la/LinearSolver.cpp:152
#15 0x00007ffff038d600 in dolfin::solve (A=..., x=..., b=..., method=..., 
    preconditioner=...) at ../../dolfin/la/solve.cpp:48

#16 0x00007ffff058979f in dolfin::HarmonicSmoothing::move (mesh=..., 
    new_boundary=...) at ../../dolfin/ale/HarmonicSmoothing.cpp:170
#17 0x00007ffff058ce49 in dolfin::ALE::move (mesh=..., new_boundary=...)
    at ../../dolfin/ale/ALE.cpp:36
#18 0x00007ffff0493059 in dolfin::Mesh::move (this=this@entry=0x1e83910, 
    boundary=...) at ../../dolfin/mesh/Mesh.cpp:317
#19 0x00007fffd0d929fc in _wrap_Mesh_move__SWIG_0 (swig_obj=0x7fffffff8de0, 
    nobjs=2) at modulePYTHON_wrap.cxx:21864
#20 _wrap_Mesh_move (self=<optimized out>, args=<optimized out>)
    at modulePYTHON_wrap.cxx:22029

Comments (5)

  1. Martin Sandve Alnæs reporter

    This is consistently reproducable:

    cd test/unit/python
    mpirun -n 2 python -B -m pytest ale/test_harmonic_smoothing.py
    
    ____________________________ test_HarmonicSmoothing ____________________________
    
        def test_HarmonicSmoothing():
            #print("Testing HarmonicSmoothing::move(Mesh& mesh, "const BoundaryMesh& new_boundary)")
    
            # Create some mesh and its boundary
            mesh = UnitSquareMesh(10, 10)
            boundary = BoundaryMesh(mesh, 'exterior')
    
            # Move boundary
            disp = Expression(("0.3*x[0]*x[1]", "0.5*(1.0-x[1])"))
            boundary.move(disp)
    
            # Move mesh according to given boundary
    >       mesh.move(boundary)
    E       RuntimeError: 
    E       
    E       *** -------------------------------------------------------------------------
    E       *** DOLFIN encountered an error. If you are not able to resolve this issue
    E       *** using the information listed below, you can ask for help at
    E       ***
    E       ***     fenics@fenicsproject.org
    E       ***
    E       *** Remember to include the error message listed below and, if possible,
    E       *** include a *minimal* running example to reproduce the error.
    E       ***
    E       *** -------------------------------------------------------------------------
    E       *** Error:   Unable to successfully call PETSc function 'KSPSolve'.
    E       *** Reason:  PETSc error code is: 63.
    E       *** Where:   This error was encountered inside ../../dolfin/la/PETScKrylovSolver.cpp.
    E       *** Process: unknown
    E       *** 
    E       *** DOLFIN version: 1.4.0+
    E       *** Git changeset:  252f5c1f7703a9ae67fd614773533caf6169d593
    E       *** --------------------------------------------------------------------
    
  2. Martin Sandve Alnæs reporter

    Still reproducable after the latest fix in next. Here's more relevant output.

    ale/test_harmonic_smoothing.py:41: RuntimeError
    ------------------------------- Captured stderr --------------------------------
    [1]PETSC ERROR: --------------------- Error Message ------------------------------------
    [1]PETSC ERROR: Argument out of range!
    [1]PETSC ERROR: nnz cannot be greater than row length: local row 0 value 3 rowlength 2!
    [1]PETSC ERROR: ------------------------------------------------------------------------
    [1]PETSC ERROR: Petsc Release Version 3.4.2, Jul, 02, 2013 
    [1]PETSC ERROR: See docs/changes/index.html for recent updates.
    [1]PETSC ERROR: See docs/faq.html for hints about trouble shooting.
    [1]PETSC ERROR: See docs/index.html for manual pages.
    [1]PETSC ERROR: ------------------------------------------------------------------------
    [1]PETSC ERROR: Unknown Name on a linux-gnu-cxx-opt named martinal-mc by martinal Wed Oct 22 11:04:52 2014
    [1]PETSC ERROR: Libraries linked from /home/martinal/opt/fenics/dorsal-dev-1410/lib
    [1]PETSC ERROR: Configure run at Tue Oct 21 10:51:41 2014
    [1]PETSC ERROR: Configure options --prefix=/home/martinal/opt/fenics/dorsal-dev-1410 COPTFLAGS=-O2 --with-debugging=0 --with-shared-libraries=1 --with-clanguage=cxx --with-c-support=1 --download-umfpack=1 --download-hypre=1 --download-mumps=1 --download-scalapack=1 --download-blacs=1 --download-ptscotch=1 --download-scotch=1 --download-metis=1 --download-parmetis=1 --with-ml=1 --with-ml-lib=/home/martinal/opt/fenics/dorsal-dev-1410/lib/libml.so --with-ml-include=/home/martinal/opt/fenics/dorsal-dev-1410/include/trilinos
    [1]PETSC ERROR: ------------------------------------------------------------------------
    [1]PETSC ERROR: MatSeqAIJSetPreallocation_SeqAIJ() line 3524 in ../src/mat/impls/aij/seq/aij.c
    [1]PETSC ERROR: MatSeqAIJSetPreallocation() line 3496 in ../src/mat/impls/aij/seq/aij.c
    [1]PETSC ERROR: MatMPIAIJSetPreallocation_MPIAIJ() line 3307 in ../src/mat/impls/aij/mpi/mpiaij.c
    [1]PETSC ERROR: MatMPIAIJSetPreallocation() line 4015 in ../src/mat/impls/aij/mpi/mpiaij.c
    [1]PETSC ERROR: MatWrapML_MPIAIJ() line 426 in ../src/ksp/pc/impls/ml/ml.c
    [1]PETSC ERROR: PCSetUp_ML() line 912 in ../src/ksp/pc/impls/ml/ml.c
    [1]PETSC ERROR: PCSetUp() line 890 in ../src/ksp/pc/interface/precon.c
    [1]PETSC ERROR: KSPSetUp() line 278 in ../src/ksp/ksp/interface/itfunc.c
    [1]PETSC ERROR: KSPSolve() line 399 in ../src/ksp/ksp/interface/itfunc.c
    
  3. Prof Garth Wells

    Can you try with the latest PETSc, and/or change the preconditioner to something other than ML? There was a bug in PETSc related to nnz and ML.

  4. Log in to comment