Segfault in MatDestroy in harmonic smoothing test
Issue #395
invalid
I'm trying to trigger parallel errors by running the different subsets of unit tests with mpirun -n 2,3,4 and got a segfault on one of the processes with -n 2.
The stacktraces below show one process is in the destructor of LinearSolver while the other is in LinearSolver::solve.
This should be investigated. I'll report back if I can reproduce or not.
Process A (segfaulted):
(gdb) where
#0 0x00007fffeef8e34b in MatDestroy ()
from /home/martinal/opt/fenics/dorsal-dev-1410/lib/libpetsc.so
#1 0x00007fffef1e7b35 in PCReset_ML(_p_PC*) ()
from /home/martinal/opt/fenics/dorsal-dev-1410/lib/libpetsc.so
#2 0x00007fffef0eedfd in PCReset ()
from /home/martinal/opt/fenics/dorsal-dev-1410/lib/libpetsc.so
#3 0x00007fffef0eefe8 in PCDestroy ()
from /home/martinal/opt/fenics/dorsal-dev-1410/lib/libpetsc.so
#4 0x00007fffef187ac3 in KSPDestroy ()
from /home/martinal/opt/fenics/dorsal-dev-1410/lib/libpetsc.so
#5 0x00007ffff037a976 in dolfin::PETScKrylovSolver::~PETScKrylovSolver (
this=0x1ca2250, __in_chrg=<optimized out>)
at ../../dolfin/la/PETScKrylovSolver.cpp:189
#6 0x00007ffff037abd9 in dolfin::PETScKrylovSolver::~PETScKrylovSolver (
this=0x1ca2250, __in_chrg=<optimized out>)
at ../../dolfin/la/PETScKrylovSolver.cpp:192
#7 0x00007ffff038863a in _M_release (this=0x20e0520)
at /usr/include/c++/4.8/bits/shared_ptr_base.h:144
#8 ~__shared_count (this=0x1c9d7a8, __in_chrg=<optimized out>)
at /usr/include/c++/4.8/bits/shared_ptr_base.h:546
#9 ~__shared_ptr (this=0x1c9d7a0, __in_chrg=<optimized out>)
at /usr/include/c++/4.8/bits/shared_ptr_base.h:781
#10 ~shared_ptr (this=0x1c9d7a0, __in_chrg=<optimized out>)
---Type <return> to continue, or q <return> to quit---
at /usr/include/c++/4.8/bits/shared_ptr.h:93
#11 dolfin::KrylovSolver::~KrylovSolver (this=0x1c9d710,
__in_chrg=<optimized out>) at ../../dolfin/la/KrylovSolver.cpp:107
#12 0x00007ffff0388689 in dolfin::KrylovSolver::~KrylovSolver (this=0x1c9d710,
__in_chrg=<optimized out>) at ../../dolfin/la/KrylovSolver.cpp:110
#13 0x00007ffff038671c in operator() (this=<optimized out>,
__ptr=<optimized out>) at /usr/include/c++/4.8/bits/unique_ptr.h:67
#14 ~unique_ptr (this=0x7fffffff8830, __in_chrg=<optimized out>)
at /usr/include/c++/4.8/bits/unique_ptr.h:184
#15 dolfin::LinearSolver::~LinearSolver (this=0x7fffffff87a0,
__in_chrg=<optimized out>) at ../../dolfin/la/LinearSolver.cpp:124
#16 0x00007ffff038d72c in dolfin::solve (A=..., x=..., b=..., method=...,
preconditioner=...) at ../../dolfin/la/solve.cpp:47
#17 0x00007ffff058979f in dolfin::HarmonicSmoothing::move (mesh=...,
new_boundary=...) at ../../dolfin/ale/HarmonicSmoothing.cpp:170
#18 0x00007ffff058ce49 in dolfin::ALE::move (mesh=..., new_boundary=...)
at ../../dolfin/ale/ALE.cpp:36
#19 0x00007ffff0493059 in dolfin::Mesh::move (this=this@entry=0x1dae700,
boundary=...) at ../../dolfin/mesh/Mesh.cpp:317
#20 0x00007fffd0d929fc in _wrap_Mesh_move__SWIG_0 (swig_obj=0x7fffffff8de0,
nobjs=2) at modulePYTHON_wrap.cxx:21864
#21 _wrap_Mesh_move (self=<optimized out>, args=<optimized out>)
Process B:
(gdb) where
#0 0x00007fffee87b9b4 in opal_progress () from /usr/lib/libmpi.so.1
#1 0x00007fffee7c91f5 in ompi_request_default_wait_all ()
from /usr/lib/libmpi.so.1
#2 0x00007fffcab3f8bf in ompi_coll_tuned_allreduce_intra_recursivedoubling ()
from /usr/lib/openmpi/lib/openmpi/mca_coll_tuned.so
#3 0x00007fffee7d5775 in PMPI_Allreduce () from /usr/lib/libmpi.so.1
#4 0x00007fffe7a6be88 in ML_gsum_scalar_int ()
from /home/martinal/opt/fenics/dorsal-dev-1410/lib/libml.so.11
#5 0x00007fffe7a6d155 in ML_build_global_numbering ()
from /home/martinal/opt/fenics/dorsal-dev-1410/lib/libml.so.11
#6 0x00007fffef1eabc4 in MatWrapML_MPIAIJ(ML_Operator_Struct*, MatReuse, _p_Mat**) () from /home/martinal/opt/fenics/dorsal-dev-1410/lib/libpetsc.so
#7 0x00007fffef1eeeb8 in PCSetUp_ML(_p_PC*) ()
from /home/martinal/opt/fenics/dorsal-dev-1410/lib/libpetsc.so
#8 0x00007fffef0efcfd in PCSetUp ()
from /home/martinal/opt/fenics/dorsal-dev-1410/lib/libpetsc.so
#9 0x00007fffef183c31 in KSPSetUp ()
from /home/martinal/opt/fenics/dorsal-dev-1410/lib/libpetsc.so
#10 0x00007fffef184426 in KSPSolve ()
from /home/martinal/opt/fenics/dorsal-dev-1410/lib/libpetsc.so
#11 0x00007ffff037f395 in dolfin::PETScKrylovSolver::solve (
this=this@entry=0x1edee30, x=..., b=...)
at ../../dolfin/la/PETScKrylovSolver.cpp:419
---Type <return> to continue, or q <return> to quit---
#12 0x00007ffff0381400 in dolfin::PETScKrylovSolver::solve (this=0x1edee30, A=
..., x=..., b=...) at ../../dolfin/la/PETScKrylovSolver.cpp:473
#13 0x00007ffff0388a51 in dolfin::KrylovSolver::solve (this=0x1b6cc80, A=...,
x=..., b=...) at ../../dolfin/la/KrylovSolver.cpp:153
#14 0x00007ffff03869d9 in dolfin::LinearSolver::solve (
this=this@entry=0x7fffffff87a0, A=..., x=..., b=...)
at ../../dolfin/la/LinearSolver.cpp:152
#15 0x00007ffff038d600 in dolfin::solve (A=..., x=..., b=..., method=...,
preconditioner=...) at ../../dolfin/la/solve.cpp:48
#16 0x00007ffff058979f in dolfin::HarmonicSmoothing::move (mesh=...,
new_boundary=...) at ../../dolfin/ale/HarmonicSmoothing.cpp:170
#17 0x00007ffff058ce49 in dolfin::ALE::move (mesh=..., new_boundary=...)
at ../../dolfin/ale/ALE.cpp:36
#18 0x00007ffff0493059 in dolfin::Mesh::move (this=this@entry=0x1e83910,
boundary=...) at ../../dolfin/mesh/Mesh.cpp:317
#19 0x00007fffd0d929fc in _wrap_Mesh_move__SWIG_0 (swig_obj=0x7fffffff8de0,
nobjs=2) at modulePYTHON_wrap.cxx:21864
#20 _wrap_Mesh_move (self=<optimized out>, args=<optimized out>)
at modulePYTHON_wrap.cxx:22029
Comments (5)
-
reporter -
reporter Still reproducable after the latest fix in next. Here's more relevant output.
ale/test_harmonic_smoothing.py:41: RuntimeError ------------------------------- Captured stderr -------------------------------- [1]PETSC ERROR: --------------------- Error Message ------------------------------------ [1]PETSC ERROR: Argument out of range! [1]PETSC ERROR: nnz cannot be greater than row length: local row 0 value 3 rowlength 2! [1]PETSC ERROR: ------------------------------------------------------------------------ [1]PETSC ERROR: Petsc Release Version 3.4.2, Jul, 02, 2013 [1]PETSC ERROR: See docs/changes/index.html for recent updates. [1]PETSC ERROR: See docs/faq.html for hints about trouble shooting. [1]PETSC ERROR: See docs/index.html for manual pages. [1]PETSC ERROR: ------------------------------------------------------------------------ [1]PETSC ERROR: Unknown Name on a linux-gnu-cxx-opt named martinal-mc by martinal Wed Oct 22 11:04:52 2014 [1]PETSC ERROR: Libraries linked from /home/martinal/opt/fenics/dorsal-dev-1410/lib [1]PETSC ERROR: Configure run at Tue Oct 21 10:51:41 2014 [1]PETSC ERROR: Configure options --prefix=/home/martinal/opt/fenics/dorsal-dev-1410 COPTFLAGS=-O2 --with-debugging=0 --with-shared-libraries=1 --with-clanguage=cxx --with-c-support=1 --download-umfpack=1 --download-hypre=1 --download-mumps=1 --download-scalapack=1 --download-blacs=1 --download-ptscotch=1 --download-scotch=1 --download-metis=1 --download-parmetis=1 --with-ml=1 --with-ml-lib=/home/martinal/opt/fenics/dorsal-dev-1410/lib/libml.so --with-ml-include=/home/martinal/opt/fenics/dorsal-dev-1410/include/trilinos [1]PETSC ERROR: ------------------------------------------------------------------------ [1]PETSC ERROR: MatSeqAIJSetPreallocation_SeqAIJ() line 3524 in ../src/mat/impls/aij/seq/aij.c [1]PETSC ERROR: MatSeqAIJSetPreallocation() line 3496 in ../src/mat/impls/aij/seq/aij.c [1]PETSC ERROR: MatMPIAIJSetPreallocation_MPIAIJ() line 3307 in ../src/mat/impls/aij/mpi/mpiaij.c [1]PETSC ERROR: MatMPIAIJSetPreallocation() line 4015 in ../src/mat/impls/aij/mpi/mpiaij.c [1]PETSC ERROR: MatWrapML_MPIAIJ() line 426 in ../src/ksp/pc/impls/ml/ml.c [1]PETSC ERROR: PCSetUp_ML() line 912 in ../src/ksp/pc/impls/ml/ml.c [1]PETSC ERROR: PCSetUp() line 890 in ../src/ksp/pc/interface/precon.c [1]PETSC ERROR: KSPSetUp() line 278 in ../src/ksp/ksp/interface/itfunc.c [1]PETSC ERROR: KSPSolve() line 399 in ../src/ksp/ksp/interface/itfunc.c
-
Can you try with the latest PETSc, and/or change the preconditioner to something other than ML? There was a bug in PETSc related to nnz and ML.
-
reporter - changed status to invalid
Works with petsc 3.5.2. Johannes has updated dorsal to use petsc 3.5.2.
-
- removed milestone
Removing milestone: 1.5 (automated comment)
- Log in to comment
This is consistently reproducable: