Random mumps out of memory error in unit test under mpi

Issue #402 wontfix
Martin Sandve Alnæs created an issue

I get

[0]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: INFO(1)=-9, INFO(2)=17346

in parallel-assembly-solve/test_solve_result_against_reference.py

some times when running all unit tests with mpirun -n 3. I have not been able to reproduce by running only the failing test.

The mumps error means "Main internal real/complex workarray S too small", specifically 17346 short of what's needed. This number changes however, e.g. I tried again now and got 7832 instead.

Comments (6)

  1. Jan Blechta

    There's a bug in PETSc which can make the behaviour non-deterministic. But this applies to Cholesky. Maybe there's similar problem in other MatConvertToTriples routines.

    We are also suffering for some time (at least from 1.4.0) with non-determinism of FEniCS/MUMPS application. Maybe it has something to do with local dof indexing and clean-up in ghosted vectors. Note that #263 was finally resolved in a different manner than agreed - by early update, without dirty flag. This could be a space for potential bugs.

  2. Log in to comment