Random mumps out of memory error in unit test under mpi
I get
[0]PETSC ERROR: Error reported by MUMPS in numerical factorization phase: INFO(1)=-9, INFO(2)=17346
in parallel-assembly-solve/test_solve_result_against_reference.py
some times when running all unit tests with mpirun -n 3. I have not been able to reproduce by running only the failing test.
The mumps error means "Main internal real/complex workarray S too small", specifically 17346 short of what's needed. This number changes however, e.g. I tried again now and got 7832 instead.
Comments (6)
-
reporter -
Are you letting PETSc build MUMPS and ParMETIS, and do you have the latest PETSc?
-
There's a bug in PETSc which can make the behaviour non-deterministic. But this applies to Cholesky. Maybe there's similar problem in other
MatConvertToTriples
routines.We are also suffering for some time (at least from 1.4.0) with non-determinism of FEniCS/MUMPS application. Maybe it has something to do with local dof indexing and clean-up in ghosted vectors. Note that
#263was finally resolved in a different manner than agreed - by early update, without dirty flag. This could be a space for potential bugs. -
- changed milestone to 1.6
-
- changed status to wontfix
No recent reports, so closing.
-
reporter - removed milestone
Removing milestone: 1.6 (automated comment)
- Log in to comment
The number is not completely random, I got 7832 and 17346 again.