Default test deadlocks with OpenMPI 1.7.3
In running the default test (
make PETSC_DIR=... PETSC_ARCH=... test) with OpenMPI 1.7.3, I find that the program deadlocks on exit. Here is the (shortened) trace:
(gdb) bt #0 __lll_lock_wait () #1 0x000000349b81126a in _L_lock_55 #2 0x000000349b8111e1 in __lll_lock_elision #3 0x000000349b80a02c in __GI___pthread_mutex_lock #4 0x00000031a802c78c in opal_mutex_lock #5 ompi_attr_get_c #6 0x00000031a8052d67 in PMPI_Attr_get #7 0x00000000004650cc in Petsc_DelComm_Outer #8 0x00000031a802d2a0 in ompi_attr_delete_impl #9 ompi_attr_delete #10 0x00000031a8052c7c in PMPI_Attr_delete #11 0x0000000000446d27 in PetscCommDestroy
I believe this is related to OpenMPI's change from two locks to one. Now, ostensibly, this is an OpenMPI bug, but I'm opening this issue for two reasons:
- It appears from this mailing list message, that PETSc could be refactored to not use the
MPI_Attr_getcall in the delete function, thereby avoiding the deadlock entirely.
- I'm not familiar enough with the inner workings of PETSc with respect to attributes and split communicators to cut this down into a small test case. I tried using the test case from the above mailing list thread, but it did not trigger the problem.