HDF5 Write crashes in parallel on OSX 10.10 and Debian Wheezy

Issue #465 invalid
Stephan Schmidt created an issue

Dear all,

I think the HDF5-installation that is part of the pre-compiled FEniCS Application for OS X (available here: http://www.fenicsproject.org/pub/software/fenics/fenics-1.5.0-p2-osx10.10.dmg) includes a faulty HDF5 library.

If I execute the attached minimal example using OS X 10.10.2 single thread by

"python H5Crash.py"

everything works normally. But if I use it in parallel

"mpiexec -np 2 python H5Crash.py"

I get a ton of error messages, all similar to this one:

# 003: /Users/johannr/fenics-1.5.0/fenics-superbuild/build-fenics/CMakeExternals/src/HDF5/src/H5FDmpio.c line 1052 in H5FD_mpio_open(): MPI_File_open failed major: Internal error (too specific to document in detail) minor: Some MPI function failed

Comments (10)

  1. Max Julian

    I'm having the same issue with reading HDF5 files on OS X 10.10. Reading a mesh stored in an HDF5 file with one process (mpiexec -np 1) works fine, but any process count greater than that errors out like the reported issue.

  2. Johannes Ring

    This is not a bug in DOLFIN. It is a problem with older versions of OpenMPI. The fenics-install.sh script has been changed to use MPICH instead of OpenMPI, so this is no longer a problem there. The DMG bundle for OS X still uses an old version of OpenMPI, but I will look at upgrading to the latest version of OpenMPI or switching to MPICH.

  3. Charles Cook

    I'm having the same issue on an HPC. I had been using MPICH as well as a workaround until that caused too many red flags for the admins. They use OpenMPI task monitoring and insist on OpenMPI.

    Using the latest OpenMPI resolved this for me. I did have to upgrade a number of dependencies along the way.

  4. Log in to comment