Segmentation fault when reading HDF5 file in parallel
I keep getting the following error when trying to read a .h5m file in parallel.
[facsa-desktop:07100] *** Process received signal ***
[facsa-desktop:07100] Signal: Segmentation fault (11)
[facsa-desktop:07100] Signal code: Address not mapped (1)
[facsa-desktop:07100] Failing at address: 0xb6fff038
[facsa-desktop:07100] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x12890)[0x7f4db6571890]
[facsa-desktop:07100] [ 1] /usr/lib/x86_64-linux-gnu/libmpi.so.20(MPI_Comm_dup+0x52)[0x7f4db6d7b5d2]
[facsa-desktop:07100] [ 2] /usr/lib/x86_64-linux-gnu/libhdf5_mpich.so.100(H5FD_mpi_comm_info_dup+0x50)[0x7f4db5ea75f0]
[facsa-desktop:07100] [ 3] /usr/lib/x86_64-linux-gnu/libhdf5_mpich.so.100(+0x2a7cdd)[0x7f4db5ea7cdd]
[facsa-desktop:07100] [ 4] /usr/lib/x86_64-linux-gnu/libhdf5_mpich.so.100(+0x1ae428)[0x7f4db5dae428]
[facsa-desktop:07100] [ 5] /usr/lib/x86_64-linux-gnu/libhdf5_mpich.so.100(+0x1af0f8)[0x7f4db5daf0f8]
[facsa-desktop:07100] [ 6] /usr/lib/x86_64-linux-gnu/libhdf5_mpich.so.100(H5P_set+0xef)[0x7f4db5dbdf0f]
[facsa-desktop:07100] [ 7] /usr/lib/x86_64-linux-gnu/libhdf5_mpich.so.100(H5P_set_driver+0xd8)[0x7f4db5db0968]
[facsa-desktop:07100] [ 8] /usr/lib/x86_64-linux-gnu/libhdf5_mpich.so.100(H5Pset_fapl_mpio+0x91)[0x7f4db5ea9b31]
[facsa-desktop:07100] [ 9] /home/facsa/MOAB/lib/libMOAB.so.0(_ZN4moab8ReadHDF511set_up_readEPKcRKNS_11FileOptionsE+0x711)[0x7f4dbbf5a041]
[facsa-desktop:07100] [10] /home/facsa/MOAB/lib/libMOAB.so.0(_ZN4moab8ReadHDF59load_fileEPKcPKmRKNS_11FileOptionsEPKNS_11ReaderIface10SubsetListEPKPNS_7TagInfoE+0x35)[0x7f4dbbf6d3a5]
[facsa-desktop:07100] [11] /home/facsa/MOAB/lib/libMOAB.so.0(_ZN4moab4Core16serial_load_fileEPKcPKmRKNS_11FileOptionsEPKNS_11ReaderIface10SubsetListEPKPNS_7TagInfoE+0x20d)[0x7f4dbbc4059d]
[facsa-desktop:07100] [12] /home/facsa/MOAB/lib/libMOAB.so.0(_ZN4moab12ReadParallel9load_fileEPPKciPKmiRNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERSt6vectorIiSaIiEEbbSG_RKNS_11FileOptionsEPKNS_11ReaderIface10SubsetListEPKPNS_7TagInfoEibiiiiii+0xb68)[0x7f4dbbe74708]
[facsa-desktop:07100] [13] /home/facsa/MOAB/lib/libMOAB.so.0(_ZN4moab12ReadParallel9load_fileEPPKciPKmRKNS_11FileOptionsEPKNS_11ReaderIface10SubsetListEPKPNS_7TagInfoE+0xedf)[0x7f4dbbe78f7f]
[facsa-desktop:07100] [14] /home/facsa/MOAB/lib/libMOAB.so.0(_ZN4moab4Core9load_fileEPKcPKmS2_S2_PKii+0x413)[0x7f4dbbc41323]
[facsa-desktop:07100] [15] ./MPFADSolver(+0x12934)[0x563a950ad934]
[facsa-desktop:07100] [16] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7)[0x7f4db618fb97]
[facsa-desktop:07100] [17] ./MPFADSolver(+0x1276a)[0x563a950ad76a]
[facsa-desktop:07100] *** End of error message ***
I’ve tried to reinstall MOAB and its dependencies but it didn’t fix it. Also tried to read different files to check if it was a specific issue with my file, but got the same error. This is not the first time it happens. Last time, it disappeared after I reinstalled my distro.
I’m using Ubuntu 18.04.2 and MOAB 5.1.0 with HDF5 1.10.0 and MPICH 3.3a2. I’m attaching an example code that causes the error to happen.
Comments (11)
-
-
reporter - attached config.log
Here is the config.log.
-
Thanks. I could not find anything particularly off in the configuration.
Can you also send us the h5m file which you are trying to load ? Does it fail for all h5m files ?
Edit: Please also run
make check
in the build directory and let us know if the I/O tests pass cleanly. -
reporter - attached test-suite.log
That’s the log file of the
make check
. One of the tests failed with the same error. -
reporter - attached linear_test_part.h5m
Those are some files I'm using. I've read them in other machines and it seems like they are just OK.
-
reporter - attached 64bricks_512hex_256part.h5m
That’s one of the examples that came in the repository.
-
reporter - attached spe10_fourth_part.h5m
-
I have built MOAB on ubuntu 18.04, with gnu 7.4, mpich 3.3.1, hdf5/1.10.5, netcdf 4.3.3.1c-4.4.2f-parallel, pnetcdf 1.6.1, zoltan 3.8.3, metis 5.1.0
It works fine for me, I have an unrelated error which I will fix soon ;
-
Hello Filipe,
Can you add DEBUG_IO=1 to your reading options ?
something like
string read_opts = "PARALLEL=READ_PART;PARTITION=PARALLEL_PARTITION;PARALLEL_RESOLVE_SHARED_ENTS;DEBUG_IO=1";
it seems to be crashing at
err = H5Pset_fapl_mpio(file_prop, MPI_COMM_SELF, MPI_INFO_NULL);
-
reporter Hi Iulian. That’s the output with the reading options set to PARALLEL=READ_PART;PARTITION=PARALLEL_PARTITION;PARALLEL_RESOLVE_SHARED_ENTS;DEBUG_IO=1.
1 H5M H5_MPI_COMPLEX_DERIVED_DATATYPE_WORKS is not defined 1 H5M (0.00 s) Getting file summary 1 H5M (0.00 s) Communicating file summary 2 H5M H5_MPI_COMPLEX_DERIVED_DATATYPE_WORKS is not defined 2 H5M (0.00 s) Getting file summary 2 H5M (0.00 s) Communicating file summary 0 H5M H5_MPI_COMPLEX_DERIVED_DATATYPE_WORKS is not defined 0 H5M (0.00 s) Getting file summary 3 H5M H5_MPI_COMPLEX_DERIVED_DATATYPE_WORKS is not defined 3 H5M (0.00 s) Getting file summary 3 H5M (0.00 s) Communicating file summary [facsa-desktop:10612] *** Process received signal *** [facsa-desktop:10612] Signal: Segmentation fault (11) [facsa-desktop:10612] Signal code: Address not mapped (1) [facsa-desktop:10612] Failing at address: 0x668d2038 [facsa-desktop:10612] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x3ef20)[0x7f7065c9ef20] [facsa-desktop:10612] [ 1] /usr/lib/x86_64-linux-gnu/libmpi.so.20(MPI_Comm_dup+0x52)[0x7f706664e5d2] [facsa-desktop:10612] [ 2] /usr/lib/x86_64-linux-gnu/libhdf5_mpich.so.100(H5FD_mpi_comm_info_dup+0x50)[0x7f70659995f0] [facsa-desktop:10612] [ 3] /usr/lib/x86_64-linux-gnu/libhdf5_mpich.so.100(+0x2a7cdd)[0x7f7065999cdd] [facsa-desktop:10612] [ 4] /usr/lib/x86_64-linux-gnu/libhdf5_mpich.so.100(+0x1ae428)[0x7f70658a0428] [facsa-desktop:10612] [ 5] /usr/lib/x86_64-linux-gnu/libhdf5_mpich.so.100(+0x1af0f8)[0x7f70658a10f8] [facsa-desktop:10612] [ 6] /usr/lib/x86_64-linux-gnu/libhdf5_mpich.so.100(H5P_set+0xef)[0x7f70658aff0f] [facsa-desktop:10612] [ 7] /usr/lib/x86_64-linux-gnu/libhdf5_mpich.so.100(H5P_set_driver+0xd8)[0x7f70658a2968] [facsa-desktop:10612] [ 8] /usr/lib/x86_64-linux-gnu/libhdf5_mpich.so.100(H5Pset_fapl_mpio+0x91)[0x7f706599bb31] [facsa-desktop:10612] [ 9] /home/facsa/MOAB/lib/libMOAB.so.0(_ZN4moab8ReadHDF511set_up_readEPKcRKNS_11FileOptionsE+0x711)[0x7f7066ef4041] [facsa-desktop:10612] [10] /home/facsa/MOAB/lib/libMOAB.so.0(_ZN4moab8ReadHDF59load_fileEPKcPKmRKNS_11FileOptionsEPKNS_11ReaderIface10SubsetListEPKPNS_7TagInfoE+0x35)[0x7f7066f073a5] [facsa-desktop:10612] [11] /home/facsa/MOAB/lib/libMOAB.so.0(_ZN4moab4Core16serial_load_fileEPKcPKmRKNS_11FileOptionsEPKNS_11ReaderIface10SubsetListEPKPNS_7TagInfoE+0x20d)[0x7f7066bda59d] [facsa-desktop:10612] [12] /home/facsa/MOAB/lib/libMOAB.so.0(_ZN4moab12ReadParallel9load_fileEPPKciPKmiRNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERSt6vectorIiSaIiEEbbSG_RKNS_11FileOptionsEPKNS_11ReaderIface10SubsetListEPKPNS_7TagInfoEibiiiiii+0xb68)[0x7f7066e0e708] [facsa-desktop:10612] [13] /home/facsa/MOAB/lib/libMOAB.so.0(_ZN4moab12ReadParallel9load_fileEPPKciPKmRKNS_11FileOptionsEPKNS_11ReaderIface10SubsetListEPKPNS_7TagInfoE+0xedf)[0x7f7066e12f7f] [facsa-desktop:10612] [14] /home/facsa/MOAB/lib/libMOAB.so.0(_ZN4moab4Core9load_fileEPKcPKmS2_S2_PKii+0x413)[0x7f7066bdb323] [facsa-desktop:10612] [15] ./parallel_hdf5_issue(+0xc6a1)[0x55f502bd56a1] [facsa-desktop:10612] [16] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7)[0x7f7065c81b97] [facsa-desktop:10612] [17] ./parallel_hdf5_issue(+0xc87a)[0x55f502bd587a] [facsa-desktop:10612] *** End of error message *** -------------------------------------------------------------------------- mpiexec noticed that process rank 0 with PID 0 on node facsa-desktop exited on signal 11 (Segmentation fault). --------------------------------------------------------------------------
-
@Iulian Grindeanu any resolution for this issue ?
- Log in to comment
This does look like an issue with your local install since the segfault happens from inside HDF5. Please send us your config.log so that we can make sure configuration is correct.