Segmentation fault when reading HDF5 file in parallel

Issue #107 new
Filipe Antônio Cumaru Silva Alves created an issue

I keep getting the following error when trying to read a .h5m file in parallel.

[facsa-desktop:07100] *** Process received signal ***
[facsa-desktop:07100] Signal: Segmentation fault (11)
[facsa-desktop:07100] Signal code: Address not mapped (1)
[facsa-desktop:07100] Failing at address: 0xb6fff038
[facsa-desktop:07100] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x12890)[0x7f4db6571890]
[facsa-desktop:07100] [ 1] /usr/lib/x86_64-linux-gnu/libmpi.so.20(MPI_Comm_dup+0x52)[0x7f4db6d7b5d2]
[facsa-desktop:07100] [ 2] /usr/lib/x86_64-linux-gnu/libhdf5_mpich.so.100(H5FD_mpi_comm_info_dup+0x50)[0x7f4db5ea75f0]
[facsa-desktop:07100] [ 3] /usr/lib/x86_64-linux-gnu/libhdf5_mpich.so.100(+0x2a7cdd)[0x7f4db5ea7cdd]
[facsa-desktop:07100] [ 4] /usr/lib/x86_64-linux-gnu/libhdf5_mpich.so.100(+0x1ae428)[0x7f4db5dae428]
[facsa-desktop:07100] [ 5] /usr/lib/x86_64-linux-gnu/libhdf5_mpich.so.100(+0x1af0f8)[0x7f4db5daf0f8]
[facsa-desktop:07100] [ 6] /usr/lib/x86_64-linux-gnu/libhdf5_mpich.so.100(H5P_set+0xef)[0x7f4db5dbdf0f]
[facsa-desktop:07100] [ 7] /usr/lib/x86_64-linux-gnu/libhdf5_mpich.so.100(H5P_set_driver+0xd8)[0x7f4db5db0968]
[facsa-desktop:07100] [ 8] /usr/lib/x86_64-linux-gnu/libhdf5_mpich.so.100(H5Pset_fapl_mpio+0x91)[0x7f4db5ea9b31]
[facsa-desktop:07100] [ 9] /home/facsa/MOAB/lib/libMOAB.so.0(_ZN4moab8ReadHDF511set_up_readEPKcRKNS_11FileOptionsE+0x711)[0x7f4dbbf5a041]
[facsa-desktop:07100] [10] /home/facsa/MOAB/lib/libMOAB.so.0(_ZN4moab8ReadHDF59load_fileEPKcPKmRKNS_11FileOptionsEPKNS_11ReaderIface10SubsetListEPKPNS_7TagInfoE+0x35)[0x7f4dbbf6d3a5]
[facsa-desktop:07100] [11] /home/facsa/MOAB/lib/libMOAB.so.0(_ZN4moab4Core16serial_load_fileEPKcPKmRKNS_11FileOptionsEPKNS_11ReaderIface10SubsetListEPKPNS_7TagInfoE+0x20d)[0x7f4dbbc4059d]
[facsa-desktop:07100] [12] /home/facsa/MOAB/lib/libMOAB.so.0(_ZN4moab12ReadParallel9load_fileEPPKciPKmiRNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERSt6vectorIiSaIiEEbbSG_RKNS_11FileOptionsEPKNS_11ReaderIface10SubsetListEPKPNS_7TagInfoEibiiiiii+0xb68)[0x7f4dbbe74708]
[facsa-desktop:07100] [13] /home/facsa/MOAB/lib/libMOAB.so.0(_ZN4moab12ReadParallel9load_fileEPPKciPKmRKNS_11FileOptionsEPKNS_11ReaderIface10SubsetListEPKPNS_7TagInfoE+0xedf)[0x7f4dbbe78f7f]
[facsa-desktop:07100] [14] /home/facsa/MOAB/lib/libMOAB.so.0(_ZN4moab4Core9load_fileEPKcPKmS2_S2_PKii+0x413)[0x7f4dbbc41323]
[facsa-desktop:07100] [15] ./MPFADSolver(+0x12934)[0x563a950ad934]
[facsa-desktop:07100] [16] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7)[0x7f4db618fb97]
[facsa-desktop:07100] [17] ./MPFADSolver(+0x1276a)[0x563a950ad76a]
[facsa-desktop:07100] *** End of error message ***

I’ve tried to reinstall MOAB and its dependencies but it didn’t fix it. Also tried to read different files to check if it was a specific issue with my file, but got the same error. This is not the first time it happens. Last time, it disappeared after I reinstalled my distro.

I’m using Ubuntu 18.04.2 and MOAB 5.1.0 with HDF5 1.10.0 and MPICH 3.3a2. I’m attaching an example code that causes the error to happen.

Comments (11)

  1. Vijay M

    This does look like an issue with your local install since the segfault happens from inside HDF5. Please send us your config.log so that we can make sure configuration is correct.

  2. Vijay M

    Thanks. I could not find anything particularly off in the configuration.

    Can you also send us the h5m file which you are trying to load ? Does it fail for all h5m files ?

    Edit: Please also run make check in the build directory and let us know if the I/O tests pass cleanly.

  3. Iulian Grindeanu

    I have built MOAB on ubuntu 18.04, with gnu 7.4, mpich 3.3.1, hdf5/1.10.5, netcdf 4.3.3.1c-4.4.2f-parallel, pnetcdf 1.6.1, zoltan 3.8.3, metis 5.1.0

    It works fine for me, I have an unrelated error which I will fix soon ;

  4. Iulian Grindeanu

    Hello Filipe,

    Can you add DEBUG_IO=1 to your reading options ?

    something like

    string read_opts = "PARALLEL=READ_PART;PARTITION=PARALLEL_PARTITION;PARALLEL_RESOLVE_SHARED_ENTS;DEBUG_IO=1";

    it seems to be crashing at

       err = H5Pset_fapl_mpio(file_prop, MPI_COMM_SELF, MPI_INFO_NULL);
    
  5. Filipe Antônio Cumaru Silva Alves reporter

    Hi Iulian. That’s the output with the reading options set to PARALLEL=READ_PART;PARTITION=PARALLEL_PARTITION;PARALLEL_RESOLVE_SHARED_ENTS;DEBUG_IO=1.

      1  H5M H5_MPI_COMPLEX_DERIVED_DATATYPE_WORKS is not defined
      1  H5M (0.00 s) Getting file summary
      1  H5M (0.00 s) Communicating file summary
      2  H5M H5_MPI_COMPLEX_DERIVED_DATATYPE_WORKS is not defined
      2  H5M (0.00 s) Getting file summary
      2  H5M (0.00 s) Communicating file summary
      0  H5M H5_MPI_COMPLEX_DERIVED_DATATYPE_WORKS is not defined
      0  H5M (0.00 s) Getting file summary
      3  H5M H5_MPI_COMPLEX_DERIVED_DATATYPE_WORKS is not defined
      3  H5M (0.00 s) Getting file summary
      3  H5M (0.00 s) Communicating file summary
    [facsa-desktop:10612] *** Process received signal ***
    [facsa-desktop:10612] Signal: Segmentation fault (11)
    [facsa-desktop:10612] Signal code: Address not mapped (1)
    [facsa-desktop:10612] Failing at address: 0x668d2038
    [facsa-desktop:10612] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x3ef20)[0x7f7065c9ef20]
    [facsa-desktop:10612] [ 1] /usr/lib/x86_64-linux-gnu/libmpi.so.20(MPI_Comm_dup+0x52)[0x7f706664e5d2]
    [facsa-desktop:10612] [ 2] /usr/lib/x86_64-linux-gnu/libhdf5_mpich.so.100(H5FD_mpi_comm_info_dup+0x50)[0x7f70659995f0]
    [facsa-desktop:10612] [ 3] /usr/lib/x86_64-linux-gnu/libhdf5_mpich.so.100(+0x2a7cdd)[0x7f7065999cdd]
    [facsa-desktop:10612] [ 4] /usr/lib/x86_64-linux-gnu/libhdf5_mpich.so.100(+0x1ae428)[0x7f70658a0428]
    [facsa-desktop:10612] [ 5] /usr/lib/x86_64-linux-gnu/libhdf5_mpich.so.100(+0x1af0f8)[0x7f70658a10f8]
    [facsa-desktop:10612] [ 6] /usr/lib/x86_64-linux-gnu/libhdf5_mpich.so.100(H5P_set+0xef)[0x7f70658aff0f]
    [facsa-desktop:10612] [ 7] /usr/lib/x86_64-linux-gnu/libhdf5_mpich.so.100(H5P_set_driver+0xd8)[0x7f70658a2968]
    [facsa-desktop:10612] [ 8] /usr/lib/x86_64-linux-gnu/libhdf5_mpich.so.100(H5Pset_fapl_mpio+0x91)[0x7f706599bb31]
    [facsa-desktop:10612] [ 9] /home/facsa/MOAB/lib/libMOAB.so.0(_ZN4moab8ReadHDF511set_up_readEPKcRKNS_11FileOptionsE+0x711)[0x7f7066ef4041]
    [facsa-desktop:10612] [10] /home/facsa/MOAB/lib/libMOAB.so.0(_ZN4moab8ReadHDF59load_fileEPKcPKmRKNS_11FileOptionsEPKNS_11ReaderIface10SubsetListEPKPNS_7TagInfoE+0x35)[0x7f7066f073a5]
    [facsa-desktop:10612] [11] /home/facsa/MOAB/lib/libMOAB.so.0(_ZN4moab4Core16serial_load_fileEPKcPKmRKNS_11FileOptionsEPKNS_11ReaderIface10SubsetListEPKPNS_7TagInfoE+0x20d)[0x7f7066bda59d]
    [facsa-desktop:10612] [12] /home/facsa/MOAB/lib/libMOAB.so.0(_ZN4moab12ReadParallel9load_fileEPPKciPKmiRNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERSt6vectorIiSaIiEEbbSG_RKNS_11FileOptionsEPKNS_11ReaderIface10SubsetListEPKPNS_7TagInfoEibiiiiii+0xb68)[0x7f7066e0e708]
    [facsa-desktop:10612] [13] /home/facsa/MOAB/lib/libMOAB.so.0(_ZN4moab12ReadParallel9load_fileEPPKciPKmRKNS_11FileOptionsEPKNS_11ReaderIface10SubsetListEPKPNS_7TagInfoE+0xedf)[0x7f7066e12f7f]
    [facsa-desktop:10612] [14] /home/facsa/MOAB/lib/libMOAB.so.0(_ZN4moab4Core9load_fileEPKcPKmS2_S2_PKii+0x413)[0x7f7066bdb323]
    [facsa-desktop:10612] [15] ./parallel_hdf5_issue(+0xc6a1)[0x55f502bd56a1]
    [facsa-desktop:10612] [16] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7)[0x7f7065c81b97]
    [facsa-desktop:10612] [17] ./parallel_hdf5_issue(+0xc87a)[0x55f502bd587a]
    [facsa-desktop:10612] *** End of error message ***
    --------------------------------------------------------------------------
    mpiexec noticed that process rank 0 with PID 0 on node facsa-desktop exited on signal 11 (Segmentation fault).
    --------------------------------------------------------------------------
    
  6. Log in to comment