Enable transparent HDF5 compression

Issue #103 new
Johannes Probst created an issue

HDF5 offers transparent compression with zlib out of the box [3]. Reading MOAB's source code, it seems that it is not used (please let me know if I'm wrong). The H5M files written by MOAB are in most cases still highly compressible. In addition to taking up less space, the file I/O can be accelerated in those cases where it is I/O bound (e.g. on slow network-mounted drives).

A possible way of configuring it could be to pass the compression rate (1-9) in the options string when creating the mesh object.

To test the efficiency, I have repacked an existing H5M file with the command h5repack -v -f GZIP=1 mesh.h5m compr.h5m. In this particular instance, it reduces the file size from 381 MB to 83 MB. MOAB is able to load the file without any problems or modifications to the code.

Regarding the performance impact, I ran 2 tests on the compressed and on the uncompressed file. The results are summarized below

Test case Time with compressed mesh Time with uncompressed mesh
Convert to OpenFOAM 142.47 sec 148.51 sec
mbconvert to VTK 20.65 sec 20.27 sec

References

[1] https://support.hdfgroup.org/HDF5/faq/compression.html

[2] https://www.hdfgroup.org/2017/05/hdf5-data-compression-demystified-2-performance-tuning/

[3] https://support.hdfgroup.org/HDF5/doc/H5.user/Filters.html

Comments (3)

  1. Vijay M

    @iulian07 @brtnfld It is my understanding that if HDF5 is configured with compression algorithms (-lz or -lsz), these will be used by default to compress the dataset. But if there are options that we could set through an API to reduce the size on disk, and in the process improve performance (though marginally from results above), that would be a bonus since almost nothing changes from a user standpoint. Please correct me if I am wrong.

  2. Scot Breitenfeld

    You still need to use an HDF5 API to enable compression in MOAB, which is also where you would specify the compression parameters (and whether to use zlib, szip, or some other third-party compression filter). Additionally, you need to change to using chunked datasets if they are not already. Yes, it's transparent to the user.

  3. Scot Breitenfeld

    BTW, adding chunking is tricky in picking the correct chunk size, especially for unstructured meshes. You also introduce issues like setting chunk cache, how to specify the chunk for > 1D datasets, etc... This is why currently CGNS does not yet have chunking/compression, though it is on the to-do list.

  4. Log in to comment