Activate HDF5 shuffle filter

Create issue
Issue #1874 closed
Erik Schnetter created an issue

Comments (7)

  1. Ian Hinder
    • removed comment

    The HDF5 gzip decompression filter allocates a block of memory the same size as the compressed data for the uncompressed data. It then reallocs it "in case" it needs to. It will always need to. This is ridiculous, and causes excessive memory fragmentation, to the extent that recovering from a compressed checkpoint file is very difficult unless you have a large amount of extra memory, or swap available (which seems to be rarer these days on HPC systems).

  2. Erik Schnetter reporter
    • removed comment

    I think this issue is unrelated; to resolve it, you would need to disable compression. There is a parameter for this. There is also h5repack that can uncompress HDF5 files; you could run it before recovering.

  3. Barry Wardell
    • removed comment

    I have personally found that the shuffle filter improves the compression efficiency for HDF5 files. In particular, the HDF5 output from the Multipole thorn can be more efficiently compressed when the shuffle filter is used in conjunction with compression.

    So, in summary I agree the suggestion is a good idea.

  4. Ian Hinder
    • changed status to open
    • removed comment

    Replying to [comment:3 eschnett]:

    I think this issue is unrelated; to resolve it, you would need to disable compression. There is a parameter for this. There is also h5repack that can uncompress HDF5 files; you could run it before recovering.

    Yes, I just felt like ranting, and this seemed like a good opportunity :) I now run without compression all the time. Uncompressing using h5repack is an interesting idea. We could also patch the decompression filter to use a saner memory allocation algorithm. It might also be useful to be able to enable compression for all HDF5 files except for checkpoints, which are the only ones which are read in again by Cactus.

    The shuffle filter looks like a good idea. Sorry I didn't look at it in time. Do you know of any expected/measured performance issues? The document at https://www.hdfgroup.org/HDF5/doc_resource/H5Shuffle_Perf.pdf describes the filter and tests of performance and improvements in compression ratio. Based on that document, I would say it looks fine to use. When we have standardised Cactus benchmarks, we can include one for this. Consider this a retroactive "reviewed OK" (the pull request was merged 2 days ago).

  5. Log in to comment