- changed status to open
- removed comment
Activate HDF5 shuffle filter
The HDF5 shuffle filter increases the efficiency of the HDF5 deflate filter.
See https://bitbucket.org/cactuscode/cactuspughio/pull-requests/2/iohdf5util-activate-shuffle-filter/diff and https://bitbucket.org/eschnett/carpet/pull-requests/11/eschnett-hdf5-shuffle/diff.
Keyword:
Comments (7)
-
reporter -
- removed comment
The HDF5 gzip decompression filter allocates a block of memory the same size as the compressed data for the uncompressed data. It then reallocs it "in case" it needs to. It will always need to. This is ridiculous, and causes excessive memory fragmentation, to the extent that recovering from a compressed checkpoint file is very difficult unless you have a large amount of extra memory, or swap available (which seems to be rarer these days on HPC systems).
-
reporter - removed comment
I think this issue is unrelated; to resolve it, you would need to disable compression. There is a parameter for this. There is also
h5repack
that can uncompress HDF5 files; you could run it before recovering. -
- removed comment
I have personally found that the shuffle filter improves the compression efficiency for HDF5 files. In particular, the HDF5 output from the Multipole thorn can be more efficiently compressed when the shuffle filter is used in conjunction with compression.
So, in summary I agree the suggestion is a good idea.
-
- changed status to open
- removed comment
Replying to [comment:3 eschnett]:
I think this issue is unrelated; to resolve it, you would need to disable compression. There is a parameter for this. There is also
h5repack
that can uncompress HDF5 files; you could run it before recovering.Yes, I just felt like ranting, and this seemed like a good opportunity :) I now run without compression all the time. Uncompressing using h5repack is an interesting idea. We could also patch the decompression filter to use a saner memory allocation algorithm. It might also be useful to be able to enable compression for all HDF5 files except for checkpoints, which are the only ones which are read in again by Cactus.
The shuffle filter looks like a good idea. Sorry I didn't look at it in time. Do you know of any expected/measured performance issues? The document at https://www.hdfgroup.org/HDF5/doc_resource/H5Shuffle_Perf.pdf describes the filter and tests of performance and improvements in compression ratio. Based on that document, I would say it looks fine to use. When we have standardised Cactus benchmarks, we can include one for this. Consider this a retroactive "reviewed OK" (the pull request was merged 2 days ago).
-
- changed status to resolved
- removed comment
-
- edited description
- changed status to closed
- Log in to comment