- edited description
HDF5 file integrity
HDF5 files might not be valid if a program is killed prematurely. It would be good to have a workaround for this. Need to:
- Check what approaches HDF5 provides natively for this issue
- Possibly allow output files, e.g. the 'heavy' HDF5 data in XDMF output to be split in multiple files (controlled by the user)
Comments (15)
-
reporter -
There doesn't seem to be much out there about option 1. I'm not sure there is much you can actually do if you get SIGKILL-ed with the file open.
-
Maybe you could ask on a HDF5 mailing list? This must have come up before.
-
Strangely enough, it came up for me two days ago. My machine crashed and 1GB and 2 days worth of simulation results were lost. That is, the xdmf/hdf5 files were still there, but they were broken and according to a few google searches, unfixable.
Any chance this would have been avoided if I had set "flush_output" to true?
-
Explicit
HDF5File.flush()
could also be useful feature. Besides other use cases, one could catch SIGKILL and call it. -
@blechta You might find it hard to catch SIGKILL. Other signals can be caught, e.g. SIGTERM, I guess - some batch scheduling systems can be configured to send SIGTERM just before SIGKILL.
@mikael_mortensen I have found that setting
parameters['flush_output'] = true
helps maybe 50% of the time.I have made some edits to
XDMFFile.cpp
which should implement 2. above, I'll try and push them today. -
Would it help to always close the file after a write? Then when one write to the file one open it again but now for appending data?
-
Chris: do you know how to configure ARCHER's PBS to send SIGTERM before SIGKILL?
-
I believe PBS just does this out of the box but it may be turned off. There is a queue-specified delay (site specific) between sending SIGTERM and SIGKILL. Your best bet is probably archer support.
-
@johanhake that would work, but doesn't protect against interrupts whilst writing.
Catching signals is a bit of a pain, and still not 100% effective against e.g. power outages.
I have just pushed xdmf-multiple-h5.
If anyone can test it out, that would be helpful...
-
Seems like the correct solution (?) is to use the
H5_FDSPLIT
file driver to save the file metadata separately from the main data. A backup of the metadata can be kept during any updates, which should make the file readable even if interrupted...However, XDMF seems not to support split raw/meta data format.
See also the bottom of: HDF5 metadata - metadata journalling will be supported from HDF5 v1.10 (currently we are on v1.8).
Another option which might work, is to copy the entire file to a backup file before appending. Obviously this has performance implications.
-
Has anyone else tried out the
xdmf-multiple-h5
branch? It is working for me, so it could potentially be merged intonext
-
reporter - changed milestone to 1.5
-
- changed status to resolved
A workaround (multi-file option) is now in master, until such time as metadata journalling comes into HDF5.
-
reporter - removed milestone
Removing milestone: 1.5 (automated comment)
- Log in to comment