- removed comment
[PATCH]CarpetIOHDF5: make WriteLargeAttribute write a string, not an array of ints
The data that's supplied to it is to be interpreted as a string.
Before the attached patch the data is written as an array of int8, e.g.:
h5dump -d '/Parameters and Global Attributes/Datasets' alpha.h5
HDF5 "alpha.h5" {
DATASET "/Parameters and Global Attributes/Datasets" {
DATATYPE H5T_STD_I8LE
DATASPACE SIMPLE { ( 16 ) / ( 16 ) }
DATA {
(0): 77, 76, 95, 66, 83, 83, 78, 58, 58, 97, 108, 112, 104, 97, 10, 0
}
}
}
After the patch:
h5dump -d '/Parameters and Global Attributes/Datasets' alpha.h5
HDF5 "alpha.h5" {
DATASET "/Parameters and Global Attributes/Datasets" {
DATATYPE H5T_STRING {
STRSIZE 16;
STRPAD H5T_STR_NULLTERM;
CSET H5T_CSET_ASCII;
CTYPE H5T_C_S1;
}
DATASPACE SIMPLE { ( 1 ) / ( 1 ) }
DATA {
(0): "ML_BSSN::alpha
"
}
}
}
Keyword:
Comments (10)
-
-
- removed comment
Carpet currently writes an array of characters (not an array of integers), thus the difference between this and a string is small.
The patch does not close the new datatype, leading to a memory leak.
The readers of these attributes, used mostly during recovery, also need to be updated to read strings and not expect arrays of characters.
-
- removed comment
silly question: do we know if trac actually automatically sends emails to the email of the reporter? Just in case anton is not subscribed to the ET mailing list.
-
reporter - removed comment
Replying to [comment:2 eschnett]:
Carpet currently writes an array of characters (not an array of integers), thus the difference between this and a string is small.
Right. To clarify, I'm using this in my python script for processing the data, since it seemed the most "proper" way to get grid attributes. Or is it a better idea to use something else?
Of course I could do ''.join(map(chr, *)) on it, but writing it properly in the first place seems cleaner.
The patch does not close the new datatype, leading to a memory leak.
The readers of these attributes, used mostly during recovery, also need to be updated to read strings and not expect arrays of characters.
Sorry about this, I'm not very familiar with the code yet. Should be fixed in the new patch.
-
- removed comment
No need to be sorry! That's what a review is for, and I thank you for improving our code.
I am not overly familiar with strings in HDF5. I notice that the documentation of H5D_get_storage_size says that it is generally not recommended to use this function. H5T_get_size (the converse of the function used to create the string type) may be more appropriate. Could you update your patch?
Did you run the test cases after applying your patch? I assume that (a) this patch indeed makes it simpler for Python to read the attributes, and (b) that in particular the recovery test cases succeed.
To answer an earlier question: Yes, these attributes are how the grid structure is stored and retrieved. We opted for a human-readable (i.e. ASCII) format to simplify debugging.
-
reporter - removed comment
Replying to [comment:5 eschnett]:
No need to be sorry! That's what a review is for, and I thank you for improving our code.
I am not overly familiar with strings in HDF5. I notice that the documentation of H5D_get_storage_size says that it is generally not recommended to use this function. H5T_get_size (the converse of the function used to create the string type) may be more appropriate. Could you update your patch?
Ok, done. It makes the patch slightly longer, but I guess a cleaner solution is better.
Did you run the test cases after applying your patch? I assume that (a) this patch indeed makes it simpler for Python to read the attributes, and (b) that in particular the recovery test cases succeed.
Yes, specifically the CarpetWaveToyRecover_test_*proc passes both with and without the changes to the checkpoint file.
I'm a bit confused by the nobuffers test claiming No files created in test directory
Success: 0 files identical
but that seems unrelated to the patch.
-
- changed status to open
- removed comment
Okay, the patch looks good. I will apply it.
-
- changed status to open
- removed comment
-
- changed status to resolved
- removed comment
Applied.
-
- edited description
- changed status to closed
- Log in to comment
Does HDF5 still read in the string attribute as a CHAR array later on (eg. in the checkpoint input routine, I do not know if this is used anywhere else but to checkpoint/recover the parameter string from IOUtil)? Looks harmless otherwise :-).