[PATCH]CarpetIOHDF5: make WriteLargeAttribute write a string, not an array of ints

Create issue
Issue #1050 closed
anonymous created an issue

The data that's supplied to it is to be interpreted as a string.

Before the attached patch  the data is written as an array of int8, e.g.:
h5dump -d '/Parameters and Global Attributes/Datasets' alpha.h5
HDF5 "alpha.h5" {
DATASET "/Parameters and Global Attributes/Datasets" {
   DATATYPE  H5T_STD_I8LE
   DATASPACE  SIMPLE { ( 16 ) / ( 16 ) }
   DATA {
   (0): 77, 76, 95, 66, 83, 83, 78, 58, 58, 97, 108, 112, 104, 97, 10, 0
   }
}
}

After the patch:
h5dump -d '/Parameters and Global Attributes/Datasets' alpha.h5
HDF5 "alpha.h5" {
DATASET "/Parameters and Global Attributes/Datasets" {
   DATATYPE  H5T_STRING {
         STRSIZE 16;
         STRPAD H5T_STR_NULLTERM;
         CSET H5T_CSET_ASCII;
         CTYPE H5T_C_S1;
      }
   DATASPACE  SIMPLE { ( 1 ) / ( 1 ) }
   DATA {
   (0): "ML_BSSN::alpha
           "
   }
}
}

Keyword:

Comments (10)

  1. Roland Haas
    • removed comment

    Does HDF5 still read in the string attribute as a CHAR array later on (eg. in the checkpoint input routine, I do not know if this is used anywhere else but to checkpoint/recover the parameter string from IOUtil)? Looks harmless otherwise :-).

  2. Erik Schnetter
    • removed comment

    Carpet currently writes an array of characters (not an array of integers), thus the difference between this and a string is small.

    The patch does not close the new datatype, leading to a memory leak.

    The readers of these attributes, used mostly during recovery, also need to be updated to read strings and not expect arrays of characters.

  3. Roland Haas
    • removed comment

    silly question: do we know if trac actually automatically sends emails to the email of the reporter? Just in case anton is not subscribed to the ET mailing list.

  4. anonymous reporter
    • removed comment

    Replying to [comment:2 eschnett]:

    Carpet currently writes an array of characters (not an array of integers), thus the difference between this and a string is small.

    Right. To clarify, I'm using this in my python script for processing the data, since it seemed the most "proper" way to get grid attributes. Or is it a better idea to use something else?

    Of course I could do ''.join(map(chr, *)) on it, but writing it properly in the first place seems cleaner.

    The patch does not close the new datatype, leading to a memory leak.

    The readers of these attributes, used mostly during recovery, also need to be updated to read strings and not expect arrays of characters.

    Sorry about this, I'm not very familiar with the code yet. Should be fixed in the new patch.

  5. Erik Schnetter
    • removed comment

    No need to be sorry! That's what a review is for, and I thank you for improving our code.

    I am not overly familiar with strings in HDF5. I notice that the documentation of H5D_get_storage_size says that it is generally not recommended to use this function. H5T_get_size (the converse of the function used to create the string type) may be more appropriate. Could you update your patch?

    Did you run the test cases after applying your patch? I assume that (a) this patch indeed makes it simpler for Python to read the attributes, and (b) that in particular the recovery test cases succeed.

    To answer an earlier question: Yes, these attributes are how the grid structure is stored and retrieved. We opted for a human-readable (i.e. ASCII) format to simplify debugging.

  6. anonymous reporter
    • removed comment

    Replying to [comment:5 eschnett]:

    No need to be sorry! That's what a review is for, and I thank you for improving our code.

    I am not overly familiar with strings in HDF5. I notice that the documentation of H5D_get_storage_size says that it is generally not recommended to use this function. H5T_get_size (the converse of the function used to create the string type) may be more appropriate. Could you update your patch?

    Ok, done. It makes the patch slightly longer, but I guess a cleaner solution is better.

    Did you run the test cases after applying your patch? I assume that (a) this patch indeed makes it simpler for Python to read the attributes, and (b) that in particular the recovery test cases succeed.

    Yes, specifically the CarpetWaveToyRecover_test_*proc passes both with and without the changes to the checkpoint file.

    I'm a bit confused by the nobuffers test claiming No files created in test directory

    Success: 0 files identical

    but that seems unrelated to the patch.

  7. Log in to comment