Provide a framework for simulation metadata

When writing tools to analysis the output of a Cactus simulation, it would be very useful to have more information than is currently available, and some of the available information could be provided in a more convenient way. For example, users set many parameters, and this provides a good source of information about the simulation, but the parameter file is not always output (e.g. in testsuite data), and it does not contain the value of parameters which are unset, and hence have their default value. Similarly, the value of a parameter is not necessarily a good indicator of what actually happened. Often, a group of parameters needs to be interpreted together to determine the required quantity. For example, if I want to know what the intended final time of the simulation was, I would have to look at Cactus::terminate, Cactus::cctk_itlast and Cactus::cctk_final_time. If I want to know what the timestep or grid spacing on the coarsest grid is, I have to look at a similar set of parameters, or parse a grid structure file from Carpet for which there is no well-defined filename. If I want to know what the last iteration actually was, I have to find an output file and look at it, and there might not even be any appropriate files, depending on the user's choices.

I propose that Cactus provides a framework for simulation metadata. The following is one possible way that it could work.

Metadata for the simulation is collected and output to disk
The metadata comes from both the flesh and from thorns
The metadata format is extensible
The metadata format is easy to parse (hence, it is in a standard well-specified and commonly-supported format)
The metadata file is easily human-readable
The metadata file is always output, so that analysis tools can expect that it is present in modern simulations
The metadata file is not too large
The framework for metadata is managed by the flesh, as it is important and will be available for every Cactus simulation
One possible format for the metadata file is the "ini" file format, as used by SimFactory. This satisfies 3, 4 and 5 above.
There would be one section per implementation active in the simulation, and one for the flesh.
Each thorn is responsible for determining what metadata keys should be output.
The flesh will output essential characteristics of the simulation that is knows about, e.g. start and end iteration and times, run title, etc.
Output thorns will output the names of output files, and a description of what they contain.
Some metadata will be available at startup, some at termination, and some will become available only periodically. For example, due to parameter steering, the set of available output files might get larger during the simulation. We could either handle this by parsing and rewriting the metadata file to insert extra information into existing sections, or allow sections to be repeated. We have a parsing framework in the flesh now (Piraha), so this should be straightforward.
Metadata files will be modified safely (e.g. by writing a new one to a temporary file and moving it over the old one)
A distinction will be made between metadata items and parameters. Often, there will be a 1-1 correspondence between these. As a result, it would be good to have a convenient way for thorn authors to easily mark parameters as suitable for direct inclusion in the metadata file. For example, marking a parameter with a keyword "metadata = yes" or equivalent in the param.ccl file would cause a metadata key for this parameter to be automatically included in the metadata file.
Information which can change during a simulation might not be a good candidate for metadata; maybe then it becomes "data" and should be output in a separate file (pointed to by a metadata entry, of course). In that case, setting "steerable" and "metadata" for a parameter in param.ccl should lead to an error.
Metadata entries could be restricted to string values, or could have richer types. Richer types such as strings, integers, floating point numbers, and lists (possibly with nesting) might be convenient.
The flesh could provide a function CCTK_RecordMetadata(key, value) [surely the implementation does not need to be told to the flesh by the caller?]. This function would store the data in a flesh data structure, and note whether the on-disk file needed to be updated.
Every iteration, the flesh (on the first process) would update the on-disk metadata file if it needed to be changed.
The sections in the metadata file will correspond to implementations, and multiple thorns providing the same implementation [who chose this name?] if providing the same information should provide it using the same key names.

An example of this sort of idea is already implemented by TwoPunctures (~~#551~~), which outputs a TwoPunctures.bbh metadata file in the "numerical relativity data format".
The thorn Formaline currently handles a limited amount of metadata, but the scope is more limited than this ticket. The above proposal could be implemented using Formaline, but then you could not always expect that the metadata file is available as Formaline might not have been activated.

Keyword:

Comments (0)