HTTPS SSH

This is an experimental feature branch. The stable repository is at bitbucket.org/JohnSalmon/SDF

libSDF
doi

Copyright (c) 1992-1994, John K. Salmon
Copyright (c) 1994, 2003, 2004, 2011-2014, Michael S. Warren
All rights reserved.

libSDF is distributed under the terms of the BSD 3-clause License.
The full license is in the file COPYING.txt, distributed with this software.

For use in research and related activities, please cite the following as appropriate:

John K. Salmon and Michael S. Warren, (2014). Self-Describing File (SDF) Library. ZENODO. doi:10.5281/zenodo.10469

This is the public release of version 1.0 of SDF ("Self Describing Files" or "Super-Duper Files, depending on how impressed you are). It is the first version of the last I/O package you'll ever need. SDF files are binary data files with an optional header which contains 1) a description of the layout of the data, and 2) optional ascii constants. There is no output capability because the SDF files are so easy to write :-). SDF has been used to manage the configuration and output of large parallel simulations for over 20 years.

The user-callable SDF programs are declared with prototypes in SDF.h. Writing SDF files can be accomplished without going through a programmatic interface by just creating the files with fprintf() and fwrite(). The basic idea behind SDF files is that they are self-describing. That is, they have an ASCII header (human-readable, machine-parseable), which describes the contents of the file. The ascii header may contain explicit values, like:

  int npart = 200000;
  double redshift = 0.53;
  char text[] = "Perhaps a few words about the file's pedigree could go here";

Notice the similarity to C declarations.

In addition, the header contains declarations for the binary data that appears immediately after it. The allowed data types are char, int16_t, int32_t, int64_t, float and double; arrays of same, and structs containing the basic types and arrays. In declarations 'short' is a synonym for int16_t and 'int' and 'long' are both synonyms for int32_t. (Multi-dimensional arrays are not supported. Nor are nested structures. But some kinds of two dimensional arrays can be captured by an array of structs, c.f., the 'id' vector in .tree files. These limitations may be relaxed in the future.)

The header is terminated by a comment of the form

# <anything> SDF-EOH <anything> \n

That is, any comment containing the string SDF-EOH. The final new-line is the last character of the header. The binary data follows immediately after the new-line character. It is strongly recommended that the terminal comment contain one or more form-feeds (ctrl-L, \f in ANSI, \014 (octal), 0xc (hex), 12 (decimal)). That way, 'more' or similar programs can be used on the file without getting confused by the binary data. Similarly, it is strongly recommended that the first line of an SDF file contain the comment:

 # SDF <anything> \n

This makes it easy for a modified version of 'file', as well as other utilities to figure out that they are dealing with an SDF file.

Thus, the header for the output of an nbody simulation might look like:

# SDF 1.0
int64_t npart;
int iter;
double time;
...
char tree_header_text[384];
struct {
       float mass;
       float x, y, z;
       float vx, vy, vz;
       int64_t ident;
} [];
# SDF-EOH ^L^L

This header means that the floats npart, iter, time, etc. are stored as binary data following the header. Then comes a 384 byte character array, followed by an array (of unspecified length) containing the vectors mass, x, y, z, id, vx, vy, vz. Only the last array in the header may be of unspecifiec length. It means that when the file is read, the array is assumed to extend to the end of the file. SDF routines figure out the length of the file by asking the OS, and hence can determine the number of elements in arrays of unspecified length. Specifications with unknown length are useful for creating generic SDF headers, i.e., headers that describe binary files that you may already be using. Notice we have included a version number in the first line, which would enable future implementations to identify files incompatible with format extensions which may be added in the future.

If one were writing out a new SDF file, it is possible to write a header exactly as above, followed by the identical binary data. However, it would be much more convenient to write the "scalar" data as ascii values into the header. A new SDF file might look like:

# SDF 1.0
/* Comments may be between C-like comment delimiters */
# Or they follow the shell-like comment delimiter to end-of-line
# This file was created by dave@h9k.hal.com on January 12, 1992...
int64_t npart = 200000;
float Gnewt = 1.0;
int iter = 17;
float masstot = 1.1;
float epsilon = 0.02;
...
struct {
    float mass;
    float x, y, z;
    float vx, vy, vz;
    int64_t ident;
}[200000];
# SDF-EOH ^L^L

This has the great advantage that most of the file's parameters are now both human-readable and machine-readable. A disadvantage to putting "history" information into comments is that it becomes inaccessable to programs (since SDF doesn't record comments). Another option is to put it in a character string:

   char history[] = 
   "This file created by ... on ...
   200000 body torqued Jaffe model constructed using:
   cubix ...
   ";

Don't bother with C-syntax for newlines, etc in character strings. SDF just scans till it hits another " character. It doesn't do any escape-interpretation, so don't bother with '\n' and especially don't try to put the double-quote character inside strings.

Byte order is another headache. The function SDFcpubyteorder() returns an unsigned int which "describes" the host-cpu's byte order, in a format which may be placed into an SDF header with the. "parameter byteorder" keyword.

 parameter byteorder = 0x12345678;

You can make this line in a C program with:

fprintf(fp, "parameter byteorder = 0x%x;\n", SDFcpubyteorder());

If such a parameter is in the header, then SDFread functions will assume that the binary data is written in the file in the byte order specified by the parameter. If the machine doing the reading has a different byte order, then bytes are swapped automatically. If there is no such parameter, you can tell the read functions to swap with SDFswapon(). Similarly, you can turn off swapping (for whatever reason) with SDFswapoff(), and you can inquire about swapping with SDFisswapping();

Non-ieee floats are completely out of the question.