Diagnostics usually come in the form of plots or summary statistics. They can serve many purposes, such as:
The diagnostics database table archives summary statistics that can be accessed across multiple stages of a pipeline, from different pipelines, and in HTML reports.
A diagnostics record looks like:
catalog_id | run_id | entity | attribute | value | timestamp
The entity field acts as a namespace to prevent attribute collisions, since the same attribute name can arise multiple times within a pipeline run.
When running a BioLite pipeline, the default entity is the pipeline name plus the stage name, so that values can be traced to the pipeline and stage during which they were entered. Entries in the diagnostics table can include paths to derivative files, which can be summaries of intermediate files that are used to generate reports or intermediate data files that serve as input to other stages and pipelines.
Before logging to diagnostics, your script must initialize this module with a BioLite catalog ID and a name for the run using the init method. This will return a new run ID from the runs Table. Optionally, you can pass an existing run ID to init to continue a previous run.
Diagnostics are automatically initialized by the Pipeline and IlluminaPipeline classes in the pipeline Module.
Use the log function described below.
Detailed system utilization statistics, including memory high-water marks and compute wall-time are recorded automatically (by the wrapper base class) for any wrapper that your pipeline calls, and for the overall pipeline itself.
Because every wrapper call is automatically logged, the diagnostics table holds a complete non-executable history of the analysis, which complements the original scripts that were used to run the analysis. In combination, the diagnostics table and original scripts provide provenance for all analyses.
Bases: tuple
OutputPattern(re, entity, attr)
Alias for field number 2
Alias for field number 1
Alias for field number 0
Bases: tuple
Run(done, run_id, id, name, hostname, username, timestamp, hidden)
Alias for field number 0
Alias for field number 7
Alias for field number 4
Alias for field number 2
Alias for field number 3
Alias for field number 1
Alias for field number 6
Alias for field number 5
Returns the current time in ISO 8601 format, e.g. YYYY-MM-DDTHH:MM:SS[.mmmmmm][+HH:MM].
Converts a diagnostics string with key name in self.data into a list, by parsing it as a typical Python list representation [item1, item2, ... ].
By default, appends to a file diagnostics.txt in the current working directory, but you can override this with the workdir argument.
You must specify a catalog id and a name for the run. If no run_id is specified, an auto-incremented run ID will be allocated by inserting a new row into the runs Table.
Returns the run_id (as a string).
Aborts if the biolite.diagnostics.init() has not been called yet.
Merges the diagnostics and program caches into the SQLite database.
Similar to a merge, but loads the local diagnostics file into an in-memory cache instead of the SQLite database.
Uses the filename specified with name, or the file diagnostics.txt in the current working directory (default).
Log an attribute/value pair in the diagnostics using the currently set entity. The pair is written to the local diagnostics text file and also into the local in-memory cache.
Logs a path by writing these attributes at the current entity, with an optional prefix for this entry: 1) the full path string 2) the full path string, converted to an absolute path by os.path.abspath() 3) the size of the file/directory at the path (according to os.stat) 4) the access time of the file/directory at the path (according to os.stat) 5) the modify time of the file/directory at the path (according to os.stat) 6) the permissions of the file/directory at the path (according to os.stat)
Log a dictionary d by calling log for each key/value pair.
Enter the version string and a hash of the binary file at path into the programs table.
Read backwards through a program’s output to find any [biolite] markers, then log their key=value pairs in the diagnostics.
A marker can specify an entity suffix with the form [biolite.suffix].
[biolite.profile] markers are handled specially, since mem= and vmem= entries need to be accumulated. These are inserted into a program’s output on Linux systems by the preloaded memusage.so library.
You can optionally include a list of additional patterns, specified as OutputPattern tuples with:
(regular expression string, entity, attribute)
and the first line of program output matching the pattern will be logged to that entity and attribute name. The value will be the subexpressions matched by the regular expression, either a single value if there is one subexpression, or a string of the tuple if there are more.
Returns a dictionary of attribute/value pairs for the given run_id and entity in the SQLite database.
Returns an empty dictionary if no records are found.
Similar to lookup, but queries the in-memory cache instead of the SQLite database. This can provide lookups when the local diagnostics text file has not yet been merged into the SQLite database (for instance, after restarting a pipeline that never completed, and hence never reached a diagnostics merge).
Returns an empty dictionary if no records are found.
Similar to lookup, but allows for wildcards in the entity name (either the SQL ‘%’ wildcard or the more standard UNIX ‘*’ wildcard).
Returns a dictinoary of dictionaries keyed on [entity][attribute].
Returns each value for the given attribute found in all entities for the given run_id, as an iterator of (entity, value) tuples.
If previous is an integer, tries to lookup the exit diagnostics of a previous run with that run ID. If previous is any string, To input the results from a previous pipeline run, use the (–previous, -p) argument with a ‘RUN_SPEC’, which is either a specific run ID to lookup in the diagnostics, or the wildcard ‘*’, meaning the latest of any previous run found in the diagnostics for the given catalog ID.