JHOVE2 Glossary

Assessment - The process of determining the level of acceptability of a digital object for a specific purpose on the basis of locally-defined policy rules.

Characterization - (1) Information about a digital object that describes its character or significant nature that can function as an surrogate for the object itself for purposes of much preservation analysis and decision making. (2) The process of deriving this information. This process has four important aspects: identification, feature extraction, validation, and assessment.

Identification - The process of determining the presumptive format of a digital object on the basis of suggestive extrinsic hints (e.g. HTTP content-type header) and intrinsic signatures, both internal (e.g. magic number) and external (e.g. file extension). Ideally, format identification should be reported in terms of a level of confidence.

Feature extraction - The process of reporting the intrinsic properties of a digital object significant to preservation planning and action.

Format - A set of syntactic and semantic rules for encoding abstract information content into sequences of bits. Many formats can be grouped into loose categories, or families, sharing a general set of encoding rules that are further restricted or extended for the specific format or profile. A format version is considered a profile.

Parsing - The syntax-directed reading of a digital object's bit streams to retrieve the set of lexical tokens that encode that object's meaning.

Reportable unit - The logical object of characterization. Given N source units, the number of reportable units will be ³ N. Each file is, by definition, a reportable unit, but proper subsets of files and aggregations of files may also constitute reportable units.

Source unit - A named file-like entity (or directory of such entities) passed to an invocation of JHOVE2 for characterization. The characterization of a directory entails the recursive traversal and characterization of each subsidiary file and sub-directory. Files must exist in a locally-attached file system, be network accessible by an http scheme URL, or be readable through a programmatic interface.

Validation - The process of determining the level of conformance of a digital object to the normative syntactic and semantic rules defined by the authoritative specification of the object's format.