Clone wiki

main / JHOVE_Project_Scope


JHOVE has proven to be a successful tool for format-specific digital object identification, validation, and characterization, and has been integrated into the workflows of most major international preservation institutions and programs. Using an extensible plug-in architecture, JHOVE provides support for a variety of digital formats commonly used to represent audio, image, and textual content.

However, while the value of JHOVE has been amply demonstrated, the collective experience of the JHOVE user community over the past two years has revealed a number of areas that can benefit from subsequent work. The aims of the JHOVE2 project are threefold:

  • To refactor the existing JHOVE architecture and API with an eye towards:
    • Rectifying known inefficiencies and idiosyncrasies of design and implementation
    • Simplifying the process of integrating JHOVE2 functionality into other systems, services, and workflows
    • Encouraging third-party extensions to the base JHOVE2 functionality
  • To support enhancements to existing JHOVE functionality, including:
    • Separation of identity detection from validation
    • Standardized error handling
    • A rules-based approach towards configurable criteria for well-formedness and validity
    • Standardized handling of validation profiles
    • Standardized XML-based reporting with XSL stylesheet customization
    • The ability to invoke modules capable of performing arbitrary functions
    • A more sophisticated digital object model explicitly supporting complex multi-file or hierarchical objects
  • To develop JHOVE2 modules supporting a number of important preservation-related processes:
    • Identification based on internal signature matching
    • Validation and characterization, using evolving community metadata standards for reporting
    • Humanly-readable display in symbolic form of the contents of selected binary formatted objects
    • Low-level API support for modifying formatted objects to create new objects, useful for example to correct existing internal metadata or to embed additional metadata in a syntactically correct manner
    • Format-based assessment based on configurable rules and heuristics

JHOVE2 signature-based identification should be able to recognize the set of the formats with documented signature in PRONOM and the Unix/Linux magic(4) database.

JHOVE2 will provide validation and feature extraction support for the following formats and their subtypes:

  • ICC color profile
  • JPEG 2000
    • JP2, JPX
  • PDF
    • PDF 1.0 - 1.7, PDF/A, PDF/X
  • SGML
  • Shapefile
  • TIFF
    • TIFF 4 - 6, Class B, G, R, Y, F, TIFF-FX, TIFF/EP, TIFF/IT, Exif, GeoTIFF, DNG
  • UTF-8
    • ASCII
  • WAVE
    • Broadcast WAVE
  • XML

Unfortunately, budgetary limitations constrain the ability of the JHOVE2 project team to be obligated to support additional formats. Nevertheless, all effort will be made to provide some level of support for the JHOVE1-supported formats as resources permit: AIFF, GIF, and JPEG. The complexity of supporting HTML effectively precludes the potential for any significant work towards offering support in JHOVE2.