Wiki

Clone wiki

OYSTER / 1_-_Introduction_to_OYSTER

Introduction to OYSTER

OYSTER is an open-source software development project sponsored by the ERIQ Research Center at the University of Arkansas at Little Rock (ualr.edu/eriq). OYSTER (Open sYSTem Entity Resolution) is an entity resolution system that supports probabilistic direct matching, transitive linking, and asserted linking. To facilitate prospecting for match candidates (blocking), the system builds and maintains an in-memory index of attribute values to identities. Because OYSTER has an identity management system, it also supports persistent identity identifiers. OYSTER is unique among other ER systems in that it is built to incorporate Entity Identity Information Management (EIIM). OYSTER supports EIIM by providing methods that force identifiers to be unique among identities, maintain persistent IDs over the life of an identity, and by allowing the ability to fix false-positive and false-negative resolutions, which cannot be done with matching rules, through the use of assertion, traceability, and other features.

OYSTER is written in Java and the source code and documentation are available as a free download on file page of the OYSTERER SourceForge website (http://sourceforge.net/projects/oysterer/). OYSTER is free for use under the OYSTER open-source license and the GNU General Public License version 2.0 (GPLv2).

Although the original version of OYSTER was developed to support entity resolution (ER) for student records in longitudinal studies, the system design readily accommodates a broad range of ER domains and entity types. A key feature of the system is that all entity and reference-specific information is interpreted at run-time through user-defined XML scripts. This allows OYSTER to be configured as a merge-purge, identity capture, identity resolution, identity update, or assertion system.

The OYSTER Project has been guided by several design principles

  • OYSTER does not use an internal database for its operation.
  • System inputs and outputs can either be text files or database tables.
  • XML scripts are used to define.
      entity identity attributes.
    
      The layout of each reference source.
    
      Identity rules for resolving each reference source.
    

Next to Entity Identity Information Management-EIIM Page

Back to OYSTER User Guide Page

Updated