HTTPS SSH

Example Data Handler

Author: Daniel Needham (daniel.needham@manchester.ac.uk)

Overview

The example data handler show how the names disambiguation library can be used to disambiguate the unique individuals with a source data set.

With the Names context a data handler is the term given to something that retrieves meta data pertaining to individuals from a source data set and then uses the disambiguation library to firstly normalise that data, and then compare the resulting records to determine match scores for each comparison. These match scores can then be used to determine candidate matches.

Dependencies

The example data handler is a maven managed Java application. Its dependencies are:

  1. Log4J
    • This should be picked up from maven's central repository
  2. names-disambiguator
    • This currently needs to be downloaded from here and added to your local maven repository

In this example:

  1. Build an example data source in memory.
  2. Iterate through the data source in batches, transforming each record into a normalised names record.
  3. Use the names disambiguator to derive match scores for comparisons between each record.
  4. Dump the resulting match scores into a tab delimited file.