Wiki

Clone wiki

OYSTER / User_Define_Index-UDI

User_Define_Index-UDI

OYSTER v 3.3 introduced an exciting new feature that allows a user to define multiple customized indices. As mentioned earlier, OYSTER uses indices to build the candidate lists used when matching is performed. In the original design of OYSTER, the attribute values of incoming references were inserted into an inverted index as a way to find the most probably match candidates for newly input references. However, it was found that this method of building the inverted index has three major drawbacks.

  1. It is predicated on the idea that each identity rule will have at least one exact match term. System performance tends to degrade dramatically when rules have one or more inexact match terms (LED, Scan, etc.)
  2. The index key is always a value from a single attribute. To find candidates for multi-term identity rules the system has to perform multiple lookups, one for each rule term with an exact match before reducing the candidate set to a manageable size.
  3. The index logic is fixed. Therefore to gain maximum performance, users must tailor identity rules in a way that best fits the logic of the index scheme rather than using an index logic that best fits the identity rules being used.

The new user-defined index scheme will allow the user to direct OYSTER to create a single index value that represents multiple terms (attributes) in a rule. It will also allow the user to define more than one index which allows the user to customize an index for specific rules. Each index defined by the user will be a single value formed by concatenating a series of “hash values.” Each hash value is created by applying a pre-defined transformation to the value of an attribute. Many of the hash algorithms may be the same as or directly related to a particular similarity function (Soundex, Scan, etc.)

This ability to build custom indices provides drastic improvements in runtimes. The tradeoff is that the user must have an intimate knowledge of the rules and data to build optimized indexes. When indexes are designed correctly, the OYSTER runtime can decrease from hours and possible days to a matter of minutes. It is important to note that with the introduction of UDI into the OYSTER system, the default inverted index has been removed and if not UDI is defined, OYSTER will do brute force comparisons (compare every record with every other record).

The syntax of UDIs is defined in detail in the Oyster v3.3 Reference Guide.

Previous to Change Report Page ........................................................................... Next to Cross-Attribute Comparison-CAC Page

Back to OYSTER User Guide Page

Updated