Wiki

Clone wiki

OYSTER / Alignment_of_Index_with_Match_Rules

It is important to understand that the index generators work independently of Match Rules. In order to get accurate ER results, the match rules and the index generators must work together. The match rules can only compare an input record to the set of candidate references returned by the index. Even though an input is capable of matching a previously processed reference, that match will only be known by the system if the matching reference is found by the Index, i.e. the input and previously processed reference generate same match key.

Screen Shot 2019-09-10 at 5.19.24 PM.png Figure 4: Examples of Good and Poor Rule-to-Index Alignment The Rules and Index in Figure 24 labeled "Proper Alignment" are in alignment because if two references match on first name by Soundex, then the two names must begin with the same first letter. This means that the hash function that extracts the first left character of the name will also be the same since the Soundex algorithm does not change the first character of the input string. Therefore if two names match by Soundex then it follows that they must begin with the same letter. Similarly if two last names are the same (Exact match), then the hash function that extracts the first five letters of the name will also give the same value.

On the other hand, the Rules and Index shown in Figure 24 and labeled "Misalignment" fail to align properly. The problem is that the first name comparator is by nickname or alias. Two nicknames may not begin with the same first letter. For example, a reference with first name "ROBERT" and last name "SMITH" will generate the match key "RSMITH" by this index. Another reference with first name "BOB" and last name "SMITH" will generated the match key "BSMITH". However, these two references will match since "BOB" is a common nickname for "ROBERT", but these references generate different match keys. Therefore there is a rule-to-index misalignment.

Back to OYSTER Reference Guide page

Click Prev Index Operation page

Click Next Index Recall and Precision page

Updated