Wiki

Clone wiki

OYSTER / Multiple_Index_Strategy

Most entity resolution systems that use match-key indices will also allow for defining more than one index. When more than one index is defined, the candidate list returned to the match rules is the union of all of the candidates returned by each of the indices. Multiple indices are often needed when different rules compare different sets of attributes.

Screen Shot 2019-09-10 at 5.22.14 PM.png Figure 28 shows a set of two matching rules on four attributes. Even though both rules match on First Name, the comparators are different. Because the second rule compares First Name by NickName, indexing on First Name would cause misalignment. The only alignment solution using a single index would be to simply index on Last Name. This means that for every input reference, the index would return a list of all previously processed references that share the same first five characters of the last name. The two-index solution shown on the right would provide better performance while still maintaining alignment. Index 1 is designed to be in alignment with Match Rule 1 while taking advantage of the fact that names matching by Soundex must also agree on the first character. It also includes Last Name and Birth Year since these are Exact match attributes in Rule 1. Similarly Index 2 is designed to align with Rule 2. Even though it does not use First Name because of the NickName comparator, it does use the School ID that is in Rule 2, but not in Rule 1. Since each rule is in alignment with at least one index, the entire set of rules is alignment. Furthermore the combined set of candidates returned for each input record by the two indices will no larger than the set of candidates returned by the single index, and in most cases will be smaller.

Screen Shot 2019-09-10 at 5.22.54 PM.png The effect of combining multiple indices is illustrated in Figure 29. Whereas neither index by itself is in alignment with the rules, together they can create alignment. Any overlap between the indices (i.e. same candidate references returned by more than one rule) can easily be removed prior to matching, so overlaps between multiple indices does not add to the number of comparisons that must be made.

Back to OYSTER Reference Guide page

Click Prev Index Fewer Attributes Strategy page

Click Next The Alignment Process page

Updated