Wiki

Clone wiki

OYSTER / Index_Recall_and_Precision

Two measures often applied to an index are its recall measure and its precision measure. Recall is the percentage of record pairs that match by one of the rules and that also generate the same match key by one of the indices. Proper alignment is achieved between rules and indices when the indices have 100% recall. On the other hand, precision is the percentage of record pairs generating the same match key that also match by one of the rules.

When an index has less than 100% precision, it means that it will return some records to the rules for match comparison that will not match. The lower the precision the more unnecessary effort that will be expended by the process in comparing an input record to previously processed records that will not match by any of the rules. It is the recall measure that has the most impact the accuracy of the ER process. When the recall is less than 100%, it means that the indices will fail to find some matches between records that are present in the data. In this respect, recall is the more important consideration for accuracy of the ER process, whereas precision the more important consideration for the efficiency (performance) of the ER process. The general rule for index design is to first achieve 100% recall (alignment), then work to increase precision while maintaining 100% recall.

Back to OYSTER Reference Guide page

Click Prev Alignment of Index with Match Rules page

Click Next Alignment and Scenarios page

Updated