Wiki

Clone wiki

OYSTER / Oyster_Attributes

Oyster_Attributes

The OysterAttributes file is used to identify each attribute that is present in the input source and the rules to be used by OYSTER to perform ER on the input source records. Each XML <Attribute> element in the document defines one OYSTER attribute labeled by the value of the XML “Item” attribute. These labels are used in the Source Descriptor to identify the logical items in a Reference Source that are OYSTER identity attributes. In addition to the attribute name, the value of the XML attribute “Algo” specifies a pre-defined matching algorithm that is to be used for comparing attributes value of this type. Specifying an algorithm is optional, and if it not given or a given name is not found, then a default matching algorithm is used.

A sample OysterAttributes file is illustrated in Figure 13.

Figure 13.PNG

In this example the default matching algorithm will be used since Algo is specified as “None”. This provides the users the ability to match using the following comparators:

  • True/False

    • EXACT
    • EXACT_IGNORE_CASE
    • TRANSPOSE
    • INITIAL
    • NICKNAME
    • SOUNDEX
    • DMSOUNDEX
    • IBMALPHACODE
    • MATCHRATING
    • NYSIIS
    • CAVERPHONE
    • METAPHONE
  • Functionalized

    • LED - default is 0.8 if LED match is used, signature for user defined threshold is LED(threshold)
    • QTR - default is 0.25 if QTR match is used, signature for user defined threshold is QTR(threshold)
    • SUBSTRLEFT(length)
    • SUBSTRRIGHT(length)
    • SUBSTRMID(start, length)
    • Scan(Direction, CharType, Length, Casing, Order)
    • SmithWaterman(Match, Mismatch, Gap, Threshold)

All the above comparators are described in detail in the “Oyster v3.3 Reference Guide”.

The last section of the file, illustrated in Figure 14, specifies the identity rules that are defined by the user to be used to perform ER on the records in the input source.

Figure 14.PNG

In this script ‘MatchResult=”Exact”’ and ‘MatchResult=”Initial”’ are being used. By default OYSTER can use the match codes listed above but this can be extended by the user by extending the base class OysterComparator.java as a new class with a name starting with “OysterCompare” and implementing the method String: getMatchCode(String, String).

A full reference of the OysterAttributes file can be found in the OYSTER Reference Guide.

As of OYSTER v3.3 the OysterAttributes file also allows for user Defined Indexes (UDI) to be defined for the run and also allows users to specify Cross-Attribute Comparison (CAC) rules. Both of these new functionalities are define din detail in the OYSTER Reference Guide.

Previous to OYSTER Reserved Words Page .................................................................. Next to 4 - Example Scenario Page

Back to OYSTER User Guide Page

Updated