Wiki

Clone wiki

OYSTER / With_OYSTER

With_OYSTER

To perform this task using OYSTER, the first step should be to analyze the data set and create an OysterSourceDescriptor file. This file will assign logical names to each of the attributes in the data set.

This dataset contains four attributes: ID, FirstName, LastName, DateofBirth. These four attributes should be translated into the ReferenceItems section of the OysterSourceDescriptor file. Note that when using a delimited text file as the input source the “Name” and “Attribute” values defined for each item do not need to match the labels given in the files. The “Pos” however has to match the relative position of the attribute in the input source.

The ReferenceItems sections would be defined as illustrated in Figure 15.

Figure 15.PNG

The last step in building the OysterSourceDescriptor is specifying the type of file and where it will be located. Since this dataset is delimited by the character “|”, the entry would look like the Source in Figure 16 assuming the file has been saved as

Z:\Oyster\Input\Data_1.txt.

Figure 16.PNG

Those are all of the steps required in building the OysterSourceDescriptor. You should make a note of where the file is saved as this location must be specified later in the OysterRunScript. The completed OysterSourceDescriptor file is illustrated in Figure 17. Save this file in the Scripts folder discussed earlier in the Files and Structure section of this document.

Figure 17.PNG

Now that the OysterSourceDescriptor file has been created the OysterAttributes file must be created to define which algorithms should be used on each attribute during matching and what matching rules will be used during ER. Note that the attributes Item value in the OysterAttributes file must match the value assigned to Attribute in the OysterSourceDescriptor file. Since only Missing and Exact are required when doing matching for this dataset, the default OYSTER matching algorithm can be used. The defined attributes are shown in Figure 18. The Algo attribute can be left off since the default algorithm is to be used by OYSTER.

Figure 18.PNG

The next step is to review the matching rules provided and translate them into Rules that can be used by OYSTER. Rule one states that the first name is missing, the last names are the same, and the dates of birth are the same. As discussed earlier, by default OYSTER can handle Exact and Missing matching (along with others). That means this rule could be translated into the OYSTER rule illustrated in Figure 19.

Figure 19.PNG

The next rule states that the first name was entered incorrectly but the last names are the same and the date of birth are the same. This rule can be translated into the OYSTER rule illustrated in Figure 20.

Figure 20.PNG

The last rule states that the first names are the same, the last names are the same, the dates of birth are the same but the employee was supplied a new ID. This rule can be translated into the OYSTER rule illustrated in Figure 21.

Figure 21.PNG

The complete OysterAttributes file is illustrated in Figure 22.

Figure 22.PNG

Save this file in the Scripts folder discussed earlier in the Files and Structure section of this document. This location will be specified in the OysterRunScript.

Now that the OysterAttributes and OysterSourceDescriptor files are created, the OysterRunScript can be created. This file requires you specify the location of your OysterAttributes and OysterSourceDescriptor as well as the location of where you want the output files to be stored. You are can also set the level of logging to take place and the RunMode that will be used for this particular run. The complete OysterRunScript is illustrated inFigure 23. Save this file as “Z:\Oyster\OysterRunScript.xml”.

Figure 23.PNG

Now that each of the three required XML files have been created, you can launch OYSTER through either of the methods discussed in the Launching OYSTER section of this document. Once launched, you will be prompted for the name of the OysterRunScript as shown in Figure 24:

Figure 24.PNG

The OYSTER version number is displayed when OYSTER is launched as depicted in Figure 24.

Once you specify the name of the OysterRunScript and press enter OYSTER will process your records and display the results on the screen, the results for this run are shown on Figure 25 and Figure 26.

Figure 25.PNG

Figure 26.PNG

From the results you can see that OYSTER was able to read nine records and resolve them to five entities (Clusters).

The OYSTER output is stored in XML format in the “Identity Change Report.txt”, “Test.idty”, and the “Test.link” file that was specified in the OysterRunScript. The contents of the files are shown in Figure 27, Figure 28, Figure 29 respectively.

Figure 27.PNG

Figure 28.PNG

Figure 29.PNG

Previous to 4 - Example Scenario Page .................................................................. Next to What Is So Great About OYSTER Page

Back to OYSTER User Guide Page

Updated