Wiki

Clone wiki

OYSTER / Example_-_Identity_Resolution

Example_-_Identity_Resolution

This example will use the test source file named ‘IdentityResolutionTest.txt’, shown in Figure 98. This data consists of six references. Each reference is defined by a RefID, FirstName, LastName, DOB, and SchCode attributes.

This run also uses the identities created by the previous assertions run that are stored in the AssertionsOutputIdentities.idty file. The identity resolution run requires a set of identity inputs so that it can identify identities in the input source that resolve to one of the input identities.

Figure 98.PNG

After analyzing the source data the source descriptor file can be created.

Using this source file and these two rules, the source descriptor file, named ‘IdentityResolutionSourceDescriptor.xml’ can be created. This file is shown in Figure 99.

Figure 99.PNG

Again, by following the same process as was performed when setting up the merge-purge example, once the source descriptor is defined the source attributes file must also be defined. This file is stored in the Source folder along with the Source Descriptor file. The attributes file is used to define the attributes in the source along with the algorithm used to compare the attributes and the matching (Identity) rules used when ER is performed. For this example run two identity rules will be used. The first rule says that the reference will be considered equivalent if the FirstName, LastName, and DOB attributes match. The second rules states that the references are equivalent if the LastName, DOB, and SchoolCode match. The source attribute file is named ‘IdentityResolutionAttributes.xml’ and is depicted in Figure 100.

Figure 100.PNG

As with the other examples, the last file that needs to be created is the RunScript for this example. For this identity resolution example, no output identity file should be specified in the Run Script but both the input identity file and the link files should be specified. The Run Script should again be stored in the root OYSTER folder as this is where the OYSTER program is expecting the file to reside. The file for this example is named ‘IdentityResolutionRunScript.xml’ and is shown in Figure 101.

Figure 101.PNG

Now that all the scripts for the Identity Resolution example have been created we can run OYSTER. This process is depicted in Figure 37, Figure 38, and Figure 39 and described in their surrounding text in the Example section.

Once the run is complete the output for the run will be written to the command box by OYSTER. This output is shown in Figure 102 and Figure 103.

Figure 102.png

Above is the Figure 102: Screen Output Created by Identity Resolution Example Run - 1

Figure 103.png

Above is the Figure 103: Screen Output Created by Identity Resolution Example Run - 2

The output in Figure 103 may seem a little confusing in that it states that OYSTER was able to process 6 records but only found 2 entities (Clusters). This is due to the way Identity Resolution works; it only creates links for records that resolve to an existing identity in the identity input file. If a reference in the source input does not resolve it is not counted.

Figure 104.PNG

As you can see in the Link file generated by this identity resolution example, shown in Figure 104, OYSTER was able to create links for 5 of the 6 references in the source data. The sixth reference was assigned an OYSTER ID of ‘XXXXXXXXXXXXXXXX’ which means that reference could not be linked to any existing identity in the identity input file for this run. This is the type of data you are looking for when performing Identity Resolution.

Previous to Configuration - Identity Resolution Page

Back to OYSTER User Guide Page

Updated