Wiki

Clone wiki

OYSTER / Demo 3 - IdentityResolution

Demo 3 - IdentityResolution

Identity Resolution is a form of Entity Resolution is which all incoming references are resolved against a predefined set of managed identities (Knowledge base). Each identity in an identity resolution system has a fixed identifier that can be used to link references that are equivalent to the identity, thus creating a persistent link.

This run will use the test source file named ‘IdentityResolutionTest.txt’. This data consists of the same six references that were used for the previous Merge-purge and IdentityCapture example and can be seen in Figure 1.

inputfile.jpg

Figure 1: Identity Resolution Source Input

The Match Rules defined for this run are likewise identical to the Match Rules used in the Merge-purge and IdentityCapture run. The rules can be seen in Figure 2.

IdentityResolutionAttributes.jpg

Figure 2: Identity Resolution Match Rules

An IdentityResolution run requires previously defined identities be provided as input in the form of an .idty file. This run uses the .idty file which is in IdentityResolution Input folder with the name IdentityResolutionInputIdentities.idty. Similar to the Merge-purge run, Identity Resolution does not retain any identities. Unlike the Merge-Purge and the Identity Capture Run, no matches are done between records in the input source. The only matching that takes place is a look-up type match that is performed between each record in the input source and the identities in the .idty file used as input.

1. Enter ‘IdentityResolutionRunScript.xml’ and press Enter to perform the run, shown in Figure 3.

IdentityResolutionRunScript.jpg

Figure 3: Running IdentityResolution Run Script

2. Information about the run will be displayed in the Command Prompt. For this run, there are 6 references processed and grouped as 2 identities. The OYSTER Run Statistics are shown in Figure 4 to Figure 7. This may seem a little confusing but for Identity Resolution runs, OYSTER only counts unique identities that were matched with the input records when specifying the number of identities for the run. This is talked about more later.

IdentityResolution OYSTER Run Statistics 1.jpg IdentityResolution OYSTER Run Statistics 2.jpg IdentityResolution OYSTER Run Statistics 3.jpg IdentityResolution OYSTER Run Statistics 4.jpg

Figure 4-7: IdentityResolution OYSTER Run Statistics

3. After the run finishes, the Output folder will contain the IdentityResolution.link file and other files auto generated by the run, shown in Figure 8.

IdentityResoluton Output Folder.jpg

Figure 8: IdentityResoluton Output Folder

4. For this run, OYSTER does not create the persistent identifiers but looks up the OYSTER ID for the EISs that were found to match the source references. It lists these matching IDs in the LinkIndex.link file. Shown in Figure 9.

IdentityResolution.link File.jpg

Figure 9: IdentityResolution.link File

By examining the .idty which is in IdentityResolution Input folder with the name IdentityResolutionInputIdentities.idty, and the input for this run, it can be seen that a look-up occurred where records that exist in the .idty file received the same Oyster ID as their matching identities. For example, records IR1.1, IR1.3, and IR1.5 from this run matched a previously defined identity on Rule 1 since they have the same FirstName, LastName, and DOB as at least one of the records in the previously identified identity “BMN73PME2SJBOHK9”. Simularly, IR1.2 and IR1.4 from this run matched a previously defined identity on Rule 1 since they have the same FirstName, LastName, and DOB as at least one of the records in the previously identified identity “R9AY3WK1JAHUTWKK”. Note that IR1.6 was assigned the Oyster ID of ‘XXXXXXXXXXXXX’. This is because the source reference was not found in the Knowledge Base used as input for this run. The ‘XXXXXXXXXXXXX’ represents that OYSTER contains no knowledge about the source reference. OYSTER does not consider records that receive an Oyster ID of ‘XXXXXXXXXXXXX’ when compiling the run statistics as mentioned earlier.

You may replace the input data in the IdentityResolutionTest.txt file with your data, and edit the IdentityResolutionSourceDescriptor.xml, IdentityResolutionAttributes.xml, and IdentityResolutionRunScript.xml files to correspond to your new data. Information on each of the XML configurations can be found in the OYSTER Reference Guide.

Back to OYSTER Demonstration Run page

Updated