Wiki

Clone wiki

OYSTER / Demo 8 - StrSplitAssertion

Demo 8 - StrSplitAssertion

Structure Split Assertion (SplitStr) is a type of assertion created for the OYSTER system that forces a single identity structure found in an existing knowledge base to be divided into two (2) or more identity structures. This is used to fix false positive matches that were produced by the OYSTER match rules in previous runs. Through the use of SpltiStr Assertion an identity structure can be forced to split and negative assertion rules are put into place in the knowledge that will never allow these newly split identity structures to be merged in the future. These splits are based on previous knowledge of the references in the identity structure.

This run will use the test data file named ‘StrSplitAssertionTest.txt’, illustrated in Figure 1. This data consists of two reference composed by four attributes. The first attribute is the RefID, this is a unique identifier associated to each record. The second attribute is the @RID, this attribute specifies which specific reference in the identity structure needs to be removed. The third attribute is the @OID attribute; this attribute is assigned by the user and one of the OysterIDs from the input identity file that the user wants to remove the reference from. The last attribute is the AssertSplitStr attribute; this is set by the user and should match for the references that contain the RIDs for references in the identity specified by the OID value that the user wants to keep together but split from the identity.

Figure1.JPG Figure 1: Input file StrSplitAssertionTest.txt

SplitStr Assertion Runs do not require any Match Rules to be specified since OYSTER bases its decisions solely on the values assigned by the users to the RID, OID, and AssertSplitStr fields. Users are however required to specify which field is to be used for Assertions by using the “@RID”, “@OID” and “@AssertSplitStr” keyword in the StrSplitAssertionSourceDescriptor.xml file.

Enter ‘StrSplitAssertionRunScript.xml’ and press Enter to perform the run as shown in Figure 2.

Figure2.JPG

Information about the run will be displayed in the Command Prompt. The OYSTER run statistics for this run are shown in Figure 3-5.

Figure3.JPG Figure4.JPG Figure5.JPG

Figure 3-5: Output statistics from command prompt.

After the run finishes, the Output folder will contain the AssertionsOutputIdentities.idty, Identity Change Report.txt, AssertionsOutputIdentities.emap, and AssertionsOutputIdentities.indx files as shown in Figure 6

Figure6.JPG

OYSTER creates no link index file when running in SplitStr Assertion mode.

SplitStr Assertion runs update an identity output file and store it in the StrSplitAssertionOutputIdentities.idty file. This file is the updated Identity Knowledge Base that can be updated and maintained in future runs. The contents of this file are shown in Figure 7.

Figure7.1.JPG Figure7.2.JPG Figure7.3.JPG

Figure 7: StrSplitAssertionOutputIdentities.idty

Note that in the above run, no rules were defined but through SplitStr Assertion, split identities were assigned a NegStrStr value which keeps these references from ever matching on following runs. You will also note that as we continue to update the identity knowledgebase that was originally created by the Identity Capture run, the Modification history now shows the original creation, the Identity Update run, and the RefToStr Assertion run, the StrToStr Assertion run, and the current StrSpilt Assertion run.

You will notice that the above run caused a signle identity with three references to be split into three seperat identities. This is due to how the source input was created. If we wanted four of the reference to stay in the same identity structure then we could have used the source input shown in Figure 8.

Figure8.jpg

The Identity Change Report for this run is shown in Error! Reference source not found.

Figure9.jpg

You may replace the input data and edit the StrSplitSourceDescriptor.xml, StrSplitAttributes.xml, and StrSplitAssertionRunScript.xml files to correspond to your new data. Detailed information for each of the XML configurations can be found in the OYSTER Reference Guide.

In this scenario we removed two of the references from an existing EIS and forced them to be placed into their own EISs. This configuration is used to remove references from an EIS in which it has falsely been matched.

It is important to reiterate that the Merge-Purge run is used as a solely standalone configuration that identifies matches with a source. The RefToRef Assertion run and the Identity Capture Runs are used to create an initial knowledgebase. Lastly, the only runs that can be performed on an existing identity knowledgebase are the Identity Update run, the RefToStr Assertion run, the StrToStr Assertion run, and the current StrSpilt Assertion run.

Back to OYSTER Demonstration Run page

Updated