Wiki

Clone wiki

OYSTER / Structure_Split_Configuration

Structure_Split_Configuration

When OYSTER is configured to build identities from a Structure Split assertion, the input identities are required. A link file and output identity file must also be specified. This is shown in Figure 86.

Figure 86.PNG

OYSTER can use the identity update architecture to split identities based on the assertion information. Structure Split Assertions represent knowledge about false positive resolutions made to one or more known entity identities. The identities built through this process can be used as an input when performing Identity Resolution or Identity Update to force a match based on the previous knowledge represented by the assertions.

Lastly, the RunMode should be set to “AssertSplitStr” as shown in Figure 87.

Figure 87.PNG

Example

Running OYSTER in the Split Structure configuration allows identity information to be asserted, preserved, and input into later processes (OYSTER runs) that run in the Identity Resolution or Identity Update Configuration. These identities can be built from a set of assertion sources that represent knowledge about the entities. This can be used to fix False Positive resolutions from previous runs.

For this example, the test source file is named ‘AssertionsSource.txt’, shown in Figure 88. This data consists of four references; each reference is constructed from the following attributes:

  • RefID
  • @RID
  • @OID
  • @AssertSplitStr

Figure 88.PNG

Note that based on previous knowledge, an Assert, RID, and OID attribute has been configured as source references. See the reference guide for details on the function of each. Since a Structure Split assertion run is based off of previous knowledge of the references there is no need to analyze the source data. Based on the knowledge about the source references, the source descriptor file can be created. Using this source file information the source descriptor file, named “AssertionsSourceDescriptor.xml”, can be created. This file is shown in Figure 89.

Figure 89.PNG

Note that when creating the source descriptor for a SplitStr assertion run, as mentioned earlier, an Assert, OID, and RID attribute are added to each record to represent the previous knowledge. To identify to OYSTER that this is a SplitStr assertion run there is a predefined key word that must be assigned as the value of the Attribute attribute of the Assert attribute. This keyword is @AssertSplitStr. The @AssertSplitStr keyword forces OYSTER to use SplitStr assertion logic on the source input and to ignore any user defined matching rules.

Following the same process as was performed in the previous two examples, once the source descriptor is defined the source attributes file must also be defined. This file is stored in the Source folder along with the Source Descriptor file. The attributes file is used to define the attributes in the source along with the algorithms used to compare the attributes and the matching (Identity) rules used when performing ER. For this example run no matching rules will be identified. Instead, as mentioned earlier, the matching will depend solely on the values of the Assert attribute.

The source attribute file is named ‘AssertionsAttributes.xml’ and is depicted in Figure 90.

Figure 90.PNG

Since the SplitStr attributes consist of only attributes specified by OYSTER keyword, no attributes need to be configured in the attribute file. You may also note that there is no rule defined for this run as mentioned earlier but the Rule tag must still be include or the OYSTER run will fail.

As with the previous two examples, the last file that needs to be created is the RunScript for this example. For the attributes example, no input identity file should be specified in the Run Script but both the output identity file and the link files should be specified. The Run Script should again be stored in the root OYSTER folder as this is where the OYSTER program is expecting the file to reside. The file for this sample is named ‘StrSplitAttributesRunScript.xml’ and is shown in Figure 91.

Figure 91.PNG

Now that all the scripts for the Assertions example have been created we can run OYSTER. This process is depicted in Figure 37, Figure 38, and Figure 39 and described in their surrounding text in the Example section.

Once the run is complete the output for the run will be written to the command box by OYSTER. This output is shown in Figure 92 and Figure 93.

Figure 92.png

Above is the Figure 92: Output to Command Bok Generated by OYSTER Run - 1

Figure 93.png

Above is the Figure 93: Output to Command Bok Generated by OYSTER Run - 2

The statistics for this run may be slightly confusing. According to the statistics, OYSTER processed the 0 records and found they belong to 5 real-world identities. This is due to this being a SplitStr Assertions run and the references were asserted to split, not matched. SplitStr runs generate no link output file.

The entire point of this SplitStr assertion run is to fix false positive resolutions made by previous OYSTER runs. As shown in Figure 94 this caused the specified identity to split into two separate identities.

Figure 94.png

Above is the Figure 94: Identity File Created for Identity Build from SplitStr Assertions.

Note that the split identities were assigned a NegStrStr value which keeps these references from ever matching on following runs.

As with the previous examples, this sample run was done using a delimited text file. Examples of how to connect to a Fixed Width text file, a Microsoft Access DB, MySQL, and Microsoft SQLServer can be seen in the OYSTER Reference Guide.

Previous to Structure to Structure Configuration Page ..................................................... Next to 8 - Identity Resolution Page

Back to OYSTER User Guide Page

Updated