Wiki

Clone wiki

OYSTER / Reference_to_Reference_Configuration

Reference_to_Reference_Configuration

When OYSTER is configured to build identities from an assertion file input identities are optional and if specified the new asserted identities will be added into the existing idty structure. A link file and output identity file must be specified. This is shown in Figure 57.

Figure 57.PNG

OYSTER can use the identity capture architecture to build a set of identities from a set of assertions. Reference to Reference Assertions represent knowledge about one or more known entity identities. The identities built through this process can be used as an input when performing Identity Resolution or Identity Update to force a match based on the previous knowledge represented by the assertions.

Lastly, the RunMode should be set to “AssertRefToRef” as shown in Figure 58.

Figure 58.PNG

Example

Running OYSTER in the Reference to Reference Assertions Configuration allows identity information to be asserted, preserved, and input into later processes (OYSTER runs) that run in the Identity Resolution or Identity Update Configuration. These identities can be built from a set of assertion sources that represent knowledge about the entities.

For this example, the test source file is named ‘AssertionsSource.txt’, shown in Figure 59. This data consists of four references; each reference is constructed from the following attributes:

  • RefID
  • FirstName
  • LastName
  • DOB
  • SchCode
  • Assert

Figure 59.PNG

Note that based on previous knowledge, an Assert attribute has been added to the source records. The Assert attribute should match for records that are known to represent the same entity. Since a reference to reference assertion run is based off of previous knowledge of the references there is no need to analyze the source data. Based on the knowledge about the source references, the source descriptor file can be created. Using this source file information the source descriptor file, named “AssertionsSourceDescriptor.xml”, can be created. This file is shown in Figure 60.

Figure 60.PNG

Note that when creating the source descriptor for an assertion run, as mentioned earlier, an Assert attribute is added to each record to represent the previous knowledge. To identify to OYSTER that this is a RefToRef assertion run there is a predefined key word that must be assigned as the value of the Attribute attribute of the Assert attribute. This keyword is @AssertRefToRef. By looking at Figure 60 you can see that the @AssertRefToRef keyword was used. The @AssertRefToRef keyword forces OYSTER to use RefToRef assertion logic on the source input and to ignore any user defined matching rules. Matching will only occur if the Assert attribute in the source file are the same for multiple records.

Following the same process as was performed in the previous two examples, once the source descriptor is defined the source attributes file must also be defined. This file is stored in the Source folder along with the Source Descriptor file. The attributes file is used to define the attributes in the source along with the algorithms used to compare the attributes and the matching (Identity) rules used when performing ER. For this example run no matching rules will be identified. Instead, as mentioned earlier, the matching will depend solely on the values of the Assert attribute.

The source attribute file is named ‘AssertionsAttributes.xml’ and is depicted in Figure 61.

Figure 61.PNG

The defined attributes match the number of distinct values assigned by the Attribute value in the source descriptor. You may also note that there is no rule defined for this run as mentioned earlier but the Rule tag must still be include or the OYSTER run will fail.

As with the previous two examples, the last file that needs to be created is the RunScript for this example. For the attributes example, no input identity file should be specified in the Run Script but both the output identity file and the link files should be specified. The Run Script should again be stored in the root OYSTER folder as this is where the OYSTER program is expecting the file to reside. The file for this sample is named ‘RefToRefAttributesRunScript.xml’ and is shown in Figure 62.

Figure 62.PNG

Now that all the scripts for the Assertions example have been created we can run OYSTER. This process is depicted in Figure 37, Figure 38, and Figure 39 and described in their surrounding text in the Example section.

Once the run is complete the output for the run will be written to the command box by OYSTER. This output is shown in Figure 63 and Figure 64.

Figure 63.png

Above is the figure 63: Output to Command Box Generated by OYSTER Run - 1.

Figure 64.PNG

The statistics for this run may be slightly confusing. According to the statistics, OYSTER processed the 0 records and found they belong to 2 real-world identities, shown in Figure 65. This is due to this being an Assertions run and the references were asserted into equivalence, not matched. This figure also shows that no rules were used for matching and instead all matching was done through assert.

Figure 65.PNG

The entire point of this RefToRef assertion run is to build a set of identities that can be used as input when performing Identity Resolution or Identity Capture. These identities are constructed through the use of previous knowledge about the references. As shown in Figure 66 these 4 references resolve to 2 identities. By assigning the Assert attribute the @AssertRefToRef Attribute value in the source descriptor, it forced OYSTER to match the records with no regard to the other attribute values of the record.

Figure 66.PNG

As with the previous examples, this sample run was done using a delimited text file. Examples of how to connect to a Fixed Width text file, a Microsoft Access DB, MySQL, and Microsoft SQLServer can be seen in the OYSTER Reference Guide.

Previous to 7 - Identity Build from Assertions Page ........................................................ Next to Reference to Structure Configuration Page

Back to OYSTER User Guide Page

Updated