Wiki

Clone wiki

OYSTER / Structure_to_Structure_Configuration

Structure_to_Structure_Configuration

When OYSTER is configured to build identities from a structure to structure assertion run, the input identities are required since asserted identities will be pulled from the existing idty structure. A link file and output identity file must also be specified. This is shown in Figure 77Figure 57.

Figure 77.PNG

OYSTER can use the identity update architecture to build a set of identities from a set of assertions. Structure to Structure Assertions represent knowledge about two or more known entity identities. The identities built through this process can be used as an input when performing Identity Resolution or Identity Update to force a match based on the previous knowledge represented by the assertions.

Lastly, the RunMode should be set to “AssertStrToStr” as shown in Figure 78.

Figure 78.PNG

Example

Running OYSTER in the Structure To Structure Assertions configuration allows identity information to be asserted, preserved, and input into later processes (OYSTER runs) that run in the Identity Resolution or Identity Update Configuration. These identities can be built from a set of assertion sources that represent knowledge about the existing entities.

For this example, the test source file is named ‘AssertionsSource.txt’, shown in Figure 79. This data consists of 2 references; each reference is constructed from the following attributes:

  • RefID
  • OID
  • AssertStrToStrLastName

Figure 79.PNG

Note that based on previous knowledge, an OID and Assert attribute has been configured to force two identities in the identity source to merge. The Assert attribute should match for identities that are known to represent the same entity. Since a Structure to Structure assertion run is based off of previous knowledge of the identities there is no need to analyze the source data. Based on the knowledge about the source references, the source descriptor file can be created. Using this source file information the source descriptor file, named “AssertionsSourceDescriptor.xml”, can be created. This file is shown in Figure 80.

Figure 80.PNG

Note that when creating the source descriptor for a Structure to Structure assertion run, as mentioned earlier, an AssertStrToStr and OID attribute is configured for each record to represent the previous knowledge. To identify to OYSTER that this is a StrToStr assertion run there is a predefined key word that must be assigned as the value of the Attribute attribute of the Assert attribute. This keyword is @AssertStrToStr. The Keyword @OID must also be specified and correspond to the attribute that represents the OYSTER IDs of the identities to be merged. The @AssertStrToStr keyword forces OYSTER to use StrToStr assertion logic on the source input and to ignore any user defined matching rules. Matching will only occur if the Assert attribute in the source file are the same for multiple records.

Following the same process as was performed in the previous two examples, once the source descriptor is defined the source attributes file must also be defined. This file is stored in the Source folder along with the Source Descriptor file. The attributes file is used to define the attributes in the source along with the algorithms used to compare the attributes and the matching (Identity) rules used when performing ER. For this example run no matching rules will be identified. Instead, as mentioned earlier, the matching will depend solely on the values of the Assert attribute.

The source attribute file is named ‘AssertionsAttributes.xml’ and is depicted in Figure 81.

Figure 81.PNG

Since a StrToStr assertion is based only on existing identities in the idty file and the attributes are specified by OYSTER keywords, the attribute file requires no attributes to be defined. You may also note that there is no rule defined for this run as mentioned earlier but the Rule tag must still be include or the OYSTER run will fail.

As with the previous two examples, the last file that needs to be created is the RunScript for this example. For the attributes example, no input identity file should be specified in the Run Script but both the output identity file and the link files should be specified. The Run Script should again be stored in the root OYSTER folder as this is where the OYSTER program is expecting the file to reside. The file for this sample is named ‘StrToStrAttributeRunScript.xml’ and is shown in Figure 82.

Figure 82.PNG

Now that all the scripts for the StrToStr Assertions example have been created we can run OYSTER. This process is depicted in Figure 37, Figure 38, and Figure 39 and described in their surrounding text in the Example section.

Once the run is complete the output for the run will be written to the command box by OYSTER. This output is shown in Figure 83 and Figure 84.

Figure 83.PNG

Above is the figure 83: Output to Command Box Generated by OYSTER Run - 1.

Figure 84.png

Above is the figure 84: Output to Command Box Generated by OYSTER Run - 2.

The statistics for this run may be slightly confusing. According to the statistics, OYSTER processed the 0 records and found they belong to 2 real-world identities. This is due to this being an Assertions run and the Identities were asserted into equivalence, not matched. This figure also shows that no rules were used for matching and instead all matching was done through assert. StrToStr assertion runs do not generate a link output file.

The entire point of this StrToStr assertion run is to merge a set of identities. These identities are constructed through the use of previous knowledge about the references contained in the Identities. As shown in Figure 85 these 4 references resolve to 2 identities. By assigning the Assert attribute the @AssertStrToStr Attribute value in the source descriptor, it forced OYSTER to match the records with no regard to the other attribute values of the record.

Figure 85.PNG

As with the previous examples, this sample run was done using a delimited text file. Examples of how to connect to a Fixed Width text file, a Microsoft Access DB, MySQL, and Microsoft SQLServer can be seen in the OYSTER Reference Guide.

Previous to Reference to Structure Configuration Page ..................................................... Next to Structure Split Configuration Page

Back to OYSTER User Guide Page

Updated