Cluster Diff Node

DataFlux Data Management Studio 2.5: User Guide

Cluster Diff Node

You can add a Cluster Diff node to a data job to compare sets of clustered records. The Cluster Diff node takes inputs from each of two tables. The two tables are referred to as a "left" and a "right" table. From each table, the node takes two inputs: a record ID field and a cluster number field. If a record in the left table does not have a matching record in the right table, it is marked as "deleted." If a record in the right table does not have a matching record in the left, it is marked as "added."

Once you have added the node, you can double-click it to open its properties dialog. The properties dialog includes the following elements:

Name - Specifies a name for the node.

Notes - Enables you to open the Notes dialog. You use the dialog to enter optional details or any other relevant information for the input.

The Left table and Right table sections of the dialog includes the following elements:

Record ID - Select the field that contains the unique ID for records in the left table. This ID field needs to be numeric (not a string field). Use the Sequencer (Autonumber) node to set up a unique numeric identifier if your data does not already have one. Note that there must be a one-to-one correspondence between the record IDs in the left and right tables. To achieve this configuration, it is recommended that you generate record IDs for your data and then use a Branch node to route data into left-hand and right-hand sides before setting up your clustering criteria.

Cluster number - Select the field on that contains the left table cluster numbers by using this drop-down box. The cluster numbers should have been generated by a Clustering node.

The Output field section of the dialog includes the following elements:

Diff set - Enables you to enter the name of the field that contains the Diff Set. The Diff Set number labels sets of records that have a relationship between the left and right table clusters.

Diff type - Enables you to enter the name of the field that contains the Diff Type indicator. The Diff Type value describes the type of change seen in the cluster number comparison. Possible values include COMBINE, DIVIDE, and NETWORK.

Skip rows with "same" diff type - When selected, prevents records with the Diff Type value of "." from being output by the Cluster Diff node. This node is useful when you only want to see information about different record clusters in your output.

Additional Outputs - Displays the Additional Outputs dialog. This dialog enables you to specify the fields that you can make available to the next node in your data job.

You can access the following advanced properties by right-clicking the Cluster Diff node:

Documentation Feedback: yourturn@sas.com
Note: Always include the Doc ID when providing documentation feedback.

Doc ID: dfU_PFInt_ClustDiff.html