DataFlux Data Management Studio 2.8: User Guide

Surviving Record Identification Node

You can add a Surviving Record Identification node to a data job to examine clustered data and determine a surviving record for each cluster. This surviving record identification (SRI) process lets you eliminate duplicate information in a data source. The surviving record is identified using one or more user-configurable record rules. Once you have added the node, you can double-click it to open its properties dialog. The properties dialog includes the following elements:

Name - Specifies a name for the node.

Notes - Enables you to open the Notes dialog. You use the dialog to enter optional details or any other relevant information for the input.

Cluster ID Field - Specifies the input field that contains the cluster identifier for the incoming data.

Options - Displays the Options dialog, where you can set the following options:

The Record rules section of the dialog contains rules that are used to determine which record in the cluster should be chosen as the surviving record. The section includes the following elements:

Record rules table - Displays the current record rules that are associated with the selected cluster ID.

Add - Displays the Add Record Rule Expression dialog, which enables you to add record rules.

Edit - Enables you to modify a record rule.

Stop processing after first rule yields records - When selected, stops processing if any rule generates surviving record results. This option processes sub-rules that accompany the first rule.

The first record in the cluster is chosen as the surviving record when no rules produce a single surviving record.

The Output fields section of the dialog includes the following elements:

Available - Displays the fields that you can make available for the next step in your data job. Items displayed in this list are dependent on your data sources and any preceding steps in your data job.

Selected - Displays the fields that will be made available to the next node in your data job.

Field Rules - Enables you to access the Field Rules dialog, where you can create and maintain field rules that match rules and expressions. These rules enable you to substitute values into the specific fields of the chosen surviving record.

You can click Add to access the Add Field Rule dialog, which enables you to select and build field rule expressions that consist of fields and conditions that govern their behavior. These field rules can be constructed in the Add Field Rule Expression dialog.

Note Note: If fields examined by “Minimum” or “Shortest” functions contain NULL values, the first such field/row will always be selected (NULL value will be used over non-NULL data).

These rules are used to determine which value from all of the cluster record values for one or more given fields should be assigned to the field in the surviving record. Note that the updated values are passed along as part of the node's output in the data flow. It is up to a subsequent node in the data flow to do something meaningful with these values.

If no rules yield a row from which to use a given field's value, then the value in the field of the already chosen surviving record is retained. However, if the rules yield multiple equal candidates from which to use a given field’s value, then the value from the first candidate in the list is chosen. For example, consider the rule Highest occurrence of Field1 and the values listed in the following table:

RowID Field1
0 A
1 B
2 C
3 B
4 C


When this rule is applied, the candidates in the following table are created:

RowID Field1
1 B
2 B
3 C
4 C


The value B is used in this case because it is the first candidate among equal candidates in the list.

You can access the following advanced properties by right-clicking the Surviving Record Identification node:

Documentation Feedback:
Note: Always include the Doc ID when providing documentation feedback.

Doc ID: dfU_PFInt_SurvRecdID.html