Surviving Record Identification Node

DataFlux Data Management Studio 2.6: User Guide

Surviving Record Identification Node

You can add a Surviving Record Identification node to a data job to examine clustered data and determine a surviving record for each cluster. This surviving record identification (SRI) process lets you eliminate duplicate information in a data source. The surviving record is identified using one or more user-configurable record rules. Once you have added the node, you can double-click it to open its properties dialog. The properties dialog includes the following elements:

Name - Specifies a name for the node.

Notes - Enables you to open the Notes dialog. You use the dialog to enter optional details or any other relevant information for the input.

Cluster ID Field - Specifies the input field that contains the cluster identifier for the incoming data.

Options - Displays the Options dialog, where you can set the following options:

Memory to use for processing each cluster - Specifies the amount of memory allocated for processing each cluster.
Keep duplicate records - When selected, passes all incoming records through to the output of the SRI step. Disable the check box to have all non-surviving records removed from the output of the step.
Surviving record ID field - Enables you to enter the name of a new output field that will contain information used to identify the surviving record in a cluster.
Use primary key as surviving record ID - When selected, places the value of the primary key field from the surviving record in the surviving record ID field of all records in the cluster. Disable the check box to have a Boolean value written to the surviving record ID field - "true" for the surviving record and "false" for the other records in the cluster.
Generate distinct surviving record - When selected, creates a new record that is a copy of the original surviving record and applies edits from field rules. This setting enables you to keep the original record and generate a distinct record.
Primary key field - Enables you to select the input field that contains the primary key values for the incoming data.

The Record rules section of the dialog contains rules that are used to determine which record in the cluster should be chosen as the surviving record. The section includes the following elements:

Record rules table - Displays the current record rules that are associated with the selected cluster ID.

Add - Displays the Add Record Rule Expression dialog, which enables you to add record rules.

Edit - Enables you to modify a record rule.

Stop processing after first rule yields records - When selected, stops processing if the first rule generates surviving record results. This option processes sub-rules that accompany the first rule.

Note that rules prefer values from the surviving record row over the first matching row if the surviving record row is part of the result set.

The Output fields section of the dialog includes the following elements:

Available - Displays the fields that you can make available for the next step in your data job. Items displayed in this list are dependent on your data sources and any preceding steps in your data job.

Selected - Displays the fields that will be made available to the next node in your data job.

Field Rules - Enables you to access the Field Rules dialog, where you can create and maintain field rules that match rules and expressions. You can click Add to access the Add Field Rule dialog, which enables you to select and build field rule expressions that consist of fields and conditions that govern their behavior. These field rules can be constructed in the Add Field Rule Expression dialog.

Note Note: If fields examined by “Minimum” or “Shortest” functions contain NULL values, the first such field/row will always be selected (NULL value will be used over non-NULL data).

These rules are used to determine which value from all of the cluster record values for one or more given fields should be assigned to the field in the surviving record. Note that the updated values are passed along as part of the node's output in the data flow. It is up to a subsequent node in the data flow to do something meaningful with these values.

You can access the following advanced properties by right-clicking the Surviving Record Identification node:

Documentation Feedback: yourturn@sas.com
Note: Always include the Doc ID when providing documentation feedback.

Doc ID: dfU_PFInt_SurvRecdID.html