Transformations Tree

Introduction to Transformations

A transformation is a metadata object that specifies how to extract data, transform data, or load data into data stores. Each transformation that you specify in a process flow diagram generates or retrieves SAS code. You can also specify user-written code in the metadata for any transformation in a process flow diagram.

Overview of the Transformations Tree

The Transformations tree organizes transformations into a set of folders. You can drag a transformation from the Transformations tree to the Job Editor, where you can connect it to source and target tables and update its default metadata. By updating a transformation with the metadata for actual sources, targets, and transformations, you can quickly create process flow diagrams for common scenarios. The following display shows the standard Transformations tree.
Transformations Tree
Transformations Tree
This document has an example of the main transformations used in SAS Data Integration Studio, and the online Help has an example of all transformations. The following sections describe the contents of the Transformations tree folders.

Access Folder

The following table describes the transformations in the Access folder in the Transformations tree.
Access Folder Transformations
Name
Description
File Reader
Reads an external file and writes to a SAS or DBMS table.
File Writer
Reads a SAS or DBMS table and writes to an external file.
Library Contents
Generates an output table that lists the tables contained in an input library. If there is no input library, then the transformation generates a list of tables from all of the libraries that are allocated on the SAS Workspace Server.
Microsoft Queue Reader
Delivers content from a Microsoft MQ message queue to SAS Data Integration Studio. If the message is being sent into a table, the message queue content is sent to a table or a SAS Data Integration Studio transformation. If the message is being sent to a macro variable or file, then these files or macro variables can be referenced by a later step.
Microsoft Queue Writer
Enables writing files in binary mode, tables, or structured lines of text to the WebSphere MQ messaging system. The queue and queue manager objects necessary to get to the messaging system are defined in SAS Management Console.
SPD Server Table Loader
Reads a source and writes to a SAS SPD Server target. Enables you to specify options that are specific to SAS SPD Server tables.
Table Loader
Reads a source table and writes to a target table. Provides more loading options than other transformations that create tables.
Websphere Queue Reader
Delivers content from a WebSphere MQ message queue to SAS Data Integration Studio. If the message is being sent into a table, the message queue content is sent to a table or a SAS Data Integration Studio transformation. If the message is being sent to a macro variable or file, then these files or macro variables can be referenced by a later step.
Websphere Queue Writer
Enables writing files in binary mode, tables, or structured lines of text to the WebSphere MQ messaging system. The queue and queue manager objects necessary to get to the messaging system are defined in SAS Management Console.
XML Writer
Puts data into an XML table. In a SAS Data Integration Studio job, if you want to put data into an XML table, you must use an XML Writer transformation. You cannot use the Table Loader transformation to load an XML table, for example.

Analysis Folder

The following table describes the transformations in the Analysis folder in the Transformations tree.
Analysis Folder Transformations
Name
Description
Correlations
Creates an output table that contains correlation statistics.
Distribution Analysis
Creates an output table that contains a distribution analysis.
Frequency
Creates an output table that contains frequency information.
One-Way Frequency
Creates a one-way output table that contains frequency information about the relationship between two classification variables.
Summary Statistics
Creates an output table that contains summary statistics.
Summary Tables
Creates an output table that contains descriptive statistics in tabular format, using some or all of the variables in a data set. It computes many of the same statistics that are computed by other descriptive statistical procedures such as MEANS, FREQ, and REPORT.

Archived Folder

In order to support backwards compatibility for existing processes and guarantee that processes run exactly as defined using older transformations, SAS has developed a methodology for archiving older versions of transformations in the Process library. The process library continues to surface the archived transformations for some number of releases. When a job is opened that contains a newer transformation replacement, a dialog box displays that indicates the name of the old transformation. The dialog box also provides the name and location of the new transformation in the process library tree.
The following table describes the deprecated and archived transformations in the Archived Transforms folder in the Transformations tree.
Archived Transforms Folder Transformations
Name
Description
Fact Table Lookup
Loads source data into a fact table and translates business keys into generated keys.
This older transformation is marked with a flag on its icon. This flag indicates that the transformation is an older version of an updated transformation.

Change Data Capture Folder

Change data capture (CDC) is a process that shortens the time required to load data from relational databases. The CDC loading process is more efficient because the source table contains changed data only. The changed data table is much smaller than the relational base table. The following table describes the transformations in the Change Data Capture folder in the Transformations tree.
Change Folder Transformations
Name
Description
Attunity CDC
Loads changed data only from Attunity and other selected databases.
DB2 CDC
Loads changed data only from DB2 databases.
General CDC
Loads changed data only from a wide range of databases.
Oracle CDC
Loads changed data only from Oracle databases.

Control Folder

The following table describes the transformations in the Control folder in the Transformations tree.
Control Folder Transformations
Name
Description
Loop
Marks the beginning of the iterative processing sequence in an iterative job.
Loop End
Marks the end of the iterative processing sequence in an iterative job.
Return Code Check
Provides status-handling logic at a desired point in the process flow diagram for a job. Can be inserted between existing transformations and removed later without affecting the mappings in the original process flow.

Data Folder

The following table describes the transformations in the Data Transforms folder in the Transformations tree.
Data Folder Transformations
Name
Description
Append
Creates a single target table by combining data from several source tables.
Data Transfer
Moves data directly from one machine to another. Direct data transfer is more efficient than the default transfer mechanism.
Data Validation
Cleanses data before it is added to a data warehouse or data mart.
Extract
Selects multiple sets of rows from a source and writes those rows to a target. Typically used to create one subset from a source. Can also be used to create columns in a target that are derived from columns in a source.
Key Effective Date
Enables change tracking in intersection tables.
Lookup
Loads a target with columns taken from a source and from several lookup tables.
Mining Results
Integrates a SAS Enterprise Miner model into a SAS Data Integration Studio data warehouse. Typically used to create target tables from a SAS Enterprise Miner model.
Rank
Ranks one or more numeric column variables in the source and stores the ranks in the target.
SCD Type 2 Loader
Loads source data into a dimension table, detects changes between source and target rows, updates change tracking columns, and applies generated key values. This transformation implements slowly changing dimensions.
Sort
Reads data from a source, sorts it, and writes the sorted data to a target.
Splitter
Selects multiple sets of rows from one source and writes each set of rows to a different target. Typically used to create two or more subsets of a source. Can also be used to create two or more copies of a source.
SQL Join
Selects multiple sets of rows from one or more sources and writes each set of rows to a single target. Typically used to merge two or more sources into one target. Can also be used to merge two or more copies of a single source.
Standardize
Creates an output table that contains data standardized to a particular number.
Surrogate Key Generator
Loads a target, adds generated whole number values to a surrogate key column, and sorts and saves the source based on the values in the business key column or columns.
Transpose
Creates an output table that contains transposed data.
User Written Code
Retrieves a user-written transformation. Can be inserted between existing transformations and removed later without affecting the mappings in the original process flow. Can also be used to document the process flow for the transformation so that you can view and analyze the metadata for a user-written transformation, similarly to how you can analyze metadata for other transformations.

Data Quality Folder

The following table describes the transformations in the Data Quality folder in the Transformations tree.
Name
Description
Apply Lookup Standardization
Applies one or more schemes to one or more columns in a source table. Applying schemes modifies your source data according to rules that are defined in the schemes.
Create Match Code
Establish relationships between source rows. You can create match codes at specified levels of sensitivity. You can also assign cluster numbers to groups of source rows that generate the same match codes.
DataFlux IS Job
Executes jobs services on DataFlux Integration Servers from DataFlux, a SAS company. Used to cleanse larger amounts of source data.
DataFlux IS Service
Executes services on DataFlux Integration Servers from DataFlux, a SAS company. Used to synchronously process smaller amounts of data, in coordination with client applications that await a response from the server.

Data Transforms Folder

The Data Transforms folder contains any transformations that have been created with the Transformation Generator wizard and not assigned to a transformation category. The folder is displayed only when a generated transformation is present. It is displayed only to other users when the generated transformations are placed in the Shared Data folder.

Output Folder

The following table describes the transformations in the Output folder in the Transformations tree.
Output Folder Transformations
Name
Description
List Data
Creates an HTML report that contains selected columns from a source table.

Publish Folder

The following table describes the transformations in the Publish folder in the Transformations tree.
Publish Folder Transformations
Name
Description
Publish to Archive
Creates an HTML report and an archive of the report.
Publish to Email
Creates an HTML report and e-mails it to a designated address.
Publish to Queue
Creates an HTML report and publishes it to a queue using MQSeries.

SPD Server Dynamic Cluster Folder

The following table describes the transformations in the SPD Server Dynamic Cluster folder in the Transformations tree.
SPD Server Dynamic Cluster Folder Transformations
Name
Description
Create or Add to a Cluster
Creates or updates an SPD Server cluster table.
List Cluster Contents
Lists the contents of an SPD Server cluster table.
Remove Cluster Definition
Deletes an SPD Server cluster table.