Working with Transformations :: SAS(R) Data Integration Studio 4.5: User's Guide

Introduction to Transformations

You want to select the right transformation to perform a specific task. The transformation enables you to include that task in a SAS Data Integration Studio job flow.

A transformation is a metadata object that specifies how to extract data, transform data, or load data into data stores. Each transformation that you specify in a process flow diagram generates or retrieves SAS code. You can also specify user-written code in the metadata for any transformation in a process flow diagram.

Overview of the Transformations Tree

The Transformations tree organizes transformations into a set of folders. You can drag a transformation from the Transformations tree to the Job Editor, where you can connect it to source and target tables and update its default metadata. By updating a transformation with the metadata for actual sources, targets, and transformations, you can quickly create process flow diagrams for common scenarios. The following display shows the standard Transformations tree.

Transformations Tree

This document has an example of the main transformations used in SAS Data Integration Studio, and the online Help has an example of all transformations. The following sections describe the contents of the Transformations tree folders.

Access Folder

The following table describes the transformations in the Access folder in the Transformations tree.

Access Folder Transformations

Name	Description
DB2 Bulk Table Loader	Used to bulk load SAS and most DBMS source tables to a DB2 target table. For more information, see About the DB2 Bulk Table Loader.
File Reader	Reads an external file and writes to a SAS or DBMS table. For more information, see Using an External File in the Process Flow for a Job.
File Writer	Reads a SAS or DBMS table and writes to an external file. For more information, see Using an External File in the Process Flow for a Job..
Library Contents	Generates an output table that lists the tables contained in an input library. If there is no input library, then the transformation generates a list of tables from all of the libraries that are allocated on the SAS Workspace Server. For more information, see Creating a Control Table.
Microsoft Queue Reader	Delivers content from a Microsoft MQ message queue to SAS Data Integration Studio. If the message is being sent into a table, the message queue content is sent to a table or a SAS Data Integration Studio transformation. If the message is being sent to a macro variable or file, then these files or macro variables can be referenced by a later step. For more information, see Processing a Microsoft Queue.
Microsoft Queue Writer	Enables writing files in binary mode, tables, or structured lines of text to the WebSphere MQ messaging system. The queue and queue manager objects necessary to get to the messaging system are defined in SAS Management Console. For more information, see Processing a Microsoft Queue.
Oracle Bulk Table Loader	Enables bulk loading of SAS or Oracle source data into an Oracle target. For more information, see About the Oracle Bulk Table Loader Transformation.
SPD Server Table Loader	Reads a source and writes to a SAS SPD Server target. Enables you to specify options that are specific to SAS SPD Server tables. For more information, see About the SPD Server Table Loader Transformation.
Table Loader	Reads a source table and writes to a target table. Provides more loading options than other transformations that create tables. For more information, see About the Table Loader Transformation.
Teradata Table Loader	Enables you to set table options unique to Teradata tables and supports the pushdown feature that enables you to process relational database tables directly on the appropriate relational database server. For more information, see Teradata Table Loader Transformation
Websphere Queue Reader	Delivers content from a WebSphere MQ message queue to SAS Data Integration Studio. If the message is being sent into a table, the message queue content is sent to a table or a SAS Data Integration Studio transformation. If the message is being sent to a macro variable or file, then these files or macro variables can be referenced by a later step. For more information, see Processing a WebSphere Queue.
Websphere Queue Writer	Enables writing files in binary mode, tables, or structured lines of text to the WebSphere MQ messaging system. The queue and queue manager objects necessary to get to the messaging system are defined in SAS Management Console. For more information, see Processing a WebSphere Queue.
XML Writer	Puts data into an XML table. In a SAS Data Integration Studio job, if you want to put data into an XML table, you must use an XML Writer transformation. For example, you cannot use the Table Loader transformation to load an XML table. For more information, see Converting a SAS or DBMS Table to an XML Table.

Analysis Folder

The following table describes the transformations in the Analysis folder in the Transformations tree.

Analysis Folder Transformations

Name	Description
Correlations	Creates an output table that contains correlation statistics. For more information, see Creating a Correlation Analysis.
Distribution Analysis	Creates an output table that contains a distribution analysis. For more information, see Creating a Distribution Analysis.
Forecasting	Enables you to run the High-Performance Forecasting procedure (PROC HPF) against a warehouse data store. PROC HPF provides a quick and automatic way to generate forecasts for many sets of time series or transactional data. For more information, see Generating Forecasts.
Frequency	Creates an output table that contains frequency information. For more information, see Frequency of Eye Color By Hair Color Crosstabulation.
One-Way Frequency	Creates a one-way output table that contains frequency information about the relationship between two classification variables. For more information, see One-Way Frequency of Eye Color By Region.
Summary Statistics	Creates an output table that contains summary statistics. For more information, see Creating Summary Statistics for a Table.
Summary Tables	Creates an output table that contains descriptive statistics in tabular format, using some or all of the variables in a data set. It computes many of the same statistics that are computed by other descriptive statistical procedures such as MEANS, FREQ, and REPORT. For more information, see Creating a Summary Tables Report from Table Data.

Archived Folder

In order to support backwards compatibility for existing processes and guarantee that processes run exactly as defined using older transformations, SAS has developed a methodology for archiving older versions of transformations in the Process library. The process library continues to surface the archived transformations for some number of releases. When a job is opened that contains a newer transformation replacement, a dialog box is displayed and indicates the name of the old transformation. The dialog box also provides the name and location of the new transformation in the process library tree.

The following table describes the deprecated and archived transformations in the Archived Transforms folder in the Transformations tree.

Archived Transforms Folder Transformations

Name	Description
Fact Table Lookup	Loads source data into a fact table and translates business keys into generated keys. This older transformation is marked with a flag on its icon. This flag indicates that the transformation is an older version of an updated transformation. For information about the current version, see About Fact Tables.

Change Data Capture Folder

Change data capture (CDC) is a process that shortens the time required to load data from relational databases. The CDC loading process is more efficient because the source table contains changed data only. The changed data table is much smaller than the relational base table. The following table describes the transformations in the Change Data Capture folder in the Transformations tree.

Change Folder Transformations

Name	Description
Attunity CDC	Loads changed data only from Attunity and other selected databases. For more information, see Working with Change Data Capture.
DB2 CDC	Loads changed data only from DB2 databases. For more information, see Working with Change Data Capture.
General CDC	Loads changed data only from a wide range of databases. For more information, see Working with Change Data Capture.
Oracle CDC	Loads changed data only from Oracle databases. For more information, see Working with Change Data Capture.

Control Folder

The following table describes the transformations in the Control folder in the Transformations tree.

Control Folder Transformations

Name	Description
Loop	Marks the beginning of the iterative processing sequence in an iterative job. For more information, see Creating and Running an Iterative Job.
Loop End	Marks the end of the iterative processing sequence in an iterative job. For more information, see Creating and Running an Iterative Job.
Return Code Check	Provides status-handling logic at a desired point in the process flow diagram for a job. Can be inserted between existing transformations and removed later without affecting the mappings in the original process flow. For more information, see Perform Actions Based on the Status of a Transformation.

Data Folder

The following table describes the transformations in the Data Transforms folder in the Transformations tree.

Data Folder Transformations

Name	Description
Append	Creates a single target table by combining data from several source tables. For more information, see Creating a Table That Appends Two or More Source Tables.
Business Rules	Enables you to use the business rule flow packages that are created in SAS Business Rules Manager in the context of a SAS Data Integration Studio job. You can import business rule flows, specify flow versions, map source table columns to required input columns, and set business rule options. The Business Rules transformation enables you to map your source data and output data into and out of the rules package. Then, the SAS Data Integration Studio job applies the rules to your data as it is run. When you run a job that includes a rules package, statistics are collected, such as the number of rules that were triggered, and the number of invalid and valid data record values. You can use this information to further refine your data as it flows through your transformation logic. For more information, see Using a Business Rule Flow in a Job.
Compare Tables	Enables you to detect changes between two tables such as an update table and a master table and generate a variety of output for matched and unmatched records. For more information, see Comparing Tables.
Data Transfer	Moves data directly from one machine to another. Direct data transfer is more efficient than the default transfer mechanism. For more information, see Moving Data Directly from One Machine to Another Machine.
Data Validation	Cleanses data before it is added to a data warehouse or data mart. For more information, see Validating Product Data.
Key Effective Date	Enables change tracking in intersection tables. For more information, see Tracking Changes in Source Datetime Values.
Lookup	Loads a target with columns taken from a source and from several lookup tables. For more information, see Loading a Fact Table Using Dimension Table Lookup.
Mining Results	Integrates a SAS Enterprise Miner model into a SAS Data Integration Studio data warehouse. Typically used to create target tables from a SAS Enterprise Miner model. For more information, see Integrating a SAS Enterprise Miner Model with Existing SAS Data.
Rank	Ranks one or more numeric column variables in the source and stores the ranks in the target. For more information, see Create a Table That Ranks the Contents of a Source.
SCD Type 1 Loader	Enables you to load a dimension table using type 1 updates. Type 1 updates insert new rows, update existing rows, and generate surrogate key values in a dimension table without maintaining a history of data changes. Each business key is represented by a single row in the dimension table. For more information, see Loading a Dimension Table with Type 1 Updates.
SCD Type 2 Loader	Loads source data into a dimension table, detects changes between source and target rows, updates change tracking columns, and applies generated key values. This transformation implements slowly changing dimensions. For more information, see Loading a Dimension Table with Type 1 and 2 Updates.
Sort	Reads data from a source, sorts it, and writes the sorted data to a target. For more information, see Creating a Table That Contains the Sorted Contents of a Source.
Splitter	Selects multiple sets of rows from one source and writes each set of rows to a different target. Typically used to create two or more subsets of a source. Can also be used to create two or more copies of a source. For more information, see Create Two Tables That Are Subsets of a Source.
Standardize	Creates an output table that contains data standardized to a particular number. For more information, see Creating Standardized Statistics from Table Data.
Surrogate Key Generator	Loads a target, adds generated whole number values to a surrogate key column, and sorts and saves the source based on the values in the business key column or columns. For more information, see Loading a Table and Adding a Surrogate Primary Key.
Transpose	Creates an output table that contains transposed data. For more information, see Creating Transposed Data from Table Data.
User Written Code	Retrieves a user-written transformation. Can be inserted between existing transformations and removed later without affecting the mappings in the original process flow. Can also be used to document the process flow for the transformation so that you can view and analyze the metadata for a user-written transformation, similarly to how you can analyze metadata for other transformations. For more information, see Adding a User Written Code Transformation to a Job.

Data Quality Folder

The following table describes the transformations in the Data Quality folder in the Transformations tree. In general, you can use Apply Lookup Standardization, Create Match Code, and Standardize with Definition for data cleansing operations. You can use DataFlux Batch Job and DataFlux Data Service to perform tasks that are a specialty of DataFlux software, such as profiling, monitoring, or address verification.

Name	Description
Apply Lookup Standardization	Enables you to select and apply DataFlux schemes that standardize the format, casing, and spelling of character columns in a source table. For more information, see Standardizing Values with a Standardization Scheme.
Create Match Code	Enables you to analyze source data and generate match codes based on common information shared by clusters of records. Comparing match codes instead of actual data enables you to identify records that are in fact the same entity, despite minor variations in the data. For more information, see Using Match Codes to Improve Record Matching.
DataFlux Batch Job	Enables you to select and execute a DataFlux job that is stored on a DataFlux Data Management Server. You can execute DataFlux Data Management Studio data jobs, process jobs, and profiles. You can also execute Architect jobs that were created with DataFlux® dfPower® Studio. For more information, see Executing a DataFlux Job or Profile from SAS Data Integration Studio.
DataFlux Data Service	Enables you to select and execute a data job that has been configured as a real-time service and deployed to a DataFlux Data Management Server. For more information, see Using a DataFlux Data Service in a Job.
Standardize with Definition	Enables you to select and apply DataFlux standardization definitions to elements within a text string. For example, you might want to change all instances of “Mister” to “Mr.” but only when “Mister” is used as a salutation. For more information, see Standardizing Values with a Definition.

Hadoop Folder

The following table describes the transformations in the Hadoop folder in the Transformations tree. Apache Hadoop is an open-source software project that supports scalable, distributed computing.

Name	Description
Hadoop Container	Enables you to connect all the sources and targets for the various steps in a container step. This container step allows for one connection to the Hadoop Cluster in the context of a SAS Data Integration Studio job. Then, all of the steps that are included in the container are submitted during the connection. For more information, see Creating a Hadoop Container Job.
Hadoop File Reader	Supports reading files from the Hadoop Cluster into SAS in the context of a SAS Data Integration Studio job.
Hadoop File Writer	Supports writing files to the Hadoop Cluster and into SAS in the context of a SAS Data Integration Studio job.
Hive	Supports submitting Hive code to the Hadoop Cluster in the context of a SAS Data Integration Studio job. Hive is a data warehouse system for Hadoop. You can easily summarize data, run ad hoc queries, and generate the analysis of large data sets stored in Hadoop compatible file systems. Hive also enables you to project structure onto this data and query the data by using an SQL-like language called HiveQL. For more information, see Creating a Hive Job.
Map Reduce	Supports submitting Map Reduce code to the Hadoop Cluster in the context of a SAS Data Integration Studio job. Hadoop Map Reduce enables you to write applications that reliably process vast amounts of data in parallel on large clusters. A Map Reduce job or step splits the input data-set into chunks that are processed by the map tasks in parallel. The outputs of the maps are sorted and then input to the reduce tasks. The input and the output of the job are typically stored in a file system. For an example of Map Reduce processing in a Hadoop container job, see Creating a Hadoop Container Job.
Pig	Supports submitting Pig code to the Hadoop Cluster in the context of a SAS Data Integration Studio job. The transformation contains an enhanced, color-coded editor specific to the Pig Latin language. Pig Latin is a high-level language used for expressing and evaluating data analysis programs. Pig Latin supports substantial parallelization and can handle very large data sets. For more information, see Creating a Pig Job.
Transfer From Hadoop	Supports the transfer of data from the Hadoop Cluster in the context of a SAS Data Integration Studio job.
Transfer To Hadoop	Supports the transfer of data to the Hadoop Cluster in the context of a SAS Data Integration Studio job.

High-Performance Analytics Folder

The following table describes the transformations in the High-Performance Analytics folder in the Transformations tree. SAS LASR Analytic Server is a direct-access, NoSQL, NoMDX server that is engineered for maximum analytic performance through multithreading and distributed computing. SAS Data Integration Studio provides the following experimental Hadoop transformations for SAS LASR Analytic Servers:

Name	Description
SAS Data in HDFS Loader	Used to stage data into a Hadoop cluster.
SAS Data in HDFS Unloader	Loads data from Hadoop into a SAS LASR Analytic Server.
SAS LASR Analytic Server Loader	Loads data to a SAS LASR Analytic server.
SAS LASR Analytic Server Unloader	Unloads data that has previously been loaded into a SAS LASR Analytic Server.

For more information about these experimental transformations, contact SAS Technical Support.

Output Folder

The following table describes the transformations in the Output folder in the Transformations tree.

Output Folder Transformations

Name	Description
List Data	Creates an HTML report that contains selected columns from a source table. For more information, see Creating Reports from Table Data.

Publish Folder

The following table describes the transformations in the Publish folder in the Transformations tree.

Publish Folder Transformations

Name	Description
Publish to Archive	Creates an HTML report and an archive of the report. For more information, see Creating a Publish to Archive Report from Table Data.
Publish to Email	Creates an HTML report and emails it to a designated address. For more information, see Creating a Publish to Email Report from Table Data.
Publish to Queue	Creates an HTML report and publishes it to a queue using MQSeries. For more information, see Creating a Publish to Queue Report from Table Data.

SPD Server Dynamic Cluster Folder

The following table describes the transformations in the SPD Server Dynamic Cluster folder in the Transformations tree.

SPD Server Dynamic Cluster Folder Transformations

Name	Description
Create or Add to a Cluster	Creates or updates an SPD Server cluster table. For more information, see Creating an SPD Server Cluster Table.
List Cluster Contents	Lists the contents of an SPD Server cluster table. For more information, see Maintaining an SPD Server Cluster.
Remove Cluster Definition	Deletes an SPD Server cluster table. For more information, see Maintaining an SPD Server Cluster.

SQL Folder

The following table describes the transformations in the SQL folder in the Transformations tree. For more information, see Working with SQL Join Transformations and Working with Other SQL Transformations.

SQL Folder Transformations

Name	Description
Create Table	Provides a simple SQL interface for creating tables.
Delete	Generates a PROC SQL statement that deletes user-selected rows in a single target table. Supports delete, truncate, or delete with a WHERE clause. Also supports implicit and explicit pass-through.
Execute	Enables you to specify custom SQL code to be executed and provides SQL templates for supported databases.
Extract	Selects multiple sets of rows from a source and writes those rows to a target. Typically used to create one subset from a source. Can also be used to create columns in a target that are derived from columns in a source. For more information, see Extracting Data from a Source Table.
Insert Rows	Provides a simple SQL interface for inserting rows into a target table. For more information, see Inserting Rows into a Target Table.
Join	Selects multiple sets of rows from one or more sources and writes each set of rows to a single target. Typically used to merge two or more sources into one target. Can also be used to merge two or more copies of a single source. For more information, see Creating a Simple SQL Query.
Merge	Inserts new rows and updates existing rows using the SQL Merge DML command. The command was officially introduced in the SQL:2008 standard.
Set Operators	Enables you to use set operators to combine the results of table-based queries. For more information, see Using the SQL Set Operators Transformation.
Update	Updates user-selected columns in a single target table. The target columns can be updated by case, constant, expression, or subquery. Handles correlated subqueries.

Note: Some functions in the Delete, Execute, Insert Rows, Merge, and Update transformations might work only when the table comes from a database management system that provides an implementation of an SQL command for which a SAS/ACCESS interface is available. One example is sort. You can use SAS tables and tables from database management systems that do not implement the SQL command, but these command-specific functions might not work.

Ungrouped Folder

The Ungrouped folder contains any transformations that have been created with the Transformation Generator wizard and not assigned to a transformation category. The folder is displayed only when a generated transformation is present. It is displayed only to other users when the generated transformations are placed in the Shared Data folder.