Overview of Transformations :: SAS® Data Integration Studio 4.902: User's Guide

Introduction to Transformations

You want to select the right transformation to perform a specific task. The transformation enables you to include that task in a SAS Data Integration Studio job flow.

A transformation is a metadata object that specifies how to extract data, transform data, or load data into data stores. Each transformation that you specify in a process flow diagram generates or retrieves SAS code. You can also specify user-written code in the metadata for any transformation in a process flow diagram.

Overview of the Transformations Tree

The Transformations tree organizes transformations into a set of folders. You can drag a transformation from the Transformations tree to the Job Editor, where you can connect it to source and target tables and update its default metadata. By updating a transformation with the metadata for actual sources, targets, and transformations, you can quickly create process flow diagrams for common scenarios. The following display shows the standard Transformations tree.

Transformations Tree

This document has an example of the main transformations used in SAS Data Integration Studio, and the online Help has an example of all transformations. The following sections describe the contents of the Transformations tree folders.

Access Folder

The following table describes the transformations in the Access folder in the Transformations tree.

Access Folder Transformations
Name	Description
Cloud Analytic Services Transfer	Used to move data to and from the SAS Viya platform using the Cloud Analytic Services LIBNAME engine.
DB2 Bulk Table Loader	Used to bulk load SAS and most DBMS source tables to a DB2 target table. For more information, see About the DB2 Bulk Table Loader.
Download File from Amazon S3	Enables you to download files from Amazon S3. For more information, see Working with Amazon S3 Files.
File Reader	Reads an external file and writes to a SAS or DBMS table. For more information, see Using an External File in the Process Flow for a Job.
File Writer	Reads a SAS or DBMS table and writes to an external file. For more information, see Using an External File in the Process Flow for a Job..
Library Contents	Generates an output table that lists the tables contained in an input library. If there is no input library, then the transformation generates a list of tables from all of the libraries that are allocated on the SAS Workspace Server. For more information, see Creating a Control Table.
Microsoft Queue Reader	Delivers content from a Microsoft MQ message queue to SAS Data Integration Studio. If the message is being sent into a table, the message queue content is sent to a table or a SAS Data Integration Studio transformation. If the message is being sent to a macro variable or file, then these files or macro variables can be referenced by a later step. For more information, see Processing a Microsoft Queue.
Microsoft Queue Writer	Enables writing files in binary mode, tables, or structured lines of text to the WebSphere MQ messaging system. The queue and queue manager objects necessary to get to the messaging system are defined in SAS Management Console. For more information, see Processing a Microsoft Queue.
Oracle Bulk Table Loader	Enables bulk loading of SAS or Oracle source data into an Oracle target. For more information, see About the Oracle Bulk Table Loader Transformation.
REST	Enables you to use the REST approach to read from and write to a third-party web service in the context of a SAS Data Integration Studio job. For more information, see Using REST to Access a Third-Party Web Service.
sFTP	Provides a secure connection for file transfers between two hosts. For more information, see Using the sFTP Transformation to Securely Access and Transfer Data.
SOAP	Enables you to use the SAS SOAP procedure to read from and write to a third-party web service in the context of a SAS Data Integration Studio job. For more information, see Using SOAP to Access a Third-Party Web Service.
SPD Server Table Loader	Reads a source and writes to a SAS SPD Server target. Enables you to specify options that are specific to SAS SPD Server tables. For more information, see About the SPD Server Table Loader Transformation.
Table Loader	Reads a source table and writes to a target table. Provides more loading options than other transformations that create tables. For more information, see About the Table Loader Transformation.
Teradata Table Loader	Enables you to set table options unique to Teradata tables and supports the pushdown feature that enables you to process relational database tables directly on the appropriate relational database server. For more information, see Teradata Table Loader Transformation
Upload File to Amazon S3	Enables you to upload files to Amazon S3. For more information, see Working with Amazon S3 Files.
Websphere Queue Reader	Delivers content from a WebSphere MQ message queue to SAS Data Integration Studio. If the message is being sent into a table, the message queue content is sent to a table or a SAS Data Integration Studio transformation. If the message is being sent to a macro variable or file, then these files or macro variables can be referenced by a later step. For more information, see Processing a WebSphere Queue.
Websphere Queue Writer	Enables writing files in binary mode, tables, or structured lines of text to the WebSphere MQ messaging system. The queue and queue manager objects necessary to get to the messaging system are defined in SAS Management Console. For more information, see Processing a WebSphere Queue.
XML Writer	Puts data into an XML table. In a SAS Data Integration Studio job, if you want to put data into an XML table, you must use an XML Writer transformation. For example, you cannot use the Table Loader transformation to load an XML table. For more information, see Converting a SAS or DBMS Table to an XML Table.

Analysis Folder

The following table describes the transformations in the Analysis folder in the Transformations tree.

Analysis Folder Transformations
Name	Description
Correlations	Creates an output table that contains correlation statistics. For more information, see Creating a Correlation Analysis.
Distribution Analysis	Creates an output table that contains a distribution analysis. For more information, see Creating a Distribution Analysis.
Forecasting	Enables you to run the High-Performance Forecasting procedure (PROC HPF) against a warehouse data store. PROC HPF provides a quick and automatic way to generate forecasts for many sets of time series or transactional data. For more information, see Generating Forecasts.
Frequency	Creates an output table that contains frequency information. For more information, see Frequency of Eye Color By Hair Color Crosstabulation.
One-Way Frequency	Creates a one-way output table that contains frequency information about the relationship between two classification variables. For more information, see One-Way Frequency of Eye Color By Region.
Summary Statistics	Creates an output table that contains summary statistics. For more information, see Creating Summary Statistics for a Table.
Summary Tables	Creates an output table that contains descriptive statistics in tabular format, using some or all of the variables in a data set. It computes many of the same statistics that are computed by other descriptive statistical procedures such as MEANS, FREQ, and REPORT. For more information, see Creating a Summary Tables Report from Table Data.

Archived Folder

In order to support backward compatibility for existing processes and guarantee that processes run exactly as defined using older transformations, SAS has developed a methodology for archiving older versions of transformations in the Process library. The process library continues to surface the archived transformations for some number of releases. When a job is opened that contains a newer transformation replacement, a dialog box is displayed and indicates the name of the old transformation. The dialog box also provides the name and location of the new transformation in the process library tree.

The following table describes the deprecated and archived transformations in the Archived Transforms folder in the Transformations tree.

Archived Transforms Folder Transformations
Name	Description
Fact Table Lookup	Loads source data into a fact table and translates business keys into generated keys. This older transformation is marked with a flag on its icon. This flag indicates that the transformation is an older version of an updated transformation. For information about the current version, see About Fact Tables.

Change Data Capture Folder

Change data capture (CDC) is a process that shortens the time required to load data from relational databases. The CDC loading process is more efficient because the source table contains changed data only. The changed data table is much smaller than the relational base table. The following table describes the transformations in the Change Data Capture folder in the Transformations tree.

Change Folder Transformations
Name	Description
Attunity CDC	Loads changed data only from Attunity and other selected databases. For more information, see Working with Change Data Capture.
DB2 CDC	Loads changed data only from DB2 databases. For more information, see Working with Change Data Capture.
General CDC	Loads changed data only from a wide range of databases. For more information, see Working with Change Data Capture.
Oracle CDC	Loads changed data only from Oracle databases. For more information, see Working with Change Data Capture.

Control Folder

The following table describes the transformations in the Control folder in the Transformations tree.

Control Folder Transformations
Name	Description
Conditional End	Marks the end of a conditional process in a job. For more information, see Running Conditional Processes.
Conditional Start	Marks the beginning of a conditional process in a job. For more information, see Running Conditional Processes.
Fork	Marks the beginning of a separate session that allows a portion of a SAS job to be run in parallel along with another piece of SAS code. For more information, see Parallel Processing Using the Fork, Fork End, and Wait for Completion Transformations.
Fork End	Marks the end of a portion of a SAS job that was running in parallel with another portion of that job. Any code between the Fork transformation and the Fork End transformation is executed in one SAS session. For more information, see Parallel Processing Using the Fork, Fork End, and Wait for Completion Transformations.
Loop	Marks the beginning of the iterative processing sequence in an iterative job. For more information, see Creating and Running an Iterative Job.
Loop End	Marks the end of the iterative processing sequence in an iterative job. For more information, see Creating and Running an Iterative Job.
Return Code Check	Provides status-handling logic at a desired point in the process flow diagram for a job. Can be inserted between existing transformations and removed later without affecting the mappings in the original process flow. For more information, see Perform Actions Based on the Status of a Transformation.
Wait For Completion	Provides logic to allow any part or all parts of a Fork job to complete before moving on to the next processing step. Inputs are typically an output table from the Fork transformation. For more information, see Parallel Processing Using the Fork, Fork End, and Wait for Completion Transformations.

Data Folder

The following table describes the transformations in the Data Transforms folder in the Transformations tree.

Data Folder Transformations
Name	Description
Append	Creates a single target table by combining data from several source tables. For more information, see Creating a Table That Appends Two or More Source Tables.
Business Rules	Enables you to use the business rule flow packages that are created in SAS Business Rules Manager in the context of a SAS Data Integration Studio job. You can import business rule flows, specify flow versions, map source table columns to required input columns, and set business rule options. The Business Rules transformation enables you to map your source data and output data into and out of the rules package. Then, the SAS Data Integration Studio job applies the rules to your data as it is run. When you run a job that includes a rules package, statistics are collected. Statistics include the number of rules that were triggered, and the number of invalid and valid data record values. You can use this information to further refine your data as it flows through your transformation logic. For more information, see Using a Business Rule Flow in a Job.
Compare Tables	Enables you to detect changes between two tables such as an update table and a master table and generate a variety of output for matched and unmatched records. For more information, see Comparing Tables.
Data Transfer	Moves data directly from one machine to another. Direct data transfer is more efficient than the default transfer mechanism. For more information, see Moving Data Directly from One Machine to Another Machine.
Data Validation	Cleanses data before it is added to a data warehouse or data mart. For more information, see Validating Product Data.
Enterprise Decision Management	Maps physical data from an Enterprise Decision Management flow package to decision flows. The output tables attached to the transformation produce decision-making results from the mapped input data. For more information, see Generating Enterprise Decision Management Output.
Key Effective Date	Enables change tracking in intersection tables. For more information, see Tracking Changes in Source Datetime Values.
Lookup	Loads a target with columns taken from a source and from several lookup tables. For more information, see Loading a Fact Table Using Dimension Table Lookup.
Model Scoring	Integrates a SAS Enterprise Miner model into a SAS Data Integration Studio data warehouse. Typically used to create target tables from a SAS Enterprise Miner model. For more information, see Integrating a SAS Enterprise Miner Model with Existing SAS Data.
Rank	Ranks one or more numeric column variables in the source and stores the ranks in the target. For more information, see Create a Table That Ranks the Contents of a Source.
SCD Type 1 Loader	Enables you to load a dimension table using type 1 updates. Type 1 updates insert new rows, update existing rows, and generate surrogate key values in a dimension table without maintaining a history of data changes. Each business key is represented by a single row in the dimension table. For more information, see Loading a Dimension Table with Type 1 Updates.
SCD Type 2 Loader	Loads source data into a dimension table, detects changes between source and target rows, updates change tracking columns, and applies generated key values. This transformation implements slowly changing dimensions. For more information, see Loading a Dimension Table with Type 1 and 2 Updates.
Sort	Reads data from a source, sorts it, and writes the sorted data to a target. For more information, see Creating a Table That Contains the Sorted Contents of a Source.
Splitter	Selects multiple sets of rows from one source and writes each set of rows to a different target. Typically used to create two or more subsets of a source. Can also be used to create two or more copies of a source. For more information, see Create Two Tables That Are Subsets of a Source.
Standardize	Creates an output table that contains data standardized to a particular number. For more information, see Creating Standardized Statistics from Table Data.
Surrogate Key Generator	Loads a target, adds generated whole number values to a surrogate key column, and sorts and saves the source based on the values in the business key column or columns. For more information, see Loading a Table and Adding a Surrogate Primary Key.
Transpose	Creates an output table that contains transposed data. For more information, see Creating Transposed Data from Table Data.
User Written Code	Retrieves a user-written transformation. Can be inserted between existing transformations and removed later without affecting the mappings in the original process flow. Can also be used to document the process flow for the transformation so that you can view and analyze the metadata for a user-written transformation. For more information, see Adding a User Written Code Transformation to a Job.

Data Quality Folder

The following table describes the transformations in the Data Quality folder in the Transformations tree. In general, you can use Apply Lookup Standardization, Create Match Code, and Standardize with Definition for data cleansing operations. You can use DataFlux Batch Job and DataFlux Data Service to perform tasks that are a specialty of DataFlux software, such as profiling, monitoring, or address verification.

Name	Description
Apply Lookup Standardization	Enables you to select and apply DataFlux schemes that standardize the format, casing, and spelling of character columns in a source table. For more information, see Standardizing Values with a Standardization Scheme.
Create Match Code	Enables you to analyze source data and generate match codes based on common information shared by clusters of records. Comparing match codes instead of actual data enables you to identify records that are in fact the same entity, despite minor variations in the data. For more information, see Using Match Codes to Improve Record Matching.
DataFlux Batch Job	Enables you to select and execute a DataFlux job that is stored on a DataFlux Data Management Server. You can execute DataFlux Data Management Studio data jobs, process jobs, and profiles. You can also execute Architect jobs that were created with DataFlux® dfPower® Studio. For more information, see Using a DataFlux Job or Profile in a SAS Data Integration Studio Job.
DataFlux Data Service	Enables you to select and execute a data job that has been configured as a real-time service and deployed to a DataFlux Data Management Server. For more information, see Using a DataFlux Data Service in a SAS Data Integration Studio Job.
Standardize with Definition	Enables you to select and apply DataFlux standardization definitions to elements within a text string. For example, you might want to change all instances of “Mister” to “Mr.” but only when “Mister” is used as a salutation. For more information, see Standardizing Values with a Definition.

Hadoop Folder

Hadoop is an open-source technology for large data volume storage and processing. Hadoop provides scalability through the union of the Hadoop Distributed File System (HDFS), its high bandwidth and clustered storage system, and Map Reduce, its fault-tolerant, distributed processing algorithm.

The following table describes the transformations in the Hadoop folder in the Transformations tree.

Name	Description
Data Loader Directive	Runs a saved directive from SAS Data Loader for Hadoop in the context of a SAS Data Integration Studio job. The SAS Data Loader for Hadoop directive enables functions for SAS Data Loader for Hadoop that are not otherwise available in SAS Data Integration into the SAS Data Integration job. For more information, see Using a SAS Data Loader for Hadoop Saved Directive in a Job.
Hadoop Container	Enables you to use one transformation to perform a series of steps in one connection to the Hadoop cluster. The steps could include transfers to and from Hadoop, Map Reduce processing, and Pig Latin processing. For more information, see Creating a Hadoop Container Job.
Hadoop File Reader	Reads a specified file from a Hadoop Cluster.
Hadoop File Writer	Writes a specified file to a Hadoop Cluster.
Hive	Enables you to submit your own HiveQL code in the context of a job. For more information, see Creating a Hive Job.
Map Reduce	Enables you to submit your own Map Reduce code in the context of a job. You must create your own Map Reduce program in Java and save it to a JAR file. You then specify this JAR file in the Map Reduce transformation, along with some relevant arguments. Your Hadoop installation usually includes an example Map Reduce program. For an example of Map Reduce processing in a Hadoop container job, see Creating a Hadoop Container Job.
Pig	Enables you to submit your own Pig Latin code in the context of a job. For more information, see Creating a Pig Job.
Transfer From Hadoop	Transfer a specified file from a Hadoop cluster. For an example of how this transformation can be used, see Creating a Hadoop Container Job
Transfer To Hadoop	Transfer a specified file to a Hadoop cluster.

For more information about these transformations, see Working with Hadoop and SAS LASR Analytic Server.

High-Performance Analytics Folder

The Transformations tree in SAS Data Integration Studio includes a High-Performance Analytics folder. You can use these transformations to load and unload tables on a Hadoop cluster or a SAS® LASR™ Analytic Server. These transformations are typically used to support a SAS Analytics solution that includes both SAS Data Integration Studio and SAS LASR Analytic Server.

Name	Description
SAS Data in HDFS Loader	Loads a table to the file system (HDFS) on a Hadoop cluster. The source can be a SAS data set or a table in any DBMS supported by SAS. The target is a table in a SAS Data in HDFS Library.
SAS Data in HDFS Unloader	Unloads a table from HDFS. The input is a table in a SAS Data in HDFS Library.
SAS LASR Analytic Server Loader	Loads a table to memory on a SAS LASR Analytic Server. The source can be a SAS data set, a table in any DBMS supported by SAS, or a table in a SAS Data in HDFS Library. The target is an in-memory table in a SAS LASR Analytic Server Library.
SAS LASR Analytic Server Unloader	Unloads a table from memory on a SAS LASR Analytic Server. The input is an in-memory table in a SAS LASR Analytic Server Library.

For more information about these transformations, see Working with Hadoop and SAS LASR Analytic Server.

Output Folder

The following table describes the transformations in the Output folder in the Transformations tree.

Output Folder Transformations
Name	Description
List Data	Creates an HTML report that contains selected columns from a source table. For more information, see Creating Reports from Table Data.

Publish Folder

The following table describes the transformations in the Publish folder in the Transformations tree.

Publish Folder Transformations
Name	Description
Publish to Archive	Creates an HTML report and an archive of the report. For more information, see Creating a Publish to Archive Report from Table Data.
Publish to Email	Creates an HTML report and emails it to a designated address. For more information, see Creating a Publish to Email Report from Table Data.
Publish to Queue	Creates an HTML report and publishes it to a queue using MQSeries. For more information, see Creating a Publish to Queue Report from Table Data.

SPD Server Dynamic Cluster Folder

The following table describes the transformations in the SPD Server Dynamic Cluster folder in the Transformations tree.

SPD Server Dynamic Cluster Folder Transformations
Name	Description
Create or Add to a Cluster	Creates or updates an SPD Server cluster table. For more information, see Creating an SPD Server Cluster Table.
List Cluster Contents	Lists the contents of an SPD Server cluster table. For more information, see Maintaining an SPD Server Cluster.
Remove Cluster Definition	Deletes an SPD Server cluster table. For more information, see Maintaining an SPD Server Cluster.

SQL Folder

The following table describes the transformations in the SQL folder in the Transformations tree. For more information, see Working with SQL Join Transformations and Working with Other SQL Transformations.

SQL Folder Transformations
Name	Description
Create Table	Provides a simple SQL interface for creating tables.
Delete	Generates a PROC SQL statement that deletes user-selected rows in a single target table. Supports delete, truncate, or delete with a WHERE clause. Also supports implicit and explicit pass-through.
Execute	Enables you to specify custom SQL code to be executed and provides SQL templates for supported databases.
Extract	Selects multiple sets of rows from a source and writes those rows to a target. Typically used to create one subset from a source. Can also be used to create columns in a target that are derived from columns in a source. For more information, see Extracting Data from a Source Table.
Insert Rows	Provides a simple SQL interface for inserting rows into a target table. For more information, see Inserting Rows into a Target Table.
Join	Selects multiple sets of rows from one or more sources and writes each set of rows to a single target. Typically used to merge two or more sources into one target. Can also be used to merge two or more copies of a single source. For more information, see Creating a Simple SQL Query.
Merge	Inserts new rows and updates existing rows using the SQL Merge DML command. The command was officially introduced in the SQL:2008 standard.
Set Operators	Enables you to use set operators to combine the results of table-based queries. For more information, see Using the SQL Set Operators Transformation.
Update	Updates user-selected columns in a single target table. The target columns can be updated by case, constant, expression, or subquery. Handles correlated subqueries.

Note: Some functions in the Delete, Execute, Insert Rows, Merge, and Update transformations might work only when the table comes from a database management system that provides an implementation of an SQL command for which a SAS/ACCESS interface is available. One example is sort. You can use SAS tables and tables from database management systems that do not implement the SQL command, but these command-specific functions might not work.

Ungrouped Folder

The Ungrouped folder contains any transformations that have been created with the Transformation Generator wizard and not assigned to a transformation category. The folder is displayed only when a generated transformation is present. It is displayed only to other users when the generated transformations are placed in the Shared Data folder.