What's New in SAS Data Integration Studio 4.5

Overview

The main enhancements for SAS Data Integration Studio 4.5 include the following:
  • Support for Hadoop
  • Experimental High-Performance Analytics Components
  • New Business Rules Transformation
  • Other New Features

Support for Hadoop

Hadoop is an open-source software project that supports scalable, distributed computing. The following transformations support the use of Hadoop Clusters in the context of SAS Data Integration Studio jobs:
  • The Hadoop Container transformation enables you to connect all the sources and targets for the various steps in a container step. This container step allows for one connection to the Hadoop Cluster in the context of a SAS Data Integration Studio job. Then, all of the steps that are included in the container are submitted during the connection.
  • The Hadoop File Reader and Hadoop File Writer transformations support reading and writing files from and to the Hadoop Cluster into SAS in the context of a SAS Data Integration Studio job.
  • The Hive transformation supports submitting Hive code to the Hadoop Cluster in the context of a SAS Data Integration Studio job. Hive is a data warehouse system for Hadoop. You can easily summarize data, run ad hoc queries, and generate the analysis of large data sets stored in file systems that are compatible with Hadoop. Hive also enables you to project structure onto this data and query the data by using an SQL-like language called HiveQL.
  • The Map Reduce transformation supports submitting MapReduce code to the Hadoop Cluster in the context of a SAS Data Integration Studio job. Hadoop MapReduce enables you to write applications that reliably process vast amounts of data in parallel on large clusters. A MapReduce job splits the input data set into chunks that are processed by the map tasks in parallel. The outputs of the maps are sorted and then input to the reduce tasks. The input and the output of the job are typically stored in a file system.
  • The Pig transformation supports submitting Pig code to the Hadoop Cluster in the context of a SAS Data Integration Studio job. The transformation contains an enhanced, color-coded editor specific to the Pig Latin language. Pig Latin is a high-level language used for expressing and evaluating data analysis programs. Pig Latin supports substantial parallelization and can handle very large data sets.
  • The Transfer From and Transfer To transformations support the transfer of data from and to the Hadoop Cluster in the context of a SAS Data Integration Studio job.
  • The Hadoop Monitor items in the Tools menu enable you to run reports that monitor the performance of a Hadoop Cluster.
  • The Hive Source Designer enables you to register tables in a Hive database.

Experimental High-Performance Analytics Components

SAS® LASR™ Analytic Server is a direct-access, NoSQL, NoMDX server that is engineered for maximum analytic performance through multithreading and distributed computing. SAS Data Integration Studio provides the following experimental High-Performance Analytics transformations for SAS LASR Analytic Servers:
  • The SAS Data in HDFS Loader transformation is used to stage data into a Hadoop cluster.
  • The SAS Data in HDFS Unloader transformation loads data from Hadoop into a SAS LASR Analytic Server.
  • The SAS LASR Analytic Server Loader transformation loads data to a SAS LASR Analytic server.
  • The SAS LASR Analytic Server Unloader transformation unloads data that has previously been loaded into a SAS LASR Analytic Server.
Source Designer wizards are used to register tables on the SAS Metadata Server. SAS Data Integration Studio provides the following experimental Source Designers for High-Performance Analytics tables:
  • The SAS Data in HDFS Source Designer enables you to register SAS tables in a Hadoop Cluster.
  • The SAS LASR Analytic Server Source Designer enables you to register SAS LASR Analytic tables.
For more information about these experimental components, contact SAS Technical Support.

New Business Rules Transformation

The Business Rules transformation enables you use the business rule flow packages that are created in SAS® Business Rules Manager in the context of a SAS Data Integration Studio job. You can import business rule flows, specify flow versions, map source table columns to required input columns, and set business rule options.
The Business Rules transformation enables you to map your source data and output data into and out of the rules package. Then, the SAS Data Integration Studio job applies the rules to your data as it is run. When you run a job that includes a rules package, statistics are collected, such as the number of rules that were triggered, and the number of invalid and valid data record values. You can use this information to further refine your data as it flows through your transformation logic.

Other New Features

Here are some of the most notable enhancements included in this release:
Support for SQL Server user-defined functions (UDFs) enables you to import UDFs for models registered through Model Manager for supported databases that include DB2, Teradata, and Netezza. You can also import native UDFs from Oracle, DB2, and Teradata. After the UDFs are imported, you can access them on the Functions tab of the Expression Builder window.
Performance enhancements for the SCD Type 2 Loader transform include the following:
  • the ability to use character-based columns for change tracking
  • an option to create an index on the permanent cross reference table
  • an option to specify the SPD server update technique
  • an option to sort target table records before creating the temporary cross reference table
In the past, if you selected Toolsthen selectOptionsthen selectData Quality tab and changed the DQ Setup Location, the new location could not be applied to data quality transformations in existing jobs. Now, if you change the global DQ Setup Location, you have the option to apply the new location to data quality transformations in existing jobs. To apply the global DQ Setup Location to a transformation, click the Reset DQ Setup Location button on the appropriate tab, such as the Standardization tab for the Apply Lookup Standardization transformation. The following data quality transformations support this option: Apply Lookup Standardization transformations, Standardize with Definition transformations, and Create Match Codes transformations.
A new Federation Server Source Designer enables you to register data sources that are available through a DataFlux® Federation Server. You can then access these data sources in a SAS Data Integration Studio job.
Direct lookup using a hash object in the Data Validation transformation is now supported.