What's New in SAS Data Integration Studio 4.5
         
         
            Overview
            The main enhancements
               for SAS Data Integration Studio 4.5 include the following:
               
               
                  
                     - 
                        
                     
- 
                        Experimental High-Performance Analytics
                              Components
                            
- 
                        New Business Rules Transformation
                            
- 
                        
                     
 
             
         
            Support for Hadoop
            
            Hadoop is an open-source
               software project that supports scalable, distributed computing. The
               following transformations support the use of Hadoop Clusters in the
               context of SAS Data Integration Studio jobs: 
               
               
                  
                     - 
                        The Hadoop Container transformation
                              enables you to connect all the sources and targets for the various
                              steps in a container step. This container step allows for one connection
                              to the Hadoop Cluster in the context of a SAS Data Integration Studio
                              job. Then, all of the steps that are included in the container are
                              submitted during the connection.
                            
- 
                        The Hadoop File Reader and Hadoop
                              File Writer transformations support reading and writing files from
                              and to the Hadoop Cluster into SAS in the context of a SAS Data Integration
                              Studio job.
                            
- 
                        The Hive transformation supports
                              submitting Hive code to the Hadoop Cluster in the context of a SAS
                              Data Integration Studio job. Hive is a data warehouse system for Hadoop.
                              You can easily summarize data, run ad hoc queries, and generate the
                              analysis of large data sets stored in file systems that are compatible
                              with Hadoop. Hive also enables you to project structure onto this
                              data and query the data by using an SQL-like language called HiveQL.
                            
- 
                        The Map Reduce transformation supports
                              submitting MapReduce code to the Hadoop Cluster in the context of
                              a SAS Data Integration Studio job. Hadoop MapReduce enables you to
                              write applications that reliably process vast amounts of data in parallel
                              on large clusters. A MapReduce job splits the input data set into
                              chunks that are processed by the map tasks in parallel. The outputs
                              of the maps are sorted and then input to the reduce tasks. The input
                              and the output of the job are typically stored in a file system.
                            
- 
                        The Pig transformation supports
                              submitting Pig code to the Hadoop Cluster in the context of a SAS
                              Data Integration Studio job. The transformation contains an enhanced,
                              color-coded editor specific to the Pig Latin language. Pig Latin is
                              a high-level language used for expressing and evaluating data analysis
                              programs. Pig Latin supports substantial parallelization and can handle
                              very large data sets.
                            
- 
                        The Transfer From and Transfer
                              To transformations support the transfer of data from and to the Hadoop
                              Cluster in the context of a SAS Data Integration Studio job.
                            
- 
                        The  Hadoop Monitor items
                              in the  Tools menu enable you to run reports
                              that monitor the performance of a Hadoop Cluster.
                            
- 
                        The  Hive Source
                              Designer enables you to register tables in a Hive database. 
                            
 
             
         
            Experimental High-Performance Analytics Components
            
            SAS® LASR™
               Analytic Server is a direct-access, NoSQL, NoMDX server that is engineered
               for maximum analytic performance through multithreading and distributed
               computing. SAS Data Integration Studio provides the following experimental
               High-Performance Analytics transformations for SAS LASR Analytic Servers: 
               
               
                  
                     - 
                        The SAS Data in HDFS Loader transformation
                              is used to stage data into a Hadoop cluster.
                            
- 
                        The SAS Data in HDFS Unloader transformation
                              loads data from Hadoop into a SAS LASR Analytic Server.  
                            
- 
                        The SAS LASR Analytic Server Loader
                              transformation loads data to a SAS LASR Analytic server.
                            
- 
                        The SAS LASR Analytic Server Unloader
                              transformation unloads data that has previously been loaded into a
                              SAS LASR Analytic Server. 
                            
 
            Source Designer wizards
               are used to register tables on the SAS Metadata Server. SAS Data Integration
               Studio provides the following experimental Source Designers for High-Performance
               Analytics tables: 
               
               
                  
                     - 
                        The  SAS Data in HDFS Source
                              Designer enables you to register SAS tables in a Hadoop Cluster. 
                            
- 
                        The  SAS LASR Analytic
                                 Server Source Designer enables you to register SAS LASR
                              Analytic tables. 
                            
 
            For more information
               about these experimental components, contact SAS Technical Support. 
            
 
         
            New Business Rules Transformation
            
            The Business Rules transformation
               enables you use the business rule flow packages that are created in
               SAS® Business Rules Manager in the context of a SAS Data Integration
               Studio job. You can import business rule flows, specify flow versions,
               map source table columns to required input columns, and set business
               rule options.
            
The Business Rules transformation
               enables you to map your source data and output data into and out of
               the rules package. Then, the SAS Data Integration Studio job applies
               the rules to your data as it is run. When you run a job that includes
               a rules package, statistics are collected, such as the number of rules
               that were triggered, and the number of invalid and valid data record
               values. You can use this information to further refine your data as
               it flows through your transformation logic. 
            
 
         
            Other New Features
            
            Here are some of the
               most notable enhancements included in this release:
            
Support for SQL Server
               user-defined functions (UDFs) enables you to import UDFs for models
               registered through Model Manager for supported databases that include
               DB2, Teradata, and Netezza. You can also import native UDFs from Oracle,
               DB2, and Teradata. After the UDFs are imported, you can access them
               on the 
Functions tab of the 
Expression
                  Builder window. 
            
Performance enhancements
               for the SCD Type 2 Loader transform include the following:
               
               
                  
                     - 
                        the ability to use character-based
                              columns for change tracking
                            
- 
                        an option to create an index on
                              the permanent cross reference table
                            
- 
                        an option to specify the SPD server
                              update technique
                            
- 
                        an option to sort target table
                              records before creating the temporary cross reference table
                            
 
            In the past, if you
               selected 
Tools Options
Options Data Quality tab
Data Quality tab and changed
               the DQ Setup Location, the new location could not be applied to data
               quality transformations in existing jobs. Now, if you change the global
               DQ Setup Location, you have the option to apply the new location to
               data quality transformations in existing jobs. To apply the global
               DQ Setup Location to a transformation, click the 
Reset
                  DQ Setup Location button on the appropriate tab, such
               as the 
Standardization tab for the Apply
               Lookup Standardization transformation. The following data quality
               transformations support this option: Apply Lookup Standardization
               transformations, Standardize with Definition transformations, and
               Create Match Codes transformations. 
            
A new Federation Server
               Source Designer enables you to register data sources that are available
               through a DataFlux® Federation Server. You can then access these
               data sources in a SAS Data Integration Studio job. 
            
Direct lookup using
               a hash object in the Data Validation transformation is now supported. 
            
 
       
      
      
         Copyright © SAS Institute Inc. All rights reserved.