What's New in SAS Data Integration Studio 4.3

Overview

The main enhancements for SAS Data Integration Studio 4.3 include the following:
  • integrated version control and rollback support for metadata
  • new job management reports
  • enhanced SAS Code Importer
  • new dialog box for searching metadata
  • deploy jobs from a command line
  • enhanced support for z/OS
  • new wizard for standardizing column metadata
  • enhanced support for user-defined formats and analytic functions in the Expression Builder
  • better workload balancing with SAS Grid Computing
  • new transformations
  • integration with DataFlux® Data Management Platform

Integrated Versioning

Within SAS Data Integration Studio, you can now save versions of jobs, tables, and other metadata to a SAS Package file, and then archive the package in a third-party versioning system. You can view the differences between versions of a selected object or between an archived version and the current version of that object. If desired, you can restore archived objects to the SAS metadata repository.
SAS Data Integration Studio supports Concurrent Versions System (CVS) or Apache Subversion (SVN) by default. If you want to use other versioning systems, you can do so with the application programming interface (API).

New Job Management Reports

Previous releases enabled you to run a job interactively in SAS Data Integration Studio and display run-time statistics for that job. Now you can use SAS® Web Report Studio or the SAS® Stored Process Server to display pre-built reports for multiple jobs that were executed in batch mode.
The information for these reports is captured in logs at run time, using SAS Application Resource Monitoring (ARM) capabilities. ARM correlates the job with the hardware that it is being run on, so that memory use and I/O can be captured and tagged to a specific job. Performance records are combined with error messages, warnings, table names, and other information to allow for complete, drillable reporting on historical job performance and problems. For example, you can use cube-based reports in SAS Web Report Studio to track outlier executions of a job down to the specific, offending job step. You can use summary and detail reports to quickly diagnose problems without having to traverse multiple log files by hand. Detail reports of job-steps support stringent historical auditing of data sources and targets.

Enhanced SAS Code Importer

In the Import SAS Code wizard, you can expand SAS macros in your jobs and create a node for each step inside of your macros. Expanding the macros provides additional detail about your job and how it works. When you run your job with the macros expansion option enabled, you can get more performance information about slow running steps. You can also identify which steps use more memory or I/O, and CPU performance.
Another option enables you to register all work tables as physical tables in a WORK library. This enables you to import SAS code that uses temporary tables that are both the source and target of a step. You can also analyze your job to determine the type and number of steps in your job. This information is provided in a report that you can review prior to importing the job.

New Dialog Box for Searching Metadata

If you select Toolsthen selectSearch from the main menu, you can search for metadata objects by full name or partial name. You can subset a search to a specific folder, search by type, by last change date, or by other user-defined criteria. You can also save searches to a folder and access them later when needed. For example, you can use the saved search feature to maintain a “recently changed” object list.

Deploy Jobs from a Command Line

You can now deploy SAS Data Integration Studio jobs from a command line. An example batch file is installed with the product that shows you how to use this new feature. You can deploy any number of jobs without having to start SAS Data Integration Studio.

Enhanced Support for z/OS

Code generation line lengths can be limited to 80 characters or less. The JCL for deployed jobs can also be restricted to fit within the z/OS 80-character line length limit. Lines that go beyond the 80-character limit will flow over onto the next line.

New Wizard for Standardizing Column Metadata

In some cases, table columns that have the same name and that are used for the same purpose should also have the same attribute values. For example, two columns named Total Sales should perhaps have the same data type and column length. The new Column Standardization wizard finds columns that have similar names and compares their attributes. It shows any inconsistency in the column attribute values. You can use this wizard to generate a report about column differences or log updates for audit purposes. You can also use the wizard to update the metadata for a target table in a SAS Data Integration Studio job. The new column attributes would be applied to the table the next time you run the job.

User-Defined Formats and Functions in the Expression Builder

You can discover and register user-defined formats and analytic scoring functions from a relational database. After the formats and functions are registered, they appear as expressions that are available from the SAS Data Integration Studio Expression Builder. The discovered functions and formats can be placed in a folder location of your choice. Parameters on the function are also discovered and registered so that you will have enough detailed information to be able to use the function or format in your jobs.

Better Workload Balancing with SAS Grid Computing

SAS Data Integration Studio and SAS Grid Computing now provide better workload balancing for SAS Data Integration Studio jobs. When running interactively on a grid, previous versions of SAS Data Integration Studio would create a new session for each job execution and terminate the session when the job finished. SAS Data Integration Studio now keeps the session open until the user closes the job. This better supports incremental job development because intermediate work tables will remain while the session is active, allowing you to inspect run results. You can also use various debugging features, such as running specific transformations individually.
Interactive submissions to a grid give administrators the ability to automate workload through prioritization, through resource utilization thresholds, and by limiting the number of concurrent jobs. SAS Grid Computing also supports the ability to implement run policies, such as a Fair Share policy, which allows prioritization of jobs based on user and workload.

New Transformations

The Compare Tables transformation can be used to detect changes between two tables. It can compare a source table to another table, or it can compare a source table to a cross-reference table that includes a compare digest. This transformation supports either a direct lookup (hash object) or a disk-based compare through a MERGE statement. The hash lookup will perform faster but requires that the entire table fit into memory. If this is not practical, you can choose the MERGE statement method instead. The transformation can handle New, Update Missing, and Unchanged tables as output. You can choose to retain or delete any of the possible outputs as needed to increase efficiency. The transformation generates its results in a single pass of the data.
The SCD Type 1 Loader is used for Type 1 slowly changing dimension processing. It will detect value changes between the source and target tables and then overwrite the target table values with the source table values when a change is detected.
The SQL Set Operators transformation generates a PROC SQL statement combines the results of two or more queries by using the following set operators:
  • UNION – produces all unique rows from both queries
  • EXCEPT – produces rows that are part of the first query only
  • INTERSECT – produces rows that are common to both query results
  • OUTER UNION – concatenates the query results
The SQL Set Operator transformation is often faster than the SQL Join transformation. You can write complex queries on either side of the SQL Set Operator, combining results sets that have the same attributes but require different access paths. You can create two select statements and then combine them with an SQL Set operator, rather than trying to integrate all logic into a single join. The new transformation supports full pushdown capability so that the entire transformation will be pushed down to the database when the source and target schemas match.
The Oracle Bulk Table Loader supports all bulk-load options for Oracle tables. You can select options that affect how indexes, constraints, and table statistics are handled, including the percentage of rows sampled when gathering statistics. You can configure other Oracle load options as well, such as partition support, commit level, and Direct Path load support.
The Standardize with Definition transformation applies a selected DataFlux® Data Management Studio definition to a source column in order to standardize the data according to a set of rules.

Integration with DataFlux Data Management Platform

DataFlux, a SAS company, recently released the DataFlux Data Management Platform. This platform provides a single environment for managing data quality, data integration, and master data management (MDM). Enterprise bundles that include SAS Data Integration Studio now also include the DataFlux Data Management Platform. One component of the new platform is DataFlux Data Management Studio, which is a desktop client that combines data quality and data integration features.
The data quality features of the platform are especially useful for SAS Data Integration Studio jobs. For example, you can use DataFlux Data Management Studio to analyze and cleanse source data before it is included in SAS Data Integration Studio jobs. You can use DataFlux data quality schemes and definitions in SAS Data Integration Studio jobs. You can also execute DataFlux jobs, profiles, and services from SAS Data Integration Studio.