What's New in SAS Data Integration Studio 4.3
Overview
The main enhancements
for SAS Data Integration Studio 4.3 include the following:
-
integrated version control and
rollback support for metadata
-
new job management reports
-
enhanced SAS Code Importer
-
new dialog box for searching metadata
-
deploy jobs from a command line
-
enhanced support for
z/OS
-
new wizard for standardizing column
metadata
-
enhanced support for user-defined
formats and analytic functions in the Expression Builder
-
better workload balancing with
SAS Grid Computing
-
-
integration with DataFlux®
Data Management Platform
Integrated Versioning
Within SAS Data Integration
Studio, you can now save versions of jobs, tables, and other metadata
to a SAS Package file, and then archive the package in a third-party
versioning system. You can view the differences between versions of
a selected object or between an archived version and the current version
of that object. If desired, you can restore archived objects to the
SAS metadata repository.
SAS Data Integration
Studio supports Concurrent Versions System (CVS) or Apache Subversion
(SVN) by default. If you want to use other versioning systems, you
can do so with the application programming interface (API).
New Job Management Reports
Previous releases enabled
you to run a job interactively in SAS Data Integration Studio and
display run-time statistics for that job. Now you can use SAS®
Web Report Studio or the SAS® Stored Process Server to display
pre-built reports for multiple jobs that were executed in batch mode.
The information for
these reports is captured in logs at run time, using SAS Application
Resource Monitoring (ARM) capabilities. ARM correlates the job with
the hardware that it is being run on, so that memory use and I/O can
be captured and tagged to a specific job. Performance records are
combined with error messages, warnings, table names, and other information
to allow for complete, drillable reporting on historical job performance
and problems. For example, you can use cube-based reports in SAS Web
Report Studio to track outlier executions of a job down to the specific,
offending job step. You can use summary and detail reports to quickly
diagnose problems without having to traverse multiple log files by
hand. Detail reports of job-steps support stringent historical auditing
of data sources and targets.
Enhanced SAS Code Importer
In the
Import
SAS Code wizard, you can expand SAS macros in your jobs
and create a node for each step inside of your macros. Expanding the
macros provides additional detail about your job and how it works.
When you run your job with the macros expansion option enabled, you
can get more performance information about slow running steps. You
can also identify which steps use more memory or I/O, and CPU performance.
Another option enables
you to register all work tables as physical tables in a WORK library.
This enables you to import SAS code that uses temporary tables that
are both the source and target of a step. You can also analyze your
job to determine the type and number of steps in your job. This information
is provided in a report that you can review prior to importing the
job.
New Dialog Box for Searching Metadata
If you select
ToolsSearch from the main menu, you can search for metadata objects
by full name or partial name. You can subset a search to a specific
folder, search by type, by last change date, or by other user-defined
criteria. You can also save searches to a folder and access them later
when needed. For example, you can use the saved search feature to
maintain a “recently changed” object list.
Deploy Jobs from a Command Line
You can now deploy SAS
Data Integration Studio jobs from a command line. An example batch
file is installed with the product that shows you how to use this
new feature. You can deploy any number of jobs without having to start
SAS Data Integration Studio.
Enhanced Support for z/OS
Code generation line
lengths can be limited to 80 characters or less. The JCL for deployed
jobs can also be restricted to fit within the
z/OS 80-character line
length limit. Lines that go beyond the 80-character limit will flow
over onto the next line.
New Wizard for Standardizing Column Metadata
In some cases, table
columns that have the same name and that are used for the same purpose
should also have the same attribute values. For example, two columns
named Total Sales should perhaps have the same data type and column
length. The new
Column Standardization wizard
finds columns that have similar names and compares their attributes.
It shows any inconsistency in the column attribute values. You can
use this wizard to generate a report about column differences or log
updates for audit purposes. You can also use the wizard to update
the metadata for a target table in a SAS Data Integration Studio job.
The new column attributes would be applied to the table the next time
you run the job.
User-Defined Formats and Functions in the Expression Builder
You can discover and
register user-defined formats and analytic scoring functions from
a relational database. After the formats and functions are registered,
they appear as expressions that are available from the SAS Data Integration
Studio Expression Builder. The discovered functions and formats can
be placed in a folder location of your choice. Parameters on the function
are also discovered and registered so that you will have enough detailed
information to be able to use the function or format in your jobs.
Better Workload Balancing with SAS Grid Computing
SAS Data Integration
Studio and SAS Grid Computing now provide better workload balancing
for SAS Data Integration Studio jobs. When running interactively on
a grid, previous versions of SAS Data Integration Studio would create
a new session for each job execution and terminate the session when
the job finished. SAS Data Integration Studio now keeps the session
open until the user closes the job. This better supports incremental
job development because intermediate work tables will remain while
the session is active, allowing you to inspect run results. You can
also use various debugging features, such as running specific transformations
individually.
Interactive submissions
to a grid give administrators the ability to automate workload through
prioritization, through resource utilization thresholds, and by limiting
the number of concurrent jobs. SAS Grid Computing also supports the
ability to implement run policies, such as a Fair Share policy, which
allows prioritization of jobs based on user and workload.
New Transformations
The
Compare
Tables transformation can be used to detect changes between
two tables. It can compare a source table to another table, or it
can compare a source table to a cross-reference table that includes
a compare digest. This transformation supports either a direct lookup
(hash object) or a disk-based compare through a MERGE statement. The
hash lookup will perform faster but requires that the entire table
fit into memory. If this is not practical, you can choose the MERGE
statement method instead. The transformation can handle New, Update
Missing, and Unchanged tables as output. You can choose to retain
or delete any of the possible outputs as needed to increase efficiency.
The transformation generates its results in a single pass of the data.
The
SCD
Type 1 Loader is used for Type 1 slowly changing dimension
processing. It will detect value changes between the source and target
tables and then overwrite the target table values with the source
table values when a change is detected.
The
SQL
Set Operators transformation generates a PROC SQL statement
combines the results of two or more queries by using the following
set operators:
-
UNION – produces all unique
rows from both queries
-
EXCEPT – produces rows that
are part of the first query only
-
INTERSECT – produces rows
that are common to both query results
-
OUTER UNION – concatenates
the query results
The
SQL
Set Operator transformation is often faster than the
SQL
Join transformation. You can write complex queries on
either side of the SQL Set Operator, combining results sets that have
the same attributes but require different access paths. You can create
two select statements and then combine them with an SQL Set operator,
rather than trying to integrate all logic into a single join. The
new transformation supports full pushdown capability so that the entire
transformation will be pushed down to the database when the source
and target schemas match.
The
Oracle
Bulk Table Loader supports all bulk-load options for
Oracle tables. You can select options that affect how indexes, constraints,
and table statistics are handled, including the percentage of rows
sampled when gathering statistics. You can configure other Oracle
load options as well, such as partition support, commit level, and
Direct Path load support.
The
Standardize
with Definition transformation applies a selected DataFlux®
Data Management Studio definition to a source column in order to standardize
the data according to a set of rules.
Integration with DataFlux Data Management Platform
DataFlux, a SAS company,
recently released the DataFlux Data Management Platform. This platform
provides a single environment for managing data quality, data integration,
and master data management (MDM). Enterprise bundles that include
SAS Data Integration Studio now also include the DataFlux Data Management
Platform. One component of the new platform is DataFlux Data Management
Studio, which is a desktop client that combines data quality and data
integration features.
The data quality features
of the platform are especially useful for SAS Data Integration Studio
jobs. For example, you can use DataFlux Data Management Studio to
analyze and cleanse source data before it is included in SAS Data
Integration Studio jobs. You can use DataFlux data quality schemes
and definitions in SAS Data Integration Studio jobs. You can also
execute DataFlux jobs, profiles, and services from SAS Data Integration
Studio.
Copyright © SAS Institute Inc. All rights reserved.