What's New in SAS Data Loader 2.3 for Hadoop
Overview
The main enhancements
for SAS Data Loader 2.3 include the following:
-
Migration Support for SAS Data
Loader 2.2
-
New Support for Importing Delimited
Files into Hadoop
-
Enhanced Support for SAS LASR Analytic
Server
-
-
New Features for Data Quality Analysis
-
-
Enhanced Support for Hadoop
-
Enhanced Support for Apache Hive
-
New Support for Active Directory
(LDAP) Authentication
-
Migration Support for SAS Data Loader 2.2
When you configure the
vApp for SAS Data Loader 2.3, you can migrate your jobs, profiles,
and configuration settings from SAS Data Loader 2.2. For more information,
see the “Migrate From a Previous Version” chapter of
the SAS Data Loader for Hadoop: vApp Deployment Guide.
New Support for Importing Delimited Files into Hadoop
The new
directive Import a File now imports and registers delimited source files in Hadoop.
The contents of the imported file are saved to a Hive table. New tables
receive columns that are derived from the source. Column definitions
can be edited and stored for reuse in future imports.
Enhanced Support for SAS LASR Analytic Server
The
directive Load Data to LASR now supports symmetric multiprocessing (SMP) with the SASIOLA
engine when loading data into non-grid configurations of the SAS LASR
Analytic Server software. In grid configurations, the directive continues
to support massively parallel processing (MPP).
Profile Enhancements
The directives Profile
Data and Saved Profile Reports have been enhanced:
-
New configuration settings give
you more control over how profile jobs are processed. For example,
you can specify the number of parallel threads to use when processing
a job. You can also choose to minimize the number of threads that
are used.
For
more information about these settings, see Profiles Panel.
-
When viewing data in a profile
report, you can view trend graphs for the information in the report
to quickly visualize how the profiled data has changed over time.
For more
information about using trend graphs in reports, see Open Saved Profile Reports.
-
When you generate a profile report, SAS Data Loader
now automatically performs data type analysis and includes the results
in the report. The analysis detects and reports on the types of content
in source columns. For example, the analysis can indicate that a column
contains phone numbers or ZIP codes.
For more
information about profile reports, see About Profile Reports.
New Features for Data Quality Analysis
The
directive Cleanse Data in Hadoop now provides the following new transformations: Change Case, Gender
Analysis, Pattern Analysis, and Field Extraction.
New Support for Deleting Rows from a Selected Table
The
new Delete Rows directive enables the deletion of
specified rows within a source table, without writing the output to
a target table. Rows can be deleted by one or more rules or by a user-written
Hive expression. You can paste and edit existing expressions. An expression
builder provides access to Hive function syntax and source column
names. In order to use this feature, the Hadoop cluster must be configured
with release 0.14 or later of the Apache Hive data warehouse software.
This release supports transactional tables.
Usability Enhancements
You can do the following
tasks, using features that are new to this release:
-
display available tables and data
sources in a list view or in the existing grid view.
-
use a new Search field to locate
tables and data sources in the list view.
-
configure the list view and grid
view to open the most recently used data source automatically.
For more information,
see About the SAS Table Viewer.
-
select from the ten most recently
accessed tables or data sources.
-
delete individual jobs in the Run
Status directive.
-
filter rows using an expression
builder or the existing rules mechanism in several directives.
-
The SAS Table Viewer now displays
short column names, not tablename.columnname. The short column names
are easier to read.
Enhanced Support for Hadoop
New versions of Cloudera
and Hortonworks are supported. Support for MapR has been added. Kerberos
is not supported in combination with MapR.
If the version of Hadoop
on your cluster has changed, you can now update the version of Hadoop
that is specified in the vApp to match. For more information, see
“Configuring a New Version of Hadoop” in the SAS Data Loader for Hadoop: vApp Deployment Guide.
If the default temporary
storage directory for SAS Data Loader is not appropriate for some
reason, you can now change that directory. For example, if the default
temporary directory is not writable, you can specify another directory.
For more information, see the description of the
SAS HDFS
temporary file location field
in the topic Hadoop Configuration Panel.
Enhanced Support for Apache Hive
The new
directive Run a Hive Program enables you to paste and edit existing Hive programs, and
then run those programs in Hadoop.
The Run a Hive Program
directive provides a resource selector that enables you to browse
and add Hive function syntax. The resource selector is also provided
in the Hive expression builder in the following directives: Delete
Rows, Query or Join Tables in Hadoop, and Sort and De-Duplicate Data
in Hadoop.
You can now store data
in the default Hive warehouse location, or you can specify an alternate
location in the Hadoop file system. This location setting can be applied
globally for all target tables that are generated. You can also override
the global setting on a job-by-job basis for individual target tables.
For more information,
see Change the File Format of Hadoop Target Tables.
New Support for Active Directory (LDAP) Authentication
SAS Data Loader
will now connect to a Hadoop cluster that is protected with Active
Directory and LDAP (Lightweight Directory Access Protocol).
For
more information, see Active Directory (LDAP) Authentication.
Copyright © SAS Institute Inc. All rights reserved.