Overview of the High-Performance Analytics Transformations

High-Performance Analytics Transformations

The Transformations tree in SAS Data Integration Studio includes a High-Performance Analytics folder. You can use these transformations to load and unload tables on a Hadoop cluster or a SAS LASR Analytic Server. These transformations are typically used to support a SAS Analytics solution that includes both SAS Data Integration Studio and SAS LASR Analytic Server.
For example, SAS Data Integration Studio can be used to support a SAS LASR Analytic Server with the Hadoop Distributed File System (HDFS) as a co-located data provider. You can create a process flow that loads the output table from a transformation to HDFS. Then the HDFS table can be loaded onto the SAS LASR Analytic Server, where it can be analyzed with SAS Visual Analytics. The following display illustrates such a job.
Example Flow for High-Performance Analytics Transformations
Example Flow for High-Performance Analytics Transformations
The process flow in the previous display includes the following components:
  • The first ALL_EMP table in the flow is the output from a previous transformation in the flow. It is a table in a Base SAS library.
  • The SAS Data in HDFS Loader reads the first ALL_EMP in the flow. It loads the contents of ALL_EMP into a table with the same physical storage name in a SAS Data in HDFS library. The library is used to make the connection to HDFS.
  • The SAS LASR Analytic Server Loader reads the second ALL_EMP in the flow and loads its contents into a table with the same physical storage name in a SAS LASR Analytic Server library. The library is used to make the connection to the SAS LASR Analytic Server cluster and load the table into memory.
The following table describes the High-Performance Analytics transformations.
Name
Description
SAS Data in HDFS Loader
Loads a table to the file system (HDFS) on a Hadoop cluster. The source can be a SAS data set or a table in any DBMS supported by SAS. The target is a table in a SAS Data in HDFS Library.
SAS Data in HDFS Unloader
Unloads a table from HDFS. The input is a table in a SAS Data in HDFS Library.
SAS LASR Analytic Server Loader
Loads a table to memory on a SAS LASR Analytic Server. The source can be a SAS data set, a table in any DBMS supported by SAS, or a table in a SAS Data in HDFS Library. The target is an in-memory table in a SAS LASR Analytic Server Library.
SAS LASR Analytic Server Unloader
Unloads a table from memory on a SAS LASR Analytic Server. The input is an in-memory table in a SAS LASR Analytic Server Library.

Software Used by These Transformations

SAS LASR Analytic Server

The High-Performance Analytics transformations in SAS Data Integration Studio use the OLIPHANT procedure and the LASR procedure, which are part of SAS LASR Analytic Server.
The SAS LASR Analytic Server is an analytic platform that provides a secure, multi-user environment for concurrent access to data that is loaded into memory. The server can take advantage of a distributed computing environment by distributing data and the workload among multiple machines and performing massively parallel processing. The server can also be deployed on a single machine where the workload and data volumes do not demand a distributed computing environment.
For distributed deployments, having local storage available on machines is critical in order to store large data sets in a distributed form. The SAS LASR Analytic Server supports HDFS as a co-located data provider. HDFS is used because the server can read from and write to HDFS in parallel. In addition, HDFS provides replication for data redundancy. HDFS stores data as blocks in distributed form on the blades and the replication provides failover capabilities.
For more information about this server and related software, see the SAS LASR Analytic Server: Reference Guide. This book is available at the following location: http://support.sas.com/documentation/onlinedoc/securedoc/index_lasrserver.html

SAS Data in HDFS Library

The second ALL_EMP table in the previous display is stored in a SAS Data in HDFS library. This is a library for tables that are stored in the Hadoop Distributed File System (HDFS). The library works only with SASHDAT files that are created with the OLIPHANT procedure or with the SAS Data in HDFS Engine. SASHDAT is the data format used for SAS tables that are added to HDFS.

OLIPHANT Procedure

The SAS Data in HDFS loader and unloader use the OLIPHANT procedure to add, delete, and manage SASHDAT files that are stored in the Hadoop Distributed File System (HDFS). You can manually specify some OLIPHANT options in the transformation. For more information about this procedure, see the SAS LASR Analytic Server: Reference Guide.

LASR Procedure

The SAS LASR Analytic Server transformations use the LASR procedure to load and unload tables from that server. You can manually specify some LASR options in the transformation. For more information about this procedure, see the SAS LASR Analytic Server: Reference Guide.

SAS LASR Analytic Server Library

The third ALL_EMP table in the previous display is stored in a SAS LASR Analytic Server library. This is a library used to register tables that are loaded in memory on a SAS LASR Analytic Server.