The SAS High-Performance
Analytics infrastructure consists of software that performs analytic
tasks in a high-performance environment, which is characterized by
massively parallel processing (MPP). The infrastructure is used by
SAS products and solutions that typically analyze big data that resides
in a distributed data storage appliance or Hadoop cluster.
The SAS High-Performance
Analytics infrastructure consists of the following components:
-
SAS High-Performance Analytics
environment
The SAS High-Performance
Analytics environment is the core of the infrastructure. The environment
performs analytic computations on an analytics cluster. The analytics
cluster is a Hadoop cluster or a data appliance.
-
(Optional) SAS Plug-ins for Hadoop
Some solutions, such
as SAS Visual Analytics, rely on a SAS data store that is co-located
with the SAS High-Performance Analytics environment on the analytics
cluster. One option for this co-located data store is SAS Plug-ins
for Hadoop.
If you already have one of the supported Hadoop distributions, you can modify it with
files from the SAS Plug-ins for Hadoop package. Hadoop modified with SAS Plug-ins
for Hadoop enables the SAS High-Performance Analytics environment to write SASHDAT
file blocks evenly across the
HDFS file system. This even distribution provides a balanced workload across the machines
in the cluster and enables SAS analytic processes to read SASHDAT tables at very impressive
rates.
-
(Optional) SAS High-Performance
Computing Management Console
The SAS High-Performance
Computing Management Console is used to ease the administration of
distributed, high-performance computing (HPC) environments. Tasks
such as configuring passwordless SSH, propagating user accounts and
public keys, and managing CPU and memory resources on the analytics
cluster are all made easier by the management console.
Other software on the
analytics cluster includes the following:
-
SAS/ACCESS Interface and SAS Embedded
Process
Together the SAS/ACCESS
Interface and SAS Embedded Process provide a high-speed parallel connection
that delivers data from the co-located SAS data source to the SAS
High-Performance Analytics environment on the analytics cluster. These
components are contained in a deployment package that is specific
for your data source.
Note: For deployments that use
Hadoop for the co-located data provider and access SASHDAT tables
exclusively, SAS/ACCESS and SAS Embedded Process is not needed.
-
Database client libraries or
JAR files
Data vendor-supplied client libraries—or in the case of Hadoop, JAR files—are required
for the SAS Embedded Process to transfer data to and from the
data store and the SAS High-Performance Analytics environment.
-
SAS solutions
The SAS High-Performance
Analytics infrastructure is used by various SAS High-Performance solutions
such as the following: