What is the Infrastructure?

The SAS High-Performance Analytics infrastructure consists of software that performs analytic tasks in a high-performance environment, which is characterized by massively parallel processing (MPP). The infrastructure is used by SAS products and solutions that typically analyze big data that resides in a distributed data storage appliance or Hadoop cluster.
The following figure depicts the SAS High-Performance Analytics infrastructure in its most basic topology:
SAS High-Performance Analytics Infrastructure Topology (Simplified)
SAS High-Performance Analytics Infrastructure Topology (Simplified)
The SAS High-Performance Analytics infrastructure consists of the following components:
  • SAS High-Performance Analytics environment
    The SAS High-Performance Analytics environment is the core of the infrastructure. The environment performs analytic computations on an analytics cluster. The analytics cluster is a Hadoop cluster or a data appliance.
  • (Optional) SAS Plug-ins for Hadoop
    Some solutions, such as SAS Visual Analytics, rely on a SAS data store that is co-located with the SAS High-Performance Analytics environment on the analytics cluster. One option for this co-located data store is SAS Plug-ins for Hadoop.
    If you already have one of the supported Hadoop distributions, you can modify it with files from the SAS Plug-ins for Hadoop package. Hadoop modified with SAS Plug-ins for Hadoop enables the SAS High-Performance Analytics environment to write SASHDAT file blocks evenly across the HDFS file system. This even distribution provides a balanced workload across the machines in the cluster and enables SAS analytic processes to read SASHDAT tables at very impressive rates.
    For more information, see Overview of Modifying Co-located Hadoop.
  • (Optional) SAS High-Performance Computing Management Console
    The SAS High-Performance Computing Management Console is used to ease the administration of distributed, high-performance computing (HPC) environments. Tasks such as configuring passwordless SSH, propagating user accounts and public keys, and managing CPU and memory resources on the analytics cluster are all made easier by the management console.
Other software on the analytics cluster includes the following:
  • SAS/ACCESS Interface and SAS Embedded Process
    Together the SAS/ACCESS Interface and SAS Embedded Process provide a high-speed parallel connection that delivers data from the co-located SAS data source to the SAS High-Performance Analytics environment on the analytics cluster. These components are contained in a deployment package that is specific for your data source.
    Note: For deployments that use Hadoop for the co-located data provider and access SASHDAT tables exclusively, SAS/ACCESS and SAS Embedded Process is not needed.
  • Database client libraries or JAR files
    Data vendor-supplied client libraries—or in the case of Hadoop, JAR files—are required for the SAS Embedded Process to transfer data to and from the data store and the SAS High-Performance Analytics environment.
  • SAS solutions
    The SAS High-Performance Analytics infrastructure is used by various SAS High-Performance solutions such as the following:
Last updated: June 19, 2017