Where Do I Locate My Analytics Cluster?

Overview of Locating Your Analytics Cluster

You have two options for where to locate your SAS analytics cluster:
  • Co-locate SAS with your data store.
  • Separate SAS from your data store.
    When your SAS analytics cluster is separated (remote) from your data store, you have two basic options for transferring data:
    • Serial data transfer using SAS/ACCESS.
    • Parallel data transfer using SAS/ACCESS in conjunction with the SAS Embedded Process.
The topics in this section contain simple diagrams that describe each option for analytics cluster placement:
Tip
Where you locate your cluster depends on a number of criteria. Your SAS representative will know the latest supported configurations, and can work with you to help you determine which cluster placement option works best for your site. Also, there might be solution-specific criteria that you should consider when determining your analytics cluster location. For more information, see the installation or administration guide for your specific SAS solution.

Analytics Cluster Co-Located with Your Hadoop Cluster

Note: In a co-located configuration, the SAS High-Performance Analytics environment supports the Apache, Cloudera, Hortonworks, IBM BigInsights, MapR, and Pivotal HD distributions of Hadoop. For more specific version information, see the SAS 9.4 Supported Hadoop Distributions.
The following figure shows the analytics cluster co-located on your Hadoop cluster:
Analytics Cluster Co-Located with the Hadoop Cluster
Analytics Cluster Co-Located with the Hadoop Cluster
Note: For deployments that use Hadoop for the co-located data provider and access SASHDAT tables exclusively, SAS/ACCESS and the SAS Embedded Process are not needed.

Analytics Cluster Remote from Your Data Store (Serial Connection)

The following figure shows the analytics cluster using a serial connection to your remote data store:
Analytics Cluster Remote from Your Data Store (Serial Connection)
Analytics Cluster Remote from Your Data Store (Serial Connection)
The serial connection between the analytics cluster and your data store is achieved by using the SAS/ACCESS Interface. SAS/ACCESS is orderable in a deployment package that is specific for your data source. For more information, refer to the SAS/ACCESS for Relational Databases: Reference.

Analytics Cluster Remote from Your Data Store (Parallel Connection)

Note: In the third maintenance release of SAS 9.4, SAS Embedded Process supports the Cloudera, Hortonworks, IBM BigInsights, MapR, and Pivotal HD distributions of Hadoop. For more specific version information, see the SAS 9.4 Support for Hadoop.
The following figure shows the analytics cluster using a parallel connection to your remote data store:
Analytics Cluster Remote from Your Data Store (Parallel Connection)
Analytics Cluster Remote from Your Data Store (Parallel Connection)
Together the SAS/ACCESS Interface and SAS Embedded Process provide a high-speed parallel connection that delivers data from your data source to the SAS High-Performance Analytics environment on the analytics cluster. These components are contained in a deployment package that is specific for your data source. For more information, refer to the SAS In-Database Products: Administrator’s Guide.

Hadoop Deployment Comparison

The following table compares various deployment Hadoop scenarios.
Hadoop Deployment Comparison
Co-located with Hadoop
SASHDAT: Yes
SAS Embedded Process: No
Co-located with Hadoop
SASHDAT: No
SAS Embedded Process: Yes
Remote Data Provider
SASHDAT: Not Supported
SAS Embedded Process: No
Remote Data Provider
SASHDAT: Not Supported
SAS Embedded Process: Yes
SASHDAT Support
Yes
No
No, SASHDAT is co-located or MapR NFS only.
No.
SASHDAT is co-located or MapR NFS only.
Parallel R/W
Yes for SASHDAT and CSV.
No for SAS/ACCESS because there is no SAS Embedded Process.
Yes.
(At least for PROC HDMD.)
No, SAS/ACCESS can perform a serial read through the root node.
Yes.
SAS/ACCESS and SAS Embedded Process enable this.
Asymmetric1
No for SASHDAT.
No for SAS/ACCESS.
SAS/ACCESS can perform a serial read through the root node.
Yes.
SAS Embedded Process on all the machines can deliver data to a fewer or greater number of machines.
No.
SAS/ACCESS can perform a serial read through the root node.
Yes.
SAS Embedded Process on all the machines can deliver data to a fewer or greater number of machines.
Serial Reads for SAS/ACCESS
SAS/ACCESS reads are always serial without SAS Embedded Process.
(If something is misconfigured, SAS/ACCESS performs a serial read.)
SAS/ACCESS reads are always serial without SAS Embedded Process.
(If something is misconfigured, SAS/ACCESS performs a serial read.)
Popularity
This is the SAS Visual Analytics configuration.
Rare.
Rare.
Popular.
(Can be combined with a co-located Hadoop configuration.)
1Asymmetric refers to a deployment where the total number of SAS High-Performance Analytics environment worker nodes is not equal to the total number of Hadoop data nodes. Symmetric refers to an equal number of worker nodes and data nodes.
Last updated: June 19, 2017