Where Do I Locate My Analytics Cluster?

Overview of Locating Your Analytics Cluster

If you are planning to use a distributed SAS LASR Analytic Server, then you need to establish an analytics cluster. An analytics cluster is a high-performance environment that is characterized by massively parallel processing (MPP) used to perform analytic tasks on big data residing in a distributed data storage appliance or in a Hadoop cluster.
You have two options for where you locate your SAS analytics cluster.
  • It can be co-located with your supported Hadoop data store.
  • It can be remote from your data store.
    When the SAS analytics cluster is separated (remote) from your data store, you have two options for data transfer.
    • You can perform a serial data transfer using SAS/ACCESS Interface to Hadoop.
    • You can perform a parallel data transfer using SAS/ACCESS Interface to Hadoop with SAS Embedded Process.
The topics in this section contain basic diagrams that describe each option for your analytics cluster location. Where you locate your analytics cluster depends on a number of criteria. Your SAS representative knows the latest supported configurations and can help you determine which cluster location works best for your site.

Co-located with Your Data Store

The following figure shows the analytics cluster co-located with a supported Hadoop data store:
Analytics Cluster Co-located with a Supported Hadoop Data Store
Analytics Cluster Co-located with a Supported Hadoop Data Store
Note: For deployments that access SASHDAT tables exclusively, SAS/ACCESS Interface to Hadoop and SAS Embedded Process are not required.
If you choose to co-locate SAS Visual Analytics with Hadoop, the following vendors are supported:
  • Apache Hadoop, 2.7 and later
  • Cloudera Hadoop
  • Hortonworks Data Platform Hadoop
  • IBM BigInsights Hadoop
  • MapR Hadoop
  • Pivotal HD Hadoop
For the complete list of supported Hadoop vendors and their versions, see https://support.sas.com/resources/thirdpartysupport/v94/hadoop/hadoop-distributions.html.
Note: To co-locate SAS Visual Analytics with Hadoop, make sure that you select Hadoop (co-located HDFS) when running the SAS Deployment Wizard. For more information, see SAS Visual Analytics Data Provider.
SAS Visual Analytics Data Provider page

Remote from Your Data Store (Serial Connection)

The serial connection between the analytics cluster and your data store is achieved by using SAS/ACCESS Interface to Hadoop. SAS/ACCESS Interface to Hadoop is orderable in a deployment package that is specific for your data source. For more information, see the SAS/ACCESS for Relational Databases: Reference.
The following figure shows the analytics cluster running on a supported Hadoop cluster using a serial connection to your remote data store:
Analytics Cluster Remote from Your Data Store (Serial Connection)
Analytics Cluster Remote from Your Data Store (Serial Connection)
If you choose to use SAS Visual Analytics with a serial connection to a remote data source, SAS/ACCESS supports the following vendors:
  • Data storage appliance vendors:
    • Greenplum
    • HANA
    • Oracle
    • Teradata
  • Hadoop vendors:
    • Cloudera Hadoop
    • Hortonworks Data Platform Hadoop
Note: Data tables that SAS Visual Analytics loads serially can originate from a variety of sources, not just the ones listed here. If a SAS session can read a table from an ODBC-compliant database, a SAS data set can be read in a serial fashion into an analytics cluster.
Note: To use SAS Visual Analytics with a serial connection to a remote data source, make sure that you select your data provider with SAS Embedded Process. If your data provider is not listed, then select Hadoop (with SAS embedded process) when running the SAS Deployment Wizard. For more information, see SAS Visual Analytics Data Provider.
SAS Visual Analytics Data Provider page

Remote from Your Data Store (Parallel Connection)

Together, SAS/ACCESS Interface to Hadoop and SAS Embedded Process provide a high-speed parallel connection that delivers data from your data source to the SAS High-Performance Analytics environment on the analytics cluster. These products are in a deployment package that is specific for your data source. For more information, see the SAS 9.4 In-Database Products: Administrator’s Guide.
The following figure shows the analytics cluster running on a supported Hadoop cluster using a parallel connection to your remote data store:
Analytics Cluster Remote from Your Data Store (Parallel Connection)
Analytics Cluster Remote from Your Data Store (Parallel Connection)
If you choose to use SAS Visual Analytics with a parallel connection to a remote data source, SAS Embedded Process supports the following vendors:
  • Data storage appliance vendors:
    • Greenplum
    • HANA
    • Oracle
    • Teradata
  • Hadoop vendors:
    • Cloudera Hadoop
    • Hortonworks Data Platform Hadoop
    • IBM BigInsights Hadoop
    • MapR Hadoop
    • Pivotal HD Hadoop
Note: To use SAS Visual Analytics with a parallel connection to a remote data source, when running the SAS Deployment Wizard, make sure that you select your data provider with SAS Embedded Process. If you are using HANA or Oracle, make sure that you select Greenplum (with SAS embedded process). If your data provider is not listed, then select Hadoop (with SAS embedded process). After running the SAS Deployment Wizard, you must manually configure your provider. For more information, see SAS Visual Analytics Data Provider.
SAS Visual Analytics Data Provider page
Last updated: August 1, 2017