Deploying SAS In-Memory Statistics for Hadoop

Installation Sequence

If SAS In-Memory Statistics for Hadoop is installed along with a SAS solution such as SAS Visual Analytics, then follow the steps that are provided in the installation guide for the solution. The software is automatically installed when it is delivered with a SAS solution.
If you are not installing the software as part of a solution, then you are performing a "Basic" installation instead of a "Planned" installation. Use the documents in the following sections to install the software.

Software for Your Hadoop Cluster

Basic Steps

Information about installing and configuring SAS High-Performance Analytics Environment, SAS High-Performance Deployment for Hadoop, and SAS High-Performance Computing Management Console is available in the SAS High-Performance Analytics Infrastructure: Installation and Configuration Guide. This book is available at the following URL:
Note: SAS High-Performance Analytics Infrastructure: Installation and Configuration Guide directs you to install the SAS Embedded Process. You can install it only after SAS Foundation and SAS/ACCESS are installed.
If you are using SAS High-Performance Deployment of Hadoop, keep in mind the following items as you follow the procedures in SAS High-Performance Analytics Infrastructure: Installation and Configuration Guide:
  • When you follow the instructions in section "Preparing Your Data Provider for a Parallel Connection with SAS," the list of JAR files is not supplied for SAS High-Performance Deployment of Hadoop. There are no MapReduce 1 JAR files. The files for Common, HDFS, and MapReduce 2 are in the following directories:
    Common
    $HADOOP_HOME/share/hadoop/common/lib/*.jar
    $HADOOP_HOME/share/hadoop/common/*.jar
    HDFS
    $HADOOP_HOME/share/hadoop/hdfs/lib/*.jar
    $HADOOP_HOME/share/hadoop/hdfs/*.jar
    MapReduce 2
    $HADOOP_HOME/share/hadoop/mapreduce/lib/*.jar
    $HADOOP_HOME/share/hadoop/mapreduce/*.jar
    Copy the JAR files from those directories into a single directory on the machine that is used as the Hadoop NameNode.
  • You need to perform the steps in section "Configuring for a Remote Data Store." Those steps install TKGrid_REP. When the installation script prompts you to specify the location of the Hadoop and client JAR files, enter the path to the single directory that has the collection of JAR files.

Additional Steps For SAS High-Performance Deployment of Hadoop

SAS High-Performance Deployment of Hadoop provides Apache Hadoop, SAS services that run inside Hadoop to enable working with the SASHDAT format, and an installation program that helps you install and configure the software.
SAS High-Performance Deployment of Hadoop is initially configured for using HDFS and SAS LASR Analytic Server together. To enable use with the SAS Embedded Process that is needed to access data in formats other than SASHDAT in parallel, you need to configure Hadoop for MapReduce and Yarn services.
To configure the MapReduce and Yarn services:
  1. Log on to the host for the Hadoop NameNode as root or with an account that has sudo privileges.
  2. Edit the $HADOOP_HOME/etc/hadoop/yarn-site.xml file. Include the following properties. Substitute the host name of your NameNode for grid001.example.com:
    <?xml version="1.0"?>
    <configuration>
    
    <!-- Site specific YARN configuration properties -->
      <property>
          <name>yarn.nodemanager.aux-services</name>
          <value>mapreduce.shuffle</value>
      </property>
      <property>
           <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
           <value>org.apache.hadoop.mapred.ShuffleHandler</value>
      </property>
      <property>
           <name>yarn.resourcemanager.resource-tracker.address</name>
           <value>gridhost001.example.com:8025</value>
      </property>
      <property>
           <name>yarn.resourcemanager.address</name>
           <value>gridhost001.example.com:8040</value>
      </property>
      <property>
           <name>yarn.nodemanager.resource.memory-mb</name>
           <value>4096</value>
      </property>
      <property>
            <name>yarn.resourcemanager.scheduler.address</name>
            <value>gridhost001.example.com:8030</value>
      </property>
    </configuration>
    
  3. Edit the $HADOOP_HOME/etc/hadoop/mapred-site.xml file. Add the following property above the closing </configuration> tag:
    <property>
            <name>mapreduce.framework.name</name>
            <value>yarn</value>
    </property>
    
  4. Copy the two files to the $HADOOP_HOME/etc/hadoop directory on the other machines in the cluster.
    Tip
    You can use /opt/TKGrid/bin/simcp to copy the files.
  5. Edit /etc/profile.d/java.sh and include the following lines. Adjust the values to match your environment:
    export JAVA_HOME=/usr/lib/jvm/jre
    export PATH=$PATH:$JAVA_HOME/bin
    export YARN_HOME=/hadoop/hadoop-0.23.1
  6. Copy the /etc/profile.d/java.sh file to the /etc/profile.d directory on the other machines in the cluster.
  7. Using the account that you use to run Hadoop, start Yarn.
    su - hadoop
    $HADOOP_HOME/sbin/start-yarn.sh

SAS Foundation and Related Software

Install SAS Foundation

On the machine that you will use as the SAS client for writing and submitting SAS programs, run the SAS Deployment Wizard to install SAS Foundation. The type of SAS Studio that is installed depends on your host operating system. See the following sections for details.

About SAS Studio Basic and SAS Studio - Single User

SAS Studio is a development application for writing SAS programs and submitting them. You can access SAS Studio through your web browser.
SAS Studio Basic is included with an order for SAS In-Memory Statistics for Hadoop on Linux for x64.
SAS Studio - Single User is included with an order for SAS In-Memory Statistics for Hadoop on Windows.
See SAS Studio: Administrator's Guide for information about installing SAS Studio and administration.

UNIX Hosts

Use the SAS Deployment Wizard to install SAS. Refer to the documentation for UNIX hosts at the following URL:
When you run the SAS Deployment Wizard, you can specify to install SAS Studio Basic. Refer to "SAS Studio Basic" in the SAS Studio: Administrator's Guide.
After the SAS Deployment Wizard installs the software, be sure to follow the instructions for configuring Hadoop JAR files. Refer to Configuration Guide for SAS 9.4 Foundation for UNIX Environments available at the preceding URL.

Windows Hosts

Use the SAS Deployment Wizard to install SAS. Refer to the documentation for Windows hosts at the following URL:
After the SAS Deployment Wizard installs the software, be sure to follow the instructions for configuring Hadoop JAR files. Refer to Configuration Guide for SAS 9.4 Foundation for Microsoft Windows for x64 available at the preceding URL.

SAS In-Database Products and SAS Embedded Process

After SAS Foundation and SAS/ACCESS are installed, follow the instructions in the SAS In-Database Products: Administrator's Guide to install the deployment package for Hadoop on the cluster.
Note: SAS High-Performance Analytics Infrastructure: Installation and Configuration Guide that is mentioned in a preceding section directs you to install the SAS Embedded Process, too. You do not need to install the SAS Embedded Process twice, it is mentioned at this point because it is possible to install it only after SAS Foundation and SAS/ACCESS are installed.
To perform the procedure for installing the deployment package, you need a list of JAR files. If you are using SAS High-Performance Deployment of Hadoop, then use the same directory with the JAR files that you collected earlier. You need to copy the contents of the directory to the client machine that you will use to run SAS programs. As described in SAS In-Database Products: Administrator's Guide, you also need to copy the sas.hadoop.ep.apache023.jar and sas.hadoop.ep.apache023.nls.jar files to the same directory on the client machine. This is the directory that you will specify as the SAS_HADOOP_JAR_PATH on the client machine.