Modify the Existing Apache Hadoop Cluster, Version 2.7

If you want to co-locate the SAS High-Performance Analytics environment with a pre-existing Apache Hadoop cluster, you can modify your cluster with files from the SAS Plug-ins for Hadoop package. Apache Hadoop modified with SAS Plug-ins for Hadoop enables the SAS High-Performance Analytics environment to write SASHDAT file blocks evenly across the HDFS file system.
To modify line 2.7 or later of Apache Hadoop with SAS Plug-ins for Hadoop, follow these steps:
  1. Make sure that you have reviewed all of the information contained in the section Requirements for Co-located Hadoop.
  2. Log on to the Hadoop NameNode machine (blade 0) as root.
  3. Create a directory, sas, under $HADOOP_HOME/share/hadoop.
  4. The software that is needed for SAS Plug-ins for Hadoop is available from within the SAS Software Depot that was created by your site’s depot administrator:
    depot-installation-location/standalone_installs/SAS_Plug-ins_for_Hadoop/1_0/Linux_for_x64/hdatplugins.tar.gz
  5. Copy the hdatplugins.tar.gz file to a temporary location and extract it:
    cp hdatplugins.tar.gz /tmp
    cd /tmp
    tar xzf hdatplugins.tar.gz
    A directory that is named hdatplugins is created.
  6. Propagate the following three JAR files in hdatplugins to the $HADOOP_HOME/share/hadoop/sas directory on each machine in the Apache Hadoop cluster:
    • sas.lasr.jar
    • sas.lasr.hadoop.jar
    • sas.grid.provider.yarn.jar
    Tip
    If you have already installed the SAS High-Performance Computing Management Console or the SAS High-Performance Analytics environment, you can issue a single simcp command to propagate JAR files across all machines in the cluster. For example:
    /opt/TKGrid/bin/simcp /tmp/hdatplugins/sas.lasr.jar
         $HADOOP_HOME/share/hadoop/sas
         /opt/TKGrid/bin/simcp /tmp/hdatplugins/sas.lasr.hadoop.jar
         $HADOOP_HOME/share/hadoop/sas
         /opt/TKGrid/bin/simcp /tmp/hdatplugins/sas.grid.provider.yarn.jar
         $HADOOP_HOME/share/hadoop/sas
    For more information, see Simultaneous Utilities Commands.
  7. Propagate saslasrfd in hdatplugins to the $HADOOP_HOME/bin directory on each machine in the Apache Hadoop cluster.
  8. Propagate SAS_VERSION in hdatplugins to the $HADOOP_HOME/ directory on each machine in the Apache Hadoop cluster.
  9. Define the following properties in $HADOOP_HOME/etc/hadoop/hdfs-site.xml and propagate the changes across all nodes in your Hadoop cluster:
      <property>
        <name>dfs.namenode.plugins</name>
        <value>com.sas.lasr.hadoop.NameNodeService</value>
      </property>
      <property>
        <name>dfs.datanode.plugins</name>
        <value>com.sas.lasr.hadoop.DataNodeService</value>
      </property>
      <property>
        <name>com.sas.lasr.hadoop.fileinfo</name>
        <value>ls -l {0}</value>
        <description>The command used to get the user, group, and permission
        information for a file.
        </description>
      </property>
      <property>
        <name>com.sas.lasr.service.allow.put</name>
        <value>true</value>
        <description>Flag indicating whether the PUT command is enabled when
        running as a service. The default is false.
        </description>
      </property>
      <property>
        <name>com.sas.lasr.hadoop.service.namenode.port</name>
        <value>15452</value>
      </property>
      <property>
        <name>com.sas.lasr.hadoop.service.datanode.port</name>
        <value>15453</value>
      </property>
    
    Note: You can change the port for the SAS name node and data node plug-ins. This example shows the default ports (15452 and 15453, respectively).
  10. Add the plug-in JAR files to the Hadoop class path in $HADOOP_HOME/etc/hadoop/hadoop-env.sh and propagate the changes across all nodes in your Hadoop cluster:
    for f in $HADOOP_HOME/share/hadoop/sas/*.jar; do
      if [ "$HADOOP_CLASSPATH" ]; then       
         export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$f
      else
         export HADOOP_CLASSPATH=$f
      fi
    done
  11. Restart the HDFS service and any dependencies.
  12. If you are deploying SAS Visual Analytics, see Hadoop Configuration Step for SAS Visual Analytics.
Last updated: June 19, 2017