Modify the Existing Apache Hadoop Cluster, Version 2.7

If you want to co-locate the SAS High-Performance Analytics environment with a pre-existing Apache Hadoop cluster, you can modify your cluster with files from the SAS Plug-ins for Hadoop package. Apache Hadoop modified with SAS Plug-ins for Hadoop enables the SAS High-Performance Analytics environment to write SASHDAT file blocks evenly across the HDFS file system.

To modify line 2.7 or later of Apache Hadoop with SAS Plug-ins for Hadoop, follow these steps:

Make sure that you have reviewed all of the information contained in the section Requirements for Co-located Hadoop.
Log on to the Hadoop NameNode machine (blade 0) as root.
Create a directory, sas, under $HADOOP_HOME/share/hadoop.
The software that is needed for SAS Plug-ins for Hadoop is available from within the SAS Software Depot that was created by your site’s depot administrator:

depot-installation-location/standalone_installs/SAS_Plug-ins_for_Hadoop/1_0/Linux_for_x64/hdatplugins.tar.gz
Copy the hdatplugins.tar.gz file to a temporary location and extract it:
```
cp hdatplugins.tar.gz /tmp
cd /tmp
tar xzf hdatplugins.tar.gz
```
A directory that is named hdatplugins is created.
Propagate the following three JAR files in hdatplugins to the $HADOOP_HOME/share/hadoop/sas directory on each machine in the Apache Hadoop cluster:
- sas.lasr.jar
- sas.lasr.hadoop.jar
- sas.grid.provider.yarn.jar
Tip
If you have already installed the SAS High-Performance Computing Management Console or the SAS High-Performance Analytics environment, you can issue a single simcp command to propagate JAR files across all machines in the cluster. For example:
```
/opt/TKGrid/bin/simcp /tmp/hdatplugins/sas.lasr.jar
     $HADOOP_HOME/share/hadoop/sas
     /opt/TKGrid/bin/simcp /tmp/hdatplugins/sas.lasr.hadoop.jar
     $HADOOP_HOME/share/hadoop/sas
     /opt/TKGrid/bin/simcp /tmp/hdatplugins/sas.grid.provider.yarn.jar
     $HADOOP_HOME/share/hadoop/sas
```
For more information, see Simultaneous Utilities Commands.
Propagate saslasrfd in hdatplugins to the $HADOOP_HOME/bin directory on each machine in the Apache Hadoop cluster.
Propagate SAS_VERSION in hdatplugins to the $HADOOP_HOME/ directory on each machine in the Apache Hadoop cluster.

Define the following properties in $HADOOP_HOME/etc/hadoop/hdfs-site.xml and propagate the changes across all nodes in your Hadoop cluster:

  <property>
    <name>dfs.namenode.plugins</name>
    <value>com.sas.lasr.hadoop.NameNodeService</value>
  </property>
  <property>
    <name>dfs.datanode.plugins</name>
    <value>com.sas.lasr.hadoop.DataNodeService</value>
  </property>
  <property>
    <name>com.sas.lasr.hadoop.fileinfo</name>
    <value>ls -l {0}</value>
    <description>The command used to get the user, group, and permission
    information for a file.
    </description>
  </property>
  <property>
    <name>com.sas.lasr.service.allow.put</name>
    <value>true</value>
    <description>Flag indicating whether the PUT command is enabled when
    running as a service. The default is false.
    </description>
  </property>
  <property>
    <name>com.sas.lasr.hadoop.service.namenode.port</name>
    <value>15452</value>
  </property>
  <property>
    <name>com.sas.lasr.hadoop.service.datanode.port</name>
    <value>15453</value>
  </property>

Note: You can change the port for the SAS name node and data node plug-ins. This example shows the default ports (15452 and 15453, respectively).

Add the plug-in JAR files to the Hadoop class path in $HADOOP_HOME/etc/hadoop/hadoop-env.sh and propagate the changes across all nodes in your Hadoop cluster:

for f in $HADOOP_HOME/share/hadoop/sas/*.jar; do
  if [ "$HADOOP_CLASSPATH" ]; then       
     export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$f
  else
     export HADOOP_CLASSPATH=$f
  fi
done

Restart the HDFS service and any dependencies.
If you are deploying SAS Visual Analytics, see Hadoop Configuration Step for SAS Visual Analytics.

Last updated: June 19, 2017