If you want to co-locate the SAS High-Performance Analytics environment with a pre-existing
Apache Hadoop cluster, you can modify your cluster with files from the SAS Plug-ins
for Hadoop package. Apache Hadoop modified with SAS Plug-ins for Hadoop enables the
SAS High-Performance Analytics environment to write SASHDAT file blocks evenly across
the
HDFS file system.
To modify line 2.7
or later of Apache Hadoop with SAS Plug-ins for Hadoop, follow these
steps:
-
-
Log on to the Hadoop
NameNode machine (blade 0) as root.
-
Create a directory, sas
,
under $HADOOP_HOME/share/hadoop
.
-
The software that is needed for SAS Plug-ins for Hadoop is available from within the
SAS Software Depot that was created by your site’s depot administrator:
depot-installation-location/standalone_installs/SAS_Plug-ins_for_Hadoop/1_0/Linux_for_x64/hdatplugins.tar.gz
-
Copy the hdatplugins.tar.gz
file to a temporary location and extract it:
cp hdatplugins.tar.gz /tmp
cd /tmp
tar xzf hdatplugins.tar.gz
A directory that is
named hdatplugins
is created.
-
Propagate the following three
JAR files in
hdatplugins
to the
$HADOOP_HOME/share/hadoop/sas
directory
on each machine in the Apache Hadoop cluster:
-
-
-
sas.grid.provider.yarn.jar
Tip
If you have already installed
the SAS High-Performance Computing Management Console or the SAS High-Performance
Analytics environment, you can issue a single
simcp command
to propagate JAR files across all machines in the cluster. For example:
/opt/TKGrid/bin/simcp /tmp/hdatplugins/sas.lasr.jar
$HADOOP_HOME/share/hadoop/sas
/opt/TKGrid/bin/simcp /tmp/hdatplugins/sas.lasr.hadoop.jar
$HADOOP_HOME/share/hadoop/sas
/opt/TKGrid/bin/simcp /tmp/hdatplugins/sas.grid.provider.yarn.jar
$HADOOP_HOME/share/hadoop/sas
For more information,
see
Simultaneous Utilities Commands.
-
Propagate saslasrfd
in hdatplugins
to the $HADOOP_HOME/bin
directory
on each machine in the Apache Hadoop cluster.
-
Propagate SAS_VERSION
in hdatplugins
to the $HADOOP_HOME/
directory
on each machine in the Apache Hadoop cluster.
-
Define the following
properties in
$HADOOP_HOME/etc/hadoop/hdfs-site.xml
and
propagate the changes across all nodes in your Hadoop cluster:
<property>
<name>dfs.namenode.plugins</name>
<value>com.sas.lasr.hadoop.NameNodeService</value>
</property>
<property>
<name>dfs.datanode.plugins</name>
<value>com.sas.lasr.hadoop.DataNodeService</value>
</property>
<property>
<name>com.sas.lasr.hadoop.fileinfo</name>
<value>ls -l {0}</value>
<description>The command used to get the user, group, and permission
information for a file.
</description>
</property>
<property>
<name>com.sas.lasr.service.allow.put</name>
<value>true</value>
<description>Flag indicating whether the PUT command is enabled when
running as a service. The default is false.
</description>
</property>
<property>
<name>com.sas.lasr.hadoop.service.namenode.port</name>
<value>15452</value>
</property>
<property>
<name>com.sas.lasr.hadoop.service.datanode.port</name>
<value>15453</value>
</property>
Note: You can change the port for
the SAS name node and data node plug-ins. This example shows the default
ports (15452 and 15453, respectively).
-
Add the plug-in JAR files to the Hadoop class path in $HADOOP_HOME/etc/hadoop/hadoop-env.sh
and
propagate the changes across all nodes in your Hadoop cluster:
for f in $HADOOP_HOME/share/hadoop/sas/*.jar; do
if [ "$HADOOP_CLASSPATH" ]; then
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$f
else
export HADOOP_CLASSPATH=$f
fi
done
-
Restart the HDFS service and any dependencies.
-