If you want to co-locate the SAS High-Performance Analytics environment with a pre-existing
Cloudera 5 Hadoop (CDH) cluster, you can modify your cluster with files from the SAS
Plug-ins for Hadoop package. CDH modified with SAS Plug-ins for Hadoop enables the
SAS High-Performance Analytics environment to write SASHDAT file blocks evenly across
the
HDFS file system.
-
Untar the SAS Plug-ins
for Hadoop tarball, and propagate five files (identified in the following
steps) on every machine in your CDH cluster:
-
Navigate to the SAS
Plug-ins for Hadoop tarball in your SAS Software depot:
cd depot-installation-location/standalone_installs/
SAS_Plug-ins_for_Hadoop/1_0/Linux_for_x64/hdatplugins.tar.gz
-
Copy hdatplugins.tar.gz
to a temporary location where you have Write access.
-
Untar hdatplugins.tar.gz:
tar xzf hdatplugins.tar.gz
The hdatplugins
directory
is created.
-
Propagate the following three
JAR files in
hdatplugins
into
the CDH library path on every machine in the CDH cluster:
-
-
-
sas.grid.provider.yarn.jar
Tip
If you have already installed
the SAS High-Performance Computing Management Console or the SAS High-Performance
Analytics environment, you can issue a single
simcp command
to propagate JAR files across all machines in the cluster. For example:
/opt/TKGrid/bin/simcp /tmp/hdatplugins/sas.lasr.jar
/opt/cloudera/parcels/CDH-5.0.0-0.cdh5b1.p0.57/lib/hadoop/lib/
/opt/TKGrid/bin/simcp /tmp/hdatplugins/sas.lasr.hadoop.jar
/opt/cloudera/parcels/CDH-5.0.0-0.cdh5b1.p0.57/lib/hadoop/lib/
/opt/TKGrid/bin/simcp /tmp/hdatplugins/sas.grid.provider.yarn.jar
/opt/cloudera/parcels/CDH-5.0.0-0.cdh5b1.p0.57/lib/hadoop/lib/
For
more information, see
Simultaneous Utilities Commands.
-
Propagate saslasrfd
in
hdatplugins
into the CDH
bin
directory
on every machine in the CDH cluster. For example:
/opt/TKGrid/bin/simcp saslasrfd
/opt/cloudera/parcels/CDH-5.0.0-0.cdh5b1.p0.57/lib/hadoop/bin/
-
Propagate SAS_VERSION
in hdatplugins
to the $HADOOP_HOME
directory
on each machine in the CDH cluster.
-
Log on to the Cloudera
Manager as an administrator.
-
In dfs.namenode.plugins
,
add the following to the plug-in configuration for the NameNode:
com.sas.lasr.hadoop.NameNodeService
-
In dfs.datanode.plugins
,
add the following to the plug-in configuration for DataNodes:
com.sas.lasr.hadoop.DataNodeService
-
Add the following lines to the advanced configuration for service-wide. These lines
are placed in the HDFS Service Advanced Configuration Snippet (Safety Valve) for hdfs-site.xml:
<property>
<name>com.sas.lasr.service.allow.put</name>
<value>true</value>
</property>
<property>
<name>com.sas.lasr.hadoop.service.namenode.port</name>
<value>15452</value>
</property>
<property>
<name>com.sas.lasr.hadoop.service.datanode.port</name>
<value>15453</value>
</property>
<property>
<name> dfs.namenode.fs-limits.min-block-size</name>
<value>0</value>
</property>
-
Add the following property
to the HDFS Client Advanced Configuration Snippet (Safety
Valve) for hdfs-site.xml under Advanced within
the Gateway Default Group. Make sure that
you change path-to-data-dir to
the data directory location for your site (for example, <value>file:///dfs/dn</value>
):
<property>
<name>com.sas.lasr.service.allow.put</name>
<value>true</value>
</property>
<property>
<name>com.sas.lasr.hadoop.service.namenode.port</name>
<value>15452</value>
</property>
<property>
<name>com.sas.lasr.hadoop.service.datanode.port</name>
<value>15453</value>
</property>
<property>
<name> dfs.namenode.fs-limits.min-block-size</name>
<value>0</value>
</property>
-
Add the location of
JAVA_HOME to the HDFS Client Environment Advanced Configuration
Snippet for hadoop-env.sh (Safety Valve), located under Advanced in
the Gateway Default Group. For example:
JAVA_HOME=/usr/lib/java/jdk1.7.0_07
Note: When Cloudera Manager prioritizes
the HDFS client configuration, the client safety valve is used. When
Cloudera Manager prioritizes anything else (such as YARN), the service
safety valve is used. Therefore, updating both safety values is the
best practice. For more information, see
Cloudera documentation.
-
Save your changes and
deploy the client configuration to each host in the cluster.
-
Restart the HDFS service and any dependencies in Cloudera Manager.
-
Create and mode the /test
directory in HDFS for testing the cluster with SAS test jobs. You might need to set
HADOOP_HOME first,
and you must run the following commands as the user running HDFS (typically, hdfs).
$HADOOP_HOME/bin/hadoop fs -mkdir /test
$HADOOP_HOME/bin/hadoop fs -chmod 777 /test
-