If you want to co-locate the SAS High-Performance Analytics environment with a pre-existing
Cloudera 4 Hadoop (CDH) cluster, you can modify your cluster with files from the SAS
Plug-ins for Hadoop package. CDH modified with SAS Plug-ins for Hadoop enables the
SAS High-Performance Analytics environment to write SASHDAT file blocks evenly across
the
HDFS file system.
Tip
In Cloudera 4.2 and earlier,
you must install the enterprise license, even if you are below the
stated limit of 50 nodes in the Hadoop cluster for requiring a license.
-
Untar the SAS Plug-ins
for Hadoop tarball, and propagate five files (identified in the following
steps) on every machine in your CDH cluster:
-
Navigate to the SAS
Plug-ins for Hadoop tarball in your SAS Software depot:
cd depot-installation-location/standalone_installs/
SAS_Plug-ins_for_Hadoop/1_0/Linux_for_x64/
-
Copy hdatplugins.tar.gz
to a temporary location where you have Write access.
-
Untar hdatplugins.tar.gz:
tar xzf hdatplugins.tar.gz
The hdatplugins
directory
is created.
-
Propagate the following three
JAR files in
hdatplugins
into
the CDH library path on every machine in the CDH cluster:
-
-
-
sas.grid.provider.yarn.jar
Tip
If you have already installed
the SAS High-Performance Computing Management Console or the SAS High-Performance
Analytics environment, you can issue a single
simcp command
to propagate JAR files across all machines in the cluster. For example:
/opt/TKGrid/bin/simcp /tmp/hdatplugins/sas.lasr.jar
/opt/cloudera/parcels/CDH-4.4.0-1.cdh4.4.0.p0.39/lib/hadoop/lib/
/opt/TKGrid/bin/simcp /tmp/hdatplugins/sas.lasr.hadoop.jar
/opt/cloudera/parcels/CDH-4.4.0-1.cdh4.4.0.p0.39/lib/hadoop/lib/
/opt/TKGrid/bin/simcp /tmp/hdatplugins/sas.grid.provider.yarn.jar
/opt/cloudera/parcels/CDH-4.4.0-1.cdh4.4.0.p0.39/lib/hadoop/lib/
For
more information, see
Simultaneous Utilities Commands.
-
Propagate saslasrfd
in
hdatplugins
into the CDH
bin
directory
to the CDH cluster on every machine in the CDH cluster. For example:
/opt/TKGrid/bin/simcp saslasrfd
/opt/cloudera/parcels/CDH-4.4.0-1.cdh4.4.0.p0.39/lib/hadoop/bin/
-
Propagate SAS_VERSION
in hdatplugins
to the $HADOOP_HOME
directory
on each machine in the CDH cluster.
-
Log on to the Cloudera
Manager as an administrator.
-
Add the following to
the plug-in configuration for the NameNode:
com.sas.lasr.hadoop.NameNodeService
-
Add the following to
the plug-in configuration for DataNodes:
com.sas.lasr.hadoop.DataNodeService
-
Add the following lines
to the advanced configuration for service-wide. These lines are placed
in the HDFS Service Configuration Safety Valve property
for hdfs-site.xml:
<property>
<name>com.sas.lasr.service.allow.put</name>
<value>true</value>
</property>
<property>
<name>com.sas.lasr.hadoop.service.namenode.port</name>
<value>15452</value>
</property>
<property>
<name>com.sas.lasr.hadoop.service.datanode.port</name>
<value>15453</value>
</property>
<property>
<name> dfs.namenode.fs-limits.min-block-size</name>
<value>0</value>
</property>
-
Restart all Cloudera
Manager services.
-
Add the following to
the HDFS Client Configuration Safety Valve:
<property>
<name>com.sas.lasr.service.allow.put</name>
<value>true</value>
</property>
<property>
<name>com.sas.lasr.hadoop.service.namenode.port</name>
<value>15452</value>
</property>
<property>
<name>com.sas.lasr.hadoop.service.datanode.port</name>
<value>15453</value>
</property>
<property>
<name> dfs.namenode.fs-limits.min-block-size</name>
<value>0</value>
</property>
-
Add the location of
JAVA_HOME to the Client Environment Safety Valve for
hadoop-env.sh. For example:
JAVA_HOME=/usr/lib/java/jdk1.7.0_07
-
Save your changes and
deploy the client configuration to each host in the cluster.
-
Tip
Remember the value of HADOOP_HOME
as the SAS High-Performance Analytics environment prompts for this
during its install. By default, these are the values for Cloudera:
-
Cloudera 4.5:
/opt/cloudera/parcels/CDH-4.5.0-1.cdh4.5.0.p0.30
-
Cloudera 4.2 and earlier:
/opt/cloudra/parcels/CDH-4.2.0-1.cdh4.2.0.po.10/lib/Hadoop