If you want to co-locate the SAS High-Performance Analytics environment with a pre-existing
Pivotal HD (PHD) Hadoop cluster, you can modify your cluster with files from the SAS
Plug-ins for Hadoop package. PHD modified with SAS Plug-ins for Hadoop enables the
SAS High-Performance Analytics environment to write SASHDAT file blocks evenly across
the
HDFS file system.
-
Log on to Pivotal Command
Center (PCC) as gpadmin. (The default password is Gpadmin1.)
-
Untar the SAS Plug-ins
for Hadoop tarball, and propagate five files (identified in the following
steps) on every machine in your PHD cluster:
-
Navigate to the SAS
Plug-ins for Hadoop tarball in your SAS Software depot:
cd depot-installation-location/standalone_installs/
SAS_Plug-ins_for_Hadoop/1_0/Linux_for_x64/
-
Copy hdatplugins.tar.gz
to a temporary location where you have Write access.
-
Untar hdatplugins.tar.gz:
tar xzf hdatplugins.tar.gz
The hdatplugins
directory
is created.
-
Propagate the following three
JAR files in
hdatplugins
into
the library path on every machine in the PHD cluster:
-
-
-
sas.grid.provider.yarn.jar
Tip
If you have already installed
the SAS High-Performance Computing Management Console or the SAS High-Performance
Analytics environment, you can issue a single
simcp command
to propagate JAR files across all machines in the cluster. For example:
/opt/TKGrid/bin/simcp /tmp/hdatplugins/sas.lasr.jar
/usr/lib/gphd/hadoop/lib/
/opt/TKGrid/bin/simcp /tmp/hdatplugins/sas.lasr.hadoop.jar
/usr/lib/gphd/hadoop/lib/
/opt/TKGrid/bin/simcp /tmp/hdatplugins/sas.grid.provider.yarn.jar
/usr/lib/gphd/hadoop/lib/
For more information, see
Simultaneous Utilities Commands.
-
Propagate saslasrfd
in
hdatplugins
into the PHD
bin
directory
on every machine in the PHD cluster. For example:
/opt/TKGrid/bin/simcp saslasrfd /usr/lib/gphd/hadoop/bin/
-
Propagate SAS_VERSION
in hdatplugins
to the $HADOOP_HOME
directory
on each machine in the PHD cluster.
-
In the PCC, for YARN,
make sure that Resource Manager, History Server, and Node Managers
have unique host names.
-
In the PCC, make sure
that the Zookeeper Server contains a unique host name.
-
Add the following properties for SAS for the HDFS configuration to the file hdfs-site.xml:
<property>
<name>dfs.namenode.plugins</name>
<value>com.sas.lasr.hadoop.NameNodeService</value>
</property>
<property>
<name>dfs.datanode.plugins</name>
<value>com.sas.lasr.hadoop.DataNodeService</value>
</property>
<property>
<name>com.sas.lasr.service.allow.put</name>
<value>true</value>
</property>
<property>
<name>com.sas.lasr.hadoop.service.namenode.port</name>
<value>15452</value>
</property>
<property>
<name>com.sas.lasr.hadoop.service.datanode.port</name>
<value>15453</value>
</property>
<property>
<name> dfs.namenode.fs-limits.min-block-size</name>
<value>0</value>
</property>
-
-
Restart your cluster using PCC and verify that HDFS is running in the dashboard.
-