Modify the Existing Pivotal HD Hadoop Cluster

If you want to co-locate the SAS High-Performance Analytics environment with a pre-existing Pivotal HD (PHD) Hadoop cluster, you can modify your cluster with files from the SAS Plug-ins for Hadoop package. PHD modified with SAS Plug-ins for Hadoop enables the SAS High-Performance Analytics environment to write SASHDAT file blocks evenly across the HDFS file system.

Log on to Pivotal Command Center (PCC) as gpadmin. (The default password is Gpadmin1.)
Untar the SAS Plug-ins for Hadoop tarball, and propagate five files (identified in the following steps) on every machine in your PHD cluster:
1. Navigate to the SAS Plug-ins for Hadoop tarball in your SAS Software depot:
  cd depot-installation-location/standalone_installs/ SAS_Plug-ins_for_Hadoop/1_0/Linux_for_x64/
2. Copy hdatplugins.tar.gz to a temporary location where you have Write access.
3. Untar hdatplugins.tar.gz:
  tar xzf hdatplugins.tar.gz
  
  The hdatplugins directory is created.
4. Propagate the following three JAR files in hdatplugins into the library path on every machine in the PHD cluster:
  
  sas.lasr.jar
  
  sas.lasr.hadoop.jar
  
  sas.grid.provider.yarn.jar
  
  Tip
  If you have already installed the SAS High-Performance Computing Management Console or the SAS High-Performance Analytics environment, you can issue a single simcp command to propagate JAR files across all machines in the cluster. For example:
  /opt/TKGrid/bin/simcp /tmp/hdatplugins/sas.lasr.jar /usr/lib/gphd/hadoop/lib/ /opt/TKGrid/bin/simcp /tmp/hdatplugins/sas.lasr.hadoop.jar /usr/lib/gphd/hadoop/lib/ /opt/TKGrid/bin/simcp /tmp/hdatplugins/sas.grid.provider.yarn.jar /usr/lib/gphd/hadoop/lib/
  For more information, see Simultaneous Utilities Commands.
5. Propagate saslasrfd in hdatplugins into the PHD bin directory on every machine in the PHD cluster. For example:
  /opt/TKGrid/bin/simcp saslasrfd /usr/lib/gphd/hadoop/bin/
Propagate SAS_VERSION in hdatplugins to the $HADOOP_HOME directory on each machine in the PHD cluster.
In the PCC, for YARN, make sure that Resource Manager, History Server, and Node Managers have unique host names.
In the PCC, make sure that the Zookeeper Server contains a unique host name.

Add the following properties for SAS for the HDFS configuration to the file hdfs-site.xml:

<property>
<name>dfs.namenode.plugins</name>
<value>com.sas.lasr.hadoop.NameNodeService</value>
</property>
<property>
<name>dfs.datanode.plugins</name>
<value>com.sas.lasr.hadoop.DataNodeService</value>
</property>
<property>
<name>com.sas.lasr.service.allow.put</name>
<value>true</value>
</property>
<property>
<name>com.sas.lasr.hadoop.service.namenode.port</name>
<value>15452</value>
</property>
<property>
<name>com.sas.lasr.hadoop.service.datanode.port</name>
<value>15453</value>
</property>
<property>
<name> dfs.namenode.fs-limits.min-block-size</name>
<value>0</value>
</property>

Save your changes.
Restart your cluster using PCC and verify that HDFS is running in the dashboard.
If you are deploying SAS Visual Analytics, see Hadoop Configuration Step for SAS Visual Analytics.

Last updated: June 19, 2017