Modify the Existing Cloudera Hadoop Cluster, Version 4

If you want to co-locate the SAS High-Performance Analytics environment with a pre-existing Cloudera 4 Hadoop (CDH) cluster, you can modify your cluster with files from the SAS Plug-ins for Hadoop package. CDH modified with SAS Plug-ins for Hadoop enables the SAS High-Performance Analytics environment to write SASHDAT file blocks evenly across the HDFS file system.
Tip
In Cloudera 4.2 and earlier, you must install the enterprise license, even if you are below the stated limit of 50 nodes in the Hadoop cluster for requiring a license.
  1. Untar the SAS Plug-ins for Hadoop tarball, and propagate five files (identified in the following steps) on every machine in your CDH cluster:
    1. Navigate to the SAS Plug-ins for Hadoop tarball in your SAS Software depot:
      cd depot-installation-location/standalone_installs/
      SAS_Plug-ins_for_Hadoop/1_0/Linux_for_x64/
    2. Copy hdatplugins.tar.gz to a temporary location where you have Write access.
    3. Untar hdatplugins.tar.gz:
      tar xzf hdatplugins.tar.gz
      The hdatplugins directory is created.
    4. Propagate the following three JAR files in hdatplugins into the CDH library path on every machine in the CDH cluster:
      • sas.lasr.jar
      • sas.lasr.hadoop.jar
      • sas.grid.provider.yarn.jar
      Tip
      If you have already installed the SAS High-Performance Computing Management Console or the SAS High-Performance Analytics environment, you can issue a single simcp command to propagate JAR files across all machines in the cluster. For example:
           /opt/TKGrid/bin/simcp /tmp/hdatplugins/sas.lasr.jar
           /opt/cloudera/parcels/CDH-4.4.0-1.cdh4.4.0.p0.39/lib/hadoop/lib/
           /opt/TKGrid/bin/simcp /tmp/hdatplugins/sas.lasr.hadoop.jar
           /opt/cloudera/parcels/CDH-4.4.0-1.cdh4.4.0.p0.39/lib/hadoop/lib/
           /opt/TKGrid/bin/simcp /tmp/hdatplugins/sas.grid.provider.yarn.jar
           /opt/cloudera/parcels/CDH-4.4.0-1.cdh4.4.0.p0.39/lib/hadoop/lib/
      For more information, see Simultaneous Utilities Commands.
    5. Propagate saslasrfd in hdatplugins into the CDH bin directory to the CDH cluster on every machine in the CDH cluster. For example:
      /opt/TKGrid/bin/simcp saslasrfd
      /opt/cloudera/parcels/CDH-4.4.0-1.cdh4.4.0.p0.39/lib/hadoop/bin/
  2. Propagate SAS_VERSION in hdatplugins to the $HADOOP_HOME directory on each machine in the CDH cluster.
  3. Log on to the Cloudera Manager as an administrator.
  4. Add the following to the plug-in configuration for the NameNode:
    com.sas.lasr.hadoop.NameNodeService
  5. Add the following to the plug-in configuration for DataNodes:
    com.sas.lasr.hadoop.DataNodeService
  6. Add the following lines to the advanced configuration for service-wide. These lines are placed in the HDFS Service Configuration Safety Valve property for hdfs-site.xml:
    <property>
    <name>com.sas.lasr.service.allow.put</name>
    <value>true</value>
    </property>
    <property>
    <name>com.sas.lasr.hadoop.service.namenode.port</name>
    <value>15452</value>
    </property>
    <property>
    <name>com.sas.lasr.hadoop.service.datanode.port</name>
    <value>15453</value>
    </property>
    <property>
    <name> dfs.namenode.fs-limits.min-block-size</name>
    <value>0</value>
    </property>
    
  7. Restart all Cloudera Manager services.
  8. Add the following to the HDFS Client Configuration Safety Valve:
    <property>
    <name>com.sas.lasr.service.allow.put</name>
    <value>true</value>
    </property>
    <property>
    <name>com.sas.lasr.hadoop.service.namenode.port</name>
    <value>15452</value>
    </property>
    <property>
    <name>com.sas.lasr.hadoop.service.datanode.port</name>
    <value>15453</value>
    </property>
    <property>
    <name> dfs.namenode.fs-limits.min-block-size</name>
    <value>0</value>
    </property>
  9. Add the location of JAVA_HOME to the Client Environment Safety Valve for hadoop-env.sh. For example:
    JAVA_HOME=/usr/lib/java/jdk1.7.0_07
  10. Save your changes and deploy the client configuration to each host in the cluster.
  11. If you are deploying SAS Visual Analytics, see Hadoop Configuration Step for SAS Visual Analytics.
Tip
Remember the value of HADOOP_HOME as the SAS High-Performance Analytics environment prompts for this during its install. By default, these are the values for Cloudera:
  • Cloudera 4.5:
    /opt/cloudera/parcels/CDH-4.5.0-1.cdh4.5.0.p0.30
  • Cloudera 4.2 and earlier:
    /opt/cloudra/parcels/CDH-4.2.0-1.cdh4.2.0.po.10/lib/Hadoop
Last updated: June 19, 2017