Modify the Existing Cloudera Hadoop Cluster, Version 5

If you want to co-locate the SAS High-Performance Analytics environment with a pre-existing Cloudera 5 Hadoop (CDH) cluster, you can modify your cluster with files from the SAS Plug-ins for Hadoop package. CDH modified with SAS Plug-ins for Hadoop enables the SAS High-Performance Analytics environment to write SASHDAT file blocks evenly across the HDFS file system.
  1. Untar the SAS Plug-ins for Hadoop tarball, and propagate five files (identified in the following steps) on every machine in your CDH cluster:
    1. Navigate to the SAS Plug-ins for Hadoop tarball in your SAS Software depot:
      cd depot-installation-location/standalone_installs/
      SAS_Plug-ins_for_Hadoop/1_0/Linux_for_x64/hdatplugins.tar.gz
    2. Copy hdatplugins.tar.gz to a temporary location where you have Write access.
    3. Untar hdatplugins.tar.gz:
      tar xzf hdatplugins.tar.gz
      The hdatplugins directory is created.
    4. Propagate the following three JAR files in hdatplugins into the CDH library path on every machine in the CDH cluster:
      • sas.lasr.jar
      • sas.lasr.hadoop.jar
      • sas.grid.provider.yarn.jar
      Tip
      If you have already installed the SAS High-Performance Computing Management Console or the SAS High-Performance Analytics environment, you can issue a single simcp command to propagate JAR files across all machines in the cluster. For example:
           /opt/TKGrid/bin/simcp /tmp/hdatplugins/sas.lasr.jar
           /opt/cloudera/parcels/CDH-5.0.0-0.cdh5b1.p0.57/lib/hadoop/lib/
           /opt/TKGrid/bin/simcp /tmp/hdatplugins/sas.lasr.hadoop.jar
           /opt/cloudera/parcels/CDH-5.0.0-0.cdh5b1.p0.57/lib/hadoop/lib/
           /opt/TKGrid/bin/simcp /tmp/hdatplugins/sas.grid.provider.yarn.jar
           /opt/cloudera/parcels/CDH-5.0.0-0.cdh5b1.p0.57/lib/hadoop/lib/
      For more information, see Simultaneous Utilities Commands.
    5. Propagate saslasrfd in hdatplugins into the CDH bin directory on every machine in the CDH cluster. For example:
      /opt/TKGrid/bin/simcp saslasrfd
      /opt/cloudera/parcels/CDH-5.0.0-0.cdh5b1.p0.57/lib/hadoop/bin/
    6. Propagate SAS_VERSION in hdatplugins to the $HADOOP_HOME directory on each machine in the CDH cluster.
  2. Log on to the Cloudera Manager as an administrator.
  3. In dfs.namenode.plugins, add the following to the plug-in configuration for the NameNode:
    com.sas.lasr.hadoop.NameNodeService
  4. In dfs.datanode.plugins, add the following to the plug-in configuration for DataNodes:
    com.sas.lasr.hadoop.DataNodeService
  5. Add the following lines to the advanced configuration for service-wide. These lines are placed in the HDFS Service Advanced Configuration Snippet (Safety Valve) for hdfs-site.xml:
    <property>
    <name>com.sas.lasr.service.allow.put</name>
    <value>true</value>
    </property>
    <property>
    <name>com.sas.lasr.hadoop.service.namenode.port</name>
    <value>15452</value>
    </property>
    <property>
    <name>com.sas.lasr.hadoop.service.datanode.port</name>
    <value>15453</value>
    </property>
    <property>
    <name> dfs.namenode.fs-limits.min-block-size</name>
    <value>0</value>
    </property>
    
  6. Add the following property to the HDFS Client Advanced Configuration Snippet (Safety Valve) for hdfs-site.xml under Advanced within the Gateway Default Group. Make sure that you change path-to-data-dir to the data directory location for your site (for example, <value>file:///dfs/dn</value>):
    <property>
    <name>com.sas.lasr.service.allow.put</name>
    <value>true</value>
    </property>
    <property>
    <name>com.sas.lasr.hadoop.service.namenode.port</name>
    <value>15452</value>
    </property>
    <property>
    <name>com.sas.lasr.hadoop.service.datanode.port</name>
    <value>15453</value>
    </property>
    <property>
    <name> dfs.namenode.fs-limits.min-block-size</name>
    <value>0</value>
    </property>
    
  7. Add the location of JAVA_HOME to the HDFS Client Environment Advanced Configuration Snippet for hadoop-env.sh (Safety Valve), located under Advanced in the Gateway Default Group. For example:
    JAVA_HOME=/usr/lib/java/jdk1.7.0_07
    Note: When Cloudera Manager prioritizes the HDFS client configuration, the client safety valve is used. When Cloudera Manager prioritizes anything else (such as YARN), the service safety valve is used. Therefore, updating both safety values is the best practice. For more information, see Cloudera documentation.
  8. Save your changes and deploy the client configuration to each host in the cluster.
  9. Restart the HDFS service and any dependencies in Cloudera Manager.
  10. Create and mode the /test directory in HDFS for testing the cluster with SAS test jobs. You might need to set HADOOP_HOME first, and you must run the following commands as the user running HDFS (typically, hdfs).
    $HADOOP_HOME/bin/hadoop fs -mkdir /test
    
    $HADOOP_HOME/bin/hadoop fs -chmod 777 /test
  11. If you are deploying SAS Visual Analytics, see Hadoop Configuration Step for SAS Visual Analytics.
Last updated: June 19, 2017