Updating Co-located Hadoop

Overview of Updating Co-located Hadoop

Follow the instructions appropriate for your co-located Hadoop distribution:

Updating Co-located Distributions of Hadoop

This topic affects the following supported Hadoop distributions that are co-located with the SAS High-Performance Analytics environment:
  • Apache Hadoop (version 2.7 and later)
  • Cloudera Hadoop
  • Hortonworks Data Platform Hadoop
  • IBM BigInsights Hadoop
  • MapR Hadoop
  • Pivotal HD Hadoop
Note: If you are using SAS High-Performance Deployment for Hadoop, see Updating SAS High-Performance Deployment for Hadoop (Prior to Version 3.0).
Before upgrading your Hadoop distribution, remove or disable these properties to prevent HDFS start-up failures:
  • dfs.namenode.plugins=com.sas.lasr.hadoop.NameNodeService
  • dfs.datanode.plugins=com.sas.lasr.hadoop.DataNodeService
After you have upgraded your co-located Hadoop, redo the steps in Modifying Co-Located Hadoop appropriate for your Hadoop distribution. Performing these steps ensure that the SAS executable file and SAS JAR files are placed into the correct upgraded Hadoop paths and the configuration properties are in place to enable them.

Updating SAS High-Performance Deployment for Hadoop (Prior to Version 3.0)

Which Version of SAS High-Performance Deployment for Hadoop Have I Deployed?

SAS has discontinued SAS High-Performance Deployment for Hadoop.
If you are running a version of SAS High-Performance Deployment for Hadoop prior to version 3.0, then you must update your Hadoop distribution with files from the SAS Plug-ins for Hadoop package.
To verify the version of SAS High-Performance Deployment for Hadoop that you are running, run the following command:
more $HADOOP_HOME/SAS_VERSION
If the output matches the following, then you do not have to upgrade:
SAS Hadoop Extensions Version 3.0

Overview of Updating SAS High-Performance Deployment for Hadoop

Updating SAS High-Performance Deployment for Hadoop prior to version 3.0 consists of the following steps:
  1. Follow the steps in Update SAS High-Performance Deployment for Hadoop, which are as follows:
    1. Copy the SAS Plug-ins for Hadoop package to a temporary location and untar it.
    2. Propagate several files from the package to specific Hadoop directories on every machine in your cluster.
    3. Restart the HDFS service and any dependencies.

Preparing to Update SAS High-Performance Deployment for Hadoop

Before starting the SAS High-Performance Deployment for Hadoop update, perform the following steps:
  1. If one does not already exist, create a SAS Software Depot that contains the installation software that you will use to update Hadoop.
    For more information, see Creating a SAS Software Depot in SAS Intelligence Platform: Installation and Configuration Guide.
  2. Log on to the Hadoop NameNode as the hdfs user.
  3. Run the following command to make sure that the Hadoop file system is healthy: hadoop fsck /
    Correct any issues before proceeding.
  4. Stop any other processes, such as YARN, running on the Hadoop cluster.
    Confirm that all processes have stopped across all the cluster machines. (You might have to become another user to have the necessarily privileges to stop all processes.)
  5. As the hdfs user, run the command $HADOOP_HOME/sbin/stop-dfs.sh to stop HDFS daemons, and confirm that all processes have ceased on all the machines in the cluster.
    Tip
    Check that there are no Java processes owned by hadoop running on any machine: ps –ef | grep hadoop. If you find any Java processes owned by the hdfs user account, terminate them. You can issue a single simsh command to simultaneously check all the machines in the cluster: /HPA-environment-installation-directory/bin/simsh ps –ef | grep hdfs.
  6. Back up the Hadoop name directory (hadoop-name by default).
    Perform a file system backup using tar (or whatever tool or process that your site uses for backups).

Update SAS High-Performance Deployment for Hadoop

To update versions prior to 3.0 of SAS High-Performance Deployment for Hadoop, follow these steps:
  1. If your version of SAS High-Performance Deployment for Hadoop is version 3.0, then you do not have to update Hadoop. For more information, see Which Version of SAS High-Performance Deployment for Hadoop Have I Deployed?.
  2. Make sure that you have completed all the steps in, Preparing to Update SAS High-Performance Deployment for Hadoop.
  3. Log on to the Hadoop NameNode machine (blade 0) as root.
    The software that is needed for SAS Plug-ins for Hadoop is available from within the SAS Software Depot that was created by your site’s depot administrator:
    depot-installation-location/standalone_installs/SAS_Plug-ins_for_Hadoop/1_0/Linux_for_x64/hdatplugins.tar.gz
  4. Copy the hdatplugins.tar.gz file to a temporary location and extract it:
    cp hdatplugins.tar.gz /tmp
    cd /tmp
    tar xzf hdatplugins.tar.gz
    A directory that is named hdatplugins is created.
  5. Propagate the following three JAR files in hdatplugins to the $HADOOP_HOME/share/hadoop/sas directory on each machine in the Hadoop cluster:
    • sas.lasr.jar
    • sas.lasr.hadoop.jar
    • sas.grid.provider.yarn.jar
    CAUTION:
    If your distribution of SAS High-Performance Deployment for Hadoop is based on Apache Hadoop version 0.23, then you must propagate these JAR files to $HADOOP_HOME/share/hadoop/hdfs/lib instead.
    Tip
    If you have already installed the SAS High-Performance Computing Management Console or the SAS High-Performance Analytics environment, you can issue a single simcp command to propagate JAR files across all machines in the cluster. For example:
    /opt/TKGrid/bin/simcp /tmp/hdatplugins/sas.lasr.jar
         $HADOOP_HOME/share/hadoop/sas
         /opt/TKGrid/bin/simcp /tmp/hdatplugins/sas.lasr.hadoop.jar
         $HADOOP_HOME/share/hadoop/sas
         /opt/TKGrid/bin/simcp /tmp/hdatplugins/sas.grid.provider.yarn.jar
         $HADOOP_HOME/share/hadoop/sas
    For more information, see Simultaneous Utilities Commands.
  6. Propagate saslasrfd in hdatplugins to the $HADOOP_HOME/share/hadoop/sas/bin directory on each machine in the Hadoop cluster.
  7. Propagate SAS_VERSION in hdatplugins to the $HADOOP_HOME/share/hadoop/sas directory on each machine in the Hadoop cluster.
  8. Restart Hadoop by entering the following command:
    $HADOOP_HOME/sbin/start-dfs.sh
  9. If you are deploying SAS Visual Analytics, see Hadoop Configuration Step for SAS Visual Analytics.
Last updated: June 19, 2017