Hadoop Installation and Configuration

Hadoop Installation and Configuration Steps

  1. If you are upgrading from or reinstalling a previous release, follow the instructions in Upgrading from or Reinstalling a Previous Version before installing the in-database deployment package.
  2. Move the SAS Embedded Process and SAS Hadoop MapReduce JAR file install scripts to the Hadoop master node (the NameNode).
    Note: Both the SAS Embedded Process install script and the SAS Hadoop MapReduce JAR file install script must be transferred to the SASEPHome directory.
    Note: The location where you transfer the install scripts becomes the SAS Embedded Process home and is referred to as SASEPHome throughout this chapter.
  3. Install the SAS Embedded Process and the SAS Hadoop MapReduce JAR files.
Note: If you are installing the SAS High-Performance Analytics environment, you must perform additional steps after you install the SAS Embedded Process. For more information, see SAS High-Performance Analytics Infrastructure: Installation and Configuration Guide.

Upgrading from or Reinstalling a Previous Version

To upgrade or reinstall a previous version, follow these steps.
  1. If you are upgrading from SAS 9.3, follow these steps. If you are upgrading from SAS 9.4, start with Step 2.
    1. Stop the Hadoop SAS Embedded Process.
      SASEPHome/SAS/SASTKInDatabaseServerForHadoop/9.35/bin/sasep-stop.all.sh
      SASEPHome is the master node where you installed the SAS Embedded Process.
    2. Delete the Hadoop SAS Embedded Process from all nodes.
      SASEPHome/SAS/SASTKInDatabaseServerForHadoop/9.35/bin/sasep-delete.all.sh
    3. Verify that all files named sas.hadoop.ep.distribution-name.jar have been deleted.
      The JAR files are located at HadoopHome/lib.
      For Cloudera, the JAR files are typically located here:
      /opt/cloudera/parcels/CDH/lib/hadoop/lib
      For Hortonworks, the JAR files are typically located here:
      /usr/lib/hadoop/lib
    4. Continue with Step 3.
  2. If you are upgrading from SAS 9.4, follow these steps.
    1. Stop the Hadoop SAS Embedded Process.
      SASEPHome/SAS/SASTKInDatabaseServerForHadoop/9.*/bin/sasep-servers.sh
         -stop -hostfile host-list-filename | -host <">host-list<">
      SASEPHome is the master node where you installed the SAS Embedded Process.
      For more information, see SASEP-SERVERS.SH Script.
    2. Remove the SAS Embedded Process from all nodes.
      SASEPHome/SAS/SASTKInDatabaseForServerHadoop/9.*/bin/sasep-servers.sh
         -remove -hostfile host-list-filename | -host <">host-list<">
         -mrhome dir
      Note: This step ensures that all old SAS Hadoop MapReduce JAR files are removed.
      For more information, see SASEP-SERVERS.SH Script.
    3. Verify that all files named sas.hadoop.ep.apache*.jar have been deleted.
      The JAR files are located at HadoopHome/lib.
      For Cloudera, the JAR files are typically located here:
      /opt/cloudera/parcels/CDH/lib/hadoop/lib
      For Hortonworks, the JAR files are typically located here:
      /usr/lib/hadoop/lib
      Note: If all the files have not been deleted, then you must delete them. Open-source utilities are available that can delete these files across multiple nodes.
    4. Verify that all the SAS Embedded Process directories and files have been deleted on all nodes, except the node from which you are running the script. The sasep-servers.sh -remove script removes the files everywhere except on the node from which you ran the script.
      Note: If all the directories and files have not been deleted, then you must delete them. Open-source utilities are available that can delete these directories and files across multiple nodes.
      Manually remove the SAS Embedded Process directories and files on the node from which you ran the script.
      The sasep-servers.sh -remove script displays instructions that are similar to the following example:
      localhost WARN: Apparently, you are trying to uninstall SAS Embedded Process 
      for Hadoop from the local node.
      The binary files located at 
      local_node/SAS/SASTKInDatabaseServerForHadoop/local_node/
      SAS/SASACCESStoHadoopMapReduceJARFiles will not be removed.
      localhost WARN: The init script will be removed from /etc/init.d and the 
      SAS Map Reduce JAR files will be removed from /usr/lib/hadoop-mapreduce/lib.
      localhost WARN: The binary files located at local_node/SAS 
      should be removed manually.
  3. Continue the installation process.

Moving the SAS Embedded Process and SAS Hadoop MapReduce JAR File Install Scripts

Creating the SAS Embedded Process Directory

Before you can install the SAS Embedded Process and the SAS Hadoop MapReduce JAR files, you must move the SAS Embedded Process and SAS Hadoop MapReduce JAR file install scripts to a directory on the Hadoop master node (the NameNode).
Create a new directory that is not part of an existing directory structure, such as /sasep.
This path is created on each node in the Hadoop cluster during the SAS Embedded Process installation. Do not use existing system directories such as /opt or /usr. This new directory becomes the SAS Embedded Process home and is referred to as SASEPHome throughout this chapter.

Moving the SAS Embedded Process Install Script

The SAS Embedded Process install script is contained in a self-extracting archive file named tkindbsrv-9.42-n_lax.sh where n is a number that indicates the latest version of the file. If this is the initial installation, n has a value of 1. Each time you reinstall or upgrade, n is incremented by 1. The self-extracting archive file is located in the [SASHome]/SASTKInDatabaseServer/9.4/HadooponLinuxx64 directory.
Using a method of your choice, transfer the SAS Embedded Process install script to your Hadoop master node.
This example uses secure copy, and SASEPHome is the location where you want to install the SAS Embedded Process.
scp tkindbsrv-9.42-n_lax.sh username@hadoop:/SASEPHome
Note: The location where you transfer the install script becomes the SAS Embedded Process home.
Note: Both the SAS Embedded Process install script and the SAS Hadoop MapReduce JAR file install script must be transferred to the SASEPHome directory.

Moving the SAS Hadoop MapReduce JAR File Install Script

The SAS Hadoop MapReduce JAR file install script is contained in a self-extracting archive file named hadoopmrjars-9.42-n_lax.sh where n is a number that indicates the latest version of the file. If this is the initial installation, n has a value of 1. Each time you reinstall or upgrade, n is incremented by 1. The self-extracting archive file is located in the [SASHome]/SASACCESStoHadoopMapReduceJARFiles/9.42 directory.
Using a method of your choice, transfer the SAS Hadoop MapReduce JAR file install script to your Hadoop master node.
This example uses Secure Copy, and SASEPHome is the location where you want to install the SAS Hadoop MapReduce JAR files.
scp hadoopmrjars-9.42-n_lax.sh username@hadoop:/SASEPHome
Note: Both the SAS Embedded Process install script and the SAS Hadoop MapReduce JAR file install script must be transferred to the SASEPHome directory.

Installing the SAS Embedded Process and SAS Hadoop MapReduce JAR Files

To install the SAS Embedded Process, follow these steps.
Note: Permissions are needed to install the SAS Embedded Process and SAS Hadoop MapReduce JAR files. For more information, see Hadoop Permissions.
  1. Log on to the server using SSH as root with sudo access.
    ssh username@serverhostname
    sudo su - root
  2. Move to your Hadoop master node where you want the SAS Embedded Process installed.
    cd /SASEPHome
    SASEPHome is the same location to which you copied the self-extracting archive file. For more information, see Moving the SAS Embedded Process Install Script.
    Note: Before continuing with the next step, ensure that each self-extracting archive file has Execute permission.
  3. Use the following script to unpack the tkindbsrv-9.42-n_lax.sh file.
    ./tkindbsrv-9.42-n_lax.sh
    
    n is a number that indicates the latest version of the file. If this is the initial installation, n has a value of 1. Each time you reinstall or upgrade, n is incremented by 1.
    Note: If you unpack in the wrong directory, you can move it after the unpack.
    After this script is run and the files are unpacked, the script creates the following directory structure where SASEPHome is the master node from Step 1.
    SASEPHome/SAS/SASTKInDatabaseServerForHadoop/9.42-1/bin
    SASEPHome/SAS/SASTKInDatabaseServerForHadoop/9.42-1/misc
    SASEPHome/SAS/SASTKInDatabaseServerForHadoop/9.42-1/sasexe
    SASEPHome/SAS/SASTKInDatabaseServerForHadoop/9.42-1/utilities
    SASEPHome/SAS/SASTKInDatabaseServerForHadoop/9.42-1/build
    The content of the SASEPHome/SAS/SASTKInDatabaseServerForHadoop/9.42-1/bin directory should look similar to this.
    SASEPHome/SAS/SASTKInDatabaseServerForHadoop/9.42-1/bin/sas.ep4hadoop.template
    SASEPHome/SAS/SASTKInDatabaseServerForHadoop/9.42-1/bin/sasep-servers.sh
    SASEPHome/SAS/SASTKInDatabaseServerForHadoop/9.42-1/bin/sasep-common.sh
    SASEPHome/SAS/SASTKInDatabaseServerForHadoop/9.42-1/bin/sasep-server-start.sh
    SASEPHome/SAS/SASTKInDatabaseServerForHadoop/9.42-1/bin/sasep-server-status.sh
    SASEPHome/SAS/SASTKInDatabaseServerForHadoop/9.42-1/bin/sasep-server-stop.sh
    SASEPHome/SAS/SASTKInDatabaseServerForHadoop/9.42-1/bin/InstallTKIndbsrv.sh
    SASEPHome/SAS/SASTKInDatabaseServerForHadoop/9.42-1/bin/MANIFEST.MF
    SASEPHome/SAS/SASTKInDatabaseServerForHadoop/9.42-1/bin/qkbpush.sh
    SASEPHome/SAS/SASTKInDatabaseServerForHadoop/9.42-1/bin/sas.tools.qkb.hadoop.jar
  4. Use this command to unpack the SAS Hadoop MapReduce JAR files.
    ./hadoopmrjars-9.42-1_lax.sh
    After the script is run, the script creates the following directory and unpacks these files to that directory.
    SASEPHome/SAS/SASACCESStoHadoopMapReduceJARFiles/9.42-1/lib/ep-config.xml
    SASEPHome/SAS/SASACCESStoHadoopMapReduceJARFiles/9.42-1/lib/
       sas.hadoop.ep.apache023.jar
    SASEPHome/SAS/SASACCESStoHadoopMapReduceJARFiles/9.42-1/lib/
        sas.hadoop.ep.apache023.nls.jar
    SASEPHome/SAS/SASACCESStoHadoopMapReduceJARFiles/9.42-1/lib/
       sas.hadoop.ep.apache121.jar
    SASEPHome/SAS/SASACCESStoHadoopMapReduceJARFiles/9.42-1/lib/
       sas.hadoop.ep.apache121.nls.jar
    SASEPHome/SAS/SASACCESStoHadoopMapReduceJARFiles/9.42-1/lib/
       sas.hadoop.ep.apache205.jar
    SASEPHome/SAS/SASACCESStoHadoopMapReduceJARFiles/9.42-1/lib/
       sas.hadoop.ep.apache205.nls.jar
  5. Use the sasep-servers.sh script with the -add option to deploy the SAS Embedded Process installation across all nodes. The SAS Embedded Process is installed as a Linux service.
    Note: If you are running on a cluster with Kerberos, complete both steps a and b. If you are not running with Kerberos, complete only step b.
    1. If you are running on a cluster with Kerberos, you must kinit the HDFS user.
      sudo su - root
      su - hdfs | hdfs-userid
      kinit -kt location of keytab file
         user for which you are requesting a ticket
      exit
      Here is an example:
      sudo su - root
      su - hdfs
      kinit -kt hdfs.keytab hdfs
      exit
      Note: The default HDFS user is hdfs. You can specify a different user ID with the -hdfsuser argument when you run the sasep-servers.sh -add command.
      Note: If you are running on a cluster with Kerberos, a keytab is required when running the sasep-servers.sh -add command.
      Note: You can run klist while you are running as an HDFS user to check the status of your Kerberos ticket on the server. Here is an example:
      klist
      Ticket cache: FILE/tmp/krb5cc_493
      Default principal: hdfs@HOST.COMPANY.COM
      
      Valid starting    Expires           Service principal
      06/20/14 09:51:26 06/27/14 09:51:26 krbtgt/HOST.COMPANY.COM@HOST.COMPANY.COM
           renew until 06/22/14 09:51:26
    2. Run the sasep-servers.sh script. Review all of the information in this step before running the script.
      cd SASEPHOME/SAS/SASTKInDatabaseServerForHadoop/9.42-1/bin
      ./sasep-servers.sh -add
      Tip
      There are many options available when installing the SAS Embedded Process. We recommend that you review the script syntax before running it. For more information, see SASEP-SERVERS.SH Script.
    During the install process, the script asks whether you want to start the SAS Embedded Process. If you choose Y or y, the SAS Embedded Process is started on all nodes after the install is complete. If you choose N or n, you can start the SAS Embedded Process later by running the ./sasep-servers.sh -start command.
    Note: When you enter the sasep-servers.sh -add command, a user and group named sasep is created. You can specify a different user and group name with the -epuser and -epgroup arguments when you enter the sasep-servers.sh -add command.
    Note: The sasep-servers.sh script can be run from any location. You can also add its location to the PATH environment variable.
    Tip
    Although you can install the SAS Embedded Process in multiple locations, the best practice is to install only one instance. Only one version of the SASEP JAR files is installed in your HadoopHome/lib directory.
    Note: The SAS Embedded Process must be installed on all nodes capable of executing MapReduce 2 tasks. For MapReduce 2, this would be nodes where a NodeManager is running. Usually, every DataNode node has a YARN NodeManager running. By default, the SAS Embedded Process install script (sasep-servers.sh) discovers the cluster topology and installs the SAS Embedded Process on all DataNode nodes, including the host node from where you run the script (the Hadoop master NameNode). This occurs even if a DataNode is not present. If you want to limit the list of nodes on which you want the SAS Embedded Process installed, run the sasep-servers.sh script with the -host <hosts> option.
    Note: If you install the SAS Embedded Process on a large cluster, the SSHD daemon might reach the maximum number of concurrent connections. The ssh_exchange_identification: Connection closed by remote host SSHD error might occur. To work around the problem, edit the /etc/ssh/sshd_config file, change the MaxStartups option to the number that accommodates your cluster, and save the file. Then, reload the SSHD daemon by running the /etc/init.d/sshd reload command.
  6. Verify that the SAS Embedded Process is installed and running. Change directories and then run the sasep-servers.sh script with the -status option.
    cd SASEPHOME/SAS/SASTKInDatabaseServerForHadoop/9.42-1/bin
    ./sasep-servers.sh -status
    This command returns the status of the SAS Embedded Process running on each node of the Hadoop cluster. Verify that the SAS Embedded Process home directory is correct on all the nodes.
    Note: The sasep-servers.sh -status command cannot run successfully if the SAS Embedded Process is not installed.
  7. Verify that the sas.hadoop.ep.apache*.jar files are now in place on all nodes.
    The JAR files are located at HadoopHome/lib.
    For Cloudera, the JAR files are typically located here:
    /opt/cloudera/parcels/CDH/lib/hadoop/lib
    For Hortonworks, the JAR files are typically located here:
    /usr/lib/hadoop/lib
  8. Restart the Hadoop YARN or MapReduce service.
    This enables the cluster to reload the SAS Hadoop JAR files (sas.hadoop.ep.*.jar).
    Note: It is preferable to restart the service by using Cloudera Manager or Hortonworks Ambari.
  9. Verify that an init.d service with a sas.ep4hadoop file was created in the following directory.
    /etc/init.d/sas.ep4hadoop
    View the sas.ep4hadoop file and verify that the SAS Embedded Process home directory is correct.
    The init.d service is configured to start at level 3 and level 5.
    Note: The SAS Embedded Process needs to run on all nodes in the Hadoop cluster.
  10. Verify that configuration files were written to the HDFS file system.
    hadoop fs -ls /sas/ep/config
    Note: If you are running on a cluster with Kerberos, you need a Kerberos ticket. If not, you can use the WebHDFS browser.
    Note: The /sas/ep/config directory is created automatically when you run the install script.