Hadoop Installation and Configuration

Hadoop Installation and Configuration Steps

Upgrading from or Reinstalling a Previous Version

Moving the SAS Embedded Process and SAS Hadoop MapReduce JAR File Install Scripts

Creating the SAS Embedded Process Directory

Moving the SAS Embedded Process Install Script

Moving the SAS Hadoop MapReduce JAR File Install Script

Installing the SAS Embedded Process and SAS Hadoop MapReduce JAR Files

Hadoop Installation and Configuration Steps

If you are upgrading from or reinstalling a previous release, follow the instructions in Upgrading from or Reinstalling a Previous Version before installing the in-database deployment package.
Move the SAS Embedded Process and SAS Hadoop MapReduce JAR file install scripts to the Hadoop master node (the NameNode).

For more information, see Moving the SAS Embedded Process and SAS Hadoop MapReduce JAR File Install Scripts.

Note: Both the SAS Embedded Process install script and the SAS Hadoop MapReduce JAR file install script must be transferred to the SASEPHome directory.

Note: The location where you transfer the install scripts becomes the SAS Embedded Process home and is referred to as SASEPHome throughout this chapter.
Install the SAS Embedded Process and the SAS Hadoop MapReduce JAR files.

For more information, see Installing the SAS Embedded Process and SAS Hadoop MapReduce JAR Files.

Note: If you are installing the SAS High-Performance Analytics environment, you must perform additional steps after you install the SAS Embedded Process. For more information, see SAS High-Performance Analytics Infrastructure: Installation and Configuration Guide.

Upgrading from or Reinstalling a Previous Version

To upgrade or reinstall a previous version, follow these steps.

If you are upgrading from SAS 9.3, follow these steps. If you are upgrading from SAS 9.4, start with Step 2.
1. Stop the Hadoop SAS Embedded Process.
  SASEPHome/SAS/SASTKInDatabaseServerForHadoop/9.35/bin/sasep-stop.all.sh
  
  SASEPHome is the master node where you installed the SAS Embedded Process.
2. Delete the Hadoop SAS Embedded Process from all nodes.
  SASEPHome/SAS/SASTKInDatabaseServerForHadoop/9.35/bin/sasep-delete.all.sh
3. Verify that all files named sas.hadoop.ep.distribution-name.jar have been deleted.
  
  The JAR files are located at HadoopHome/lib.
  
  For Cloudera, the JAR files are typically located here:
  /opt/cloudera/parcels/CDH/lib/hadoop/lib
  For Hortonworks, the JAR files are typically located here:
  /usr/lib/hadoop/lib
4. Continue with Step 3.
If you are upgrading from SAS 9.4, follow these steps.
1. Stop the Hadoop SAS Embedded Process.
  SASEPHome/SAS/SASTKInDatabaseServerForHadoop/9.*/bin/sasep-servers.sh -stop -hostfile host-list-filename | -host <">host-list<">
  SASEPHome is the master node where you installed the SAS Embedded Process.
  
  For more information, see SASEP-SERVERS.SH Script.
2. Remove the SAS Embedded Process from all nodes.
  SASEPHome/SAS/SASTKInDatabaseForServerHadoop/9.*/bin/sasep-servers.sh -remove -hostfile host-list-filename | -host <">host-list<"> -mrhome dir
  
  Note: This step ensures that all old SAS Hadoop MapReduce JAR files are removed.
  
  For more information, see SASEP-SERVERS.SH Script.
3. Verify that all files named sas.hadoop.ep.apache*.jar have been deleted.
  
  The JAR files are located at HadoopHome/lib.
  
  For Cloudera, the JAR files are typically located here:
  /opt/cloudera/parcels/CDH/lib/hadoop/lib
  For Hortonworks, the JAR files are typically located here:
  /usr/lib/hadoop/lib
  
  Note: If all the files have not been deleted, then you must delete them. Open-source utilities are available that can delete these files across multiple nodes.
4. Verify that all the SAS Embedded Process directories and files have been deleted on all nodes, except the node from which you are running the script. The sasep-servers.sh -remove script removes the files everywhere except on the node from which you ran the script.
  
  Note: If all the directories and files have not been deleted, then you must delete them. Open-source utilities are available that can delete these directories and files across multiple nodes.
  
  Manually remove the SAS Embedded Process directories and files on the node from which you ran the script.
  
  The sasep-servers.sh -remove script displays instructions that are similar to the following example:
  localhost WARN: Apparently, you are trying to uninstall SAS Embedded Process for Hadoop from the local node. The binary files located at local_node/SAS/SASTKInDatabaseServerForHadoop/local_node/ SAS/SASACCESStoHadoopMapReduceJARFiles will not be removed. localhost WARN: The init script will be removed from /etc/init.d and the SAS Map Reduce JAR files will be removed from /usr/lib/hadoop-mapreduce/lib. localhost WARN: The binary files located at local_node/SAS should be removed manually.
Continue the installation process.

For more information, see Installing the SAS Embedded Process and SAS Hadoop MapReduce JAR Files.

Moving the SAS Embedded Process and SAS Hadoop MapReduce JAR File Install Scripts

Creating the SAS Embedded Process Directory

Before you can install the SAS Embedded Process and the SAS Hadoop MapReduce JAR files, you must move the SAS Embedded Process and SAS Hadoop MapReduce JAR file install scripts to a directory on the Hadoop master node (the NameNode).

Create a new directory that is not part of an existing directory structure, such as /sasep.

This path is created on each node in the Hadoop cluster during the SAS Embedded Process installation. Do not use existing system directories such as

/opt
                        or /usr

. This new directory becomes the SAS Embedded Process home and is referred to as SASEPHome throughout this chapter.

Moving the SAS Embedded Process Install Script

The SAS Embedded Process install script is contained in a self-extracting archive file named tkindbsrv-9.42-n_lax.sh where n is a number that indicates the latest version of the file. If this is the initial installation, n has a value of 1. Each time you reinstall or upgrade, n is incremented by 1. The self-extracting archive file is located in the [SASHome]/SASTKInDatabaseServer/9.4/HadooponLinuxx64 directory.

Using a method of your choice, transfer the SAS Embedded Process install script to your Hadoop master node.

This example uses secure copy, and SASEPHome is the location where you want to install the SAS Embedded Process.

scp tkindbsrv-9.42-n_lax.sh username@hadoop:/SASEPHome

Note: The location where you transfer the install script becomes the SAS Embedded Process home.

Note: Both the SAS Embedded Process install script and the SAS Hadoop MapReduce JAR file install script must be transferred to the SASEPHome directory.

Moving the SAS Hadoop MapReduce JAR File Install Script

The SAS Hadoop MapReduce JAR file install script is contained in a self-extracting archive file named hadoopmrjars-9.42-n_lax.sh where n is a number that indicates the latest version of the file. If this is the initial installation, n has a value of 1. Each time you reinstall or upgrade, n is incremented by 1. The self-extracting archive file is located in the [SASHome]/SASACCESStoHadoopMapReduceJARFiles/9.42 directory.

Using a method of your choice, transfer the SAS Hadoop MapReduce JAR file install script to your Hadoop master node.

This example uses Secure Copy, and SASEPHome is the location where you want to install the SAS Hadoop MapReduce JAR files.

scp hadoopmrjars-9.42-n_lax.sh username@hadoop:/SASEPHome

Note: Both the SAS Embedded Process install script and the SAS Hadoop MapReduce JAR file install script must be transferred to the SASEPHome directory.

Installing the SAS Embedded Process and SAS Hadoop MapReduce JAR Files

To install the SAS Embedded Process, follow these steps.

Note: Permissions are needed to install the SAS Embedded Process and SAS Hadoop MapReduce JAR files. For more information, see Hadoop Permissions.

Log on to the server using SSH as root with sudo access.
```
ssh username@serverhostname
sudo su - root
```
Move to your Hadoop master node where you want the SAS Embedded Process installed.
```
cd /SASEPHome
```
SASEPHome is the same location to which you copied the self-extracting archive file. For more information, see Moving the SAS Embedded Process Install Script.

Note: Before continuing with the next step, ensure that each self-extracting archive file has Execute permission.

Use the following script to unpack the tkindbsrv-9.42-n_lax.sh file.

./tkindbsrv-9.42-n_lax.sh

n is a number that indicates the latest version of the file. If this is the initial installation, n has a value of 1. Each time you reinstall or upgrade, n is incremented by 1.

Note: If you unpack in the wrong directory, you can move it after the unpack.

After this script is run and the files are unpacked, the script creates the following directory structure where SASEPHome is the master node from Step 1.

SASEPHome/SAS/SASTKInDatabaseServerForHadoop/9.42-1/bin
SASEPHome/SAS/SASTKInDatabaseServerForHadoop/9.42-1/misc
SASEPHome/SAS/SASTKInDatabaseServerForHadoop/9.42-1/sasexe
SASEPHome/SAS/SASTKInDatabaseServerForHadoop/9.42-1/utilities
SASEPHome/SAS/SASTKInDatabaseServerForHadoop/9.42-1/build

The content of the SASEPHome/SAS/SASTKInDatabaseServerForHadoop/9.42-1/bin directory should look similar to this.

SASEPHome/SAS/SASTKInDatabaseServerForHadoop/9.42-1/bin/sas.ep4hadoop.template
SASEPHome/SAS/SASTKInDatabaseServerForHadoop/9.42-1/bin/sasep-servers.sh
SASEPHome/SAS/SASTKInDatabaseServerForHadoop/9.42-1/bin/sasep-common.sh
SASEPHome/SAS/SASTKInDatabaseServerForHadoop/9.42-1/bin/sasep-server-start.sh
SASEPHome/SAS/SASTKInDatabaseServerForHadoop/9.42-1/bin/sasep-server-status.sh
SASEPHome/SAS/SASTKInDatabaseServerForHadoop/9.42-1/bin/sasep-server-stop.sh
SASEPHome/SAS/SASTKInDatabaseServerForHadoop/9.42-1/bin/InstallTKIndbsrv.sh
SASEPHome/SAS/SASTKInDatabaseServerForHadoop/9.42-1/bin/MANIFEST.MF
SASEPHome/SAS/SASTKInDatabaseServerForHadoop/9.42-1/bin/qkbpush.sh
SASEPHome/SAS/SASTKInDatabaseServerForHadoop/9.42-1/bin/sas.tools.qkb.hadoop.jar

Use this command to unpack the SAS Hadoop MapReduce JAR files.

./hadoopmrjars-9.42-1_lax.sh

After the script is run, the script creates the following directory and unpacks these files to that directory.

SASEPHome/SAS/SASACCESStoHadoopMapReduceJARFiles/9.42-1/lib/ep-config.xml
SASEPHome/SAS/SASACCESStoHadoopMapReduceJARFiles/9.42-1/lib/
   sas.hadoop.ep.apache023.jar
SASEPHome/SAS/SASACCESStoHadoopMapReduceJARFiles/9.42-1/lib/
    sas.hadoop.ep.apache023.nls.jar
SASEPHome/SAS/SASACCESStoHadoopMapReduceJARFiles/9.42-1/lib/
   sas.hadoop.ep.apache121.jar
SASEPHome/SAS/SASACCESStoHadoopMapReduceJARFiles/9.42-1/lib/
   sas.hadoop.ep.apache121.nls.jar
SASEPHome/SAS/SASACCESStoHadoopMapReduceJARFiles/9.42-1/lib/
   sas.hadoop.ep.apache205.jar
SASEPHome/SAS/SASACCESStoHadoopMapReduceJARFiles/9.42-1/lib/
   sas.hadoop.ep.apache205.nls.jar

Use the sasep-servers.sh script with the -add option to deploy the SAS Embedded Process installation across all nodes. The SAS Embedded Process is installed as a Linux service.

Note: If you are running on a cluster with Kerberos, complete both steps a and b. If you are not running with Kerberos, complete only step b.
1. If you are running on a cluster with Kerberos, you must kinit the HDFS user.
  sudo su - root su - hdfs | hdfs-userid kinit -kt location of keytab file user for which you are requesting a ticket exit
  
  Here is an example:
  sudo su - root su - hdfs kinit -kt hdfs.keytab hdfs exit
  
  Note: The default HDFS user is hdfs. You can specify a different user ID with the -hdfsuser argument when you run the sasep-servers.sh -add command.
  
  Note: If you are running on a cluster with Kerberos, a keytab is required when running the sasep-servers.sh -add command.
  
  Note: You can run klist while you are running as an HDFS user to check the status of your Kerberos ticket on the server. Here is an example:
  klist Ticket cache: FILE/tmp/krb5cc_493 Default principal: hdfs@HOST.COMPANY.COM Valid starting Expires Service principal 06/20/14 09:51:26 06/27/14 09:51:26 krbtgt/HOST.COMPANY.COM@HOST.COMPANY.COM renew until 06/22/14 09:51:26
2. Run the sasep-servers.sh script. Review all of the information in this step before running the script.
  cd SASEPHOME/SAS/SASTKInDatabaseServerForHadoop/9.42-1/bin ./sasep-servers.sh -add
  
  Tip
  There are many options available when installing the SAS Embedded Process. We recommend that you review the script syntax before running it. For more information, see SASEP-SERVERS.SH Script.
During the install process, the script asks whether you want to start the SAS Embedded Process. If you choose Y or y, the SAS Embedded Process is started on all nodes after the install is complete. If you choose N or n, you can start the SAS Embedded Process later by running the ./sasep-servers.sh -start command.

Note: When you enter the sasep-servers.sh -add command, a user and group named sasep is created. You can specify a different user and group name with the -epuser and -epgroup arguments when you enter the sasep-servers.sh -add command.

Note: The sasep-servers.sh script can be run from any location. You can also add its location to the PATH environment variable.

Tip
Although you can install the SAS Embedded Process in multiple locations, the best practice is to install only one instance. Only one version of the SASEP JAR files is installed in your HadoopHome/lib directory.

Note: The SAS Embedded Process must be installed on all nodes capable of executing MapReduce 2 tasks. For MapReduce 2, this would be nodes where a NodeManager is running. Usually, every DataNode node has a YARN NodeManager running. By default, the SAS Embedded Process install script (sasep-servers.sh) discovers the cluster topology and installs the SAS Embedded Process on all DataNode nodes, including the host node from where you run the script (the Hadoop master NameNode). This occurs even if a DataNode is not present. If you want to limit the list of nodes on which you want the SAS Embedded Process installed, run the sasep-servers.sh script with the -host <hosts> option.

Note: If you install the SAS Embedded Process on a large cluster, the SSHD daemon might reach the maximum number of concurrent connections. The ssh_exchange_identification: Connection closed by remote host SSHD error might occur. To work around the problem, edit the /etc/ssh/sshd_config file, change the MaxStartups option to the number that accommodates your cluster, and save the file. Then, reload the SSHD daemon by running the /etc/init.d/sshd reload command.
Verify that the SAS Embedded Process is installed and running. Change directories and then run the sasep-servers.sh script with the -status option.
```
cd SASEPHOME/SAS/SASTKInDatabaseServerForHadoop/9.42-1/bin
./sasep-servers.sh -status
```
This command returns the status of the SAS Embedded Process running on each node of the Hadoop cluster. Verify that the SAS Embedded Process home directory is correct on all the nodes.

Note: The sasep-servers.sh -status command cannot run successfully if the SAS Embedded Process is not installed.
Verify that the sas.hadoop.ep.apache*.jar files are now in place on all nodes.

The JAR files are located at HadoopHome/lib.
For Cloudera, the JAR files are typically located here:
```
/opt/cloudera/parcels/CDH/lib/hadoop/lib
```
For Hortonworks, the JAR files are typically located here:
```
/usr/lib/hadoop/lib
```
Restart the Hadoop YARN or MapReduce service.

This enables the cluster to reload the SAS Hadoop JAR files (sas.hadoop.ep.*.jar).

Note: It is preferable to restart the service by using Cloudera Manager or Hortonworks Ambari.
Verify that an init.d service with a sas.ep4hadoop file was created in the following directory.
```
/etc/init.d/sas.ep4hadoop
```
View the sas.ep4hadoop file and verify that the SAS Embedded Process home directory is correct.

The init.d service is configured to start at level 3 and level 5.

Note: The SAS Embedded Process needs to run on all nodes in the Hadoop cluster.
Verify that configuration files were written to the HDFS file system.
```
hadoop fs -ls /sas/ep/config
```
Note: If you are running on a cluster with Kerberos, you need a Kerberos ticket. If not, you can use the WebHDFS browser.

Note: The /sas/ep/config directory is created automatically when you run the install script.