Manual Installation

Creating the SAS Data Management Accelerator for Spark Directory

Copying the SAS Data Management Accelerator for Spark Install Script

Installing SAS Data Management Accelerator for Spark

Creating the SAS Data Management Accelerator for Spark Directory

Create a new directory on the Hadoop master node that is not part of an existing directory structure, such as /sasdmp.

This path is created on each node in the Hadoop cluster during the SAS Data Management Accelerator for Spark installation. Do not use existing system directories such as /opt or /usr. This new directory is referred to as DMPInstallDir throughout this section.

Copying the SAS Data Management Accelerator for Spark Install Script

The SAS Data Management Accelerator for Spark install script is contained in a self-extracting archive file named sasdmp_admin.sh. This file is contained in a ZIP file that is located in a directory in your SAS Software Depot.

To copy the ZIP file to the DMPInstallDir on your Hadoop master node, follow these steps:

Navigate to the YourSASDepot/standalone_installs directory.

This directory was created when your SAS Software Depot was created by the SAS Download Manager.
Locate the en_sasexe.zip file. This file is in the following directory: YourSASDepot/standalone_installs/SAS_Data_Management_Accelerator_for_Spark/2_4/Hadoop_on_Linux_x64.

The sasdmp_admin.sh file is included in this ZIP file.
Log on to the cluster using SSH with sudo access.
```
ssh username@serverhostname
sudo su - 
```
Copy the en_sasexe.zip file from the client to the DMPInstallDir on the cluster. The following example uses secure copy:
```
scp en_sasexe.zip username@hdpclus1: /DMPInstallDir
```
Note: The DMPInstallDir location becomes the SAS Data Management Accelerator for Spark home.

Installing SAS Data Management Accelerator for Spark

To install SAS Data Management Accelerator for Spark, follow these steps:

Note: Permissions are required to install SAS Data Management Accelerator for Spark. For more information, see Hadoop Permissions.

Navigate to the location on your Hadoop master node where you copied the en_sasexe.zip file.
```
cd /DMPInstallDir
```
Ensure that both the DMPInstallDir folder and the en_sasexe.zip file have Read, Write, and Execute permissions (chmod 777).
Unzip the en_sasexe.zip file.
```
unzip en_sasexe.zip
```
After the file is unzipped, a sasexe directory is created in the same location as the en_sasexe.zip file. The dmsprkhadp-2.40000-1.sh file is located in the sasexe directory.
```
DMPInstallDir/sasexe/dmsprkhadp-2.40000-1.sh
```
Use the following command to unpack the dmsprkhadp-2.40000-1.sh file.
```
./dmsprkhadp-2.40000-1.sh
```
After this script is run and the files are unpacked, the script creates the following directory structure:
```
DMPInstallDir/sasexe/SASDMPHome
DMPInstallDir/sasexe/dmsprkhadp-2.40000-1.sh
```
Note: During the install process, the dmsprkhadp-2.40000-1.sh is copied to all data nodes. Do not remove or move this file from the DMPInstallDir/sasexe directory.
The SASDMPHome directory structure looks like this.
```
DMPInstallDir/sasexe/SASDMPHome/bin
DMPInstallDir/sasexe/SASDMPHome/dat
DMPInstallDir/sasexe/SASDMPHome/etc
DMPInstallDir/sasexe/SASDMPHome/lib
DMPInstallDir/sasexe/SASDMPHome/share
DMPInstallDir/sasexe/SASDMPHome/var
```
The DMPInstallDir/sasexe/SASDMPHome/bin directory looks like this.
```
DMPInstallDir/sasexe/SASDMPHome/bin/dfwsvc
DMPInstallDir/sasexe/SASDMPHome/bin/dfxver
DMPInstallDir/sasexe/SASDMPHome/bin/dfxver.bin
DMPInstallDir/sasexe/SASDMPHome/bin/sasdmp_admin.sh
DMPInstallDir/sasexe/SASDMPHome/bin/settings.sh
DMPInstallDir/sasexe/SASDMPHome/bin/dmpsvc
```
Use the sasdmp_admin.sh script to deploy the SAS Data Management Accelerator for Spark installation across all nodes.

Tip
Many options are available for installing SAS Data Management Accelerator for Spark. Review the script syntax before running it. For more information, see Overview of the SASDMP_ADMIN.SH Script.

Note: If your cluster is secured with Kerberos, complete both steps a and b. If your cluster is not secured with Kerberos, complete only step b.
1. If your cluster is secured with Kerberos, the HDFS user must have a valid Kerberos ticket to access HDFS. This can be done with kinit.
  sudo su - root su - hdfs | hdfs-userid kinit -kt location of keytab file user for which you are requesting a ticket exit
  
  Note: For all Hadoop distributions except MapR, the default HDFS user is hdfs. For MapR distributions, the default MapR superuser is mapr. You can specify a different user ID with the -hdfsuser argument when you run the bin/sasdmp_admin.sh -add script.
  
  Note: To check the status of your Kerberos ticket on the server, run klist while you are running as the -hdfsuser user. Here is an example:
  klist Ticket cache: FILE/tmp/krb5cc_493 Default principal: hdfs@HOST.COMPANY.COM Valid starting Expires Service principal 06/20/15 09:51:26 06/27/15 09:51:26 krbtgt/HOST.COMPANY.COM@HOST.COMPANY.COM renew until 06/22/15 09:51:26
2. Run the sasdmp_admin.sh script. Review all of the information in this step before running the script.
  cd DMPInstallDir/SASDMPHome/ bin/sasdmp_admin.sh -genconfig bin/sasdmp_admin.sh -add
  
  Tip
  Many options are available when installing SAS Data Management Accelerator for Spark. Review the script syntax before running it. For more information, see Overview of the SASDMP_ADMIN.SH Script.
Note: By default, the SAS Data Management Accelerator for Spark install script (sasdmp_admin.sh) discovers the cluster topology and installs SAS Data Management Accelerator for Spark on all DataNode nodes, including the host node from where you run the script (the Hadoop master NameNode). This occurs even if a DataNode is not present. If you want to add SAS Data Management Accelerator for Spark to new nodes at a later time, you should run the sasdmp_admin.sh script with the -host <hosts> option.
Verify that SAS Data Management Accelerator for Spark is installed by running the sasdmp_admin.sh script with the -check option.
```
cd DMPInstallDir/SASDMPHome/bin/
bin/sasdmp_admin.sh -check
```
This command checks whether SAS Data Management Accelerator for Spark is installed on all data nodes.

Note: The sasdmp_admin.sh -check script does not run successfully if SAS Data Management Accelerator for Spark is not installed.
Verify that the configuration file, dmp-config.xml, was written to the HDFS file system.
```
hadoop fs -ls /sas/ep/config
```
Note: If your cluster is secured with Kerberos, you need a valid Kerberos ticket to access HDFS. If not, you can use the WebHDFS browser.

Note: The /sas/ep/config directory is created automatically when you run the install script. If you used -dmpconfig or -genconfig to specify a non-default location, use that location to find the dmp-config.xml file.

Overview of the SASDMP_ADMIN.SH Script

The sasdmp_admin.sh script enables you to perform the following actions.

Install or uninstall SAS Data Management Accelerator for Spark on a single node or a group of nodes.
Check if SAS Data Management Accelerator for Spark is installed correctly.
Generate a SAS Data Management Accelerator for Spark configuration file and write the file to an HDFS location.
Write the installation output to a log file.
Display all live data nodes on the cluster.
Display the Hadoop configuration environment.

Note: You must have sudo access on the master node only to run the sasdmp_admin.sh script. You must also have SSH set up in such a way that the master node can passwordless SSH to all data nodes on the cluster where SAS Data Management Accelerator for Spark is installed.

SASDMP_ADMIN.SH Syntax

sasdmp_admin.sh

-add <-dmpconfig config-filename > <-maxscp number-of-copies>
<-hostfile host-list-filename | -host <">host-list<">>
<-hdfsuser user-id> <-log filename>

sasdmp_admin.sh

-remove <-dmpconfig config-filename > <-hostfile host-list-filename | -host <">host-list<">>
<-hdfsuser user-id> <-log filename><-keepconfig>

sasdmp_admin.sh

<-genconfig config-filename <-force>>

<-check> <-hostfile host-list-filename | -host <">host-list<">>

<-env>

<-hadoopversion >

<-hotfix >

<-log filename>

<-nodelist>

<-sparkversion>

<-validate>

<-version >

Arguments

-add

installs SAS Data Management Accelerator for Spark.

Tip	If at a later time you add nodes to the cluster, you can specify the hosts on which you want to install SAS Data Management Accelerator for Spark by using the -hostfile or -host option. The -hostfile and -host options are mutually exclusive.
See	-hostfile and -host option

-dmpconfig config-filename

generates the SAS Data Management Accelerator for Spark configuration file in the specified location.

Default	/sas/ep/config/dmp-config.xml
Interaction	Use the -dmpconfig argument in conjunction with the -add or -remove argument to specify the HDFS location of the configuration file. Use the -genconfig argument when you upgrade to a new version of your Hadoop distribution.
Tip	Use the -dmpconfig argument to create the configuration file in a non-default location.
See	-genconfig config-filename -force

-maxscp number-of-copies

specifies the maximum number of parallel copies between the master and data nodes.

Default	10
Interaction	Use this argument in conjunction with the -add argument.

-hostfile host-list-filename

specifies the full path of a file that contains the list of hosts where SAS Data Management Accelerator for Spark is installed or removed.

Default	The sasdmp_admin.sh script discovers the cluster topology and uses the retrieved list of data nodes.
Interaction	Use the -hostfile argument in conjunction with the -add when new nodes are added to the cluster.
Tip	You can also assign a host list filename to a UNIX variable, `SASEP_HOSTS_FILE`. export SASEP_HOSTS_FILE=/etc/hadoop/conf/slaves
See	-hdfsuser user-id
Example	-hostfile /etc/hadoop/conf/slaves

-host <">host-list<">

specifies the target host or host list where SAS Data Management Accelerator for Spark is installed or removed.

Default	The sasdmp_admin.sh script discovers the cluster topology and uses the retrieved list of data nodes.
Requirement	If you specify more than one host, the hosts must be enclosed in double quotation marks and separated by spaces.
Interaction	Use the -host argument in conjunction with the -add when new nodes are added to the cluster.
Tip	You can also assign a list of hosts to a UNIX variable, `SASEP_HOSTS`. export SASEP_HOSTS="server1 server2 server3"
See	-hdfsuser user-id
Example	-host "server1 server2 server3" -host bluesvr

-hdfsuser user-id

specifies the user ID that has Write access to the HDFS root directory.

Default	hdfs for Cloudera, Hortonworks, Pivotal HD, and IBM BigInsights
Default	mapr for MapR
Interaction	Use the -hdfsuser argument in conjunction with the -add or -remove argument to change or remove the HDFS user ID.
Note	The user ID is used to copy the SAS Data Management Accelerator for Spark configuration files to HDFS.

-log filename

writes the installation output to the specified filename.

Interaction

Use the -log argument in conjunction with the -add or -remove argument to write or remove the installation output file.

-remove <-keepconfig>

removes SAS Data Management Accelerator for Spark.

Tips	You can specify the hosts for which you want to remove SAS Data Management Accelerator for Spark by using the -hostfile or -host option. The -hostfile or -host options are mutually exclusive.
Tips	This argument removes the generated dmp-config.xml file. Use the -keepconfig argument to retain the existing configuration file.
See	-hostfile and -host option

-genconfig config-filename <-force>

generates a new SAS Data Management Accelerator for Spark configuration file in the specified location.

Default	/sas/ep/config/dmp-config.xml
Interaction	Use the -dmpconfig argument in conjunction with the -add or -remove argument to specify the HDFS location of the configuration file. Use the -genconfig argument when you upgrade to a new version of your Hadoop distribution.
Tip	This argument generates an updated dmp-config.xml file. Use the -force argument to overwrite the existing configuration file.
See	-dmpconfig config-filename

-check

checks if SAS Data Management Accelerator for Spark is installed correctly on all data nodes.

-env

displays the Hadoop configuration environment.

-hadoopversion

displays the Hadoop version information for the cluster.

-hotfix

installs a hotfix on an existing SAS Data Management Accelerator for Spark installation.

-nodelist

displays all live DataNodes on the cluster.

-sparkversion

displays the Spark version information for the cluster.

-validate

validates the install by executing simple Spark and MapReduce jobs.

-version

displays the version of SAS Data Management Accelerator for Spark that is installed.