Manual Installation

Creating the SAS Data Management Accelerator for Spark Directory

Create a new directory on the Hadoop master node that is not part of an existing directory structure, such as /sasdmp.
This path is created on each node in the Hadoop cluster during the SAS Data Management Accelerator for Spark installation. Do not use existing system directories such as /opt or /usr. This new directory is referred to as DMPInstallDir throughout this section.

Copying the SAS Data Management Accelerator for Spark Install Script

The SAS Data Management Accelerator for Spark install script is contained in a self-extracting archive file named sasdmp_admin.sh. This file is contained in a ZIP file that is located in a directory in your SAS Software Depot.
To copy the ZIP file to the DMPInstallDir on your Hadoop master node, follow these steps:
  1. Navigate to the YourSASDepot/standalone_installs directory.
    This directory was created when your SAS Software Depot was created by the SAS Download Manager.
  2. Locate the en_sasexe.zip file. This file is in the following directory: YourSASDepot/standalone_installs/SAS_Data_Management_Accelerator_for_Spark/2_4/Hadoop_on_Linux_x64.
    The sasdmp_admin.sh file is included in this ZIP file.
  3. Log on to the cluster using SSH with sudo access.
    ssh username@serverhostname
    sudo su - 
  4. Copy the en_sasexe.zip file from the client to the DMPInstallDir on the cluster. The following example uses secure copy:
    scp en_sasexe.zip username@hdpclus1: /DMPInstallDir
    Note: The DMPInstallDir location becomes the SAS Data Management Accelerator for Spark home.

Installing SAS Data Management Accelerator for Spark

To install SAS Data Management Accelerator for Spark, follow these steps:
Note: Permissions are required to install SAS Data Management Accelerator for Spark. For more information, see Hadoop Permissions.
  1. Navigate to the location on your Hadoop master node where you copied the en_sasexe.zip file.
    cd /DMPInstallDir
  2. Ensure that both the DMPInstallDir folder and the en_sasexe.zip file have Read, Write, and Execute permissions (chmod 777).
  3. Unzip the en_sasexe.zip file.
    unzip en_sasexe.zip
    After the file is unzipped, a sasexe directory is created in the same location as the en_sasexe.zip file. The dmsprkhadp-2.40000-1.sh file is located in the sasexe directory.
    DMPInstallDir/sasexe/dmsprkhadp-2.40000-1.sh
  4. Use the following command to unpack the dmsprkhadp-2.40000-1.sh file.
    ./dmsprkhadp-2.40000-1.sh
    After this script is run and the files are unpacked, the script creates the following directory structure:
    DMPInstallDir/sasexe/SASDMPHome
    DMPInstallDir/sasexe/dmsprkhadp-2.40000-1.sh
    Note: During the install process, the dmsprkhadp-2.40000-1.sh is copied to all data nodes. Do not remove or move this file from the DMPInstallDir/sasexe directory.
    The SASDMPHome directory structure looks like this.
    DMPInstallDir/sasexe/SASDMPHome/bin
    DMPInstallDir/sasexe/SASDMPHome/dat
    DMPInstallDir/sasexe/SASDMPHome/etc
    DMPInstallDir/sasexe/SASDMPHome/lib
    DMPInstallDir/sasexe/SASDMPHome/share
    DMPInstallDir/sasexe/SASDMPHome/var
    The DMPInstallDir/sasexe/SASDMPHome/bin directory looks like this.
    DMPInstallDir/sasexe/SASDMPHome/bin/dfwsvc
    DMPInstallDir/sasexe/SASDMPHome/bin/dfxver
    DMPInstallDir/sasexe/SASDMPHome/bin/dfxver.bin
    DMPInstallDir/sasexe/SASDMPHome/bin/sasdmp_admin.sh
    DMPInstallDir/sasexe/SASDMPHome/bin/settings.sh
    DMPInstallDir/sasexe/SASDMPHome/bin/dmpsvc
  5. Use the sasdmp_admin.sh script to deploy the SAS Data Management Accelerator for Spark installation across all nodes.
    Tip
    Many options are available for installing SAS Data Management Accelerator for Spark. Review the script syntax before running it. For more information, see Overview of the SASDMP_ADMIN.SH Script.
    Note: If your cluster is secured with Kerberos, complete both steps a and b. If your cluster is not secured with Kerberos, complete only step b.
    1. If your cluster is secured with Kerberos, the HDFS user must have a valid Kerberos ticket to access HDFS. This can be done with kinit.
      sudo su - root
      su - hdfs | hdfs-userid
      kinit -kt location of keytab file user for which you are requesting a ticket
      exit
      Note: For all Hadoop distributions except MapR, the default HDFS user is hdfs. For MapR distributions, the default MapR superuser is mapr. You can specify a different user ID with the -hdfsuser argument when you run the bin/sasdmp_admin.sh -add script.
      Note: To check the status of your Kerberos ticket on the server, run klist while you are running as the -hdfsuser user. Here is an example:
      klist
      Ticket cache: FILE/tmp/krb5cc_493
      Default principal: hdfs@HOST.COMPANY.COM
      
      Valid starting    Expires           Service principal
      06/20/15 09:51:26 06/27/15 09:51:26 krbtgt/HOST.COMPANY.COM@HOST.COMPANY.COM
           renew until 06/22/15 09:51:26
    2. Run the sasdmp_admin.sh script. Review all of the information in this step before running the script.
      cd DMPInstallDir/SASDMPHome/
      bin/sasdmp_admin.sh -genconfig
      bin/sasdmp_admin.sh -add
      
      Tip
      Many options are available when installing SAS Data Management Accelerator for Spark. Review the script syntax before running it. For more information, see Overview of the SASDMP_ADMIN.SH Script.
    Note: By default, the SAS Data Management Accelerator for Spark install script (sasdmp_admin.sh) discovers the cluster topology and installs SAS Data Management Accelerator for Spark on all DataNode nodes, including the host node from where you run the script (the Hadoop master NameNode). This occurs even if a DataNode is not present. If you want to add SAS Data Management Accelerator for Spark to new nodes at a later time, you should run the sasdmp_admin.sh script with the -host <hosts> option.
  6. Verify that SAS Data Management Accelerator for Spark is installed by running the sasdmp_admin.sh script with the -check option.
    cd DMPInstallDir/SASDMPHome/bin/
    bin/sasdmp_admin.sh -check
    This command checks whether SAS Data Management Accelerator for Spark is installed on all data nodes.
    Note: The sasdmp_admin.sh -check script does not run successfully if SAS Data Management Accelerator for Spark is not installed.
  7. Verify that the configuration file, dmp-config.xml, was written to the HDFS file system.
    hadoop fs -ls /sas/ep/config
    Note: If your cluster is secured with Kerberos, you need a valid Kerberos ticket to access HDFS. If not, you can use the WebHDFS browser.
    Note: The /sas/ep/config directory is created automatically when you run the install script. If you used -dmpconfig or -genconfig to specify a non-default location, use that location to find the dmp-config.xml file.

Overview of the SASDMP_ADMIN.SH Script

The sasdmp_admin.sh script enables you to perform the following actions.
  • Install or uninstall SAS Data Management Accelerator for Spark on a single node or a group of nodes.
  • Check if SAS Data Management Accelerator for Spark is installed correctly.
  • Generate a SAS Data Management Accelerator for Spark configuration file and write the file to an HDFS location.
  • Write the installation output to a log file.
  • Display all live data nodes on the cluster.
  • Display the Hadoop configuration environment.
Note: You must have sudo access on the master node only to run the sasdmp_admin.sh script. You must also have SSH set up in such a way that the master node can passwordless SSH to all data nodes on the cluster where SAS Data Management Accelerator for Spark is installed.

SASDMP_ADMIN.SH Syntax

sasdmp_admin.sh
-add <-dmpconfig config-filename > <-maxscp number-of-copies>
<-hostfile host-list-filename | -host <">host-list<">>
<-hdfsuser user-id> <-log filename>
sasdmp_admin.sh
-remove <-dmpconfig config-filename > <-hostfile host-list-filename | -host <">host-list<">>
<-hdfsuser user-id> <-log filename><-keepconfig>
sasdmp_admin.sh
<-genconfig config-filename <-force>>
<-check> <-hostfile host-list-filename | -host <">host-list<">>
<-env>
<-hadoopversion >
<-hotfix >
<-log filename>
<-nodelist>
<-sparkversion>
<-validate>
<-version >
Arguments

-add

installs SAS Data Management Accelerator for Spark.

Tip If at a later time you add nodes to the cluster, you can specify the hosts on which you want to install SAS Data Management Accelerator for Spark by using the -hostfile or -host option. The -hostfile and -host options are mutually exclusive.
See -hostfile and -host option

-dmpconfig config-filename

generates the SAS Data Management Accelerator for Spark configuration file in the specified location.

Default /sas/ep/config/dmp-config.xml
Interaction Use the -dmpconfig argument in conjunction with the -add or -remove argument to specify the HDFS location of the configuration file. Use the -genconfig argument when you upgrade to a new version of your Hadoop distribution.
Tip Use the -dmpconfig argument to create the configuration file in a non-default location.
See -genconfig config-filename -force

-maxscp number-of-copies

specifies the maximum number of parallel copies between the master and data nodes.

Default 10
Interaction Use this argument in conjunction with the -add argument.

-hostfile host-list-filename

specifies the full path of a file that contains the list of hosts where SAS Data Management Accelerator for Spark is installed or removed.

Default The sasdmp_admin.sh script discovers the cluster topology and uses the retrieved list of data nodes.
Interaction Use the -hostfile argument in conjunction with the -add when new nodes are added to the cluster.
Tip You can also assign a host list filename to a UNIX variable, SASEP_HOSTS_FILE.
export SASEP_HOSTS_FILE=/etc/hadoop/conf/slaves
See -hdfsuser user-id
Example
-hostfile /etc/hadoop/conf/slaves

-host <">host-list<">

specifies the target host or host list where SAS Data Management Accelerator for Spark is installed or removed.

Default The sasdmp_admin.sh script discovers the cluster topology and uses the retrieved list of data nodes.
Requirement If you specify more than one host, the hosts must be enclosed in double quotation marks and separated by spaces.
Interaction Use the -host argument in conjunction with the -add when new nodes are added to the cluster.
Tip You can also assign a list of hosts to a UNIX variable, SASEP_HOSTS.
export SASEP_HOSTS="server1 server2 server3"
See -hdfsuser user-id
Example
-host "server1 server2 server3"
-host bluesvr

-hdfsuser user-id

specifies the user ID that has Write access to the HDFS root directory.

Default hdfs for Cloudera, Hortonworks, Pivotal HD, and IBM BigInsights
mapr for MapR
Interaction Use the -hdfsuser argument in conjunction with the -add or -remove argument to change or remove the HDFS user ID.
Note The user ID is used to copy the SAS Data Management Accelerator for Spark configuration files to HDFS.

-log filename

writes the installation output to the specified filename.

Interaction Use the -log argument in conjunction with the -add or -remove argument to write or remove the installation output file.

-remove <-keepconfig>

removes SAS Data Management Accelerator for Spark.

Tips You can specify the hosts for which you want to remove SAS Data Management Accelerator for Spark by using the -hostfile or -host option. The -hostfile or -host options are mutually exclusive.
This argument removes the generated dmp-config.xml file. Use the -keepconfig argument to retain the existing configuration file.
See -hostfile and -host option

-genconfig config-filename <-force>

generates a new SAS Data Management Accelerator for Spark configuration file in the specified location.

Default /sas/ep/config/dmp-config.xml
Interaction Use the -dmpconfig argument in conjunction with the -add or -remove argument to specify the HDFS location of the configuration file. Use the -genconfig argument when you upgrade to a new version of your Hadoop distribution.
Tip This argument generates an updated dmp-config.xml file. Use the -force argument to overwrite the existing configuration file.
See -dmpconfig config-filename

-check

checks if SAS Data Management Accelerator for Spark is installed correctly on all data nodes.

-env

displays the Hadoop configuration environment.

-hadoopversion

displays the Hadoop version information for the cluster.

-hotfix

installs a hotfix on an existing SAS Data Management Accelerator for Spark installation.

-nodelist

displays all live DataNodes on the cluster.

-sparkversion

displays the Spark version information for the cluster.

-validate

validates the install by executing simple Spark and MapReduce jobs.

-version

displays the version of SAS Data Management Accelerator for Spark that is installed.