Installing the SAS Embedded Process

To install the SAS Embedded Process and SAS Hadoop Embedded Process JAR file, follow these steps:
Note: Permissions are needed to install the SAS Embedded Process and SAS Hadoop Embedded Process JAR file. For more information, see Hadoop Permissions.
  1. Navigate to the location on your Hadoop master node where you copied the sepcorehadp-12.00000-1.sh file.
    cd /EPInstallDir
  2. Ensure that both the EPInstallDir folder and the sepcorehadp-12.00000-1.sh file have Read, Write, and Execute permissions (chmod 755 —R).
  3. Use the following command to unpack the sepcorehadp-12.00000-1.sh file.
    ./sepcorehadp-12.00000-1.sh <--verbose>
    Note: The --quiet option is enabled by default. Only error messages are displayed. The --verbose option causes all messages to be displayed that are generated during the installation process. Using verbose messaging can increase the time that is required to perform the installation.
    After this script is run and the files are unpacked, the following directory structure is created where EPInstallDir is the location on the master node from Step 2.
    EPInstallDir/sasexe/SASEPHome
    EPInstallDir/sasexe/sepcorehadp-12.00000-1.sh
    Note: During the install process, the sepcorehadp-12.00000-1.sh file is copied to all data nodes. Do not remove or move this file from the EPInstallDir/sasexe directory.
    The SASEPHome directory should have the following structure:
    EPInstallDir/sasexe/SASEPHome/bin
    EPInstallDir/sasexe/SASEPHome/install
    EPInstallDir/sasexe/SASEPHome/jars
    EPInstallDir/sasexe/SASEPHome/misc
    EPInstallDir/sasexe/SASEPHome/sasexe
    EPInstallDir/sasexe/SASEPHome/utilities
    
    The EPInstallDir/SASEPHome/jars directory contains the SAS Hadoop Embedded Process JAR file.
    EPInstallDir/sasexe/SASEPHome/jars/sas.hadp2.jar
    The EPInstallDir/SASEPHome/install directory contains install scripts for other SAS software that is packaged with the SAS Embedded Process. These files exist only if you have licensed this additional software. For more information about what components are also deployed, see Overview of the In-Database Deployment Package for Hadoop.
    The EPInstallDir/sasexe/SASEPHome/bin directory should contain the following script.
    EPInstallDir/sasexe/SASEPHome/bin/sasep-admin.sh
  4. If your Hadoop cluster is secured with Kerberos and you have sudo access, the HDFS user must have a valid Kerberos ticket in order to access HDFS. You can obtain a valid Kerberos ticket with the kinit command.
    sudo su - root
    su - hdfs | hdfs-userid
    kinit -kt location-of-keytab-file-user-for-which-you-are-requesting-a-ticket 
       principal-name
    exit
    Note: For all Hadoop distributions except MapR, the default HDFS user is hdfs. For MapR distributions, the default HDFS user is mapr. You can specify a different user ID with the -hdfsuser argument when you run the sasep-admin.sh -add script. If you use a different HDFS superuser, ensure that the user has a home directory in HDFS before you run the sasep-admin.sh -add command. For example, if the HDFS superuser is prodhdfs, ensure that the /user/prodhdfs directory exists in HDFS.
    Tip
    To check the status of your Kerberos ticket on the server, as the HDFS user, run the klist command. Here is an example of the command and its output:
    klist
    Ticket cache: FILE/tmp/krb5cc_493
    Default principal: hdfs@HOST.COMPANY.COM
    
    Valid starting    Expires           Service principal
    06/20/16 09:51:26 06/27/16 09:51:26 krbtgt/HOST.COMPANY.COM@HOST.COMPANY.COM
         renew until 06/22/16 09:51:26
  5. Run the sasep-admin.sh script to deploy the SAS Embedded Process across all nodes. How you run the script depends on whether you have sudo access.
    Note: It is recommended that the sasep-admin.sh script be run from the EPInstallDir/sasexe/SASEPHome/bin/ location.
    Tip
    Many options are available for installing the SAS Embedded Process. We recommend that you review the script syntax before running it. For more information, see SASEP-ADMIN.SH Syntax.
    • If you have sudo access, run the sasep-admin.sh script as follows to deploy SAS Embedded Process on all nodes. Review all of the information in this step and the script syntax before you run the script.
      cd EPInstallDir/sasexe/SASEPHome/bin/
      ./sasep-admin.sh -add
      If you have sudo access, the SAS Embedded Process install script (sasep-admin.sh) detects the Hadoop cluster topology and installs the SAS Embedded Process on all DataNode nodes. The install script also installs the SAS Embedded Process on the host node from which you run the script (the Hadoop master NameNode). The SAS Embedded Process is installed even if a DataNode is not present. To add the SAS Embedded Process to new nodes at a later time, run the sasep-admin.sh script with the -host <hosts> option.
      In addition, a configuration file, ep-config.xml, is automatically created and written to the EPInstallDir/SASEPHome/conf directory and to the HDFS file system in the /sas/ep/config directory.
    • If you do not have sudo access, follow these steps to deploy the SAS Embedded Process:
      1. Run the sasep-admin.sh script as follows to deploy the SAS Embedded Process across all nodes.
        cd EPInstallDir/SASEPHome/bin/
        ./sasep-admin.sh -x -add -hostfile host-list-filename | -host <">host-list<">
        CAUTION:
        The SAS Embedded Process must be installed on all nodes that are capable of running a MapReduce task (MapReduce 1) or on all nodes that are capable of running a YARN container (MapReduce 2). The SAS Embedded Process must also be installed on the host node from which you run the script (the Hadoop master NameNode). Hive and HCatalog must be available on all nodes where the SAS Embedded Process is installed.
        Otherwise, the SAS Embedded Process does not function properly.
        Note: If you do not have sudo access, you must use the -x option and specify the hosts on which the SAS Embedded Process is deployed with either the -hostfile or -host option. Automatic detection of the Hadoop cluster topology is not available when you run the installation script with the -x option.
        The sepcorehadp-12.00000-1.sh file is copied to all nodes that you specify. The configuration file, ep-config.xml, is created and written to the EPInstallDir/SASEPHome/conf directory.
      2. Manually copy the ep-config.xml configuration file to the HDFS.
        Note: This step must be performed by a user that has Write permission to the HDFS root/ folder. If your Hadoop cluster is secured with Kerberos, the user who copies the configuration file to HDFS must have a valid Kerberos ticket.
        1. Log on as your HDFS user or as the user that you use to access HDFS.
        2. Create the /sas/ep/config directory for the configuration file.
          hadoop fs -mkdir -p /sas/ep/config
        3. Navigate to the EPInstallDir/SASEPHome/conf directory.
          cd EPInstallDir/SASEPHome/conf
        4. Use the Hadoop copyFromLocal command to copy the ep-config.xml file to HDFS.
          hadoop fs -copyFromLocal ep-config.xml /sas/ep/config/ep-config.xml
  6. Verify that the SAS Embedded Process is installed by running the sasep-admin.sh script with the -check option.
    • If you ran the sasep-admin.sh script with sudo access, run the following command. By default, this command verifies that the SAS Embedded Process was installed on all nodes.
      cd EPInstallDir/sasexe/SASEPHome/bin/
      ./sasep-admin.sh -check
    • If you ran the sasep-admin.sh script with the -x argument, run the following command. This command verifies that the SAS Embedded Process was installed on the hosts that you specified.
      cd EPInstallDir/sasexe/SASEPHome/bin/
      ./sasep-admin.sh ./sasep-admin.sh -x -check 
         -hostfile host-list-filename | -host <">host-list<">
    Note: The sasep-admin.sh -check script does not run successfully if the SAS Embedded Process is not installed.
  7. If your distribution is running MapReduce 1, follow these steps. Otherwise, skip to Step 10.
    Note: For more information, see Backward Compatibility.
    1. Verify that the sasep.hdp2.jar files are now in the hadoop/lib directory.
      For Cloudera, the JAR files are typically located here:
      /opt/cloudera/parcels/CDH/lib/hadoop/lib
      For Hortonworks, the JAR files are typically located here:
      /usr/lib/hadoop/lib
    2. Restart the Hadoop MapReduce service.
      This enables the cluster to load the SAS Hadoop Embedded Process JAR file (sasep.hadp2.jar).
      Note: It is preferable to restart the service by using Cloudera Manager or Ambari (for Hortonworks), if available.
  8. Verify that the configuration file, ep-config.xml, was written to the HDFS file system.
    hadoop fs -ls /sas/ep/config/
    hadoop fs -cat /sas/ep/config/ep-config.xml
    Note: If your cluster is secured with Kerberos, you need a valid Kerberos ticket in order to access HDFS. Otherwise, you can use the WebHDFS browser.
    Note: The /sas/ep/config directory is created automatically when you run the install script with sudo access. If you used the -genconfig option to specify a non-default location, use that location to find the ep-config.xml file. When using a non-default location, a configuration property must be added to the mapred-site.xml configuration file that is used on the client side.
    <property>
    <name>sas.ep.config.file</name>
    <value>config-file-location-on-hdfs</value>
    The config-file-location-on-hdfs is the location of the SAS Embedded Process configuration file on HDFS.
Last updated: February 9, 2017