To install the SAS Embedded
Process and SAS Hadoop Embedded Process JAR file, follow these steps:
Note: Permissions
are needed to install the SAS Embedded Process and SAS Hadoop Embedded
Process JAR file.
For more information, see Hadoop Permissions.
-
Navigate to the location
on your Hadoop master node where you copied the sepcorehadp-12.00000-1.sh
file.
cd /EPInstallDir
-
Ensure that both the EPInstallDir folder
and the sepcorehadp-12.00000-1.sh file have Read, Write, and Execute
permissions (chmod 755 —R
).
-
Use the following command
to unpack the sepcorehadp-12.00000-1.sh file.
./sepcorehadp-12.00000-1.sh <--verbose>
Note: The -
-quiet
option
is enabled by default. Only error messages are displayed. The -
-verbose
option
causes all messages to be displayed that are generated during the
installation process. Using verbose messaging can increase the time
that is required to perform the installation.
After this script is
run and the files are unpacked, the following directory structure
is created where
EPInstallDir is
the location on the master node from Step 2.
EPInstallDir/sasexe/SASEPHome
EPInstallDir/sasexe/sepcorehadp-12.00000-1.sh
Note: During the install process,
the sepcorehadp-12.00000-1.sh file is copied to all data nodes. Do
not remove or move this file from the EPInstallDir/sasexe
directory.
The SASEPHome directory
should have the following structure:
EPInstallDir/sasexe/SASEPHome/bin
EPInstallDir/sasexe/SASEPHome/install
EPInstallDir/sasexe/SASEPHome/jars
EPInstallDir/sasexe/SASEPHome/misc
EPInstallDir/sasexe/SASEPHome/sasexe
EPInstallDir/sasexe/SASEPHome/utilities
The
EPInstallDir/SASEPHome/jars
directory contains the SAS Hadoop Embedded Process
JAR file.
EPInstallDir/sasexe/SASEPHome/jars/sas.hadp2.jar
The
EPInstallDir/SASEPHome/install
directory contains install scripts for other SAS software
that is packaged with the SAS Embedded Process. These files exist
only if you have licensed this additional software.
For more information
about what components are also deployed, see Overview of the In-Database Deployment Package for Hadoop.
The
EPInstallDir/sasexe/SASEPHome/bin
directory should contain the following script.
EPInstallDir/sasexe/SASEPHome/bin/sasep-admin.sh
-
If your Hadoop cluster
is secured with Kerberos and you have sudo access, the HDFS user must
have a valid Kerberos ticket in order to access HDFS. You can obtain
a valid Kerberos ticket with the
kinit command.
sudo su - root
su - hdfs | hdfs-userid
kinit -kt location-of-keytab-file-user-for-which-you-are-requesting-a-ticket
principal-name
exit
Note: For all Hadoop distributions
except MapR, the default HDFS user is hdfs
.
For MapR distributions, the default HDFS user is mapr
.
You can specify a different user ID with the -hdfsuser argument when
you run the sasep-admin.sh -add
script. If
you use a different HDFS superuser, ensure that the user has a home
directory in HDFS before you run the sasep-admin.sh -add command.
For example, if the HDFS superuser is prodhdfs
,
ensure that the /user/prodhdfs
directory
exists in HDFS.
Tip
To check the status of your
Kerberos ticket on the server, as the HDFS user, run the
klist command.
Here is an example of the command and its output:
klist
Ticket cache: FILE/tmp/krb5cc_493
Default principal: hdfs@HOST.COMPANY.COM
Valid starting Expires Service principal
06/20/16 09:51:26 06/27/16 09:51:26 krbtgt/HOST.COMPANY.COM@HOST.COMPANY.COM
renew until 06/22/16 09:51:26
-
Run the sasep-admin.sh
script to deploy the SAS Embedded Process across all nodes. How you
run the script depends on whether you have sudo access.
Note: It is recommended that the
sasep-admin.sh script be run from the EPInstallDir/sasexe/SASEPHome/bin/
location.
Tip
Many
options are available for installing the SAS Embedded Process. We
recommend that you review the script syntax before running it.
For more information,
see SASEP-ADMIN.SH Syntax.
-
If you have sudo access, run the
sasep-admin.sh script as follows to deploy SAS Embedded Process on
all nodes. Review all of the information in this step and the script
syntax before you run the script.
cd EPInstallDir/sasexe/SASEPHome/bin/
./sasep-admin.sh -add
If you have sudo access,
the SAS Embedded Process install script (sasep-admin.sh) detects the
Hadoop cluster topology and installs the SAS Embedded Process on all
DataNode nodes. The install script also installs the SAS Embedded
Process on the host node from which you run the script (the Hadoop
master NameNode). The SAS Embedded Process is installed even if a
DataNode is not present. To add the SAS Embedded Process to new nodes
at a later time, run the sasep-admin.sh script with the -host
<hosts>
option.
In addition, a configuration
file, ep-config.xml, is automatically created and written to the EPInstallDir/SASEPHome/conf
directory
and to the HDFS file system in the /sas/ep/config
directory.
-
If you do not have sudo access,
follow these steps to deploy the SAS Embedded Process:
-
Run the sasep-admin.sh
script as follows to deploy the SAS Embedded Process across all nodes.
cd EPInstallDir/SASEPHome/bin/
./sasep-admin.sh -x -add -hostfile host-list-filename | -host <">host-list<">
CAUTION:
The SAS
Embedded Process must be installed on all nodes that are capable of
running a MapReduce task (MapReduce 1) or on all nodes that are capable
of running a YARN container (MapReduce 2). The SAS Embedded Process
must also be installed on the host node from which you run the script
(the Hadoop master NameNode). Hive and HCatalog must be available
on all nodes where the SAS Embedded Process is installed.
Otherwise, the SAS
Embedded Process does not function properly.
Note: If you do not have sudo access,
you must use the -x option and specify the hosts on which the SAS
Embedded Process is deployed with either the -hostfile or -host option.
Automatic detection of the Hadoop cluster topology is not available
when you run the installation script with the -x option.
The sepcorehadp-12.00000-1.sh
file is copied to all nodes that you specify. The configuration file,
ep-config.xml, is created and written to the EPInstallDir/SASEPHome/conf
directory.
-
Manually copy the ep-config.xml
configuration file to the HDFS.
Note: This step must be performed
by a user that has Write permission to the HDFS root/
folder.
If your Hadoop cluster is secured with Kerberos, the user who copies
the configuration file to HDFS must have a valid Kerberos ticket.
-
Log on as your HDFS
user or as the user that you use to access HDFS.
-
Create the
/sas/ep/config
directory
for the configuration file.
hadoop fs -mkdir -p /sas/ep/config
-
Navigate to the
EPInstallDir/SASEPHome/conf
directory.
cd EPInstallDir/SASEPHome/conf
-
Use the Hadoop copyFromLocal
command to copy the ep-config.xml file to HDFS.
hadoop fs -copyFromLocal ep-config.xml /sas/ep/config/ep-config.xml
-
Verify that the SAS
Embedded Process is installed by running the sasep-admin.sh script
with the
-check
option.
-
If you ran the sasep-admin.sh script
with sudo access, run the following command. By default, this command
verifies that the SAS Embedded Process was installed on all nodes.
cd EPInstallDir/sasexe/SASEPHome/bin/
./sasep-admin.sh -check
-
If you ran the sasep-admin.sh script
with the -x argument, run the following command. This command verifies
that the SAS Embedded Process was installed on the hosts that you
specified.
cd EPInstallDir/sasexe/SASEPHome/bin/
./sasep-admin.sh ./sasep-admin.sh -x -check
-hostfile host-list-filename | -host <">host-list<">
Note: The sasep-admin.sh -check
script does not run successfully if the SAS Embedded Process is not
installed.
-
If your distribution
is running MapReduce 1, follow these steps. Otherwise, skip to Step
10.
-
Verify that the sasep.hdp2.jar
files are now in the hadoop/lib
directory.
For Cloudera, the JAR
files are typically located here:
/opt/cloudera/parcels/CDH/lib/hadoop/lib
For
Hortonworks, the JAR files are typically located here:
/usr/lib/hadoop/lib
-
Restart the Hadoop MapReduce
service.
This enables the cluster
to load the SAS Hadoop Embedded Process JAR file (sasep.hadp2.jar).
Note: It is preferable to restart
the service by using Cloudera Manager or Ambari (for Hortonworks),
if available.
-
Verify that the configuration
file, ep-config.xml, was written to the HDFS file system.
hadoop fs -ls /sas/ep/config/
hadoop fs -cat /sas/ep/config/ep-config.xml
Note: If your cluster is secured
with Kerberos, you need a valid Kerberos ticket in order to access
HDFS. Otherwise, you can use the WebHDFS browser.
Note: The
/sas/ep/config
directory
is created automatically when you run the install script with sudo
access. If you used the -genconfig option to specify a non-default
location, use that location to find the ep-config.xml file. When using
a non-default location, a configuration property must be added to
the mapred-site.xml configuration file that is used on the client
side.
<property>
<name>sas.ep.config.file</name>
<value>config-file-location-on-hdfs</value>
The
config-file-location-on-hdfs is
the location of the SAS Embedded Process configuration file on HDFS.