This section describes
how to install and configure the in-database deployment package for
Hadoop (SAS Embedded Process).
The in-database deployment
package for Hadoop must be installed and configured before you can
transform data in Hadoop and extract transformed data out of Hadoop
for analysis.
The in-database deployment
package for Hadoop includes the SAS Embedded Process and two SAS Hadoop
MapReduce JAR files. The SAS Embedded Process is a SAS server process
that runs within Hadoop to read and write data. The SAS Embedded Process
contains macros, run-time libraries, and other software that is installed
on your Hadoop system.
The SAS Embedded Process
must be installed on all nodes capable of executing MapReduce 2 and
YARN tasks. The SAS Hadoop MapReduce JAR files must be installed on
all nodes of a Hadoop cluster.
The SAS Embedded Process
must be installed on all nodes capable of executing MapReduce
2 and YARN tasks, that is, nodes where a NodeManager is running. Usually,
every DataNode node has a YARN NodeManager
running. By default, the SAS Embedded Process install script (sasep-servers.sh)
discovers the cluster topology and installs the SAS Embedded Process
on all DataNode nodes, including the host node from where you run
the script (the Hadoop master NameNode). This occurs even if a DataNode
is not present. If you want to limit the list of nodes on which you
want the SAS Embedded Process installed, you should run the sasep-servers.sh
script with the -host <hosts> option.
The SAS Hadoop MapReduce JAR files must be installed on all nodes
of a Hadoop cluster.