Prerequisites for Installing the In-Database Deployment Package
for Hadoop
The following prerequisites
are required before you install and configure the in-database deployment
package for Hadoop:
-
The required Hadoop JAR and configuration
files are available to the SAS client machine.
Depending on your SAS
software, there are several ways these JAR and configuration files
are gathered. Gathering the JAR and configuration files is a one-time
process (unless you are updating your cluster or changing Hadoop vendors).
If you have already gathered the Hadoop JAR and configuration files
for another SAS component, you do not need to do it again.
For more information
on obtaining the JAR and configuration files, see the following documentation,
depending on your SAS software:
-
SAS Hadoop Configuration
Guide for Base SAS and SAS/ACCESS
-
SAS Data Loader for
Hadoop: Installation and Configuration Guide
-
SAS Contextual Analysis
In-Database Scoring in Hadoop: Administrator's Guide
-
SAS/ACCESS Interface to Hadoop
has been configured.
-
You have working knowledge of the
Hadoop vendor distribution that you are using (for example, Cloudera
or Hortonworks).
You also need working
knowledge of the Hadoop Distributed File System (HDFS), MapReduce
1, MapReduce 2, YARN, Hive, and HiveServer2 services. For more information,
see the
Apache
website or the vendor’s website.
-
Ensure that the HCatalog, HDFS,
Hive, MapReduce, Oozie, Sqoop, and YARN services are running on the
Hadoop cluster. The SAS Embedded Process does not necessarily use
these services. However, other SAS software that relies on the SAS
Embedded Process might use these various services. This ensures that
the appropriate JAR files are gathered during the configuration.
-
The SAS in-database and high-performance
analytic products require a specific version of the Hadoop distribution.
For more information, see the SAS Foundation system requirements documentation
for your operating environment.
-
The master node needs to connect
to the slave nodes using passwordless SSH. For more information, see
to the Linux manual pages on ssh-keygen and ssh-copy-id.
-
You understand and can verify your
security setup.
If your cluster is
secured with Kerberos, you need the ability to get a Kerberos ticket.
You also need to have knowledge of any additional security policies.
-
You have permission to restart
the Hadoop MapReduce service (only needed for backward compatibility
with SAS 9.4M2 or SAS 9.4M3 and MapReduce 1).
Copyright © SAS Institute Inc. All Rights Reserved.
Last updated: February 9, 2017