The following prerequisites
are required before you install and configure the in-database deployment
package for Hadoop:
-
SAS/ACCESS Interface to Hadoop
has been configured.
-
You have working knowledge of the
Hadoop vendor distribution that you are using (for example, Cloudera
or Hortonworks).
You also need working
knowledge of the Hadoop Distributed File System (HDFS), MapReduce
1, MapReduce 2, YARN, Hive, and HiveServer2 services. For more information,
see the
Apache
website or the vendor’s website.
-
Ensure that the HCatalog, HDFS,
Hive, MapReduce, Oozie, Sqoop, and YARN services are running on the
Hadoop cluster. The SAS Embedded Process does not necessarily use
these services. However, other SAS software that relies on the SAS
Embedded Process might use these various services. This ensures that
the appropriate JAR files are gathered during the configuration.
-
The SAS in-database and high-performance
analytic products require a specific version of the Hadoop distribution.
For more information, see the SAS Foundation system requirements documentation
for your operating environment.
-
You have sudo access on the NameNode.
-
Your HDFS user has Write permission
to the root of HDFS.
-
The master node needs to connect
to the slave nodes using passwordless SSH. For more information, see
to the Linux manual pages on ssh-keygen and ssh-copy-id.
-
You understand and can verify your
security setup.
If your cluster is
secured with Kerberos, you need the ability to get a Kerberos ticket.
You also need to have knowledge of any additional security policies.
-
You have permission to restart
the Hadoop MapReduce service.