Prerequisites for Installing the In-Database Deployment Package for Hadoop

The following prerequisites are required before you install and configure the in-database deployment package for Hadoop:
  • SAS/ACCESS Interface to Hadoop has been configured.
    For more information, see SAS 9.4 Hadoop Configuration Guide for Base SAS and SAS/ACCESS at SAS 9.4 Support for Hadoop.
  • You have working knowledge of the Hadoop vendor distribution that you are using (for example, Cloudera or Hortonworks).
    You also need working knowledge of the Hadoop Distributed File System (HDFS), MapReduce 1, MapReduce 2, YARN, Hive, and HiveServer2 services. For more information, see the Apache website or the vendor’s website.
  • Ensure that the HCatalog, HDFS, Hive, MapReduce, Oozie, Sqoop, and YARN services are running on the Hadoop cluster. The SAS Embedded Process does not necessarily use these services. However, other SAS software that relies on the SAS Embedded Process might use these various services. This ensures that the appropriate JAR files are gathered during the configuration.
  • The SAS in-database and high-performance analytic products require a specific version of the Hadoop distribution. For more information, see the SAS Foundation system requirements documentation for your operating environment.
  • You have sudo access on the NameNode.
  • Your HDFS user has Write permission to the root of HDFS.
  • The master node needs to connect to the slave nodes using passwordless SSH. For more information, see to the Linux manual pages on ssh-keygen and ssh-copy-id.
  • You understand and can verify your security setup.
    If your cluster is secured with Kerberos, you need the ability to get a Kerberos ticket. You also need to have knowledge of any additional security policies.
  • You have permission to restart the Hadoop MapReduce service.