Prerequisites for Installing the In-Database Deployment Package for Hadoop

The following prerequisites are required before you install and configure the in-database deployment package for Hadoop:
  • The required Hadoop JAR and configuration files are available to the SAS client machine.
    Depending on your SAS software, there are several ways these JAR and configuration files are gathered. Gathering the JAR and configuration files is a one-time process (unless you are updating your cluster or changing Hadoop vendors). If you have already gathered the Hadoop JAR and configuration files for another SAS component, you do not need to do it again.
    For more information on obtaining the JAR and configuration files, see the following documentation, depending on your SAS software:
    • SAS Hadoop Configuration Guide for Base SAS and SAS/ACCESS
    • SAS Data Loader for Hadoop: Installation and Configuration Guide
    • SAS Contextual Analysis In-Database Scoring in Hadoop: Administrator's Guide
  • SAS/ACCESS Interface to Hadoop has been configured.
    For more information, see SAS Hadoop Configuration Guide for Base SAS and SAS/ACCESS at SAS 9.4 Support for Hadoop.
  • You have working knowledge of the Hadoop vendor distribution that you are using (for example, Cloudera or Hortonworks).
    You also need working knowledge of the Hadoop Distributed File System (HDFS), MapReduce 1, MapReduce 2, YARN, Hive, and HiveServer2 services. For more information, see the Apache website or the vendor’s website.
  • Ensure that the HCatalog, HDFS, Hive, MapReduce, Oozie, Sqoop, and YARN services are running on the Hadoop cluster. The SAS Embedded Process does not necessarily use these services. However, other SAS software that relies on the SAS Embedded Process might use these various services. This ensures that the appropriate JAR files are gathered during the configuration.
  • The SAS in-database and high-performance analytic products require a specific version of the Hadoop distribution. For more information, see the SAS Foundation system requirements documentation for your operating environment.
  • The master node needs to connect to the slave nodes using passwordless SSH. For more information, see to the Linux manual pages on ssh-keygen and ssh-copy-id.
  • You understand and can verify your security setup.
    If your cluster is secured with Kerberos, you need the ability to get a Kerberos ticket. You also need to have knowledge of any additional security policies.
  • You have permission to restart the Hadoop MapReduce service (only needed for backward compatibility with SAS 9.4M2 or SAS 9.4M3 and MapReduce 1).
Last updated: February 9, 2017