Requirements for Co-located Hadoop

If you already have one of the supported Hadoop distributions, you can modify it with files from the SAS Plug-ins for Hadoop package. Hadoop modified with SAS Plug-ins for Hadoop enables the SAS High-Performance Analytics environment to write SASHDAT file blocks evenly across the HDFS file system.
The following is required for existing Hadoop clusters with which the SAS High-Performance Analytics environment can be co-located:
  • Each machine in the cluster must be able to resolve the host name of all the other machines.
  • The machine configured as the NameNode cannot also be configured as a DataNode.
  • These Hadoop directories must reside on local storage:
    • the directory on the file system where the Hadoop NameNode stores the namespace and transactions logs persistently
    • the directory on the file system where temporary MapReduce data is written
    • the directory on the file system where the MapReduce framework writes system files
    Note: The exception is the hadoop-data directory, which can be on a storage area network (SAN). Network attached storage (NAS) devices are not supported.
  • Time must be synchronized across all machines in the cluster.
  • (Cloudera 5 only) Make sure that all machines configured for the SAS High-Performance Analytics environment are in the same role group.
  • For Kerberos, in the SAS High-Performance Analytics environment, /etc/hosts must contain the machine names in the cluster in this order: short name, fully qualified domain name.
Last updated: June 19, 2017