If you already have one of the supported Hadoop distributions, you can modify it with
files from the SAS Plug-ins for Hadoop package. Hadoop modified with SAS Plug-ins
for Hadoop enables the SAS High-Performance Analytics environment to write SASHDAT
file blocks evenly across the
HDFS file system.
The following is required
for existing Hadoop clusters with which the SAS High-Performance Analytics
environment can be co-located:
-
-
Each machine in the cluster must
be able to resolve the host name of all the other machines.
-
The machine configured as the NameNode
cannot also be configured as a DataNode.
-
These Hadoop directories must reside
on local storage:
-
the directory on the file system
where the Hadoop NameNode stores the namespace and transactions logs
persistently
-
the directory on the file system
where temporary MapReduce data is written
-
the directory on the file system
where the MapReduce framework writes system files
Note: The exception is the hadoop-data
directory,
which can be on a storage area network (SAN). Network attached storage
(NAS) devices are not supported.
-
Time must be synchronized across
all machines in the cluster.
-
(Cloudera 5 only) Make sure that
all machines configured for the SAS High-Performance Analytics environment
are in the same role group.
-
For Kerberos, in the SAS High-Performance
Analytics environment, /etc/hosts
must
contain the machine names in the cluster in this order: short name,
fully qualified domain name.