Specifying the Hadoop Parameter File Options

Making the Hadoop Cluster Configuration Files and Hadoop Distribution JAR Files Available to SPD Server

To make the Hadoop cluster configuration files and the Hadoop distribution JAR files available to SPD Server, you must specify Hadoop parameter file options in either the libnames.parm file (as part of an individual SPD Server domain definition) or in the spdsserv.parm file.

Settings that you specify in the libnames.parm file take precedence over settings in the spdsserv.parm file. This enables the SPD Server administrator to define common Hadoop settings in the spdsserv.parm file and to use the libnames.parm file to override settings when needed. For more information about the libnames.parm file and the spdsserv.parm file, see SAS Scalable Performance Data Server: Administrator’s Guide.

The following parameter file options are needed to access Hadoop:

HADOOP=YES in the libnames.parm file to specify that an SPD Server domain can access data in HDFS.
HADOOPCFG= to specify the location of the Hadoop cluster configuration files in either the libnames.parm file or the spdsserv.parm file. For example, if the cluster configuration files are copied to the location /u/hadoop/hdist/cdh/confdir, then the following syntax sets the location appropriately:
```
HADOOPCFG=/u/hadoop/hdist/cdh/confdir;
```
HADOOPJAR= to specify the location of the Hadoop distribution JAR files in either the libnames.parm file or the spdsserv.parm file. For example, if the JAR files are copied to the location /u/hadoop/hdist/cdh/cdh54, then the following syntax sets the location appropriately:
```
HADOOPJAR=/u/hadoop/hdist/cdh/cdh54;
```

If you specify the HADOOPCFG= and HADOOPJAR= parameter file options in the libnames.parm file, SPD Server automatically assumes that the domain is a Hadoop domain. You do not have to specify HADOOP=YES.

If you specify the HADOOPCFG= and HADOOPJAR= parameter file options in the spdsserv.parm file, the settings become the default settings for the domain. YIn the libnames.parm file, you must specify HADOOP=YES for each domain definition that you want to be a Hadoop domain.

Determining the JRE Version

The Java Runtime Environment (JRE) version for the Hadoop cluster must be a major Java version. The default is 1.7. If the JRE version for the Hadoop cluster is not 1.7, include the HADOOPACCELJVER= parameter file option in the spdsserv.parm file to specify the version.

The JRE version of the JAR files in the JAR file directory must match the JRE version of the Hadoop environment. If you have multiple Hadoop servers that are running different JRE versions, then you must create a different JAR file directory for each JRE version.

Kerberos Security

To access a Hadoop cluster that is secured with Kerberos, use the following information:

If you are running on Linux, a valid Kerberos keytab file is required for the HDFS user ID for your Hadoop cluster. In either the libnames.parm file or the spdsserv.parm file, you must include these parameter file options:
- HADOOPKEYTAB= to specify the path to the Kerberos keytab file
- HADOOPREALM= to specify the Kerberos realm to use
- HADOOPUSER= to specify an authorized Kerberos user ID
Including the parameter file options in the libnames.parm file specifies that you want the domain to be treated as a domain that is secured with Kerberos. Including the parameter file options in the spdserv.parm file specifies that any domain that is defined in the libnames.parm file is treated as a domain that is secured with Kerberos.

Tip
If you have a domain that you do not want to be treated as a domain that is secured with Kerberos, you can specify HADOOPKERBEROS=NO in the libnames.parm file. For more information, see HADOOPKERBEROS= Parameter File Option.
If you are running on Microsoft Windows, a keytab file is not required for Kerberos security if your Windows realm trusts your Hadoop realm. The HADOOPKEYTAB=, HADOOPREALM=, and HADOOPUSER= parameter file options should not be used in this case. The single sign-on to the Windows desktop provides the Kerberos TGT to authenticate to Hadoop. However, the Microsoft Active Directory stores the credentials cache in memory. To allow access to the credentials cache, you must add the registry key AllowTGTSessionKey and set the value to 1. For more information, see Registry Key to Allow Session Keys to Be Sent in Kerberos Ticket-Granting-Ticket.
In the spdsserv.parm parameter file, modify the java.security.krb5.conf property in JREOPTIONS to specify your Kerberos configuration file. Here is an example:
```
-jreoptions 
   (-Djava.security.krb4.config=/u/fedadmin/krb5/krb5_PDE.KRB.SAS.COM.ini)
```
If you are using Advanced Encryption Standard (AES) encryption with Kerberos, you must manually add the Java Cryptography Extension local_policy.jar file in every place that JAVA Home resides on the cluster. If you are outside the United States, you must also manually add the US_export_policy.jar file. The addition of these files is governed by the United States import control restrictions.

These two JAR files also need to replace the existing local_policy.jar and US_export_policy.jar files in the SAS JRE location that is the <install-dir>/SASPrivateJavaRuntimeEnvironment/9.4/jre/lib/security/ directory. It is recommended to back up the existing local_policy.jar and US_export_policy.jar files first in case they need to be restored.

These files can be obtained from the IBM or Oracle website.
When SPD Server is secured with Kerberos and is running as a Microsoft Windows service, you must change the Log On account for the three SPD Server services, which include SPD 5.3 Data Server, SPD 5.3 Name Server, and SPD 5.3 Snet Server. The account should match the authorized Hadoop account for your Hadoop cluster. Follow these steps:
1. Access the Services (Local) window.
2. Right-click the service and select Properties.
3. Select the Log On tab.
4. Select This Account, specify the authorized and password, and click OK.

SPD Server ACLs with Hadoop Domains

Specify the default location for SPD Server Hadoop ACLs with the ACLDIR= start-up option in the rc.spds script. If you define a domain that contains the Hadoop parameter file option HADOOP=YES, SPD Server creates a directory using the following schema: hadoop-acl-path/HADOOPACLS/domain_name.

ACLs for SPD Server resources are typically created in the root pathname of each domain. Storing ACLs in this location does not work with Hadoop domains for the following reasons:

ACLs are small and are updated frequently. Updating data in HDFS is very slow and can, therefore, significantly degrade performance.
HDFS does not support the type of locking required for ACL processing.

Because of the restrictions, ACLs with Hadoop domains must be on a local file. The SPD Server parameter file option HADOOPACLPATH= specifies a local file system directory where ACLs for the Hadoop environment will be created and stored.