Using the Directory Cleanup Utility with Hadoop

Overview: Using the Directory Cleanup Utility with Hadoop

The directory cleanup utility performs routine maintenance functions on directories that are used by SPD Server. To execute the utility, you submit the spdsclean command, which supports a simple command-line interface. You control the level of cleanup and the behavior of the utility by specifying command options. For more information about the directory cleanup utility, including the spdsclean command options, see SAS Scalable Performance Data Server: Administrator’s Guide.
For a Hadoop domain, you can specify any of the standard spdsclean command options, with the addition of some Hadoop options. The Hadoop options define parameter values for SPD Server environment variables such as PATH, LD_LIBRARY_PATH, TKPATH, and JREOPTIONS.

Providing Hadoop Configuration Files and JAR Files Information

To use the directory cleanup utility with a Hadoop domain, you must provide the Hadoop cluster configuration files path and the Hadoop distribution JAR files path. There are several methods that you can use.
  • Use the -hadoopcfg and the -hadoopjar options in the spdsclean command to specify the locations of the configuration files and the JAR files:
    spdsclean 
       -hadoopcfg /u/fedadmin/hadoopcfg/cdh54p1
       -hadoopjar /u/fedadmin/hadoopjars/cdh54
       other-options
  • Specify the location of the configuration files and the JAR files in either the libnames.parm or spdsserv.parm parameter file, and then reference the parameter file in the spdsclean command with either the -libnamefile or the -parmfile options. Here is an spdsclean command that references the configuration and JAR files that are in the libnames.parm parameter file:
    spdsclean -libnamefile /opt/spds/site/libnames.parm other-options
    Here is the content of the libnames.parm file:
    libname=Stuff1
       pathname=/user/userlname
       hadoopcfg=/u/fedadmin/hadoopcfg/cdh54p1
       hadoopjar=/u/fedadmin/hadoopjars/cdh54
       hadoop=yes;
  • You can use SAS environment variables to specify the location of the configuration files and the JAR files. Specify the environment variables using a UNIX command prompt before submitting the spdsclean command.
    export SAS_HADOOP_CONFIG_PATH=/u/fedadmin/hadoopcfg/cdh54p1 
    export SAS_HADOOP_JAR_PATH=/u/fedadmin/hadoopjars/cdh54

Using the Directory Cleanup Utility on Hadoop ACL Files

To use the directory cleanup utility on Hadoop domain ACL files, you must specify the path to the directory that contains the ACL files. Here are two methods that you can use:
  • Use the -hadoopaclpath option in the spdsclean command to specify the path to the ACL files. In addition, you must include the -all or the -acl option. The -hadoopaclpath option overrides any path specified with the HADOOPACLPATH= parameter file option in the spdsserv.parm parameter file. Here is an example of the spdsclean command:
    spdsclean -libnamefile /opt/spds53/site/libnames.parm
       -hadoopaclpath /opt/spds53/site/acls
       -domains concur
       -acl
       -verbose
  • Use the HADOOPACLPATH= parameter file option in the spdsserv.parm parameter file to specify the path to the ACL files, and use the -parmfile option in the spdsclean command to specify the location of the spdsserv.parm parameter file. Here is an example of the spdsclean command:
    spdsclean -libnamefile /opt/spds53/site/libnames.parm
       -parmfile /opt/spds53/site/spdsserv.parm
       -domains concur
       -acl
       -verbose

Using the Directory Cleanup Utility on Hadoop Domains Secured with Kerberos

To use the directory cleanup utility on Hadoop domains that are secured with Kerberos, submit the following KINIT command before submitting the spdsclean command:
$ kinit -kt full—path—to—keytab—file UserID;