Deployment Overview :: SAS(R) Data Loader 2.2 for Hadoop: Administrator's Guide

SAS Data Loader provides a script for deploying your QKB on the Hadoop cluster. Before you can run this script, you must copy your QKB to the Hadoop cluster. This can be done by transferring the directory structure to the Hadoop master node via FTP, or by mounting the file system where the QKB is located on the Hadoop master node.

It is recommended that you run the script, named qkbpush.sh, on the Hadoop master node (Name Node). The script automatically discovers all nodes in the cluster and deploys the QKB on them by default. Flags are available to enable you to deploy the QKB on individual nodes, or on a subset of nodes instead.

The qkbpush.sh script performs two tasks:

It copies the specified QKB directory to a fixed location (/opt/qkb/default) on the specified nodes and sets the QKB’s permissions so that the QKB is owned by the user account that is owned by the SAS Embedded Process.
It generates an index file from the contents of the QKB and pushes this index file to HDFS. This index file, named default.idx, is created in the /sas/qkb directory in HDFS. The default.idx file provides a list of QKB definition and token names to SAS Data Loader. SAS Data Loader surfaces the names in its graphical user interface.

Creating the index file requires special permissions in a Kerberos security environment. For more information, see Kerberos Security Requirements.

Only one QKB and one index file are supported in the Hadoop framework at a time. Subsequent QKB and index pushes replace prior ones.

After the QKB deployment is complete, you must restart the SAS Embedded Process on each Hadoop node so that each instance of the SAS Embedded Process loads the newly deployed QKB. Use the sasep-servers.sh script to restart the SAS Embedded Process. For information about the sasep-servers.sh script, see the information for Hadoop in the SAS In-Database Products: Administrator's Guide.