Deploying SAS Data Quality Accelerator for Hadoop

Running the Install Script

The SAS Data Quality Accelerator for Hadoop is provided in an install script. To deploy the SAS Data Quality Accelerator for Hadoop manually, follow these steps:
  1. Copy the SAS Data Quality Accelerator install script (sepdqacchadp) to the Hadoop master node.
  2. Execute sepdqacchadp.sh.
  3. Execute dq_install.sh.

Copying the SAS Data Quality Accelerator Install Script to the Hadoop NameNode

The SAS Data Quality Accelerator for Hadoop install script is contained in a self-extracting archive file named sedqacchadp-2.70000-1.sh. This file is contained in a ZIP file that is located in a directory in your SAS Software Depot.
To copy the SAS Data Quality Accelerator install script to the Hadoop NameNode, follow these steps:
  1. Navigate to the YourSASDepot/standalone_installs directory.
    This directory was created when your SAS Software Depot was created by the SAS Download Manager.
  2. Locate the en_sasexe.zip file. This file is in the YourSASDepot/standalone_installs/SAS_Data_Quality_Accelerator_Embedded_Process_Package_for_Hadoop/2_7/Hadoop_on_Linux_x64directory.
    The.sedqacchadp-2.70000-1.sh file is included in this ZIP file.
  3. Unzip the ZIP file on the client.
    unzip en_sasexe.zip
    The ZIP file contains one file: sedqacchadp-2.70000-1.sh.
  4. Copy the sedqacchadp-2.70000-1.sh file to theEPInstallDir directory on the Hadoop master node (NameNode). The following example uses secure copy:
    scp sepdqacchadp-2.70000-1.sh username@hdpclus1:/EPInstallDir

Executing the SAS Data Quality Accelerator Install Script

To install the SAS Data Quality Accelerator for Hadoop on the cluster, log on to the Hadoop NameNode as root. Then, execute the following command from the EPInstallDir directory:
./sedqacchadp-2.70000-1.sh
In addition to other files, the command creates the following files in EPInstallDir/SASEPHome/bin of the Hadoop NameNode:
  • dq_install.sh
  • qkb_push.sh
  • dq_uninstall.sh
The dq_install.sh executable file enables you to copy the SAS Data Quality Accelerator files that were installed on the NameNode to the cluster nodes. Execute this file next as described in Deploying SAS Data Quality Accelerator Files to the Cluster.
The qkb_push.sh file enables you to deploy the QKB on the cluster. Before you can use qkb_push.sh, you must install and copy a QKB to the Hadoop NameNode. For instructions to install and deploy the QKB after you have copied SAS Data Quality Accelerator files to the cluster, see SAS Quality Knowledge Base (QKB).
The dq_uninstall.sh file enables you to remove SAS Data Quality Accelerator files from the Hadoop cluster. For more information, see Removing the SAS Data Quality Binaries from the Hadoop Cluster.

Deploying SAS Data Quality Accelerator Files to the Cluster

To deploy SAS Data Quality Accelerator for Hadoop binaries to the cluster, execute the dq_install.sh file. You must have root or sudo access to execute dq_install.sh.
The dq_install.sh file automatically discovers and deploys the SAS Data Quality Accelerator files on all nodes in the cluster by default. To execute dq_install.sh, enter:
cd EPInstallDir/SASEPHome/bin
./dq_install.sh
The executable file does not list the names of the host nodes on which it installs the files by default. To create a list, include the -v flag in the command. Flags are also available to direct the deployment to a specific node or group of nodes . Use these flags (-f or -h ) to avoid having to redeploy the SAS Data Quality Accelerator files to the entire cluster when you add new nodes.
The dq_install.sh file supports the following flags:

-?

prints usage information.

-l logfile

directs status information to the specified log file, instead of to standard output.

-f hostfile

specifies to perform the deployment only on the host names or IP addresses in the specified file.

–h hostname

specifies to perform the deployment only on the specified host name or IP address.

-v

specifies verbose output, which lists the names of the nodes on which dq_install.sh ran.

Verifying the SAS Data Quality Accelerator Deployment

The dq_install.sh script creates the following files on each node on which it is executed. The files are created relative to the EPInstallDir/SASEPHome directory:
  • /bin/dq_install.sh
  • /bin/dq_uninstall.sh
  • /bin/qkb_push.sh
  • /bin/dq_env.sh
  • /jars/sas.tools.qkb.hadoop.jar
  • /sasexe/tkeblufn.so
  • /sasexe/t0w7zt.so
  • /sasexe/t0w7zh.so
  • /sasexe/t0w7ko.so
  • /sasexe/t0w7ja.so
  • /sasexe/t0w7fr.so
  • /sasexe/t0w7en.so
  • /sasexe/d2dqtokens.so
  • /sasexe/d2dqlocales.so
  • /sasexe/d2dqdefns.so
  • /sasexe/d2dq.so
Check these directories on some of the nodes to make sure the files are there. At a minimum, verify that EPInstallDir/SASEPHome/sasexe/d2dq.so exists on the nodes.