About Data Quality Directives

SAS Data Quality Accelerator is a required component for SAS Data Loader for Hadoop and is included in SAS In-Database Technologies for Hadoop. In addition, the SAS Quality Knowledge Base (QKB) is a collection of files that store data and logic that support data management operations. SAS Data Loader for Hadoop data quality directives reference the QKB when performing data quality operations on your data. Both of these components must be deployed in the Hadoop cluster.
The steps required to complete this deployment depend on several factors:
  • If you are using Cloudera or Hortonworks and installing SAS In-Database Technologies for Hadoop through the SAS Deployment Manager, SAS Data Quality Accelerator files are already installed and you need only deploy the QKB. See SAS Quality Knowledge Base (QKB).
  • If you are using a Hadoop distribution other than Cloudera or Hortonworks, or not installing SAS In-Database Technologies for Hadoop on Cloudera or Hortonworks through the SAS Deployment Manager, you must deploy SAS Data Quality Accelerator before deploying the QKB, as follows:
    1. Deploy SAS Data Quality Accelerator for Hadoop in the cluster. See Deploying SAS Data Quality Accelerator for Hadoop. The SAS Data Quality Accelerator for Hadoop install script deploys files required by data quality operations and the QKB.
    2. Deploy the QKB in the cluster. See SAS Quality Knowledge Base (QKB).