SAS Data Loader and SAS In-Database Technologies for Hadoop

About SAS In-Database Technologies for Hadoop

SAS In-Database Technologies for Hadoop supports the Hadoop operations of SAS Data Loader for Hadoop. SAS Data Loader for Hadoop is web-client software that is installed as a vApp and is run on a virtual machine. The following products are included in SAS In-Database Technologies for Hadoop: SAS In-Database Deployment Package, SAS Data Quality Accelerator, SAS Quality Knowledge Base, and SAS Data Management Accelerator for Spark.

SAS In-Database Deployment Package

The SAS In-Database Deployment Package includes the SAS Embedded Process and the SAS Hadoop MapReduce JAR files. The SAS Embedded Process runs within MapReduce to read and write data. You must deploy the SAS In-Database Deployment Package. Deploying and configuring the SAS In-Database Deployment Package needs to be done only once for each Hadoop cluster.

SAS Data Quality Accelerator and SAS Quality Knowledge Base

The data quality directives in SAS Data Loader for Hadoop are supported by SAS Data Quality Accelerator and the SAS Quality Knowledge Base (QKB). Both are required components for SAS Data Loader for Hadoop and are included in SAS In-Database Technologies for Hadoop. The QKB is a collection of files that store data and logic to support data management operations. A QKB is specific to a locale, that is, to a country and language. SAS Data Loader for Hadoop data quality directives reference the QKB when performing data quality operations on your data. It is recommended that you periodically update the QKB. For more information, see Updating and Customizing the QKB.

SAS Data Management Accelerator for Spark

Spark is a processing engine that is compatible with Hadoop data. SAS Data Management Accelerator for Spark runs data integration and data quality tasks in a Spark environment. These tasks include mapping columns, summarizing columns, performing data quality tasks such as clustering and survivorship, and standardization of data. Deploy SAS Data Management Accelerator for Spark only if Spark is available on the cluster.