How Does It Work?

SAS Data Loader for Hadoop is a software offering that includes SAS Data Loader, SAS/ACCESS Interface to Hadoop, SAS In-Database Code Accelerator for Hadoop and SAS Data Quality Accelerator for Hadoop. The following diagram illustrates an installed configuration.
SAS Data Loader for Hadoop Block Diagram
The SAS Data Loader for Hadoop web application runs inside a virtual machine or vApp. The vApp is started and managed by a hypervisor application called VMware Player Pro. The web application uses SAS software in the vApp and on the Hadoop cluster to manage data within Hadoop.
The hypervisor provides a web (HTTP) address that you enter into a web browser. The web address opens the SAS Data Loader: Information Center. The Information Center does the following:
  • starts the SAS Data Loader web application in a new browser tab.
  • provides a Settings window to configure the vApp connection to Hadoop.
  • checks for available vApp software updates and installs vApp software updates.
All of the files that are accessed by the vApp reside in the shared folder. The shared folder is the only location on the user host that is accessed by the vApp. The shared folder contains the JDBC drivers needed to connect to external databases, and the Hadoop JAR files that were copied to the client from the Hadoop cluster.
When you create a job using a directive, the web application generates code that is then sent to the Hadoop cluster for execution. When the job is complete, the Hadoop cluster writes data to the target file and delivers log and status information to the vApp. Saved directives are stored in a database within the vApp.
The SAS In-Database Technologies for Hadoop software is deployed to each node in the Hadoop cluster. The in-database technologies consist of the following components:
  • SAS Quality Knowledge Base for reference to data cleansing definitions.
  • SAS Embedded Process software for code acceleration.
  • SAS Data Quality Accelerator software for SAS DS2 methods that pertain to data cleansing.