What is SAS Data Loader for Hadoop?

SAS Data Loader for Hadoop is a software offering that makes it easier to move, cleanse, and analyze data in Hadoop. It enables business users and data scientists to do self-service data preparation on a Hadoop cluster.
Hadoop is highly efficient at storing and processing large amounts of data. However, moving, cleansing, and analyzing data in Hadoop can be labor-intensive, and these tasks usually require specialized coding skills. As a result, business users and data scientists usually depend on IT personnel to prepare large Hadoop data sets for analysis. This technical overhead makes it harder to turn Hadoop data into useful knowledge.
SAS Data Loader for Hadoop provides a set of “directives” or wizards that help business users and data scientists do the following tasks:
  • Copy data to and from Hadoop, using parallel, bulk data transfer.
  • Perform data integration, data quality, and data preparation tasks within Hadoop, without writing complex MapReduce code or asking for outside help.
  • Minimize data movement for increased scalability, governance, and performance.
  • Load data in memory to prepare it for high-performance reporting, visualization, or analytics.