SAS Data Loader for
Hadoop is a software offering that makes it easier to move, cleanse,
and analyze data in Hadoop. It enables business users and data scientists
to do self-service data preparation on a Hadoop cluster.
Hadoop is highly efficient
at storing and processing large amounts of data. However, moving,
cleansing, and analyzing data in Hadoop can be labor-intensive, and
these tasks usually require specialized coding skills. As a result,
business users and data scientists usually depend on IT personnel
to prepare large Hadoop data sets for analysis. This technical overhead
makes it harder to turn Hadoop data into useful knowledge.
SAS Data Loader for
Hadoop provides a set of “directives” or wizards that
help business users and data scientists do the following tasks:
-
copy data to and from Hadoop, using
parallel, bulk data transfer
-
perform data integration, data
quality, and data preparation tasks within Hadoop without writing
complex MapReduce code or asking for outside help
-
minimize data movement for increased
scalability, governance, and performance
-
load data in memory to prepare
it for high-performance reporting, visualization, or analytics