Hadoop is an open-source
technology for large data volume storage and processing. Hadoop provides
scalability through the union of the Hadoop Distributed File System
(HDFS), its high bandwidth and clustered storage system, and Map Reduce,
its fault-tolerant distributed processing algorithm.
SAS Data Integration
Studio provides integration with Hadoop in the following ways:
-
reading and writing data to and
from HDFS with the Hadoop File Reader and Hadoop File Writer transformations
-
data processing for sending programs
and managing execution of programs in Hadoop systems with the Transfer
To Hadoop and Transfer From Hadoop transformations
-
a data transformation library for
writing Hadoop programs in Hadoop languages that include Pig, Hive,
and MapReduce with the Hive, Pig, Map Reduce, and Hadoop Container
transformations