About Hadoop

Hadoop is an open-source technology for large data volume storage and processing. Hadoop provides scalability through the union of the Hadoop Distributed File System (HDFS), its high bandwidth and clustered storage system, and Map Reduce, its fault-tolerant distributed processing algorithm.
SAS Data Integration Studio provides integration with Hadoop in the following ways:
  • reading and writing data to and from HDFS with the Hadoop File Reader and Hadoop File Writer transformations
  • data processing for sending programs and managing execution of programs in Hadoop systems with the Transfer To Hadoop and Transfer From Hadoop transformations
  • a data transformation library for writing Hadoop programs in Hadoop languages that include Pig, Hive, and MapReduce with the Hive, Pig, Map Reduce, and Hadoop Container transformations
Four experimental transformations support SAS LASR Analytic Servers. For more information, see High-Performance Analytics Folder.