About Hadoop


Hadoop is an open-source technology for large data volume storage and processing. Hadoop provides scalability through the union of the Hadoop Distributed File System (HDFS), its high bandwidth and clustered storage system, and Map Reduce, its fault-tolerant distributed processing algorithm.
SAS Data Integration Studio provides integration with Hadoop in the following ways:
  • reading and writing data to and from HDFS with the Hadoop File Reader and Hadoop File Writer transformations
  • data processing for sending programs and managing execution of programs in Hadoop systems with the Transfer To Hadoop and Transfer From Hadoop transformations
  • a data transformation library for writing Hadoop programs in Hadoop languages that include Pig, Hive, and MapReduce with the Hive, Pig, Map Reduce, and Hadoop Container transformations

Experimental Features

Four experimental transformations support SAS® LASR™ Analytic Servers. For more information, see High-Performance Analytics Folder.
SAS Data Integration Studio has experimental support for sending Hadoop Pig code to a SAS LASR Analytic Server in Objective Analysis Package Data format (HDAT) format . This support is implemented with an experimental UDF that enables the output of the Pig code to be written in the HDAT format to be consumed by the SAS LASR Analytic Server. It also includes a Store in HDAT format template that gives a code snippet example of the Pig code needed to integrate the JAR file. Because the HDAT format is represented by a table in SAS Data Integration Studio, you can add a table output to the Pig transformation. Previously, the Pig transformation could output only to external files.
To add the UDF, on the Pig Latin tab of the Pig transformation, click Add in the User-defined jars section. Then, navigate to the file named hdatudf.jar, which is located in the hadoop/udf directory, under the SAS Data Integration Studio installation directory. To add the Store in HDAT format template, on the Pig Latin Statements section of the Pig Latin tab, click Add template.