Overview of the Hadoop Transformations

The Transformations tree in SAS Data Integration Studio includes a Hadoop folder. The transformations in this folder enable you to perform the following operations within a job:
  • read files from a Hadoop cluster and write files to the cluster
  • transfer files to and from a Hadoop cluster
  • submit your own Pig Latin, HiveQL, or Map Reduce code
  • use one transformation to perform a series of steps in one connection to the Hadoop cluster, such as transfers to and from Hadoop, Map Reduce processing, and Pig Latin processing
Hadoop is an open-source technology for large data volume storage and processing. Hadoop provides scalability through the union of the Hadoop Distributed File System (HDFS), its high bandwidth and clustered storage system, and Map Reduce, its fault-tolerant, distributed processing algorithm.
Apache Pig is a high-level platform for creating Map Reduce programs that are used with Hadoop. The language for this platform is called Pig Latin. Apache Hive is a data warehouse infrastructure built on top of Hadoop for data queries, analysis, and summarization. It provides an SQL-like language called HiveQL.
The following table describes the Hadoop transformations.
Name
Description
Hadoop Container
Enables you to use one transformation to perform a series of steps in one connection to the Hadoop cluster. The steps can include transfers to and from Hadoop, Map Reduce processing, and Pig Latin processing. For more information, see Creating a Hadoop Container Job.
Hadoop File Reader
Reads a specified file from a Hadoop cluster.
Hadoop File Writer
Writes a specified file to a Hadoop cluster.
Hive
Enables you to submit your own HiveQL code in the context of a job. For more information, see Creating a Hive Job.
Map Reduce
Enables you to submit your own Map Reduce code in the context of a job. You must create your own Map Reduce program in Java and save it to a JAR file. You then specify this JAR file in the Map Reduce transformation, along with some relevant arguments. Your Hadoop installation usually includes an example Map Reduce program. For an example of Map Reduce processing in a Hadoop container job, see Creating a Hadoop Container Job.
Pig
Enables you to submit your own Pig Latin code in the context of a job. For more information, see Creating a Pig Job.
Transfer From Hadoop
Transfer a specified file from a Hadoop cluster. For an example of how this transformation can be used, see Creating a Hadoop Container Job.
Transfer To Hadoop
Transfer a specified file to a Hadoop cluster.