HADOOP Procedure

Example 1: Submitting HDFS Commands

Details

This PROC HADOOP example submits HDFS commands to a Hadoop server. The statements create a directory, delete a directory, and copy a file from HDFS to a local output location.

Program

filename cfg "C:\Users\sasabc\hadoop\sample_config.xml";
proc hadoop options=cfg username="sasabc" password="sasabc" verbose;
   hdfs mkdir="/user/sasabc/new_directory";
   hdfs delete="/user/sasabc/temp2_directory";
   hdfs copytolocal="/user/sasabc/testdata.txt"
        out="C:\Users\sasabc\Hadoop\testdata.txt" overwrite;
run;

Program Description

Assign a file reference to the Hadoop configuration file.The FILENAME statement assigns the file reference CFG to the physical location of a Hadoop configuration file that is named sample_config.xml, which is shown in Using PROC HADOOP.
filename cfg "C:\Users\sasabc\hadoop\sample_config.xml";
Execute the PROC HADOOP statement.The PROC HADOOP statement controls access to the Hadoop server by referencing the Hadoop configuration file with the OPTIONS= option, identifying the user ID and password on the Hadoop server, and specifying the option VERBOSE, which enables additional messages to the SAS log.
proc hadoop options=cfg username="sasabc" password="sasabc" verbose;
Create an HDFS path.The first HDFS statement specifies the MKDIR= option to create an HDFS path.
   hdfs mkdir="/user/sasabc/new_directory";
Delete an HDFS path.The second HDFS statement specifies the DELETE= option to delete an HDFS path.
   hdfs delete="/user/sasabc/temp2_directory";
Copy an HDFS file.The third HDFS statement specifies the COPYTOLOCAL= option to specify the HDFS file to copy, the OUT= option to specify the output location on the local machine, and the OVERWRITE option to specify that if the output location exists, write over it.
   hdfs copytolocal="/user/sasabc/testdata.txt"
        out="C:\Users\sasabc\Hadoop\testdata.txt" overwrite;
run;