HADOOP Procedure

Example 3: Submitting Pig Language Code

Details

This PROC HADOOP example submits Pig language code into a Hadoop cluster. This is the Pig language code to be executed:
A = LOAD '/user/sasabc/testdata.txt' USING PigStorage(',')
    AS (customer_number,account_number,tax_id,date_of_birth,status,
        residence_country_code,marital_status,email_address,phone_number,
        annual_income,net_worth_amount,risk_classification);
B = FILTER A BY marital_status == 'Single';
store B into '/user/sasabc/output_customer' USING PigStorage(',');

Program

filename cfg "C:\Users\sasabc\hadoop\sample_config.xml";
filename code "C:\Users\sasabc\hadoop\sample_pig.txt";
proc hadoop options=cfg username="sasabc" password="sasabc" verbose;
   pig code=code registerjar="C:\Users\sasabc\Hadoop\jars\myudf.jar";
run;

Program Description

Assign a file reference to the Hadoop configuration file.The first FILENAME statement assigns the file reference CFG to the physical location of a Hadoop configuration file that is named sample_config.xml, which is shown in Using PROC HADOOP.
filename cfg "C:\Users\sasabc\hadoop\sample_config.xml";
Assign a file reference to the Pig language code.The second FILENAME statement assigns the file reference CODE to the physical location of the file that contains the Pig language code that is named sample_pig.txt, which is shown above.
filename code "C:\Users\sasabc\hadoop\sample_pig.txt";
Execute the PROC HADOOP statement.The PROC HADOOP statement controls access to the Hadoop server by referencing the Hadoop configuration file with the OPTIONS= option, identifying the user ID and password on the Hadoop server with the USERNAME= and PASSWORD= options, and specifying the VERBOSE option, which enables additional messages to the SAS log.
proc hadoop options=cfg username="sasabc" password="sasabc" verbose;
Execute the PIG statement.The PIG statement includes the CODE= option to specify the SAS fileref CODE that is assigned to the physical location of the file that contains the Pig language code and the REGISTERJAR= option to specify the JAR file that contains the Pig scripts to execute.
   pig code=code registerjar="C:\Users\sasabc\Hadoop\jars\myudf.jar";
run;