The configuration for
the Pig transformation is simple. Open the Hadoop Options tab
and select the Delete outputs before executing hadoop
statements check box.
The configuration needed for the Pig transformation varies from job to job. This sample
job requires that you add three Pig Latin statements and four
substitution parameters on the Pig Latin tab.
Note that the Pig Latin
statements are entered in the
Pig Latin field. The sample job contains the following statements:
A = load '/user/test/PIG/$inputfilename' USING PigStorage(',')
AS (f1:int,f2:int,f3:int);
B1 = filter A by $filtercolumn == $filtervalue;
store B1 into '/user/test/PIG/$outputfilename' USING PigStorage(',');
Similarly, the substitution parameters are entered
in the
Substitution parameters field, as
follows:
Name = inputfilename, Value = numbers_target.txt,
Description = This is the name of the file loaded into hadoop
Name = outputfilename, Value = &output, Description = Output filename
Name = filtervalue, Value = 5
Name = filtercolumn, Value = f3
You also need to open
the Hadoop Options tab to select the Delete
outputs before executing hadoop statements check box.
Finally, you need to
create a new prompt for the substitution parameter on the Parameters tab. The general values for this job are Name=output
and Displayed
text=Pig target
. The prompt type and values are Prompt
type=Text
and Default value=PIG_SubstitutionParamtarget.txt
.