Here are the restrictions
for using the DATA step in Hadoop:
-
More than one SET statement is
not supported.
-
These statements are not supported:
-
BY (or FIRST. and LAST. variables)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
The ABORT statement has these restrictions:
-
The ABORT statement does not accept
arguments.
-
The ABORT statement is not supported
within functions. It is valid only in the main program.
-
The sub-setting IF statement is
not supported.
-
The INPUT function does not support
the question mark (?) and double question mark (??) modifiers.
-
No SET statements options are allowed.
-
You can use only SAS formats and
functions that are supported by the DS2 language. For more information,
see SAS DS2 Language Reference.
-
Some CALL routines are not supported.
Routines are supported if there is an equivalent function.
-
Component objects are not supported.
-
Scoring input variables cannot
be modified.
-
Large models can consume large
amounts of memory on the client side. It is recommended that you set
the MEMSIZE= system option to MAX.
-
If you create a table in Hadoop
using the SAS/ACCESS HADOOP LIBNAME engine and the SAS Embedded Process
and then drop an input variable with a DROP statement, missing values
are assigned to any variable that is created from that input variable.
Any variable that is dropped from the output table is also dropped
from the input table. Dropping an input variable with the DROP statement
is the same as using the DROP= data set option in the SET statement.
The workaround is to create a separate DATA step to drop NEWX from
the output table.
In this example, the
variable
x
is dropped in the second
data program. The new variable that is created in the data program,
newx
,is
created with a missing value.
/* This code cannot be run as stand-alone */
/* as options specific to your site */
/* are required on OPTION and LIBNAME statements.
/* options for Hadoop jars and configuration files */
option set=SAS_HADOOP_CONFIG_PATH="/saswork/hadoop_files/cdh5config";
option set=SAS_HADOOP_JAR_PATH="/saswork/hadoop_files/cdh5jars";
/* This options enables the DATA step to run, with */
/* limitations, inside Hadoop using */
/* SAS/ACCESS and SAS Embedded Process. */
options dsaccel='any' msglevel=i;
/* libname statement for Hadoop data */
libname hive hadoop server='sasts009.unx.sas.com' user=a database=default;
data hive.test;
x=99;
run;
data hive.test2;
set hive.test;
newx=x+10;
drop x;
run;