Restrictions in DATA Step Processing

Here are the restrictions for using the DATA step in Hadoop:
  • More than one SET statement is not supported.
  • These statements are not supported:
    • BY (or FIRST. and LAST. variables)
    • CONTINUE
    • DISPLAY
    • FILE
    • INIFILE
    • INPUT
    • LEAVE
    • MERGE
    • MODIFY
    • OUTPUT
    • PUT
    • REMOVE
    • RENAME
    • REPLACE
    • RETAIN
    • UPDATE
    • WHERE
    • WINDOW
  • The ABORT statement has these restrictions:
    • The ABORT statement does not accept arguments.
    • The ABORT statement is not supported within functions. It is valid only in the main program.
  • The sub-setting IF statement is not supported.
  • The INPUT function does not support the question mark (?) and double question mark (??) modifiers.
  • No SET statements options are allowed.
  • You can use only SAS formats and functions that are supported by the DS2 language. For more information, see SAS DS2 Language Reference.
  • Some CALL routines are not supported. Routines are supported if there is an equivalent function.
  • Component objects are not supported.
  • Scoring input variables cannot be modified.
  • Large models can consume large amounts of memory on the client side. It is recommended that you set the MEMSIZE= system option to MAX.
  • If you create a table in Hadoop using the SAS/ACCESS HADOOP LIBNAME engine and the SAS Embedded Process and then drop an input variable with a DROP statement, missing values are assigned to any variable that is created from that input variable. Any variable that is dropped from the output table is also dropped from the input table. Dropping an input variable with the DROP statement is the same as using the DROP= data set option in the SET statement. The workaround is to create a separate DATA step to drop NEWX from the output table.
    In this example, the variable x is dropped in the second data program. The new variable that is created in the data program, newx,is created with a missing value.
    /* This code cannot be run as stand-alone */
    /* as options specific to your site  */
    /* are required on OPTION and LIBNAME statements. 
    
    /* options for Hadoop jars and configuration files */
    
    option set=SAS_HADOOP_CONFIG_PATH="/saswork/hadoop_files/cdh5config";
    option set=SAS_HADOOP_JAR_PATH="/saswork/hadoop_files/cdh5jars";
    
    /* This options enables the DATA step to run, with */
    /* limitations, inside Hadoop using */
    /* SAS/ACCESS and SAS Embedded Process. */
    
    
    options dsaccel='any' msglevel=i;
    
    /* libname statement for Hadoop data */
    libname hive hadoop server='sasts009.unx.sas.com' user=a database=default;
    
    data hive.test;
    x=99;
    run;
    
    data hive.test2;
    set hive.test;
    newx=x+10;
    drop x;
    run;