In order to accelerate
DATA step processing of data based in Hadoop, the DATA step has been
enhanced to determine when the user code is appropriate for exporting
to the Hadoop MapReduce facility. If you have installed and activated
the SAS Embedded Process on a Hadoop cluster, it is possible for DATA
step code to be executed in parallel against the input data residing
on the HDFS file system.
Because of the single-source,
shared-nothing nature of MapReduce processing and the immutable nature
of HDFS files, only a subset of the full DATA step syntax can be passed
through for parallel execution. The DATA step can be run inside Hadoop
for scoring with the following limitations:
-
Only one input file and one output
file are allowed.
-
The input file and output file
are in Hadoop.
-
Only functions and formats that
are supported by the DS2 language compile successfully.
-
Some DATA step statements are not
allowed, such as those pertaining to input and output.
To enable the DATA step
to be run inside Hadoop, set the DSACCEL= system option to ANY.
If a SAS program does
not meet the requirements for running in Hadoop, the code executes
in your Base SAS session. In this case, SAS reads and writes large
tables over the network.
You can determine whether
your code is non-compliant for Hadoop by setting the system option
MSGLEVEL=I. When MSGLEVEL=I, SAS writes log messages that identify
the non-compliant code.