The SAS Data in
HDFS engine
is used with SAS High-Performance Deployment of Hadoop only.
The engine is designed as a write-only engine for transferring data to HDFS. However,
SAS High-Performance Analytics procedures are designed to read data in
parallel from a
co-located data provider. The LASR procedure, and other procedures such as HPREG and HPLOGISTIC, can read
data from HDFS with the engine. The HPDS2 procedure is designed to read data and write
data in parallel. The HPDS2 procedure can be used with the engine to read data from
HDFS and create new tables in HDFS.
Whenever a SAS High-Performance Analytics procedure is used to create data in HDFS,
the procedure creates the data with a default block size of 8 megabytes.
This
size can be overridden with the BLOCKSIZE= data set option.