The SAS Data in HDFS engine
is used to distribute data in the Hadoop Distributed File System (HDFS)
that is provided by SAS High-Performance Deployment of Hadoop.
The engine enables you to distribute the data in a format that is
designed for high-performance analytics. The block redundancy and
distributed computing provided by SAS High-Performance Deployment of Hadoop is complemented
by the block structure that is created with the engine.
The
engine is designed to distribute data in HDFS only. Because the data
volumes in HDFS are typically very large, the engine is not designed
to read from HDFS and transfer data back to the SAS client. For example,
consider the case of reading several terabytes of data from a distributed
computing environment, transferring that data back to a SAS session,
and then using the UNIVARIATE or REG procedures on such a large volume
of data. In contrast, the SAS High-Performance Analytics procedures
are designed to operate in a distributed computing environment and
to read data in parallel from a co-located data provider like SAS High-Performance Deployment of Hadoop.