The SASHDAT engine is used to distribute data in the
Hadoop Distributed File System (
HDFS) or an NFS-mounted file system such as that provided by the MapR NFS service. The
engine enables you to distribute the data in a format that is designed for high-performance
analytics. The block redundancy and distributed computing provided by HDFS is complemented
by the block structure that is created with the engine.
The engine is designed to distribute data only. Because the data volumes in HDFS are
typically very large, the engine is not designed to read SASHDAT files and transfer
data back to the SAS client. For example, consider the case of reading several terabytes
of data from a distributed computing environment, transferring that data back to a
SAS session, and then using the UNIVARIATE or REG procedures on such a large volume
of data. In contrast, the SAS High-Performance Analytics procedures are designed to
operate in a distributed computing environment and to read data in parallel from a
co-located data provider like HDFS.