Loading
data from disk to memory is efficient when the SAS LASR Analytic Server is co-located with a distributed data provider. The
Hadoop Distributed File System (
HDFS) provided by SAS High-Performance Deployment of Hadoop acts as a
co-located data provider. HDFS offers some key benefits:
-
Parallel I/O. The SAS LASR Analytic Server can read data in parallel at very impressive rates from a co-located data provider.
-
Data redundancy. By default, two copies of the data are stored in HDFS. If a machine in the cluster becomes unavailable or fails, the SAS LASR
Analytic Server
instance on another machine in the cluster retrieves the data from
a redundant block and loads the data into memory.
-
Homogeneous block distribution. HDFS stores files in blocks. The SAS implementation enables a homogeneous block distribution that results in balanced memory
utilization
across the SAS LASR Analytic Server and reduces execution time.