Loading
data from disk to memory is efficient when the SAS LASR Analytic Server
is co-located with a distributed data provider. The Hadoop Distributed
File System (HDFS) provided by SAS High-Performance Deployment of Hadoop acts as a
co-located data provider. HDFS offers some key benefits:
-
Parallel I/O. The SAS LASR Analytic Server
can read data in parallel at very impressive rates from a co-located
data provider.
-
Data redundancy. By
default, two copies of the data are stored in HDFS. If a machine in
the cluster becomes unavailable or fails, the SAS LASR Analytic Server
instance on another machine in the cluster retrieves the data from
a redundant block and loads the data into memory.
-
Homogeneous block distribution. HDFS
stores files in blocks. The SAS implementation enables a homogeneous
block distribution that results in balanced memory utilization across
the SAS LASR Analytic Server and reduces execution time.