Loading
data from disk to memory is efficient when the SAS LASR Analytic Server is co-located
with a distributed data provider. The
Hadoop Distributed File System (
HDFS) provided by SAS High-Performance Deployment of Hadoop acts as a
co-located data provider. HDFS offers some key benefits:
-
Parallel I/O. The SAS LASR Analytic Server can read data in parallel at very impressive rates from
a co-located data provider.
-
Data redundancy. By default, two copies of the data are stored in HDFS. If a machine in the cluster
becomes unavailable or fails, the SAS LASR Analytic Server
instance on another machine in the cluster retrieves the data from
a redundant block and loads the data into memory.
-
Homogeneous block distribution. HDFS stores files in blocks. The SAS implementation enables a homogeneous block distribution
that results in balanced memory utilization across the SAS LASR Analytic Server and
reduces execution time.