The SAS LASR Analytic Server
provides a client/server environment where the client connects to
the server, sends requests to the server, and receives results back
from the server. The server-side environment is a distributed computing
environment. A typical deployment is to use a series of blades in
a cluster. In addition to using a homogeneous hardware profile, the
software installation is also homogeneous. The same operating system
is used throughout and the same SAS software is installed on each
blade that is used for the server. In order for the software on each
blade to share the workload and still act as a single server, the
SAS software that is installed on each blade implements the Message
Passing Interface (MPI). The MPI implementation is used to enable
communication between the blades.
After a client connection
is authenticated, the server performs the operations requested by
the client. Any request (for example, a request for summary statistics)
that is authorized will execute. After the server completes the request,
there is no trace of the request. Every client request is executed
in parallel at extraordinarily high speeds, and client communication
with the server is practically instantaneous and seamless.
There are two ways to
load data into a distributed server:
-
load data from tables and data sets. You can start a server
instance and directly load tables into the server by using the SAS LASR Analytic Server engine
or the LASR procedure from a SAS session that has a network connection
to the cluster. Any data source that can be accessed with a SAS engine
can be loaded into memory. The data is transferred to the root
node and the root node distributes the data to the worker
nodes. You can also append rows to an in-memory table
with the SAS LASR Analytic Server engine.
-
load tables from a co-located data provider.
-
Tables can be read from the Hadoop
Distributed File System (HDFS) that is provided by SAS High-Performance Deployment of Hadoop.
You can use the SAS Data in HDFS engine
to add tables to HDFS. When a table is added to HDFS, it is divided
into blocks that are distributed across the machines in the cluster.
The server software is designed to read data in parallel from HDFS.
When used to read data from HDFS, the LASR procedure causes the worker
nodes to read the blocks of data that are local to the machine.
-
Tables
can also be read from a third-party vendor database. For distributed
databases like Teradata and Greenplum, the SAS LASR Analytic Server
can access the local data on each machine that is used for the database.
Note: The preceding figure shows
the distributed architecture of SAS High-Performance Deployment of Hadoop. For deployments
that use a third-party vendor database, the architecture is also distributed,
but different procedures and software components are used for distributing
and reading the data.
After the data is loaded
into memory on the server, it resides in memory until the table is
unloaded or the server terminates. After the table is in memory, client
applications that are authorized to access the table can send requests
to the server and receive the results from the server.
In memory tables can
be saved. You can use the SAS LASR Analytic Server engine
to save an in-memory table as a SAS data set or as any other output
that a SAS engine can use. For large tables, saving to HDFS is supported
with the LASR procedure and the SAS Data in HDFS engine.