The server provides a client/server environment where the client connects to the server,
sends requests to the server, and receives results back from the server. The server-side
environment is a distributed computing environment. A typical deployment is to use
a series of blades in a cluster. In addition to using a homogeneous hardware profile,
the software installation is also homogeneous. The same operating system is used throughout
and the same SAS software is installed on each blade that is used for the server.
In order for the software on each blade to share the workload and still act as a single
server, the SAS software that is installed on each blade implements the
Message Passing Interface (
MPI). The MPI implementation is used to enable communication between the blades.
After a client connection
is authenticated, the server performs the operations requested by
the client. Any request (for example, a request for summary statistics)
that is authorized will execute. After the server completes the request,
there is no trace of the request. Every client request is executed
in parallel at extraordinarily high speeds, and client communication
with the server is practically instantaneous and seamless.
There are two ways to
load data into a distributed server:
-
load data from tables and data sets. You
can start a server instance and directly load tables into the server
by using the SAS LASR Analytic Server engine or the LASR procedure
from a SAS session that has a network connection to the cluster. Any
data source that can be accessed with a SAS engine can be loaded into
memory. The data is transferred to the root node of the server and the root node distributes the data to the worker
nodes. You can
also append rows to an in-memory table with the SAS LASR Analytic Server engine.
-
load tables from a co-located data provider.
-
Tables can be read from the
Hadoop Distributed File System (
HDFS) or an NFS-mounted distributed file system. You can use the SASHDAT engine
to add tables to HDFS. When a table is added to HDFS, it is divided
into blocks that are distributed across the machines in the cluster.
The server software is designed to read data in parallel from HDFS.
When used to read data from HDFS, the LASR procedure causes the worker
nodes to read the blocks of data that are local to the machine.
-
Tables
can also be read from a third-party vendor database. For distributed
databases like Teradata and Greenplum, the SAS LASR Analytic Server
can access the data in the appliance.
Note: The preceding figure shows
a distributed architecture that uses HDFS. For deployments that use
a third-party vendor database, the architecture is also distributed,
but different procedures and software components are used for distributing
the data.
After the data is loaded
into memory on the server, it resides in memory until the table is
unloaded or the server terminates. After the table is in memory, client
applications that are authorized to access the table can send requests
to the server and receive the results from the server.
In-memory
tables can be saved. You can use the SAS LASR Analytic Server engine to save an in-memory
table as a SAS data set or as any other output that a SAS engine
can use. This method of using an engine transfers the data across the network connection.
For large tables, saving to HDFS is supported with the LASR and IMSTAT procedures.
This strategy saves the data in
parallel and keeps the data on the cluster.