The server provides a client/server environment where the client connects to the server, sends requests to the server, and
receives results back from the server. The server-side environment is a distributed computing environment. A typical deployment
is to use a series of blades in a cluster. In addition to using a homogeneous hardware profile, the software installation
is also homogeneous. The same operating system is used throughout and the same SAS software is installed on each blade that
is used for the server. In order for the software on each blade to share the workload and still act as a single server, the
SAS software that is installed on each blade implements the
Message Passing Interface (
MPI). The MPI implementation is used to enable communication between the blades.
After a client connection
is authenticated, the server performs the operations requested by
the client. Any request (for example, a request for summary statistics)
that is authorized will execute. After the server completes the request,
there is no trace of the request. Every client request is executed
in parallel at extraordinarily high speeds, and client communication
with the server is practically instantaneous and seamless.
There are two ways to
load data into a distributed server:
-
load data from tables and data sets. You can start a server
instance and directly load tables into the server by using the SAS LASR Analytic Server engine
or the LASR procedure from a SAS session that has a network connection
to the cluster. Any data source that can be accessed with a SAS engine
can be loaded into memory. The data is transferred to the root node of the server and the root node distributes the data to the worker nodes. You can also
append rows to an in-memory table with the SAS LASR Analytic Server engine.
-
load tables from a co-located data provider.
-
Tables can be read from the
Hadoop Distributed File System (
HDFS). You can use the SAS Data in HDFS engine to add tables to HDFS. When a table is added to HDFS, it is divided into blocks
that are distributed across the machines in the cluster. The server
software is designed to read data in parallel from HDFS. When used to read data from HDFS, the LASR procedure causes the worker
nodes to read the blocks of data that are local to the machine.
-
Tables
can also be read from a third-party vendor database. For distributed
databases like Teradata and Greenplum, the SAS LASR Analytic Server
can access the local data on each machine that is used for the database.
Note: The preceding figure shows
a distributed architecture that uses HDFS. For deployments that use
a third-party vendor database, the architecture is also distributed,
but different procedures and software components are used for distributing
and reading the data.
After the data is loaded
into memory on the server, it resides in memory until the table is
unloaded or the server terminates. After the table is in memory, client
applications that are authorized to access the table can send requests
to the server and receive the results from the server.
In-memory tables can
be saved. You can use the SAS LASR Analytic Server engine to save an in-memory table as a SAS data set or as any other output
that a SAS engine can use. This method of using an engine
transfers the data across the network connection. For large tables, saving to HDFS is supported with the LASR and IMSTAT procedures.
This strategy saves the data in parallel and keeps the data on the cluster.