The SAS LASR Analytic Server
is an analytic platform that provides a secure, multi-user environment
for concurrent access to data that is loaded into memory. The server
can take advantage of a distributed computing environment by distributing
data and the workload among multiple machines and performing massively
parallel processing. The server can also be deployed on a single machine
where the workload and data volumes do not demand a distributed computing
environment.
The server handles both
big data and smaller sets of data, and it is designed with a high-performance,
multi-threaded, analytic code. The server processes client requests
at extraordinarily high speeds due to the combination of hardware
and software that is designed for rapid access to tables in memory.
By loading tables into memory for analytic processing, the server
enables business analysts to explore data and discover relationships
in data at the speed of RAM.
The server can also
perform text analysis on unstructured data. The unstructured data
is loaded to memory in the form of a table, with one document in each
row. The TEXTPARSE statement in the IMSTAT procedure can then provide
similar analysis to what is available with the HPTMINE procedure.
Another use for the
analytic platform that the server provides is to create a recommender
system. Creating recommender systems introduces the concept of an
application in the server. The recommender system contains the application
and might contain four or five tables. Each of the tables can be used
in different ways, depending on the task and which method you apply.
For example, making an item-based prediction for a nearest-neighbor
method requires different data structures than a singular-value decomposition.
You can associate a particular method or a set of methods with the
application. You can execute one method or an ensemble. The flexibility
provided by the server enables you to add and drop methods from the
application. As a modeler, you want to explore and evaluate with different
methods and different parameter configurations for the methods until
you have optimized the system for your purposes. Then, you can deploy
the recommender system in an online scoring environment.
The architecture for
the server was originally designed for optimal performance in a distributed
computing environment. A distributed server runs on multiple machines.
A typical distributed configuration is to use a series of blades as
a cluster. Each blade contains both local storage and large amounts
of memory. Local storage is used to store large data sets in distributed
form. Data is loaded into memory and made available so that clients
can quickly access that data.
For distributed deployments,
having local storage available on machines is critical in order to
store large data sets in a distributed form. The server supports the
Hadoop Distributed File System (HDFS) as a co-located data provider.
HDFS is used because the server can read from and write to HDFS in
parallel. In addition, HDFS provides replication for data redundancy.
HDFS stores data as blocks in distributed form on the blades and the
replication provides failover capabilities.
In
a distributed deployment, the server also supports some third-party
vendor databases as co-located data providers. Teradata Data Warehouse
Appliance and Greenplum Data Computing Appliance are massively parallel
processing database appliances. You can install the SAS LASR Analytic Server
software on each of the machines in either appliance. The server can
read in parallel from the local data on each machine.
For
the SAS LASR Analytic Server
1.6 release (concurrent with the SAS Visual Analytics 6.1 release)
the server supports a non-distributed deployment. A non-distributed
server can perform the same in-memory analytic operations as a distributed
server. However, a non-distributed deployment does not support parallel
I/O from HDFS or third-party vendor appliances.