Glossary
- Apache Hadoop
- a framework that allows for the distributed processing
of large data sets across clusters of computers using a simple programming
model.
- BY-group processing
- the process of using the BY statement to process
observations that are ordered, grouped, or indexed according to the
values of one or more variables. Many SAS procedures and the DATA
step support BY-group processing. For example, you can use BY-group
processing with the PRINT procedure to print separate reports for
different groups of observations in a single SAS data set.
- co-located data provider
- a distributed data source, such as SAS Visual
Analytics Hadoop or a third-party vendor database, that has SAS High-Performance
Analytics software installed on the same machines. The SAS software
on each machine processes the data that is local to the machine or
that the data source makes available as the result of a query.
- grid host
- the machine to which the SAS client makes an initial
connection in a SAS High-Performance Analytics application.
- Hadoop Distributed File System
- a framework for managing files as blocks of equal
size, which are replicated across the machines in a Hadoop cluster
to provide fault tolerance.
- HDFS
- See Hadoop Distributed File System
- Message Passing Interface
- is a message-passing library interface specification.
SAS High-Performance Analytics applications implement MPI for use
in high-performance computing environments.
- MPI
- See Message Passing Interface
- root node
- in a SAS High-Performance Analytics application,
the role of the software that distributes and coordinates the workload
of the worker nodes. In most deployments the root node runs on the
machine that is identified as the grid host. SAS High-Performance
Analytics applications assign the highest MPI rank to the root node.
- SASHDAT file
- the data format used for tables that are added
to HDFS by SAS. SASHDAT files are read in parallel by the server.
- server description file
- a file that is created by a SAS client when the
LASR procedure executes to create a server. The file contains information
about the machines that are used by the server. It also contains the
name of the server signature file that controls access to the server.
- signature file
- small files that are created by the server to
control access to the server and to the tables loaded in the server.
There is one server signature file for each server instance. There
is one table signature file for each table that is loaded into memory
on a server instance.
- worker node
- in a SAS High-Performance Analytics application,
the role of the software that receives the workload from the root
node.
Copyright © SAS Institute Inc. All Rights Reserved.