Glossary
- activity
-
See workflow process activity
- analytical model
-
a statistical model that is designed to perform
a specific task or to predict the probability of a specific event.
- attribute
-
See variable attribute
- baseline
-
the initial performance prediction against which
the output data from later tasks are compared.
- bin
-
a grouping of predictor variable values that is
used for frequency analysis.
- candidate model
-
a predictive model that evaluates a model's predictive
power as compared with the champion model's predictive power.
- challenger model
-
a model that is compared and assessed against
a champion model for the purpose of replacing the champion model in
a production scoring environment.
- champion model
-
the best predictive model that is chosen from
a pool of candidate models in a data mining environment.
- classification model
-
a predictive model that has a categorical, ordinal,
or binary target.
- clustering model
-
a model in which data sets are divided into mutually
exclusive groups in such a way that the observations for each group
are as close as possible to one another, and different groups are
as far as possible from one another.
- component files
-
the files that define a predictive model. Component
files can be SAS programs or data sets, XML files, log files, SPK
files, or CSV files.
- data model training
-
the process of building a predictive model from
data.
- data set
-
See SAS data set
- data source
-
a table, view, or file from which you will extract
information. Sources can be in any format that SAS can access, on
any supported hardware platform. The metadata for a source is typically
an input to a job.
- DATA step
-
in a SAS program, a group of statements that begins
with a DATA statement and that ends with either a RUN statement, another
DATA statement, a PROC statement, or the end of the job. The DATA
step enables you to read raw data or other SAS data sets and to create
SAS data sets.
- DATA step fragment
-
a block of SAS code that does not begin with a
DATA statement. In SAS Model Manager, all SAS Enterprise Miner models
use DATA step fragments in their score code.
- diagram
-
See process flow diagram
- file reference
-
See fileref
- fileref
-
a name that is temporarily assigned to an external
file or to an aggregate storage location such as a directory or a
folder. The fileref identifies the file or the storage location to
SAS.
- format
-
See SAS format
- Gini coefficient
-
a benchmark statistic that is a measure of the
inequality of distribution, and that can be used to summarize the
predictive accuracy of a model.
- hold-out data
-
a portion of the historical data that is set aside
during model development. Hold-out data can be used as test data to
benchmark the fit and accuracy of the emerging predictive model.
- informat
-
See SAS informat
- input variable
-
a variable that is used in a data mining process
to predict the value of one or more target variables.
- instance
-
See workflow process instance
- Kolmogorov-Smirnov chart
-
a chart that shows the measurement of the maximum
vertical separation, or deviation between the cumulative distributions
of events and non-events.
- library reference
-
See libref
- libref
-
a SAS name that is associated with the location
of a SAS library. For example, in the name MYLIB.MYFILE, MYLIB is
the libref, and MYFILE is a file in the SAS library.
- life cycle phase
-
a collection of milestones that complete a major
step in the process of selecting and monitoring a champion model.
Typical life cycle phases include development, test, production, and
retire.
- logistic regression
-
a form of regression analysis in which the target
variable (response variable) represents a binary-level, categorical,
or ordinal-level response.
- macro variable
-
a variable that is part of the SAS macro programming
language. The value of a macro variable is a string that remains constant
until you change it. Macro variables are sometimes referred to as
symbolic variables.
- metadata
-
descriptive data about data that is stored and
managed in a database, in order to facilitate access to captured and
archived data for further use.
- milestone
-
a collection of tasks that complete a significant
event. The significant event can occur either in the process of selecting
a champion model, or in the process of monitoring a champion model
that is in a production environment.
- model assessment
-
the process of determining how well a model predicts
an outcome.
- model function
-
the type of statistical model, such as classification,
prediction, or segmentation.
- model scoring
-
the process of applying a model to new data in
order to compute outputs.
- neural networks
-
a class of flexible nonlinear regression models,
discriminant models, data reduction models, and nonlinear dynamic
systems that often consist of a large number of neurons. These neurons
are usually interconnected in complex ways and are often organized
into layers.
- observation
-
a row in a SAS data set. All of the data values
in an observation are associated with a single entity such as a customer
or a state. Each observation contains either one data value or a missing-value
indicator for each variable.
- organizational folder
-
a folder in the SAS Model Manager Project Tree
that is used to organize project and document resources. An organizational
folder can contain zero or more organizational folders in addition
to other objects.
- output variable
-
in a data mining process, a variable that is computed
from the input variables as a prediction of the value of a target
variable.
- package
-
See SAS package file
- package file
-
See SAS package file
- participant
-
See workflow participant
- performance table
-
a table that contains response data that is collected
over a period of time. Performance tables are used to monitor the
performance of a champion model that is in production.
- PFD
-
See process flow diagram
- PMML
-
See Predictive Modeling Markup Language
- prediction model
-
a model that predicts the outcome of an interval
target.
- Predictive Modeling Markup Language
-
an XML based standard for representing data
mining results for scoring purposes. PMML enables the sharing and
deployment of data mining results between applications and across
data management systems. Short form: PMML.
- process
-
See workflow process
- process definition
-
See workflow process definition
- process flow diagram
-
a graphical sequence of interconnected symbols
that represent an ordered set of steps or tasks that, when combined,
form a process designed to yield an analytical result.
- profile data
-
information that consists of the model name, type,
length, label, format, level, and role.
- project
-
a collection of models, SAS programs, data tables,
scoring tasks, life cycle data, and reporting documents.
- Project Tree
-
a hierarchical structure made up of folders and
nodes that are related to a single folder or node one level above
it and to zero, one, or more folders or nodes one level below it.
- property
-
any of the characteristics of a component that
collectively determine the component's appearance and behavior. Examples
of types of properties are attributes and methods.
- publication channel
-
an information repository that has been established
using the SAS Publishing Framework and that can be used to publish
information to users and applications.
- Receiver Operating Characteristic chart
-
a chart that plots the specificity of binary data
values against 1 specificity of binary data values. A ROC chart
is used to assess a model's predictive performance. Short form: ROC
- ROC
-
See Receiver Operating Characteristic chart
- SAS code model
-
a SAS program or a DATA step fragment that computes
output values from input values. An example of a SAS code model is
the LOGISTIC procedure.
- SAS data set
-
a file whose contents are in one of the native
SAS file formats. There are two types of SAS data sets: SAS data files
and SAS data views. SAS data files contain data values in addition
to descriptor information that is associated with the data. SAS data
views contain only the descriptor information plus other information
that is required for retrieving data values from other SAS data sets
or from files whose contents are in other software vendors' file formats.
- SAS format
-
a type of SAS language element that applies a
pattern to or executes instructions for a data value to be displayed
or written as output. Types of formats correspond to the data's type:
numeric, character, date, time, or timestamp. The ability to create
user-defined formats is also supported. Examples of SAS formats are
BINARY and DATE. Short form: format.
- SAS informat
-
a type of SAS language element that applies a
pattern to or executes instructions for a data value to be read as
input. Types of informats correspond to the data's type: numeric,
character, date, time, or timestamp. The ability to create user-defined
informats is also supported. Examples of SAS informats are BINARY
and DATE. Short form: informat.
- SAS Metadata Repository
-
a container for metadata that is managed by the
SAS Metadata Server.
- SAS package file
-
a container for data that has been generated or
collected for delivery to consumers by the SAS Publishing Framework.
Packages can contain SAS files, binary files, HTML files, URLs, text
files, viewer files, and metadata.
- SAS publication channel
-
See publication channel
- SAS variable
-
a column in a SAS data set or in a SAS data view.
The data values for each variable describe a single characteristic
for all observations (rows).
- scoring
-
See model scoring
- scoring function
-
a user-defined function that is created by the
SAS Scoring Accelerator from a scoring model and that is deployed
inside the database.
- scoring task
-
a process that executes a model's score code.
- scoring task input table
-
a table that contains the variables and data that
are used as input in a SAS Model Manager scoring task.
- scoring task output table
-
a table that contains the output variables and
data that result from performing a SAS Model Manager scoring task.
Before executing a scoring task, the scoring task output table defines
the variables to keep as the scoring results.
- segmentation model
-
a model that identifies and forms segments, or
clusters, of individual observations that are associated with an
attribute of interest.
- source
-
See data source
- SPK
-
See SAS package file
- target event value
-
for binary models, the value of a target variable
that a model attempts to predict. In SAS Model Manager, the target
event value is a property of a model.
- target variable
-
a variable whose values are known in one or more
data sets that are available (in training data, for example) but whose
values are unknown in one or more future data sets (in a score data
set, for example). Data mining models use data from known variables
to predict the values of target variables.
- test table
-
a SAS data set that is used as input to a model
that tests the accuracy of a model's output.
- training data
-
data that contains input values and target values
that are used to train and build predictive models.
- universal unique identifier
-
a number that is used to uniquely identify information
in distributed systems without significant central coordination. There
are 32 hexadecimal digits in a UUID, and these are divided into five
groups with hyphens between them as follows: 8-4-4-4-12. Altogether
the 16-byte (128 bit) canonical UUID has 32 digits and 4 hyphens,
or 36 characters.
- UUID
-
See universal unique identifier
- variable
-
See SAS variable
- variable attribute
-
any of the following characteristics that are
associated with a particular variable: name, label, format, informat,
data type, and length.
- version folder
-
a folder in the Project Tree that typically represents
a time phase and that contains models, scoring tasks, life cycle data,
reports, documents, resources, and model performance output.
- view
-
a particular representation of a model’s
data.
- workflow
-
a model for a sequence of activities, declared
as work of a person, a group, an organization, or one or more mechanisms.
Workflows are generally designed to enable a work process that can
be documented and learned.
- workflow participant
-
a user, group, or role that is assigned to an
activity of a workflow instance.
- workflow process
-
a set of one or more linked activities which collectively
realize a business objective or policy goal, normally within the context
of an organizational structure defining functional roles and relationships.
- workflow process activity
-
a task or step in a workflow process.
- workflow process definition
-
an activated process template that is available
in the SAS Workflow Engine for use. Process definitions contain the
set of activities, participants, policies, statuses, and operands
that comprise a business task.
- workflow process instance
-
a running process in the SAS Workflow Engine or
a working version of a process definition.
Copyright © SAS Institute Inc. All rights reserved.