Node Name
|
Description
|
---|---|
Append
|
Use the Append node to append data sets that are exported by two different paths in
a single process flow diagram. The Append node can also append train, validation,
and test data sets into a new
training data set.
|
Data Partition
|
Use the Data Partition node to partition an input data set into a training, test, and validation data set. The training data set is used for preliminary model fitting. The validation data set is used to monitor and tune the free model parameters
during estimation. It is also used for model assessment. The test data set is an additional holdout data set that you can use for model assessment.
|
File Import
|
Use the File Import
node to convert selected external flat files, spreadsheets, and database
tables into a format that SAS Enterprise Miner recognizes as a data
source and can use in data mining process flow diagrams.
|
Filter
|
Use the Filter node
to create and apply filters to the input data. You can use filters
to exclude certain observations, such as extreme outliers and errant
data that you do not want to include in a mining analysis.
|
Input Data
|
The Input Data node represents the data source that you choose for a mining analysis.
It provides details (metadata) about the variables in the data source that you want to use.
|
Merge
|
Use the Merge node to merge observations from two or more data sets into a single
observation in a new data set.
|
Sample
|
Use the Sample node
to extract a simple random sample, nth-observation sample, stratified sample, first-n sample, or cluster sample of an input
data set. Sampling is recommended for extremely large databases because it can significantly
decrease model training time. If the random sample sufficiently represents the source
data set,
then data relationships that SAS Enterprise Miner finds in the sample can be applied
to the complete source data set.
|
Time Series
|
Use the Time Series
node to convert transactional data to time series data to perform
seasonal and trend analysis. Transactional data is timestamped data
that is collected over time at no particular frequency. By contrast,
time series data is timestamped data that is collected over time at
a specific frequency.
|
Node Name
|
Description
|
---|---|
Association
|
Use the Association
node to identify association and sequence relationships within the
data. For example, “If a customer buys cat food, how likely
is the customer to also buy cat litter?” In the case of sequence
discovery, this question could be extended and posed as, “If
a customer buys cat food today, how likely is the customer to buy
cat litter within the next week?”
|
Cluster
|
Use the Cluster node to perform observation clustering, which can be used to segment
databases. Clustering places objects into
groups or clusters suggested by the data. The objects in each cluster tend to be similar
to each other in some sense, and objects in different clusters tend to be dissimilar.
|
DMDB (Data Mining Database)
|
The DMDB node creates a data mining database that provides summary statistics and factor-level information for class and interval
variables in the imported data set. Improvements to SAS Enterprise Miner have eliminated
the previous need to use the
DMDB node to optimize the performance of nodes. However, the DMDB database can still
provide quick summary statistics for class and interval variables at a given point
in a process flow diagram.
|
Graph Explore
|
The Graph Explore node
is an advanced visualization tool that enables you to interactively
explore large volumes of data to uncover patterns and trends and to
reveal extreme values in the database. You can analyze univariate
distributions, investigate multivariate distributions, create scatter
and box plots, constellation and 3-D charts, and so on.
|
Market Basket
|
The Market Basket node
performs association rule mining over transaction data in conjunction
with item taxonomy. Market basket analysis uses the information from
the transaction data to give you insight (for example, about which
products tend to be purchased together). The market basket analysis
is not limited to the retail marketing domain and can be abstracted
to other areas such as word co-occurrence relationships in text documents.
|
MultiPlot
|
Use the MultiPlot node
to visualize data from a wide range of perspectives. The MultiPlot
node automatically creates bar charts and scatter plots for the input
and target variables without requiring you to make several menu or
window item selections.
|
Path Analysis
|
Use the Path Analysis
node to analyze web log data and to determine the paths that visitors
take as they navigate through a website. You can also use the node
to perform sequence analysis.
|
SOM/Kohonen
|
Use the SOM/Kohonen
node to perform unsupervised learning by using Kohonen vector quantization
(VQ), Kohonen self-organizing maps (SOMs), or batch SOMs with Nadaraya-Watson
or local-linear smoothing. Kohonen VQ is a clustering method, whereas
SOMs are primarily dimension-reduction methods.
|
StatExplore
|
Use the StatExplore node to examine the statistical properties of an input data set.
You can use the StatExplore node to compute standard univariate distribution statistics,
to compute standard bivariate statistics by class target and class segment, and to
compute correlation statistics for interval variables by interval input and target.
|
Variable Clustering
|
Variable clustering is a useful tool for data reduction and can remove collinearity,
decrease variable redundancy, and help reveal the underlying structure of the input variables in a
data set. When properly used as a variable-reduction tool, the Variable Clustering
node can
replace a large set of variables with the set of cluster components with little loss
of information.
|
Variable Selection
|
Use the Variable Selection node to quickly identify input variables that are useful
for predicting the target variable.
|
Node Name
|
Description
|
---|---|
Drop
|
Use the Drop node to remove variables from data sets or hide variables from the metadata.
You can drop specific variables and all variables of a particular type.
|
Impute
|
Use the Impute node to replace missing values. For example, you could replace missing
values of an interval variable with the mean or using an M-estimator such as Andrew’s Wave. Missing values for the
training, validation, test, and score data sets are replaced using imputation statistics that are calculated from the active training predecessor data set.
|
Interactive Binning
|
The Interactive Binning node is an interactive grouping tool that you use to model
nonlinear functions of multiple modes of continuous distributions. The interactive
tool computes initial bins by quantiles. Then you can interactively split and combine
the initial bins. This node enables you to select strong characteristics based on
the Gini statistic and to group the selected characteristics based on business considerations.
The node is helpful in shaping the data to represent risk-ranking trends rather than
modeling quirks, which might lead to overfitting.
|
Principal Components
|
Use the Principal Components
node to generate principal components. Principal components are uncorrelated
linear combinations of the original input variables and which depend
on the covariance matrix or correlation matrix of the input variables.
In data mining, principal components are usually used as the new set
of input variables for subsequent analysis by modeling nodes.
|
Replacement
|
Use the Replacement node to generate score code to process unknown levels when scoring
and also to interactively specify replacement values for class and interval levels.
In some cases, you might want to reassign specified nonmissing values before performing
imputation calculations for the missing values.
|
Rules Builder
|
The Rules Builder node
accesses the Rules Builder window that so you can create ad hoc sets of rules with user-definable outcomes.
You can interactively define the values of the outcome variable and the paths to the
outcome. This is useful, for example, in applying logic for
posterior probabilities and scorecard values. Rules are defined using charts and histograms
based on a sample of the data.
|
Transform Variables
|
Use the Transform Variables node to create new variables or variables that are transformations
of existing variables in the data. Transformations are useful when you want to improve
the fit of a model to the data. For example, transformations can be used to stabilize
variances, remove
nonlinearity, improve additivity, and correct non-normality in variables. The Transform
Variables node also enables you to create interaction variables.
|
Node Name
|
Description
|
---|---|
AutoNeural
|
Use the AutoNeural node as an automated tool to help you find optimal configurations
for a neural network model.
|
Decision Tree
|
Use the Decision Tree node to fit decision tree models to the data. The implementation
includes features that are found in a variety of popular decision tree algorithms
such as CHAID, CART, and C4.5. The node supports both automatic and interactive training. When you run
the Decision Tree node in automatic mode, it automatically ranks the input variables,
based on the strength of their contribution to the tree. This ranking can be used
to select variables for use in subsequent modeling. You can override any automatic
step with the option to define a splitting rule and prune explicit tools or subtrees.
Interactive training enables you to explore and evaluate a large set of trees as you
develop them.
|
Dmine Regression
|
Use the Dmine Regression node to compute a forward stepwise least squares regression
model. In each step, an independent variable is selected. This independent variable
contributes maximally to the model R-square
value.
|
DMNeural
|
Use DMNeural node to fit an additive nonlinear model. The additive nonlinear model
uses bucketed principal components as inputs to predict
a binary or an interval target variable. The algorithm that is used in DMNeural network
training was developed to overcome
the problems of the common neural networks that are likely to occur especially when
the data set contains highly collinear variables.
|
Ensemble
|
Use the Ensemble node to create new models by combining the posterior probabilities
(for class targets) or the predicted values (for interval targets) from multiple predecessor
models. One common ensemble approach is to use multiple modeling methods, such as
a neural network and a decision tree, to obtain separate models from the same training
data set. The component models from the two complementary modeling methods are integrated
by the Ensemble node to form the final model solution
|
Gradient Boosting
|
Gradient boosting creates a series of simple decision trees that together form a single
predictive model. Each tree in the series is fit to the residual of the prediction
from the earlier
trees in the series. Each time the data is used to grow a tree, the accuracy of the
tree is computed. The successive samples are adjusted to accommodate previously computed
inaccuracies. Because each successive sample is weighted according to the classification
accuracy of previous models, this approach is sometimes called stochastic gradient
boosting. Boosting is defined for binary, nominal, and interval targets.
|
LARS (Least Angle Regressions)
|
The LARs node can perform both variable selection and model-fitting tasks. When used
for variable selection, the LARs node selects variables
in a continuous fashion, where coefficients for each selected variable grow from zero
to the variable's least square estimates. With a small modification, you can use LARs
to efficiently produce LASSO solutions.
|
MBR (Memory-Based Reasoning)
|
Use the MBR node to
identify similar cases and to apply information that is obtained from
these cases to a new record. The MBR node uses k-nearest neighbor
algorithms to categorize or predict observations.
|
Model Import
|
Use the Model Import node to import and assess a model that was not created by one
of the SAS Enterprise Miner modeling nodes. You can then
use the Model Comparison node to compare the user-defined model with one or more models
that you developed with a SAS Enterprise Miner modeling node. This process is called
integrated assessment.
|
Neural Network
|
Use the Neural Network node to construct, train, and validate multilayer, feed-forward
neural networks. By default, the Neural Network node automatically constructs a network
that has one hidden layer consisting of three neurons. In general, each input is fully connected to the first
hidden layer, each hidden layer is fully connected to the next hidden layer, and the
last hidden layer is fully connected to the output. The Neural Network node supports
many variations of this general form.
|
Partial Least Squares
|
The Partial Least Squares
node is a tool for modeling continuous and binary targets. This node
extracts factors called components or latent vectors that can be used
to explain response variation or predictor variation in the analyzed
data.
|
Regression
|
Use the Regression node to fit both linear and logistic regression models to the data. You can use continuous, ordinal, and binary target variables,
and you can use both continuous and discrete input variables. The node supports the
stepwise, forward, and backward selection methods.
|
Rule Induction
|
Use the Rule Induction node to improve the classification of rare events. The Rule
Induction node creates a Rule Induction model that uses split techniques to remove
the largest pure split node from the data. Rule
Induction also creates binary models for each level of a target variable and ranks
the levels from the most rare event to the most common. After all levels
of the target variable are modeled, the score code is combined into a SAS DATA step.
|
SVM (Support Vector
Machines)
|
A support vector machine
(SVM) is a supervised machine learning method that is used to perform
classification and regression analysis. The standard SVM problem solves
binary classification problems that produce non-probability output
(only sign +1/-1) by constructing a set of hyperplanes that maximize
the margin between two classes.
|
TwoStage
|
Use the TwoStage node to build a sequential or concurrent two-stage model for predicting
a class variable and an interval target variable at the same time. The interval target
variable is usually a value that is associated
with a level of the class target.
|
Node Name
|
Description
|
---|---|
Cutoff
|
The Cutoff node provides
tabular and graphical information to assist you in determining an
appropriate probability cutoff point for decision making with binary
target models. The establishment of a cutoff decision point entails
the risk of generating false positives and false negatives, but an
appropriate use of the Cutoff node can help minimize those risks.
You typically run the node at least twice. In the first run, you obtain
all the plots and tables. In subsequent runs, you can change the node
properties until an optimal cutoff value is obtained.
|
Decisions
|
Use the Decisions node
to define target profiles for a target that produces optimal decisions.
The decisions are made using a user-specified decision matrix and
output from a subsequent modeling procedure.
|
Model Comparison
|
Use the Model Comparison node to compare models and predictions from any of the modeling
nodes. The comparison is based on the expected and actual profits or losses that would
result from implementing the model. The node produces the charts that help describe
the usefulness of the model.
|
Score
|
Use the Score node to manage SAS scoring code that is generated from a trained model
or models, to save the SAS scoring code to a location on the client computer, and
to run the SAS scoring code. Scoring is the generation of predicted values for a data
set that might not contain a target variable.
|
Segment Profile
|
Use the Segment Profile
node to examine segmented or clustered data and identify factors that
differentiate data segments from the population. The node generates
various reports that aid in exploring and comparing the distribution
of these factors within the segments and population.
|
Node Name
|
Description
|
---|---|
Control Point
|
Use the Control Point
node to establish a control point within process flow diagrams. A
control point simplifies distributing the connections between process
flow steps that have multiple interconnected nodes. The Control Point
node can reduce the number of connections that are made.
|
End Groups
|
The End Groups node is used only in conjunction with the Start Groups node. The End
Groups node acts as a boundary marker that defines the end of group processing operations
in a process flow diagram. Group processing operations are performed on the portion
of the process flow diagram
that exists between the Start Groups node and the End Groups node. If you specify
Stratified, Bagging,
or Boosting in the group processing function of the Start Groups node,then the End Groups node
functions as a model node and presents the final aggregated model.
|
Ext Demo
|
The Ext Demo node illustrates
the various controls that can be used in SAS Enterprise Miner extension
nodes. These controls enable users to pass arguments to an underlying
SAS program. By choosing an appropriate user interface control, an
extension node developer can specify how information about the node's
arguments are presented to the user and place restrictions on the
values of the arguments. The Ext Demo node's results also provide
examples of the various types of graphs that can be generated by an
extension node using the %EM_REPORT macro.
|
Metadata
|
Use the Metadata node to modify the columns metadata information (such as roles, measurement
levels, and order) in a process flow diagram.
|
Reporter
|
The Reporter node uses SAS Output Delivery System (ODS) capability to create a single
PDF or RTF file that contains information about the open process flow diagram. The
report shows the SAS Enterprise Miner settings, process flow diagram, and detailed
information for each node. The report also includes results such as variable selection,
model diagnostic tables, and plots from the Results browser. The score code, log,
and output
listing are not included in the report; those items are found in the SAS Enterprise
Miner package folder.
|
SAS Code
|
Use the SAS Code node
to incorporate new or existing SAS code into process flows that you
develop using SAS Enterprise Miner.
|
Score Code Export
|
The Score Code Export
node is an extension for SAS Enterprise Miner that exports files that
are necessary for score code deployment. Extensions are programmable
add-ins for the SAS Enterprise Miner environment.
|
Start Groups
|
The Start Groups node
is useful when the data can be segmented or grouped, and you want
to process the grouped data in different ways. The Start Groups node
uses BY-group processing as a method to process observations from
one or more data sources that are grouped or ordered by values of
one or more common variables.
|
Node Name
|
Description
|
---|---|
Incremental Response
|
The Incremental Response
node models the incremental impact of a treatment in order to optimize
customer targeting for maximum return on investment. The Incremental
Response node can determine the likelihood that a customer purchases
a product or uses a coupon. It can predict the incremental revenue
that is realized during a promotional period.
|
Survival
|
Survival data mining
is the application of survival analysis to data mining problems concerning
customers. The application to the business problem changes the nature
of the statistical techniques. The issue in survival data mining is
not whether an event will occur in a certain time interval, but when
the next event will occur. The SAS Enterprise Miner Survival node
performs survival analysis on mining customer databases when there
are time-dependent outcomes.
|