Model Nodes
|
|
|
Use the AutoNeural node
as an automated tool to help you find optimal configurations for a
neural network model.
|
|
Use the Decision Tree
node to fit decision tree models to the data. The implementation includes
features that are found in a variety of popular decision tree algorithms
such as CHAID, CART, and C4.5. The node supports both automatic and
interactive training. When you run the Decision Tree node in automatic
mode, it automatically ranks the input variables, based on the strength
of their contribution to the tree. This ranking can be used to select
variables for use in subsequent modeling. You can override any automatic
step with the option to define a splitting rule and prune explicit
tools or subtrees. Interactive training enables you to explore and
evaluate a large set of trees as you develop them.
|
|
Use the Dmine Regression
node to compute a forward stepwise least squares regression
model. In each step, an independent variable is selected that contributes
maximally to the model R-square value.
|
|
Use DMNeural node to
fit an additive nonlinear model. The additive nonlinear model uses
bucketed principal components as inputs to predict a binary or an
interval target variable. The algorithm that is used in DMNeural network
training was developed to overcome the problems of the common neural
networks that are likely to occur especially when the data set contains
highly collinear variables.
|
|
Use the Ensemble node
to create new models by combining the posterior probabilities (for
class targets) or the predicted values (for interval targets) from
multiple predecessor models. One common ensemble approach is to use
multiple modeling methods, such as a neural network and a decision
tree, to obtain separate models from the same training data set. The
component models from the two complementary modeling methods are integrated
by the Ensemble node to form the final model solution
|
|
Gradient boosting creates
a series of simple decision trees that together form a single predictive
model. Each tree in the series is fit to the residual of the prediction
from the earlier trees in the series. Each time the data is used to
grow a tree, the accuracy of the tree is computed. The successive
samples are adjusted to accommodate previously computed inaccuracies.
Because each successive sample is weighted according to the classification
accuracy of previous models, this approach is sometimes called stochastic
gradient boosting. Boosting is defined for binary, nominal, and interval
targets.
|
LARS (Least Angle Regressions)
|
The LARs node can perform
both variable selection and model-fitting tasks. When used for variable
selection, the LARs node selects variables in a continuous fashion,
where coefficients for each selected variable grow from zero to the
variable's least square estimates. With a small modification, you
can use LARs to efficiently produce LASSO solutions.
|
MBR (Memory-Based Reasoning)
|
Use the MBR node to
identify similar cases and to apply information that is obtained from
these cases to a new record. The MBR node uses k-nearest neighbor
algorithms to categorize or predict observations.
|
|
Use the Model Import
node to import and assess a model that was not created by one of the
SAS Enterprise Miner modeling nodes. You can then use the Model Comparison
node to compare the user-defined model with one or more models that
you developed with a SAS Enterprise Miner modeling node. This process
is called integrated assessment.
|
|
Use the Neural Network
node to construct, train, and validate multilayer, feed-forward neural
networks. By default, the Neural Network node automatically constructs
a network that has one hidden layer consisting of three neurons. In
general, each input is fully connected to the first hidden layer,
each hidden layer is fully connected to the next hidden layer, and
the last hidden layer is fully connected to the output. The Neural
Network node supports many variations of this general form.
|
|
The Partial Least Squares
node is a tool for modeling continuous and binary targets. This node
extracts factors called components or latent vectors that can be used
to explain response variation or predictor variation in the analyzed
data.
|
|
Use the Regression node
to fit both linear and logistic regression models to the data. You
can use continuous, ordinal, and binary target variables, and you
can use both continuous and discrete input variables. The node supports
the stepwise, forward, and backward selection methods.
|
|
Use the Rule Induction
node to improve the classification of rare events. The Rule Induction
node creates a Rule Induction model that uses split techniques to
remove the largest pure split node from the data. Rule Induction also
creates binary models for each level of a target variable and ranks
the levels from the most rare event to the most common. After all
levels of the target variable are modeled, the score code is combined
into a SAS DATA step.
|
SVM (Support Vector
Machines)
|
A support vector machine
(SVM) is a supervised machine learning method that is used to perform
classification and regression analysis. The standard SVM problem solves
binary classification problems that produce non-probability output
(only sign +1/-1) by constructing a set of hyperplanes that maximize
the margin between two classes.
|
|
Use the TwoStage node
to build a sequential or concurrent two-stage model for predicting
a class variable and an interval target variable at the same time.
The interval target variable is usually a value that is associated
with a level of the class target.
|