Decision tree models are advantageous because they are conceptually easy to understand,
yet they readily accommodate nonlinear associations between input variables and one
or more target variables. They also handle missing values without the need for
imputation. Therefore, you decide to first
model the data using decision trees. You will compare decision tree models to other models
later in the example.
However, before you add and run the Decision Tree node, you will add a Control Point
node. The Control Point node is used to simplify a
process flow diagram by reducing the number of connections between multiple interconnected nodes. By the
end of this example, you will have created five different models of the input
data set, and two Control Point nodes to connect these nodes. The first Control Point node,
added here, will distribute the input data to each of these models. The second Control
Point node will collect the models and send them to evaluation nodes.
To use the Control
Point node:
-
Select the
Utility tab
on the Toolbar.
-
Select the
Control
Point node icon. Drag the node into the Diagram Workspace.
-
Connect the
Replacement node
to the
Control Point node.
SAS Enterprise Miner
enables you to build a decision tree in two ways: automatically and
interactively. You will begin by letting SAS Enterprise Miner automatically
train and prune a tree.
To use the Decision
Tree node to automatically train and prune a
decision tree:
-
Select the
Model tab
on the Toolbar.
-
Select the
Decision
Tree node icon. Drag the node into the Diagram Workspace.
-
Connect the
Control
Point node to the
Decision Tree node.
-
Select the
Decision
Tree node. In the Properties Panel, scroll down to view
the
Train properties:
-
Click on the value of the
Maximum
Depth splitting
rule property, and enter
10
. This specification enables SAS Enterprise Miner to train a tree that includes up
to ten generations of the
root node. The final tree in this example, however, will have fewer generations due to
pruning.
-
Click on the value of the
Leaf
Size node
property, and enter
8
. This specification constrains the minimum number of training observations in any
leaf to eight.
-
Click on the value of the Number
of Surrogate Rules node property, and enter 4
. This specification enables SAS Enterprise Miner to use up to four surrogate rules
in each non-leaf node if the main splitting rule relies on an input whose value is
missing.
Note: The
Assessment
Measure subtree property is automatically set to
Decision because
you defined a profit matrix in
Create a Data Source. Accordingly, the Decision Tree node will
build a tree that maximizes profit in the validation data.
-
In the Diagram Workspace,
right-click the Decision Tree node, and select
Run from
the resulting menu. Click
Yes in the
Confirmation window
that opens.
-
In the window that appears
when processing completes, click
Results. The
Results window
appears.
-
On the
View menu,
select
ModelNode
Rules. The
English Rules window
appears.
-
Expand the
Node
Rules window. This window contains the IF-THEN logic that distributes observations into
each leaf node of the decision tree.
In the
Output window,
the
Tree Leaf Report indicates that there are seven leaf nodes in this tree. For each leaf node, the following
information is listed:
-
-
number of training observations
in the node
-
percentage of training observations
in the node with TARGET_B=1 (did donate), adjusted for prior probabilities
-
percentage of training observations
in the node with TARGET_B=0 (did not donate), adjusted for prior probabilities
This tree has been automatically pruned to an optimal size. Therefore, the node numbers
that appear in the final tree are not sequential. In fact, they reflect the positions
of the nodes in the full tree, before pruning.
-
Close the
Results window.