Working with Nodes That Model |
An empirical tree is a segmentation of the data. Enterprise Miner creates an empirical tree by applying a series of simple rules that you specify. Each rule assigns an observation to a segment, based on the value of one input. One rule is applied after another, resulting in a hierarchy of segments within segments. The hierarchy is called a tree, and each segment is called a node. The original segment contains the entire data set and is called the root node of the tree. A node and all its successors form a branch of the node that created it. The final nodes are called leaves. For each leaf, a decision is made and applied to all observations in the leaf. The type of decision depends on the context of the data mining problem. In this example, the decision is simply the predicted value. The path from the root to the target leaf is the rule that classifies the target.
Tree models readily accommodate nonlinear associations between the input variables and the target. They offer easy interpretability, accept different data types, and handle missing values without using imputation.
In Enterprise Miner, you use the plots and tables of the Results window to assess how well the tree model fits the training and validation data. You can benchmark the accuracy, profitability, and stability of your model. The Decision Tree node displays the following results:
A standard Cumulative Lift Chart for the training and validation data. This chart provides not only lift values but also provides a quick check as to whether the tree model is reliable. If the tree is unreliable, then none of the numbers or splits is valuable. Trees can be unreliable when they are applied to new data, so you should always evaluate the tree using both the validation and test data.
A Leaf Statistics bar chart in which the height of each bar equals the percentage of donors in the leaf for both the training and validation data. The order of the bars is based on the percentage of donors (1's) in the training data. Use the scroll bar at the top to show additional leaves. You should also look for consistency in each leaf with regards to the training and validation data.
The Tree Diagram (in a window labeled Tree) that shows how the data splits into subgroups.
Fit Statistics for both the training and validation data.
The Tree Map represents a compact graphical representation of the tree. The nodes have the same top-to-bottom, left-to-right relationship as the traditional tree diagram. The width of a node is proportional to the number of training cases in the node. Colored rectangles represent individual nodes of the tree: larger rectangles represent the nodes that contain more cases. The nodes are colored according to the values of a statistic. By default, a node's color reflects the percentage of the target event in the node. For categorical targets, color represents the proportion of the target value in the training data set that is assigned to this node. For an interval target, color represents the average target value in the training data that is assigned to this node.
The Output window contains information such as variable importance, tree leaf report, fit statistics, and a classification matrix.
Copyright © 2008 by SAS Institute Inc., Cary, NC, USA. All rights reserved.