Working with Nodes That Model |
In this task, you use the Decision Tree node to build a decision tree using your partitioned data.
Drag the Decision Tree icon from the Model tab of the toolbar into the Diagram Workspace.
Connect the Replacement node to the Decision Tree node.
Select the Decision Tree node in the Diagram Workspace. The Properties panel indicates how each Decision Tree node property is configured.
Set the Decision Tree node properties as follows:
Set the Maximum Branch to 4 in order to allow the Tree node to create up to four-way rules. The Decision Tree node creates binary splits by default.
Set the Leaf Size to 8 in order to ensure that each leaf contains at least 8 observations.
Set the Maximum Depth to 10 in order to potentially grow a bushier tree.
Set the Number of Surrogate Rules to 4 in order to handle missing values in the data.
Keep the Splitting Rule Criterion property in its Default (Chi-Square) setting.
Note: The Assessment Measure property is automatically set to Decision by default because you have defined a profit matrix. The Decision Tree node will choose the tree that maximizes profit in the validation data.
Right-click the Decision Tree node in the Diagram Workspace and select Run.
A Run Status window appears when the Decision Tree run has been completed. Click
. The Results window opens.The Score Rankings Overlay: TARGET_B chart shows that a consistent trend is achieved on both the training and validation data. Although the lift values decrease quickly, the tree does seem to be stable.
The Fit Statistics table shows that the average profit of the training and validation data is about .250382 and .25003 , respectively.
Move your mouse over the different points of either the training or validation line in order to reveal the various lift values on the Score Rankings Overlay: TARGET_B chart.
Select the Cumulative Lift chart and then click
at the top left of the Results window in order to display a table that includes the lift values,Note: You can highlight rows in the table and then use Copy to paste the contents to another application such as Microsoft Word or Excel. You can also copy graphs the same way. This feature is common to most of the Enterprise Miner tools.
Close the Score Rankings Overlay table.
Because you defined a profit matrix for the Donor data, you should base your evaluation of the model on profit. To display profit, rather than lift, on the Score Rankings plot, follow these steps:
Maximize the Score Rankings Overlay: TARGET_B chart.
Right-click the background of the plot and select Data Options.
Scroll down the list of variables and set the Role for PROFIT to Y.
Click
.Restore the Score Rankings chart to its original size.
Select the Leaf Statistics plot and double--click the bar that corresponds to Leaf Index = 4. When you select the bar, note that the corresponding node is highlighted in both the Tree Map and Tree Diagram. The Leaf Statistics plot, Tree Map, and Tree Diagram are interactive, dynamically linked plots. This feature is especially useful when you are using the Tree Map to isolate interesting nodes in a large tree.
The largest nodes with a high percentage of donors are in the lower left quadrant. Select a node from the lower left quadrant and examine the corresponding node in the Tree view.Move your mouse pointer over the node to display node statistics for both donors (1) and non-donors (0). By default, each node in the Tree Diagram displays the predicted target value percentages and counts.
Maximize the Tree window and explore the Tree diagram. Note that the line thickness for each split indicates the number of cases in each underlying node. Right-click the plot background and examine the different View menu item settings.
Select View Model English Rules from the Results window menu in order to view the English Rules.
Select View Assessment Adjusted Classification Chart: TARGET_B from the Results window menu t view the Adjusted Classification chart.
Notice that none of the donors has been correctly classified in either partitioned data set. However, the goal is centered on isolating the set of candidate donors that will maximize profit. Even small average profit values will result in a significant total profit, especially when applied to a large customer base.
Examine the Score Code. From the main menu, select View Scoring . You will notice these entries:
SAS Code, also known as Publish Score Code, is the SAS score code that you can use to score data in applications that run outside the Enterprise Miner environment.
PMML Code is an XML representation of a data mining model. SAS PMML is based on the Data Mining Group PMML Version 2.1, that has significant extensions to support the data types, transformations, and model definitions that SAS requires. These files can be used with PMML scoring engines that support PMML Version 2.1.
Close the Tree Results window.
Copyright © 2008 by SAS Institute Inc., Cary, NC, USA. All rights reserved.