The “Performance Information” table is created by default. It displays information about the execution mode. For single-machine mode, the table displays the number of threads used. For distributed mode, the table displays the grid mode (symmetric or asymmetric), the number of compute nodes, and the number of threads per node.
Table 9.3 shows the variables that are contained in an example data set that the SCORE statement produces. In this data set, the variable BAD
is the target and has values 0 and 1.
Table 9.3: Example SCORE Statement Data Set Variables
Variable |
Description |
---|---|
|
Target variable |
|
Leaf number to which this observation is assigned |
|
Node number to which this observation is assigned |
|
Proportion of training set at this leaf that has |
|
Proportion of training set at this leaf that has |
|
Proportion of validation set at this leaf that has |
|
Proportion of validation set at this leaf that has |
The variable importance data set contains the importance of the input variables in creating the pruned decision tree. A simple count-based importance metric and two variable importance metrics that are based on the sum of squares error-based are output. In addition, the number of observations that are used in the training and validation sets, the number of observations that have a missing value, and the number of observations that have a missing target are output. Table 9.4 shows the variables contained in the data set that the OUTPUT statement produces using the IMPORTANCE= option. In addition to the variables listed below, a variable containing the importance for each input variable is included.
Table 9.4: Variable Importance Data Set Variables
Variable |
Description |
---|---|
|
Tree number (always 1) |
|
Criterion used to generate the tree |
|
Importance type (“Count”, “SSE”, “VSSE”, “IMPORT”, or “VIMPORT”) |
|
Number of observations that have a missing value |
|
Number of observations that have a missing target |
|
Number of observations used to build the tree (training set) |
|
Number of observations in the validation set |
The data set specified in the NODESTATS= option in the OUTPUT statement can be used to visualize the tree. Table 9.5 shows the variables in this data set.
Table 9.5: NODESTATS= Data Set Variables
Variable |
Description |
---|---|
|
Text that describes the split |
|
Which of the three criteria was used |
|
Values of the parent variable’s split to get to this node |
|
Depth of the node |
|
Node number |
|
Fraction of all training observations going to this node |
|
Number of training observations at this node |
|
Number of validation observations at this node |
|
Parent’s node number |
|
Value of target predicted at this node |
|
Proportion of training observations that have |
|
Proportion of training observations that have |
|
Variable used in the split |
|
Tree number (always 1) |
|
Proportion of validation observations that have |
|
Proportion of validation observations that have |
During tree growth and pruning, the number of leaves at each growth or pruning iteration is output in addition to other, optional metrics.
The GROWTHSUBTREE= and PRUNESUBTREE= data sets are identical, except that:
The growth data set reflects statistics of the tree during growth. The pruning data set reflects statistics of the tree during pruning.
The statistics of the growth data set are always from the training subset. The statistics of the pruning data set are from the validation subset if one is available. Otherwise, the statistics of the pruning data set are from the training subset.
Table 9.6: GROWTHSUBTREE= and PRUNESUBTREE= Data Set Variables
Variable |
Description |
---|---|
|
Iteration number |
|
Number of leaves |
|
Tree number (always 1) |
|
Training set: average square error |
|
Training set: entropy |
|
Training set: Gini |
|
Training set: misclassification rate |
|
Training set: sum of squares error |
|
Validation set: average square error |
|
Validation set: entropy |
|
Validation set: Gini |
|
Validation set: misclassification rate |
|
Validation set: sum of squares error |