The “Performance Information” table is created by default. It displays information about the execution mode. For single-machine mode, the table displays the number of threads used. For distributed mode, the table displays the grid mode (symmetric or asymmetric), the number of compute nodes, and the number of threads per node.
Table 13.3 shows the variables that are contained in an example data set that the SCORE statement produces.
Table 13.3: Example SCORE Statement Data Set Variables
Variable |
Target Type |
Description |
---|---|---|
|
Either |
Target variable |
|
Either |
Leaf number to which this observation is assigned |
|
Either |
Node number to which this observation is assigned |
|
Nominal |
Proportion of training set at this leaf that has target |
|
Nominal |
Proportion of validation set at this leaf that has target |
|
Interval |
Average value of target |
|
Interval |
Average value of target |
The variable importance data set contains the importance of the input variables in creating the pruned decision tree. PROC HPSPLIT outputs two simple count-based importance metrics and two variable importance metrics that are based on the sum of squares error. In addition, it outputs the number of observations that are used in the training and validation sets, the number of observations that have a missing value, and the number of observations that have a missing target. Table 13.4 shows the variables contained in the data set when you specify the IMPORTANCE= option in the OUTPUT statement. In addition to the variables listed below, a variable that contains the importance for each input variable is included.
Table 13.4: Variable Importance Data Set Variables
Variable |
Description |
---|---|
|
Criterion used to generate the tree |
|
Importance type (Count, NSURROGATES, SSE, |
VSSE, IMPORT, or VIMPORT) |
|
|
Number of observations that have a missing value |
|
Number of observations that have a missing target |
|
Number of observations that were used to build the tree (training set) |
|
Number of observations in the validation set |
|
Tree number (always 1) |
The data set specified in the NODESTATS= option in the OUTPUT statement can be used to visualize the tree. Table 13.5 shows the variables in this data set.
Table 13.5: NODESTATS= Data Set Variables
Variable |
Target Type |
Description |
---|---|---|
|
Either |
Text that describes the split |
|
Either |
Which of the three criteria was used |
|
Either |
Values of the parent variable’s split to get to this node |
|
Either |
Depth of the node |
|
Either |
Node number |
|
Either |
Leaf number |
|
Either |
Fraction of all training observations going to this node |
|
Either |
Number of training observations at this node |
|
Either |
Number of validation observations at this node |
|
Either |
Parent’s node number |
|
Either |
Value of target predicted at this node |
|
Either |
Variable used in the split |
|
Either |
Tree number (always 1) |
|
Nominal |
Proportion of training set at this leaf that has target |
|
Nominal |
Proportion of validation set at this leaf that has target |
|
Interval |
Average value of target |
|
Interval |
Average value of target |
During tree growth and pruning, the number of leaves at each growth or pruning iteration and other metrics are output to data sets that are specified in the GROWTHSUBTREE= and PRUNESUBTREE= options, respectively.
The growth and pruning data sets are identical, except that:
The growth data set reflects statistics of the tree during growth. The pruning data set reflects statistics of the tree during pruning.
The statistics of the growth data set are always computed from the training subset. The statistics of the pruning data set are computed from the validation subset if one is available. Otherwise, the statistics of the pruning data set are computed from the training subset.
Table 13.6: GROWTHSUBTREE= and PRUNESUBTREE= Data Set Variables
Variable |
Target Type |
Description |
---|---|---|
|
Either |
Iteration number |
|
Either |
Number of leaves |
|
Either |
Tree number (always 1) |
|
Either |
Training set: average square error |
|
Nominal |
Training set: entropy |
|
Nominal |
Training set: Gini |
|
Nominal |
Training set: misclassification rate |
|
Either |
Training set: sum of squares error |
|
Either |
Validation set: average square error |
|
Nominal |
Validation set: entropy |
|
Nominal |
Validation set: Gini |
|
Nominal |
Validation set: misclassification rate |
|
Either |
Validation set: sum of squares error |
|
Either |
Subtree assessment value |
Ratio of slopes or change in errors |
||
|
Either |
Chosen subtree if 1 |
|
Either |
Intermediate number of node selected for pruning |
|
Either |
Change in node selection metric by pruning node |