The “Performance Information” table is created by default. It displays information about the execution mode. For singlemachine mode, the table displays the number of threads used. For distributed mode, the table displays the grid mode (symmetric or asymmetric), the number of compute nodes, and the number of threads per node.
Table 13.3 shows the variables that are contained in an example data set that the SCORE statement produces.
Table 13.3: Example SCORE Statement Data Set Variables
Variable 
Target Type 
Description 


Either 
Target variable 

Either 
Leaf number to which this observation is assigned 

Either 
Node number to which this observation is assigned 

Nominal 
Proportion of training set at this leaf that has target 

Nominal 
Proportion of validation set at this leaf that has target 

Interval 
Average value of target 

Interval 
Average value of target 
The variable importance data set contains the importance of the input variables in creating the pruned decision tree. PROC HPSPLIT outputs two simple countbased importance metrics and two variable importance metrics that are based on the sum of squares error. In addition, it outputs the number of observations that are used in the training and validation sets, the number of observations that have a missing value, and the number of observations that have a missing target. Table 13.4 shows the variables contained in the data set when you specify the IMPORTANCE= option in the OUTPUT statement. In addition to the variables listed below, a variable that contains the importance for each input variable is included.
Table 13.4: Variable Importance Data Set Variables
Variable 
Description 


Criterion used to generate the tree 

Importance type (Count, NSURROGATES, SSE, 
VSSE, IMPORT, or VIMPORT) 


Number of observations that have a missing value 

Number of observations that have a missing target 

Number of observations that were used to build the tree (training set) 

Number of observations in the validation set 

Tree number (always 1) 
The data set specified in the NODESTATS= option in the OUTPUT statement can be used to visualize the tree. Table 13.5 shows the variables in this data set.
Table 13.5: NODESTATS= Data Set Variables
Variable 
Target Type 
Description 


Either 
Text that describes the split 

Either 
Which of the three criteria was used 

Either 
Values of the parent variable’s split to get to this node 

Either 
Depth of the node 

Either 
Node number 

Either 
Leaf number 

Either 
Fraction of all training observations going to this node 

Either 
Number of training observations at this node 

Either 
Number of validation observations at this node 

Either 
Parent’s node number 

Either 
Value of target predicted at this node 

Either 
Variable used in the split 

Either 
Tree number (always 1) 

Nominal 
Proportion of training set at this leaf that has target 

Nominal 
Proportion of validation set at this leaf that has target 

Interval 
Average value of target 

Interval 
Average value of target 
During tree growth and pruning, the number of leaves at each growth or pruning iteration and other metrics are output to data sets that are specified in the GROWTHSUBTREE= and PRUNESUBTREE= options, respectively.
The growth and pruning data sets are identical, except that:
The growth data set reflects statistics of the tree during growth. The pruning data set reflects statistics of the tree during pruning.
The statistics of the growth data set are always computed from the training subset. The statistics of the pruning data set are computed from the validation subset if one is available. Otherwise, the statistics of the pruning data set are computed from the training subset.
Table 13.6: GROWTHSUBTREE= and PRUNESUBTREE= Data Set Variables
Variable 
Target Type 
Description 


Either 
Iteration number 

Either 
Number of leaves 

Either 
Tree number (always 1) 

Either 
Training set: average square error 

Nominal 
Training set: entropy 

Nominal 
Training set: Gini 

Nominal 
Training set: misclassification rate 

Either 
Training set: sum of squares error 

Either 
Validation set: average square error 

Nominal 
Validation set: entropy 

Nominal 
Validation set: Gini 

Nominal 
Validation set: misclassification rate 

Either 
Validation set: sum of squares error 

Either 
Subtree assessment value 
Ratio of slopes or change in errors 


Either 
Chosen subtree if 1 

Either 
Intermediate number of node selected for pruning 

Either 
Change in node selection metric by pruning node 