The HPSPLIT Procedure

Pruning Criteria

Pruning criteria are similar to growth criteria, except that they use the global change of a metric instead of the per-leaf change. In addition, if a validation partition is present, pruning statistics are calculated from that.

Entropy Pruning Criterion

When you prune by entropy, the entropy is calculated as though the entire data set were a single leaf partitioned into the final number of leaves. Thus it can be expected that the pruning path taken during pruning might not correspond to the reverse of the path taken during growth, even if the pruning and growth metrics are identical.

The change is then based on the global entropy with the node preserved and the node pruned back to a leaf.

Gini Pruning Criterion

As with entropy, the change in Gini statistic is calculated based on the change in the global Gini statistic. The equations are otherwise unchanged.

Misclassification Rate Pruning Criterion

The misclassification rate (MISC) is simply the number of mispredictions divided by the number of predictions. Thus, for a leaf that has a predicted target level, $\tau _ P$, the misclassification rate is

\begin{equation*}  \mathrm{MISC}_\lambda = \sum _{\tau _ i \ne \tau _ P} { \frac{N_{\tau _ i}^\lambda }{N_\lambda } } \end{equation*}

For all the leaves in the tree, it is

\begin{equation*}  \mathrm{MISC} = \sum _\lambda { \frac{N_\lambda }{N_0} \sum _{\tau _ i \ne \tau _ P} { \frac{N_{\tau _ i}^\lambda }{N_\lambda } } } \end{equation*}

The predicted target level is always based on the training data set.

Average Square Error Pruning Criterion

The average square error (ASE) is based on the sum of squares error (SSE). You would expect, for a perfect assignment, that the proportion of observations at a leaf $\lambda $ would be 1 for the predicted target level and 0 for the remainder. Thus, for a single leaf, the equation for the average of this error is

\begin{equation*}  \mathrm{ASE}_\lambda = 1 - 2 \sum _{\tau _ i} { \frac{N_{\tau _ i}^\Lambda }{N_\Lambda } \frac{N_{\tau _ i}^\lambda }{N_\lambda } } + \sum _{\tau _ i} { \left( \frac{N_{\tau _ i}^\lambda }{N_\lambda } \right)^2 } \end{equation*}

where $\lambda $ is for a leaf in the training set and $\Lambda $ is for a leaf in the validation set. If there is no validation set, the training set is used.

Thus, for an ensemble of leaves the ASE becomes

\begin{equation*}  \mathrm{ASE} = \sum _{\Lambda } \frac{N_\lambda }{N_0} {\left[1 - 2 \sum _{\tau _ i} {\frac{N_{\tau _ i}^\Lambda }{N_\Lambda } \frac{N_{\tau _ i}^\lambda }{N_\lambda } } + \sum _{\tau _ i} { \left( \frac{N_{\tau _ i}^\lambda }{N_\lambda } \right)^2 } \right] } \end{equation*}

This summation is over the validation counts set at the leaves, $\Lambda $.