Pruning criteria are similar to growth criteria, except that they use the global change of a metric instead of the per-leaf change. In addition, if a validation partition is present, pruning statistics are calculated from that.
When you prune by entropy, the entropy is calculated as though the entire data set were a single leaf partitioned into the final number of leaves. Thus it can be expected that the pruning path taken during pruning might not correspond to the reverse of the path taken during growth, even if the pruning and growth metrics are identical.
The change is then based on the global entropy with the node preserved and the node pruned back to a leaf.
As with entropy, the change in Gini statistic is calculated based on the change in the global Gini statistic. The equations are otherwise unchanged.
The misclassification rate (MISC) is simply the number of mispredictions divided by the number of predictions. Thus, for a leaf that has a predicted target level, , the misclassification rate is
For all the leaves in the tree, it is
The predicted target level is always based on the training data set.
The average square error (ASE) is based on the sum of squares error (SSE). You would expect, for a perfect assignment, that the proportion of observations at a leaf would be 1 for the predicted target level and 0 for the remainder. Thus, for a single leaf, the equation for the average of this error is
where is for a leaf in the training set and is for a leaf in the validation set. If there is no validation set, the training set is used.
Thus, for an ensemble of leaves the ASE becomes
This summation is over the validation counts set at the leaves, .