The PRUNE statement controls pruning. It has five different syntaxes: one for C4.5-style pruning, one for no pruning, one
for cost-complexity pruning, one for pruning by using a specified metric and choosing the subtree based on the change in a
specified metric, and one for pruning by using a specified metric and choosing the subtree based on the minimum of a specified
metric.
The default decision tree pruning method is entropy, using the change-in-entropy subtree selection method. The following PRUNE
statement example is equivalent to having no PRUNE statement for a nominal target:
prune entropy;
The default regression tree pruning method is ASE, using the change-in-ASE subtree selection method. The following PRUNE statement
example is equivalent to having no PRUNE statement for an interval target:
prune ASE;
The preceding statements are also equivalent to the following statements, respectively:
prune entropy / entropy >= 1.0;
prune ASE / ASE >= 1.0;
You can specify the following pruning options:
-
C45 </ confidence>
-
requests C4.5-based pruning (Quinlan, 1993) based on the upper error rate from the binomial distribution (Wilson, 1927; Blyth and Still, 1983; Agresti and Coull, 1998) at the confidence limit. The default confidence is 0.25.
This option is available only for decision trees (nominal targets).
-
COSTCOMPLEXITY </ >
-
requests cost-complexity pruning (Breiman et al., 1984). The optional argument, , is the penalty for the size of the tree. The final per-leaf penalty value is multiplied by the sum of squares error (SSE) of the largest tree and divided by the number of leaves in that tree.
-
NONE
-
turns off pruning.
-
by-metric < / until-metric operator value>
-
chooses a node to prune back to a leaf by the specified by-metric. Optionally, you can specify an until-metric, operator, and value to control pruning. If you do not specify these arguments, until-metric is set to the same metric as by-metric, operator is set to ">=," and value is set to 1.
You can specify the following value for by-metric for decision trees (nominal target) or for regression trees (interval target):
- ASE
-
chooses the leaf that has the smallest change in the average square error (ASE).
You can specify the following values for by-metric only for decision trees (nominal target):
- ENTROPY
-
chooses the leaf that has the smallest change in the entropy.
- GINI
-
chooses the leaf that has the smallest change in the Gini statistic.
- MISC
-
chooses the leaf that has the smallest change in the misclassification rate.
You can specify the following values for until-metric for decision trees (nominal target) or for regression trees (interval target):
- ASE
-
stops pruning when the per-leaf change in average square error rate (ASE) is operator value times the per-leaf change in the ASE of pruning the whole initial tree to a leaf.
- N
-
stops pruning when the number of leaves is operator value.
You can specify the following values for until-metric only for decision trees (nominal target):
- ENTROPY
-
stops pruning when the per-leaf change in entropy is operator value times the per-leaf change in the entropy of pruning the whole initial tree to a leaf.
- GINI
-
stops pruning when the per-leaf change in the Gini statistic is operator value times the per-leaf change in the Gini statistic of pruning the whole initial tree to a leaf.
- MISC
-
stops pruning when the per-leaf change in misclassification rate is operator value times the per-leaf change in the misclassification rate of pruning the whole initial tree to a leaf.
You can specify any of the following values for operator for decision trees (nominal target) or for regression trees (interval target):
- <=
-
less than or equal to
- LE
-
less than or equal to
- >=
-
greater than or equal to
- GE
-
greater than or equal to
- <
-
less than
- LT
-
less than
- >
-
greater than
- GT
-
greater than
- =
-
equal to
- EQ
-
equal to
-
by-metric / <until-metric> MIN
-
chooses a node to prune back to a leaf by the specified by-metric and selects the subtree at the minimum of the specified until-metric as the optimal subtree. Optionally, you can specify an until-metric. If you do not specify an until-metric, until-metric is set to the same metric as by-metric.
You can specify the following value for by-metric for decision trees (nominal target) or for regression trees (interval target):
- ASE
-
chooses the leaf that has the smallest change in the average square error (ASE).
You can specify any of the following values for by-metric for nominal targets only:
- ENTROPY
-
chooses the leaf that has the smallest change in the entropy.
- GINI
-
chooses the leaf that has the smallest change in the Gini statistic.
- MISC
-
chooses the leaf that has the smallest change in the misclassification rate.
You can specify the following value for until-metric for decision trees (nominal target) or for regression trees (interval target):
- ASE
-
selects the subtree with the minimum absolute square error (ASE) as the optimal subtree.
You can specify any of the following values for until-metric for nominal targets only:
- ENTROPY
-
selects the subtree with the minimum entropy as the optimal subtree.
- GINI
-
selects the subtree with the minimum Gini statistic as the optimal subtree.
- MISC
-
selects the subtree with the minimum misclassification rate as the optimal subtree.