The HPSPLIT Procedure

PRUNE Statement

  • PRUNE C45 </ confidence>;

  • PRUNE NONE;

  • PRUNE COSTCOMPLEXITY </ $\gamma $>;

  • PRUNE by-metric </ until-metric operator value>;

  • PRUNE by-metric / <until-metric> MIN;

The PRUNE statement controls pruning. It has five different syntaxes: one for C4.5-style pruning, one for no pruning, one for cost-complexity pruning, one for pruning by using a specified metric and choosing the subtree based on the change in a specified metric, and one for pruning by using a specified metric and choosing the subtree based on the minimum of a specified metric.

The default decision tree pruning method is entropy, using the change-in-entropy subtree selection method. The following PRUNE statement example is equivalent to having no PRUNE statement for a nominal target:

prune entropy;

The default regression tree pruning method is ASE, using the change-in-ASE subtree selection method. The following PRUNE statement example is equivalent to having no PRUNE statement for an interval target:

prune ASE;

The preceding statements are also equivalent to the following statements, respectively:

prune entropy / entropy >= 1.0;
prune ASE / ASE >= 1.0;

You can specify the following pruning options:

C45 </ confidence>

requests C4.5-based pruning (Quinlan, 1993) based on the upper error rate from the binomial distribution (Wilson, 1927; Blyth and Still, 1983; Agresti and Coull, 1998) at the confidence limit. The default confidence is 0.25.

This option is available only for decision trees (nominal targets).

COSTCOMPLEXITY </ $\gamma $>

requests cost-complexity pruning (Breiman et al., 1984). The optional argument, $\gamma $, is the penalty for the size of the tree. The final per-leaf penalty value is $\gamma $ multiplied by the sum of squares error (SSE) of the largest tree and divided by the number of leaves in that tree.

NONE

turns off pruning.

by-metric < / until-metric operator value>

chooses a node to prune back to a leaf by the specified by-metric. Optionally, you can specify an until-metric, operator, and value to control pruning. If you do not specify these arguments, until-metric is set to the same metric as by-metric, operator is set to ">=," and value is set to 1.

You can specify the following value for by-metric for decision trees (nominal target) or for regression trees (interval target):

ASE

chooses the leaf that has the smallest change in the average square error (ASE).

You can specify the following values for by-metric only for decision trees (nominal target):

ENTROPY

chooses the leaf that has the smallest change in the entropy.

GINI

chooses the leaf that has the smallest change in the Gini statistic.

MISC

chooses the leaf that has the smallest change in the misclassification rate.

You can specify the following values for until-metric for decision trees (nominal target) or for regression trees (interval target):

ASE

stops pruning when the per-leaf change in average square error rate (ASE) is operator value times the per-leaf change in the ASE of pruning the whole initial tree to a leaf.

N

stops pruning when the number of leaves is operator value.

You can specify the following values for until-metric only for decision trees (nominal target):

ENTROPY

stops pruning when the per-leaf change in entropy is operator value times the per-leaf change in the entropy of pruning the whole initial tree to a leaf.

GINI

stops pruning when the per-leaf change in the Gini statistic is operator value times the per-leaf change in the Gini statistic of pruning the whole initial tree to a leaf.

MISC

stops pruning when the per-leaf change in misclassification rate is operator value times the per-leaf change in the misclassification rate of pruning the whole initial tree to a leaf.

You can specify any of the following values for operator for decision trees (nominal target) or for regression trees (interval target):

<=

less than or equal to

LE

less than or equal to

>=

greater than or equal to

GE

greater than or equal to

<

less than

LT

less than

>

greater than

GT

greater than

=

equal to

EQ

equal to

by-metric / <until-metric> MIN

chooses a node to prune back to a leaf by the specified by-metric and selects the subtree at the minimum of the specified until-metric as the optimal subtree. Optionally, you can specify an until-metric. If you do not specify an until-metric, until-metric is set to the same metric as by-metric.

You can specify the following value for by-metric for decision trees (nominal target) or for regression trees (interval target):

ASE

chooses the leaf that has the smallest change in the average square error (ASE).

You can specify any of the following values for by-metric for nominal targets only:

ENTROPY

chooses the leaf that has the smallest change in the entropy.

GINI

chooses the leaf that has the smallest change in the Gini statistic.

MISC

chooses the leaf that has the smallest change in the misclassification rate.

You can specify the following value for until-metric for decision trees (nominal target) or for regression trees (interval target):

ASE

selects the subtree with the minimum absolute square error (ASE) as the optimal subtree.

You can specify any of the following values for until-metric for nominal targets only:

ENTROPY

selects the subtree with the minimum entropy as the optimal subtree.

GINI

selects the subtree with the minimum Gini statistic as the optimal subtree.

MISC

selects the subtree with the minimum misclassification rate as the optimal subtree.