The HPSPLIT Procedure

PRUNE Statement

PRUNE C45 </ confidence> ;

PRUNE NONE ;

PRUNE by-metric </ until-metric operator value> ;

The PRUNE statement controls pruning. It has three different syntaxes: one for C4.5-style pruning, one for no pruning, and one for pruning by using a specified metric.

The default decision tree pruning method is entropy. The following PRUNE statement example is equivalent to having no PRUNE statement for a nominal target:

prune entropy;

The default decision tree pruning method is ASE. The following PRUNE statement example is equivalent to having no PRUNE statement for an interval target:

prune ASE;

The preceding statements are also equivalent to the following statements, respectively:

prune entropy / entropy >= 1.0;
prune ASE / ASE >= 1.0;

You can specify the following pruning options:

C45 </ confidence>

requests C4.5-based pruning (Quinlan, 1993) based on the upper error rate from the binomial distribution (Wilson, 1927; Blyth and Still, 1983; Agresti and Coull, 1998) at the confidence limit. The default confidence is 0.25.

This option is available only for decision trees (nominal targets).

NONE

turns off pruning.

by-metric < / until-metric operator value>

chooses a node to prune back to a leaf by the specified by-metric. Optionally, you can specify an until-metric, operator, and value to control pruning. If you do not specify these arguments, until-metric is set to the same metric as by-metric, operator is set to >=, and value is set to 1.

You can specify the following value for by-metric for decision trees (nominal target) or for regression trees (interval target):

ASE

chooses the leaf that has the smallest change in the average square error.

You can specify the following values for by-metric only for decision trees (nominal target):

ENTROPY

chooses the leaf that has the smallest change in the entropy.

GINI

chooses the leaf that has the smallest change in the Gini statistic.

MISC

chooses the leaf that has the smallest change in the misclassification rate.

You can specify the following values for until-metric for decision trees (nominal target) or for regression trees (interval target):

ASE

stops pruning when the per-leaf change in average square error rate is operator value times the per-leaf change in the ASE of pruning the whole initial tree to a leaf.

N

stops pruning when the number of leaves is operator value.

You can specify the following values for until-metric only for decision trees (nominal target):

ENTROPY

stops pruning when the per-leaf change in entropy is operator value times the per-leaf change in the entropy of pruning the whole initial tree to a leaf.

GINI

stops pruning when the per-leaf change in the Gini statistic is operator value times the per-leaf change in the Gini statistic of pruning the whole initial tree to a leaf.

MISC

stops pruning when the per-leaf change in misclassification rate is operator value times the per-leaf change in the misclassification rate of pruning the whole initial tree to a leaf.

You can specify any of the following values for operator for decision trees (nominal target) or for regression trees (interval target):

<=

less than or equal to

LE

less than or equal to

>=

greater than or equal to

GE

greater than or equal to

<

less than

LT

less than

>

greater than

GT

greater than

=

equal to

EQ

equal to