The HPSPLIT Procedure

PRUNE Statement

PRUNE C45 </ value> ;

PRUNE NONE ;

PRUNE by-metric </ until-metric operator value> ;

The PRUNE statement controls pruning. It has three different syntaxes: one for C4.5-style pruning, one for no pruning, and one for pruning by using a specified metric.

The default pruning method is entropy. The following PRUNE statement example is equivalent to having no PRUNE statement:

prune entropy;

The preceding statement is also equivalent to the following statement:

prune entropy / entropy >= 1.0;

You can specify the following pruning options:

C45 </ confidence>

requests C4.5-based pruning (Quinlan, 1993) based on the upper error rate from the binomial distribution (Wilson, 1927; Blyth and Still, 1983; Agresti and Coull, 1998) at the confidence limit. The default confidence is 0.25.

NONE

turns off pruning.

by-metric < / until-metric operator value>

chooses a node to prune back to a leaf by the specified by-metric. Optionally, you can specify an until-metric, operator, and value to control pruning. If you do not specify these arguments, until-metric is set to the same metric as by-metric, operator is set to >=, and value is set to 1. You can specify any of the following values for by-metric:

ASE

chooses the leaf that has the smallest change in the average square error.

ENTROPY

chooses the leaf that has the smallest change in the entropy.

GINI

chooses the leaf that has the smallest change in the Gini statistic.

MISC

chooses the leaf that has the smallest change in the misclassification rate.

You can specify any of the following values for until-metric:

ASE

stops pruning when the per-leaf change in average square error rate is operator value times the per-leaf change in the ASE of pruning the whole initial tree to a leaf.

ENTROPY

stops pruning when the per-leaf change in entropy is operator value times the per-leaf change in the entropy of pruning the whole initial tree to a leaf.

GINI

stops pruning when the per-leaf change in the Gini statistic is operator value times the per-leaf change in the Gini statistic of pruning the whole initial tree to a leaf.

MISC

stops pruning when the per-leaf change in misclassification rate is operator value times the per-leaf change in the misclassification rate of pruning the whole initial tree to a leaf.

N

stops pruning when the number of leaves is operator value.

You can specify any of the following values for operator:

<=

less than or equal to

LE

less than or equal to

>=

greater than or equal to

GE

greater than or equal to

<

less than

LT

less than

>

greater than

GT

greater than

=

equal to

EQ

equal to