The HPSPLIT Procedure

GROW Statement

GROW criterion <(options)>;

The GROW statement specifies the criterion by which to grow the tree. For more information, see the section Splitting Criteria. For categorical responses, the available criteria are CHAID, CHISQUARE, ENTROPY, FASTCHAID, and GINI, and the default criterion is ENTROPY. For continuous responses, the available criteria are CHAID, FTEST, and RSS, and the default criterion is RSS.

For either categorical or continuous responses, you can specify the following criterion:

CHAID <(options)>

For categorical predictors, CHAID uses values of a chi-square statistic (in the case of a classification tree) or an F statistic (in the case of a regression tree) to merge similar levels until the number of children in the proposed split reaches the number that you specify in the MAXBRANCH= option. The p-values for the final split determine the variable on which to split.

For continuous predictors, CHAID chooses the best single split until the number of children in the proposed split reaches the value that you specify in the MAXBRANCH= option.

You can specify the following options:

ALPHA=value

specifies the maximum p-value for a split to be considered.

By default, ALPHA=0.3.

BONFERRONI

requests a Bonferroni adjustment to the p-value for a variable after the split has been determined.

By default, no adjustment is made.

For categorical responses only, you can specify the following criteria:

CHISQUARE <(options)>

uses a chi-square statistic to split each variable and then uses the p-values that correspond to the resulting splits to determine the splitting variable.

You can specify the following options:

ALPHA=value

specifies the maximum p-value for a split to be considered.

By default, ALPHA=0.3.

BONFERRONI

requests a Bonferroni adjustment to the p-value for a variable after the split has been determined.

By default, no adjustment is made.

ENTROPY

uses the gain in information (decrease in entropy) to split each variable and then to determine the split.

FASTCHAID <(options)>

uses a Kolmogorov-Smirnov splitter to determine splits for each variable. The FastCHAID criterion follows a recursive method similar to that of Friedman (1977) after ordering the levels according to the response variable. The criterion then selects the split variable as the variable that has the smallest p-value.

You can specify the following options:

ALPHA=value

specifies the maximum p-value for a split to be considered.

By default, ALPHA=0.3.

BONFERRONI

requests a Bonferroni adjustment to the p-value for a variable after the split has been determined.

By default, no adjustment is made.

MINDIST=number

specifies the minimum Kolmogorov-Smirnov distance for a candidate split.

By default, MINDIST=0.01.

GINI

uses the decrease in the Gini index to split each variable and then to determine the split.

IGR

uses the entropy metric to split each variable and then uses the information gain ratio to determine the split.

The default criterion for categorical responses is ENTROPY.

For continuous responses only, you can specify the following criteria:

FTEST <(options)>

uses an F statistic to split each variable and then uses the resulting p-value to determine the split variable.

You can specify the following options:

ALPHA=value

specifies the maximum p-value for a split to be considered.

By default, ALPHA=0.3.

BONFERRONI

requests a Bonferroni adjustment to the p-value for a variable after the split has been determined.

By default, no adjustment is made.

RSS VARIANCE

uses the change in response variance to split each variable and then to determine the split.

The default criterion for continuous responses is RSS.