The HPSPLIT Procedure

CRITERION Statement

CRITERION criterion </ options> ;

The CRITERION statement specifies the criterion by which to grow the tree.

You can set the criterion to one of the following:

ENTROPY

uses the gain in information (decrease in entropy) to split each variable and then to determine the split.

This is the default criterion.

FASTCHAID

uses a Kolmogorov-Smirnov splitter to determine splits for each variable, following a recursive method similar to that of Friedman (1977) (after ordering the levels of nominal variables by the level specified in the EVENT= option), and then uses the lowest of each variable’s resulting p-values to determine the variable on which to split.

Note: The FASTCHAID criterion is experimental in this release.

GINI

uses the decrease in Gini statistic to split each variable and then to determine the split.

You can also specify the following options:

LEVTHRESH1=number

specifies the maximum number of computations to perform for an exhaustive search for a nominal input. If the input variable being examined is a nominal variable, the splitter tries to fall back to the fast algorithm. Otherwise, it falls back to a greedy algorithm. The LEVTHRESH1= option does not affect interval inputs.

The default is LEVTHRESH1=500,000.

LEVTHRESH2=number

specifies the maximum number of computations to perform in a greedy search for nominal input variables. If the input variable that is being examined is an interval variable, the LEVTHRESH2= option specifies the number of computations to perform for an exhaustive search of all possible split points.

If the number of computations in either case is greater than number, the splitter uses a much faster greedy algorithm.

Although this option is similar to the LEVTHRESH1= option, it specifies the computations of the nominal variable fallback algorithm for finding the best splits of a nominal variable, a calculation that has a much different computational complexity.

The default is LEVTHRESH2=1,000,000.