CRITERION Statement :: SAS/STAT(R) 13.1 User's Guide: High-Performance Procedures

Previous Page|Next Page

CRITERION Statement

CRITERION criterion </ options> ;

The CRITERION statement specifies the criterion by which to grow the tree.

For nominal targets, you can set the criterion to one of the following:

CHISQUARE: uses the p-values to split each variable and then to determine the split.
ENTROPY: uses the gain in information (decrease in entropy) to split each variable and then to determine the split.
FASTCHAID: uses a Kolmogorov-Smirnov splitter to determine splits for each variable, following a recursive method similar to that of Friedman (1977) (after ordering the levels of nominal variables by the level specified in the EVENT= option), and then uses the lowest of each variable’s resulting p-values to determine the variable on which to split.
GINI: uses the decrease in Gini statistic to split each variable and then to determine the split.

The default criterion for nominal targets is ENTROPY.

For interval targets, you can set the criterion to one of the following:

FTEST: uses an F test to split each variable and then to determine the split.
VARIANCE: uses the change in target variance to split each variable and then to determine the split.

The default criterion for interval targets is VARIANCE.

You can also specify the following options after a slash (/):

LEVTHRESH1=number

specifies the maximum number of computations to perform for an exhaustive search for a nominal input. If the input variable being examined is a nominal variable, the splitter tries to fall back to the fast algorithm. Otherwise, it falls back to a greedy algorithm. The LEVTHRESH1= option does not affect interval inputs.

By default, LEVTHRESH1=500,000.

LEVTHRESH2=number

specifies the maximum number of computations to perform in a greedy search for nominal input variables. If the input variable that is being examined is an interval variable, the LEVTHRESH2= option specifies the number of computations to perform for an exhaustive search of all possible split points.

If the number of computations in either case is greater than number, the splitter uses a much faster greedy algorithm.

Although this option is similar to the LEVTHRESH1= option, it specifies the computations of the nominal variable fallback algorithm for finding the best splits of a nominal variable, a calculation that has a much different computational complexity.

By default, LEVTHRESH2=1,000,000.

The HPSPLIT Procedure

CRITERION Statement