The CRITERION statement specifies the criterion by which to grow the tree.
For nominal targets, you can set the criterion to one of the following:
-
CHAID
-
uses values of a chi-square test (decision tree) or an F test (regression tree) to merge similar levels of nominal inputs until the number of children in the proposed split reaches
the value of the MAXBRANCH= option. It then uses the p-values of the final split to determine the variable on which to split. For interval inputs, CHAID chooses the best single
split until the number of children in the proposed split reaches the value of the MAXBRANCH= option.
This criterion is available for both interval and nominal targets.
-
CHISQUARE
-
uses the p-values to split each variable and then to determine the split.
-
ENTROPY
-
uses the gain in information (decrease in entropy) to split each variable and then to determine the split.
-
FASTCHAID
-
uses a Kolmogorov-Smirnov splitter to determine splits for each variable, following a recursive method similar to that of
Friedman (1977) (after ordering the levels of nominal variables by the level specified in the EVENT= option), and then uses the lowest of
each variable’s resulting p-values to determine the variable on which to split.
-
GINI
-
uses the decrease in Gini statistic to split each variable and then to determine the split.
-
IGR
-
uses the entropy metric to split each variable and then uses the information gain ratio to determine the split.
This criterion is available only for nominal targets.
The default criterion for nominal targets is ENTROPY.
For interval targets, you can set the criterion to one of the following:
-
FTEST
-
uses an F test to split each variable and then to determine the split.
-
VARIANCE
-
uses the change in target variance to split each variable and then to determine the split.
The default criterion for interval targets is VARIANCE.
You can also specify the following options after a slash (/):
-
LEVTHRESH1=number
-
specifies the maximum number of computations to perform for an exhaustive search for a nominal input. If the input variable
being examined is a nominal variable, the splitter tries to fall back to the fast algorithm. Otherwise, it falls back to a
greedy algorithm. The LEVTHRESH1= option does not affect interval inputs.
By default, LEVTHRESH1=500,000.
-
LEVTHRESH2=number
-
specifies the maximum number of computations to perform in a greedy search for nominal input variables. If the input variable
that is being examined is an interval variable, the LEVTHRESH2= option specifies the number of computations to perform for
an exhaustive search of all possible split points.
If the number of computations in either case is greater than number, the splitter uses a much faster greedy algorithm.
Although this option is similar to the LEVTHRESH1= option, it specifies the computations of the nominal variable fallback
algorithm for finding the best splits of a nominal variable, a calculation that has a much different computational complexity.
By default, LEVTHRESH2=1,000,000.