When you specify entropy or the Gini statistic as the splitting criterion, the value of the split is judged by the decrease in the specified criterion. Thus, the criterion for the original leaf is computed, as is the criterion for the final, split leaf. The per-variable split and then the variable on which to split are chosen based on the gain.
When you specify FastCHAID as the splitting criterion, splitting is based on the Kolmogorov-Smirnov distance of the variables.
The entropy is related to the amount of information that a split contains. The entropy of a single leaf is given by the equation
where is the number of observations with the target level t on leaf and is the number of observations on the leaf (Hastie, Tibshirani, and Friedman, 2001; Quinlan, 1993).
When a leaf is split, the total entropy is then
where is the number of observations on the original unsplit leaf.
Split Gini is similar to split entropy. First, the per-leaf Gini statistic or index is given by Hastie, Tibshirani, and Friedman (2001) as
When split, the Gini statistic is then
The Kolmogorov-Smirnov (K-S) distance is the maximum distance between the cumulative distribution functions (CDFs) of two or more target levels (Friedman, 1977; Rokach and Maimon, 2008; Utgoff and Clouse, 1996). To create a meaningful CDF for nominal inputs, nominal target levels are ordered first by the level that is specified in the EVENT= option in the PROC HPSPLIT statement (if specified) and then by the other levels in internal order.
After the CDFs have been created, the maximum K-S distance is given by
where i is an interval variable bin or an explanatory variable level, is the jth target level, and is the kth target level.
At each step of determining each variable’s split, the maximum K-S distance is computed, resulting in a single split. The splitting continues recursively until the value specified in the MAXBRANCH= option has been reached.
After each variable’s split has been determined, the variable that has the lowest p-value is chosen as the variable on which to split. Because this operation is similar to another established tree algorithm (Kass, 1980; Soman, Diwakar, and Ajay, 2010), this overall criterion is called “FastCHAID.”