When the
Rapid
growth property is
enabled,
node splits are determined in part by
information gain ratio instead of
information gain. The information gain and information gain ratio calculations and their benefits
and drawbacks are explained in this section. In these explanations, an
attribute is considered any specific measurement level of a classification variable or
bin of a measure variable.
The information gain method chooses a split based on which attribute provides the
greatest information gain. The gain is measured in bits. Although this
method provides good results, it favors splitting on variables that have a large number
of
attributes. The information gain ratio method incorporates the value of a split to determine
what proportion of the information
gain is actually valuable for that split. The split with the greatest information
gain ratio is chosen.
The information gain calculation starts by determining the information of the training
data. The information
in a response value, r,
is calculated in the following expression:
T represents
the training data and
|T| is the number of
observations. To determine the expected information of the training data, sum this expression
for every possible response value:
Here, n is
the total number of response values. This value is also referred to
as the entropy of the training data.
Next, consider a split S on
a variable X with m possible attributes. The expected information provided by that split is calculated
by the following equation:
In this equation, Tj represents the observations that contain the jth attribute.
The information gain of split S is calculated by the following
equation:
Information gain ratio attempts to correct the information gain calculation by introducing
a split information value. The split information is calculated
by the following equation:
As its name suggests, the information gain ratio is the ratio of the information gain
to the split information: