The HPBIN Procedure

Computing the Weight of Evidence and Information Value

PROC HPBIN can compute the weight of evidence and the information value.

Weight of evidence (WOE) is a measure of how much the evidence supports or undermines a hypothesis. WOE measures the relative risk of an attribute of binning level. The value depends on whether the value of the target variable is a non-event or an event. An attribute’s WOE is defined as follows:

\[ WOE_{attribute} = ln{\frac{p_{attribute}^{non-event}}{p_{attribute}^{event}}} = ln{\frac{\frac{N_{non-event}^{attribute}}{N_{non-event}^{total}}}{\frac{N_{event}^{attribute}}{N_{event}^{total}}}} \]

The definitions of the quantities in the preceding formula are as follows:

  • $N_{non-event}^{attribute}$: the number of non-event records that exhibit the attribute

  • $N_{non-event}^{total}$: the total number of non-event records

  • $N_{event}^{attribute}$: the number of event records that exhibit the attribute

  • $N_{event}^{total}$: the total number of event records

To avoid an undefined WOE, an adjustment factor, x, is used:

\[ WOE_{attribute} = ln{\frac{\frac{N_{non-event}^{attribute}+x}{N_{non-event}^{total}}}{\frac{N_{event}^{attribute}+x}{N_{event}^{total}}}} \]

You can use the WOEADJUST= option to specify a value between [0, 1] for x. By default, x is 0.5.

The information value (IV) is a weighted sum of the WOE of the characteristic’s attributes. The weight is the difference between the conditional probability of an attribute given an event and the conditional probability of that attribute given a non-event. In the following formula of IV, m is the number of bins of a variable:

\[ IV = \sum _{i=1}^{m} (\frac{N_{non-event}^{attribute}}{N_{non-event}^{total}} - \frac{N_{event}^{attribute}}{N_{event}^{total}}) * WOE_ i \]

An information value can be any real number. Generally speaking, the higher the information value, the more predictive a characteristic is likely to be.