SUPPORT / SAMPLES & SAS NOTES
 

Support

Sample 41683: Gains and Lift plots for binary-response models

DetailsResultsDownloadsAboutRate It

Gains and Lift plots for binary-response models

Contents: Purpose / History / Requirements / Usage / Details / Limitations / Missing Values
PURPOSE:
The GainLift macro produces a Gains plot and cumulative and noncumulative plots of the following statistics for a binary-response model such as a logistic or probit model:
  • Lift
  • Percent Captured
  • Percent Response

All plots can be displayed in a single panel or plots can be individually presented.

HISTORY:
The version of the GainLift macro that you are using is displayed in the SAS® log when you specify version (or any string) as the first argument. For example:
    %GainLift(version, <other macro options>)

The GainLift macro always attempts to check for a later version of itself. If it is unable to do this (such as if there is no active internet connection available), the macro will issue the following message:

   GainLift: Unable to check for newer version

The computations performed by the macro are not affected by the appearance of this message.

Version
Update Notes
1.1 Corrected gain formula to use absolute value. Removed cumulative gain. Added Best possible values for each statistic. Added default table of statistics. Added GRAPHOPTS= option that subsumes previous GRAPH, GRID= and PANEL= options and adds capability for displaying baseline and best possible lines. Added TABLEOPTS option that controls whether statistics table is presented and presence of baseline and best possible columns. Replaced ONEPLOT= option with PLOTS= option to specify any subset of the plots for display. Now displays notes in the log for time used and output data set creation.
1.0 Initial version
REQUIREMENTS:
Base SAS®
USAGE:
Follow the instructions in the Downloads tab of this sample to save the GainLift macro definition. In your SAS program or in the SAS editor window, specify the following statement to define the GainLift macro and make it available for use. Replace the text within quotes with the location of the GainLift macro definition file on your system.

%inc "<location of your file containing the GainLift macro>";

Following this statement, you can call the GainLift macro using the following syntax. Macro arguments can be listed in any order.

%GainLift(<list of macro arguments separated by commas>)

See the Results tab for an example.

The following macro arguments are required:

response=variable-name
Specifies the name of the variable containing the observed response values. This is typically the variable specified to the left of the equal sign in the MODEL statement of procedures such as LOGISTIC, GENMOD, and PROBIT.
event=value | "value" | 'value'
Specifies the value of the response variable which represents the event level that was modeled. If the RESPONSE= variable is a character variable, value must be enclosed in quotes.
p=variable-name
Specifies the name of the variable containing the predicted event probabilities produced by the model. This variable is typically produced by the P= option in the OUTPUT statement of procedures such as LOGISTIC, GENMOD, and PROBIT.
 

The following macro arguments are optional:

data=SAS-data-set
Specifies the name of the data set containing observed response values and predicted probabilities from the fitted model. This data set is typically produced by the OUT= option in the OUTPUT statement of procedures such as LOGISTIC, GENMOD, and PROBIT. If not specified, the data set last created is used.
tableopts=values
Specify any of the following, separated by spaces: TABLE, to produce a table showing statistics for each decile or demi-decile group, NOTABLE, to omit the table of statistics, BASE, to show the baseline (case of an uninformative model), NOBASE, to omit the baseline case, BEST, to show the best possible case, NOBEST, to omit the best possible case. The default is TABLEOPTS=TABLE NOBASE NOBEST.
graphopts=values
Specify any of the following, separated by spaces: NOGRAPH, suppresses all plots, LINE, produces line plots, BAR, produces bar charts, PANEL, plots all statistics in a panel, NOPANEL, plots all statistics in separate plots, GRID, adds grid lines in plots, NOGRID, omits grid lines, BASE, shows the baseline case (uninformative model), NOBASE, omits the baseline case, BEST, shows the best possible case, NOBEST, omits the best possible case. The default is GRAPHOPTS=LINE PANEL GRID BEST BASE.
groups=20 | 10
Specifies the number of groups produced from the predicted event probabilities. By default, 20 groups (demi-deciles) are produced. GROUPS=10 produces deciles.
plots=CGAIN | GAIN | CLIFT | LIFT | CCAPT | PCAPT | CRESP | PRESP
Requests a individual, full-size plots rather than a panel of all seven plots. Specify one or more of the following, separated by spaces: CGAIN (cumulative gains), GAIN (gains), CLIFT (cumulative lift), LIFT (lift), CCAPT (cumulative percent captured), PCAPT (percent captured), CRESP (cumulative percent response), or PRESP (percent response).
xaxis=SelectedPCT | Percentile | GroupNum
The default XAXIS=SelectedPct displays the selected percentile values of the predicted probability groups on the horizontal axis (labeled Depth). The group with the highest predicted probability has the lowest depth. The group with depth 100 has the lowest predicted probability. The opposite is true when XAXIS=Percentile since the groups with the lowest predicted probability has the lowest percentile. Group numbers are displayed when XAXIS=GroupNum. Group 1 has highest predicted probability.
out=SAS-data-set
Names the data set created by the macro which contains the statistics needed to generate the plots. If not specified, the data set is named _GainLift.
DETAILS:
The plots produced by the GainLift macro are commonly used to assess the predictive ability of a binary response model, such as a logistic or probit model or of a decision tree. Gains and lift plots are popular in direct marketing and credit scoring fields. Another plot, the ROC plot, also summarizes predictive ability. The ROC plot can be produced directly by the LOGISTIC procedure.

The gains plot shows the percent difference between the overall proportion of events and the observed proportion of events in all groups up to the current group. The lift plot gives, for each group, the ratio of the proportion of observations in the group that are events to the overall proportion of events. The cumulative lift plot gives event rate up to the current group over the overall event rate. The percent captured plot gives, for each group, the percentage of total events in the group. The cumulative percent captured plots gives the percentage of total events up to the current group. The percent response plot gives the percentage of observations in the group that are events. The cumulative percent response plot gives the percentage of observations up to the current group that are events.

The statistics are computed as follows:

   E  = total number of events
   N  = number of observations
   G  = number of groups (10 for deciles or 20 for demi-deciles)
   P  = overall proportion of observations that are events (P = E/N)
   ei = number of events in group i, i=1,2,...,G
   bi = best event count achievable in group i, i=1,2,...,G
      = E-Σjej if E-Σjej≤ni, = ni otherwise, where j=1,2,...,i-1
   ni = number of observations in group i
   pi = proportion of observations in group i that are events (pi = ei/ni)

   %Response = 100*pi
   Cumulative %Response = 100*Σieiini
   %Captured = 100*ei/E
   Cumulative %Captured = 100*Σiei/E
   Lift = pi/P
   Cumulative Lift = (Σieiini)/P
   Gain = 100*abs(Cumulative Lift-1)

The best possible statistic values for a model are computed as above using the best event count achievable, bi, rather than ei. This puts all observed events in the top decile or demi-decile groups, filling the top groups as needed to absorb all observed events.

Assuming an uninformative model which randomly puts observations into groups, baseline values for each statistic can be computed as follows:

   %Response = 100*P
   Cumulative %Response = 100*P
   %Captured = 100*ni/N
   Cumulative %Captured = 100*Σini/N
   Lift = 1
   Cumulative Lift = 1
   Gain = 0
LIMITATIONS:
Only unaggregated data can be used. That is, the GainLift macro cannot be used on data which were fitted using events/trials syntax in the MODEL statement of the fitting procedure, or if a FREQ statement was used. To use the macro for such data, the data must be expanded so that each observation represents a single individual or item. This can be done using a DATA step and DO loops containing OUTPUT statements.

If the data were collected by sampling from the response levels, such as to oversample the event or in case-control settings, then see this note on adjusting the predicted probabilities for the sampling before using the macro.

MISSING VALUES:
Observations that resulted in missing predicted probability (such as when a predictor value is missing) are ignored and do not influence the statistics.



These sample files and code examples are provided by SAS Institute Inc. "as is" without warranty of any kind, either express or implied, including but not limited to the implied warranties of merchantability and fitness for a particular purpose. Recipients acknowledge and agree that SAS Institute shall not be liable for any damages whatsoever arising out of their use of this material. In addition, SAS Institute will provide no support for the materials contained herein.