41683 - Gains and Lift plots for binary-response models

SUPPORT / SAMPLES & SAS NOTES

Support

Sample 41683: Gains and Lift plots for binary-response models

Gains and Lift plots for binary-response models

Contents:

Purpose / History / Requirements / Usage / Details / Limitations / Missing Values

PURPOSE:

The GainLift macro produces a Gains plot and cumulative and noncumulative plots of the following statistics for a binary-response model such as a logistic or probit model:

Lift
Percent Captured
Percent Response

All plots can be displayed in a single panel or plots can be individually presented.

HISTORY:

The version of the GainLift macro that you are using is displayed in the SAS^® log when you specify version (or any string) as the first argument. For example:

    %GainLift(version, <other macro options>)

The GainLift macro always attempts to check for a later version of itself. If it is unable to do this (such as if there is no active internet connection available), the macro will issue the following message:

   GainLift: Unable to check for newer version

The computations performed by the macro are not affected by the appearance of this message.

Version	Update Notes
1.1	Corrected gain formula to use absolute value. Removed cumulative gain. Added Best possible values for each statistic. Added default table of statistics. Added GRAPHOPTS= option that subsumes previous GRAPH, GRID= and PANEL= options and adds capability for displaying baseline and best possible lines. Added TABLEOPTS option that controls whether statistics table is presented and presence of baseline and best possible columns. Replaced ONEPLOT= option with PLOTS= option to specify any subset of the plots for display. Now displays notes in the log for time used and output data set creation.
1.0	Initial version

REQUIREMENTS:

Base SAS^®

USAGE:

Follow the instructions in the Downloads tab of this sample to save the GainLift macro definition. In your SAS program or in the SAS editor window, specify the following statement to define the GainLift macro and make it available for use. Replace the text within quotes with the location of the GainLift macro definition file on your system.

%inc "<location of your file containing the GainLift macro>";

Following this statement, you can call the GainLift macro using the following syntax. Macro arguments can be listed in any order.

%GainLift(<list of macro arguments separated by commas>)

See the Results tab for an example.

The following macro arguments are required:

response=variable-name: Specifies the name of the variable containing the observed response values. This is typically the variable specified to the left of the equal sign in the MODEL statement of procedures such as LOGISTIC, GENMOD, and PROBIT.
event=value | "value" | 'value': Specifies the value of the response variable which represents the event level that was modeled. If the RESPONSE= variable is a character variable, value must be enclosed in quotes.
p=variable-name: Specifies the name of the variable containing the predicted event probabilities produced by the model. This variable is typically produced by the P= option in the OUTPUT statement of procedures such as LOGISTIC, GENMOD, and PROBIT.

The following macro arguments are optional:

data=SAS-data-set: Specifies the name of the data set containing observed response values and predicted probabilities from the fitted model. This data set is typically produced by the OUT= option in the OUTPUT statement of procedures such as LOGISTIC, GENMOD, and PROBIT. If not specified, the data set last created is used.
tableopts=values: Specify any of the following, separated by spaces: TABLE, to produce a table showing statistics for each decile or demi-decile group, NOTABLE, to omit the table of statistics, BASE, to show the baseline (case of an uninformative model), NOBASE, to omit the baseline case, BEST, to show the best possible case, NOBEST, to omit the best possible case. The default is TABLEOPTS=TABLE NOBASE NOBEST.
graphopts=values: Specify any of the following, separated by spaces: NOGRAPH, suppresses all plots, LINE, produces line plots, BAR, produces bar charts, PANEL, plots all statistics in a panel, NOPANEL, plots all statistics in separate plots, GRID, adds grid lines in plots, NOGRID, omits grid lines, BASE, shows the baseline case (uninformative model), NOBASE, omits the baseline case, BEST, shows the best possible case, NOBEST, omits the best possible case. The default is GRAPHOPTS=LINE PANEL GRID BEST BASE.
groups=20 | 10: Specifies the number of groups produced from the predicted event probabilities. By default, 20 groups (demi-deciles) are produced. GROUPS=10 produces deciles.
plots=CGAIN | GAIN | CLIFT | LIFT | CCAPT | PCAPT | CRESP | PRESP: Requests a individual, full-size plots rather than a panel of all seven plots. Specify one or more of the following, separated by spaces: CGAIN (cumulative gains), GAIN (gains), CLIFT (cumulative lift), LIFT (lift), CCAPT (cumulative percent captured), PCAPT (percent captured), CRESP (cumulative percent response), or PRESP (percent response).
xaxis=SelectedPCT | Percentile | GroupNum: The default XAXIS=SelectedPct displays the selected percentile values of the predicted probability groups on the horizontal axis (labeled Depth). The group with the highest predicted probability has the lowest depth. The group with depth 100 has the lowest predicted probability. The opposite is true when XAXIS=Percentile since the groups with the lowest predicted probability has the lowest percentile. Group numbers are displayed when XAXIS=GroupNum. Group 1 has highest predicted probability.
out=SAS-data-set: Names the data set created by the macro which contains the statistics needed to generate the plots. If not specified, the data set is named _GainLift.

DETAILS:

The plots produced by the GainLift macro are commonly used to assess the predictive ability of a binary response model, such as a logistic or probit model or of a decision tree. Gains and lift plots are popular in direct marketing and credit scoring fields. Another plot, the ROC plot, also summarizes predictive ability. The ROC plot can be produced directly by the LOGISTIC procedure.

The gains plot shows the percent difference between the overall proportion of events and the observed proportion of events in all groups up to the current group. The lift plot gives, for each group, the ratio of the proportion of observations in the group that are events to the overall proportion of events. The cumulative lift plot gives event rate up to the current group over the overall event rate. The percent captured plot gives, for each group, the percentage of total events in the group. The cumulative percent captured plots gives the percentage of total events up to the current group. The percent response plot gives the percentage of observations in the group that are events. The cumulative percent response plot gives the percentage of observations up to the current group that are events.

The statistics are computed as follows:

   E  = total number of events
   N  = number of observations
   G  = number of groups (10 for deciles or 20 for demi-deciles)
   P  = overall proportion of observations that are events (P = E/N)
   e_i = number of events in group i, i=1,2,...,G
   b_i = best event count achievable in group i, i=1,2,...,G
      = E-Σ_je_j if E-Σ_je_j≤n_i, = n_i otherwise, where j=1,2,...,i-1
   n_i = number of observations in group i
   p_i = proportion of observations in group i that are events (p_i = e_i/n_i)

   %Response = 100*p_i
   Cumulative %Response = 100*Σ_ie_i/Σ_in_i
   %Captured = 100*e_i/E
   Cumulative %Captured = 100*Σ_ie_i/E
   Lift = p_i/P
   Cumulative Lift = (Σ_ie_i/Σ_in_i)/P
   Gain = 100*abs(Cumulative Lift-1)

The best possible statistic values for a model are computed as above using the best event count achievable, b_i, rather than e_i. This puts all observed events in the top decile or demi-decile groups, filling the top groups as needed to absorb all observed events.

Assuming an uninformative model which randomly puts observations into groups, baseline values for each statistic can be computed as follows:

   %Response = 100*P
   Cumulative %Response = 100*P
   %Captured = 100*n_i/N
   Cumulative %Captured = 100*Σ_in_i/N
   Lift = 1
   Cumulative Lift = 1
   Gain = 0

LIMITATIONS:

Only unaggregated data can be used. That is, the GainLift macro cannot be used on data which were fitted using events/trials syntax in the MODEL statement of the fitting procedure, or if a FREQ statement was used. To use the macro for such data, the data must be expanded so that each observation represents a single individual or item. This can be done using a DATA step and DO loops containing OUTPUT statements.

If the data were collected by sampling from the response levels, such as to oversample the event or in case-control settings, then see this note on adjusting the predicted probabilities for the sampling before using the macro.

MISSING VALUES:

Observations that resulted in missing predicted probability (such as when a predictor value is missing) are ignored and do not influence the statistics.

These sample files and code examples are provided by SAS Institute Inc. "as is" without warranty of any kind, either express or implied, including but not limited to the implied warranties of merchantability and fitness for a particular purpose. Recipients acknowledge and agree that SAS Institute shall not be liable for any damages whatsoever arising out of their use of this material. In addition, SAS Institute will provide no support for the materials contained herein.

Type:	Sample
Topic:	Analytics ==> Categorical Data Analysis Analytics ==> Regression

Date Modified:	2017-02-23 14:36:10
Date Created:	2010-11-24 14:49:04

Product Family	Product	Host	SAS Release
Product Family	Product	Host	Starting	Ending
SAS System	SAS/STAT	z/OS
		OpenVMS VAX
		Microsoft® Windows® for 64-Bit Itanium-based Systems
		Microsoft Windows Server 2003 Datacenter 64-bit Edition
		Microsoft Windows Server 2003 Enterprise 64-bit Edition
		Microsoft Windows XP 64-bit Edition
		Microsoft® Windows® for x64
		OS/2
		Microsoft Windows 95/98
		Microsoft Windows 2000 Advanced Server
		Microsoft Windows 2000 Datacenter Server
		Microsoft Windows 2000 Server
		Microsoft Windows 2000 Professional
		Microsoft Windows NT Workstation
		Microsoft Windows Server 2003 Datacenter Edition
		Microsoft Windows Server 2003 Enterprise Edition
		Microsoft Windows Server 2003 Standard Edition
		Microsoft Windows Server 2003 for x64
		Microsoft Windows Server 2008
		Microsoft Windows Server 2008 for x64
		Microsoft Windows XP Professional
		Windows 7 Enterprise 32 bit
		Windows 7 Enterprise x64
		Windows 7 Home Premium 32 bit
		Windows 7 Home Premium x64
		Windows 7 Professional 32 bit
		Windows 7 Professional x64
		Windows 7 Ultimate 32 bit
		Windows 7 Ultimate x64
		Windows Millennium Edition (Me)
		Windows Vista
		Windows Vista for x64
		64-bit Enabled AIX
		64-bit Enabled HP-UX
		64-bit Enabled Solaris
		ABI+ for Intel Architecture
		AIX
		HP-UX
		HP-UX IPF
		IRIX
		Linux
		Linux for x64
		Linux on Itanium
		OpenVMS Alpha
		OpenVMS on HP Integrity
		Solaris
		Solaris for x64
		Tru64 UNIX

Support

Sample 41683: Gains and Lift plots for binary-response models

Gains and Lift plots for binary-response models

Operating System and Release Information

Follow Us

What is...