Contents: | Purpose / History / Requirements / Usage / Details / Limitations / Missing Values |
All plots can be displayed in a single panel or plots can be individually presented.
%GainLift(version, <other macro options>)
The GainLift macro always attempts to check for a later version of itself. If it is unable to do this (such as if there is no active internet connection available), the macro will issue the following message:
GainLift: Unable to check for newer version
The computations performed by the macro are not affected by the appearance of this message.
Version
|
Update Notes
|
1.1 | Corrected gain formula to use absolute value. Removed cumulative gain. Added Best possible values for each statistic. Added default table of statistics. Added GRAPHOPTS= option that subsumes previous GRAPH, GRID= and PANEL= options and adds capability for displaying baseline and best possible lines. Added TABLEOPTS option that controls whether statistics table is presented and presence of baseline and best possible columns. Replaced ONEPLOT= option with PLOTS= option to specify any subset of the plots for display. Now displays notes in the log for time used and output data set creation. |
1.0 | Initial version |
%inc "<location of your file containing the GainLift macro>";
Following this statement, you can call the GainLift macro using the following syntax. Macro arguments can be listed in any order.
%GainLift(<list of macro arguments separated by commas>)
See the Results tab for an example.
The following macro arguments are required:
The following macro arguments are optional:
The gains plot shows the percent difference between the overall proportion of events and the observed proportion of events in all groups up to the current group. The lift plot gives, for each group, the ratio of the proportion of observations in the group that are events to the overall proportion of events. The cumulative lift plot gives event rate up to the current group over the overall event rate. The percent captured plot gives, for each group, the percentage of total events in the group. The cumulative percent captured plots gives the percentage of total events up to the current group. The percent response plot gives the percentage of observations in the group that are events. The cumulative percent response plot gives the percentage of observations up to the current group that are events.
The statistics are computed as follows:
E = total number of events N = number of observations G = number of groups (10 for deciles or 20 for demi-deciles) P = overall proportion of observations that are events (P = E/N) ei = number of events in group i, i=1,2,...,G bi = best event count achievable in group i, i=1,2,...,G = E-Σjej if E-Σjej≤ni, = ni otherwise, where j=1,2,...,i-1 ni = number of observations in group i pi = proportion of observations in group i that are events (pi = ei/ni) %Response = 100*pi Cumulative %Response = 100*Σiei/Σini %Captured = 100*ei/E Cumulative %Captured = 100*Σiei/E Lift = pi/P Cumulative Lift = (Σiei/Σini)/P Gain = 100*abs(Cumulative Lift-1)
The best possible statistic values for a model are computed as above using the best event count achievable, bi, rather than ei. This puts all observed events in the top decile or demi-decile groups, filling the top groups as needed to absorb all observed events.
Assuming an uninformative model which randomly puts observations into groups, baseline values for each statistic can be computed as follows:
%Response = 100*P Cumulative %Response = 100*P %Captured = 100*ni/N Cumulative %Captured = 100*Σini/N Lift = 1 Cumulative Lift = 1 Gain = 0
If the data were collected by sampling from the response levels, such as to oversample the event or in case-control settings, then see this note on adjusting the predicted probabilities for the sampling before using the macro.
These sample files and code examples are provided by SAS Institute Inc. "as is" without warranty of any kind, either express or implied, including but not limited to the implied warranties of merchantability and fitness for a particular purpose. Recipients acknowledge and agree that SAS Institute shall not be liable for any damages whatsoever arising out of their use of this material. In addition, SAS Institute will provide no support for the materials contained herein.
These sample files and code examples are provided by SAS Institute Inc. "as is" without warranty of any kind, either express or implied, including but not limited to the implied warranties of merchantability and fitness for a particular purpose. Recipients acknowledge and agree that SAS Institute shall not be liable for any damages whatsoever arising out of their use of this material. In addition, SAS Institute will provide no support for the materials contained herein.
The following example uses the cancer remission data presented in the stepwise logistic regression example in the LOGISTIC documentation. The first model uses only a single predictor. These statements fit the model, produce the ROC plot, and create a data set (OUT) for later use in the GainLift macro.
ods graphics on; title "Cancer remission model (smear)"; proc logistic data=remission plots(only)=roc; model remiss(event='1') = smear; output out=out p=p; run;
The performance of the model is assessed by the ROC curve provided by PROC LOGISTIC and by the plots from the GainLift macro. The area under the ROC curve is 0.586 indicating only a slight improvement in predictive ability of this model over an uninformative model containing only an intercept (with area 0.5).
|
The GainLift macro is called next to provide additional assessment of the model. The default table of statistics is suppressed by the TABLEOPTS=NOTABLE option.
%inc "<location of your file containing the GainLift macro>"; %GainLift(data=out, response=remiss, p=p, event=1, tableopts=notable)
The cumulative plots from the GainLift macro show curves for the model that do not deviate much from baseline and are far from the best possible which again indicates a poorly performing model.
|
The final model found by stepwise selection is assessed next. These statements fit the model and produce the ROC plot.
title "Cancer remission model (li temp cell)"; proc logistic data=remission plots(only)=roc; model remiss(event='1') = li temp cell; output out=out p=p; run;
The area under the ROC curve for this model is 0.889 indicating much better performance.
|
The GainLift macro is called to evaluate this new model.
%GainLift(data=out, response=remiss, p=p, event=1)
The plots from the GainLift macro are now much closer to the best possible case than to the baseline case. In the table and plot, the cumulative percent of remissions captured is over 55% using only the top 25% of predicted remission probabilities. The cumulative percent response indicates that the top 20% of predicted remission probabilities contain 80% remissions. This is a factor of 2.4 (cumulative lift) times the base (overall) remission rate of 33%, or a 140% gain.
|
The following macro call produces a bar chart version of the cumulative percent captured plot for the final model. The baseline case is omitted by the NOBASE option.
%GainLift(data=out, response=remiss, p=p, event=1, plots=ccapt, graphopts=bar nobase, tableopts=notable)
|
Right-click on the link below and select Save to save the GainLift macro definition to a file. It is recommended that you name the file GainLift.sas.
Type: | Sample |
Topic: | Analytics ==> Categorical Data Analysis Analytics ==> Regression |
Date Modified: | 2017-02-23 14:36:10 |
Date Created: | 2010-11-24 14:49:04 |
Product Family | Product | Host | SAS Release | |
Starting | Ending | |||
SAS System | SAS/STAT | z/OS | ||
OpenVMS VAX | ||||
Microsoft® Windows® for 64-Bit Itanium-based Systems | ||||
Microsoft Windows Server 2003 Datacenter 64-bit Edition | ||||
Microsoft Windows Server 2003 Enterprise 64-bit Edition | ||||
Microsoft Windows XP 64-bit Edition | ||||
Microsoft® Windows® for x64 | ||||
OS/2 | ||||
Microsoft Windows 95/98 | ||||
Microsoft Windows 2000 Advanced Server | ||||
Microsoft Windows 2000 Datacenter Server | ||||
Microsoft Windows 2000 Server | ||||
Microsoft Windows 2000 Professional | ||||
Microsoft Windows NT Workstation | ||||
Microsoft Windows Server 2003 Datacenter Edition | ||||
Microsoft Windows Server 2003 Enterprise Edition | ||||
Microsoft Windows Server 2003 Standard Edition | ||||
Microsoft Windows Server 2003 for x64 | ||||
Microsoft Windows Server 2008 | ||||
Microsoft Windows Server 2008 for x64 | ||||
Microsoft Windows XP Professional | ||||
Windows 7 Enterprise 32 bit | ||||
Windows 7 Enterprise x64 | ||||
Windows 7 Home Premium 32 bit | ||||
Windows 7 Home Premium x64 | ||||
Windows 7 Professional 32 bit | ||||
Windows 7 Professional x64 | ||||
Windows 7 Ultimate 32 bit | ||||
Windows 7 Ultimate x64 | ||||
Windows Millennium Edition (Me) | ||||
Windows Vista | ||||
Windows Vista for x64 | ||||
64-bit Enabled AIX | ||||
64-bit Enabled HP-UX | ||||
64-bit Enabled Solaris | ||||
ABI+ for Intel Architecture | ||||
AIX | ||||
HP-UX | ||||
HP-UX IPF | ||||
IRIX | ||||
Linux | ||||
Linux for x64 | ||||
Linux on Itanium | ||||
OpenVMS Alpha | ||||
OpenVMS on HP Integrity | ||||
Solaris | ||||
Solaris for x64 | ||||
Tru64 UNIX |