SUPPORT / SAMPLES & SAS NOTES
 

Support

Usage Note 45767: Computing the statistics in "Association of Predicted Probabilities and Observed Responses" table

DetailsAboutRate It

"Rank Correlation of Observed Responses and Predicted Probabilities" in the Details section of the PROC LOGISTIC documentation describes the binning of predicted probabilities and how pairs of observations are determined to be concordant, discordant, or tied. From these, the association statistics — Somers' D (Gini coefficient), gamma, tau-a, and c (the concordance index and area under the ROC curve) — can be computed using the formulas shown in the documentation.

Note that binning the predicted probabilities is more efficient and reduces execution time for large data sets, but produces a rougher approximation to these statistics. Binning can be turned off by specifying the BINWIDTH=0 option in the MODEL statement, or by specifying any of the following:

  • the OUTROC= option in the MODEL or SCORE statement
  • one or more ROC statements
  • the PLOTS=ROC (or PLOTS=ALL) option in the PROC LOGISTIC statement when ODS Graphics is on

A more accurate approximation of the association statistics, such as the area under the ROC curve (c statistic), is obtained by using any of the above to turn off binning.

Note that, beginning in SAS 9.4 TS1M3, no binning is done if the response is binary and there are fewer than 5,000,000 observations in the input data set.

The following example uses the described method to optionally bin the predicted probabilities and compute the association statistics. The code shown applies only to binary response models.

Example

These statements produce an example data set for which the association statistics will be computed.

      data a; 
        input Seq Score Outcome;
        datalines;
      1   1525    0
      2   1641    0
      3   1706    0
      4   1722    0
      5   1738    1
      6   1758    1
      7   1770    0
      8   1775    0
      9   1798    0
      10  1839    1
      11  1842    0
      12  1848    0
      13  1848    0
      14  1856    0
      15  1864    0
      16  1864    0
      17  1879    0
      18  1895    0
      19  1909    0
      20  1917    0
      21  1944    0
      22  1975    0
      23  2002    0
      ;

These statements fit a binary logistic model to the OUTCOME variable. The EVENT="1" response variable option ensures that the probability of OUTCOME=1 is modeled. Since the BINWIDTH= option is not specified, the default bin width of 1/500 = 0.002 is used in computing the association statistics. The predicted probabilities computed by the PREDPROBS=INDIVIDUAL option are not binned and are saved in data set OUT.

      proc logistic data=a;
        model outcome(event="1") = score;
        output out=out predprobs=individual;
        run;

The resulting "Association of Predicted Probabilities and Observed Responses" table from the model fit is shown below.

Association of Predicted Probabilities and
Observed Responses
Percent Concordant 73.3 Somers' D 0.483
Percent Discordant 25.0 Gamma 0.492
Percent Tied 1.7 Tau-a 0.115
Pairs 60 c 0.742

By specifying the BINWIDTH=0 option (or any of the other options or statements mentioned above), binning is turned off.

      proc logistic data=a;
        model outcome(event="1") = score / binwidth=0;
        run;

Following is the table of statistics when the predicted probabilities are not binned.

Association of Predicted Probabilities and
Observed Responses
Percent Concordant 75.0 Somers' D 0.500
Percent Discordant 25.0 Gamma 0.500
Percent Tied 0.0 Tau-a 0.119
Pairs 60 c 0.750

The following statements define the macro CONCDISC which applies the binning method described in the LOGISTIC documentation. It creates the data set _PAIRS which contains an observation for each possible pair of event and nonevent observations and indicates whether each pair is concordant, discordant, or tied. The macro requires the data set of predicted probabilities, the name of the response variable, and the values of the event and nonevent levels of the response. If the BINWIDTH= option was not specified in the PROC LOGISTIC step, then it can be omitted when calling the macro. The macro will then use the same default bin width. Otherwise, specify the same value in the BINWIDTH= macro option as was specified in the PROC LOGISTIC step.

      %macro concdisc(data=, event=, nonevent=, response=, binwidth=0.002);
      %global n;
      /* bin the predicted probabilities and give all in the bin the lowest value */
      data _r&data;
        set &data;
        %if &binwidth ne 0 %then pbin=floor((1/&binwidth)*ip_&event);
        %else                    pbin=ip_&event;
        ;
        run;
      proc sql;
        reset noprint;
        /* number of observations in original data set into macro variable N */
        select count(*) into :N from _r&data;
        /* create data set of nonevents */
        create table _nonevents as select pbin as pbin0 from _r&data where &response=&nonevent;
        /* create data set of events */
        create table _events as select pbin as pbin1 from _r&data where &response=&event;
        /* create data set of all event-nonevent pairs and determine concordance */
        create table _pairs as select *, (pbin1>pbin0) as conc,
                                         (pbin1=pbin0) as tie,
                                         (pbin0>pbin1) as disc
                     from _nonevents, _events;
      %mend;

This statement calls the CONCDISC macro using the default bin width of 0.002.

      %concdisc(data=out, event=1, nonevent=0, response=outcome)

The following statements compute the proportions and counts of concordant, discordant, and tied observations.

      proc summary data=_pairs;
        var conc disc tie;
        output out=cd mean=propconc propdisc proptie sum=nc nd n=t;
        run;

Finally, these statements use the formulas shown in the documentation to compute and display the association statistics.

      data rankcorrs;
        set cd;
        Concordant=100*propconc; 
        Discordant=100*propdisc;
        Tied=100*proptie;
        Pairs=t;
        SomersD=(nc-nd)/t;
        Gamma=(nc-nd)/(nc+nd);
        Tau_a=(nc-nd)/(0.5*&N*(&N-1));
        c=(nc + 0.5*(t-nc-nd))/t;
        run;
      
      proc print noobs;
        var Concordant Discordant Tied Pairs SomersD Gamma Tau_a c;
        format Concordant Discordant Tied Pairs 5.1 SomersD Gamma Tau_a c 6.3;
        run;

Note that the recomputed association statistics match those produced by PROC LOGISTIC when the default binning was used.

Concordant Discordant Tied Pairs SomersD Gamma Tau_a c
73.3 25.0 1.7 60.0 0.483 0.492 0.115 0.742

The association statistics resulting from not binning the predicted probabilities can be obtained by using the above code with the BINWIDTH=0 option in the CONCDISC macro.

      %concdisc(data=out, event=1, nonevent=0, response=outcome, binwidth=0)
Concordant Discordant Tied Pairs SomersD Gamma Tau_a c
75.0 25.0 0.0 60.0 0.500 0.500 0.119 0.750


Operating System and Release Information

Product FamilyProductSystemSAS Release
ReportedFixed*
SAS SystemN/AAster Data nCluster on Linux x64
DB2 Universal Database on AIX
DB2 Universal Database on Linux x64
Greenplum on Linux x64
Netezza TwinFin 32bit blade
Netezza TwinFin 32-bit SMP Hosts
Netezza TwinFin 64-bit S-Blades
Netezza TwinFin 64-bit SMP Hosts
Teradata on Linux
z/OS
Z64
OpenVMS VAX
Macintosh
Macintosh on x64
Microsoft Windows 2000 Professional
Microsoft® Windows® for 64-Bit Itanium-based Systems
Microsoft Windows Server 2003 Datacenter 64-bit Edition
Microsoft Windows Server 2003 Enterprise 64-bit Edition
Microsoft Windows XP 64-bit Edition
Microsoft® Windows® for x64
OS/2
Microsoft Windows 95/98
Microsoft Windows 2000 Advanced Server
Microsoft Windows 2000 Datacenter Server
Microsoft Windows 2000 Server
Microsoft Windows NT Workstation
Microsoft Windows Server 2003 Datacenter Edition
Microsoft Windows Server 2003 Enterprise Edition
Microsoft Windows Server 2003 Standard Edition
Microsoft Windows Server 2003 for x64
Microsoft Windows Server 2008
Microsoft Windows Server 2008 for x64
Microsoft Windows XP Professional
Windows 7 Enterprise 32 bit
Windows 7 Enterprise x64
Windows 7 Home Premium 32 bit
Windows 7 Home Premium x64
Windows 7 Professional 32 bit
Windows 7 Professional x64
Windows 7 Ultimate 32 bit
Windows 7 Ultimate x64
Windows Millennium Edition (Me)
Windows Vista
Windows Vista for x64
64-bit Enabled AIX
64-bit Enabled HP-UX
64-bit Enabled Solaris
ABI+ for Intel Architecture
AIX
HP-UX
HP-UX IPF
IRIX
Linux
Linux for x64
Linux on Itanium
OpenVMS Alpha
OpenVMS on HP Integrity
Solaris
Solaris for x64
Tru64 UNIX
* For software releases that are not yet generally available, the Fixed Release is the software release in which the problem is planned to be fixed.