23072 - How the multiple intercepts in an ordinal model are used in computing predicted probabilities in PROC LOGISTIC

Usage Note 23072: How the multiple intercepts in an ordinal model are used in computing predicted probabilities in PROC LOGISTIC

Before illustrating how the intercepts of an ordinal model are involved in the computation of predicted values, first note that you do not need to compute these yourself either by hand or by writing a program. The PREDPROBS= option in the OUTPUT statement does this for you. You can compute predicted values for new observations (also called scoring) as well as for the original observations that are used to fit the model. However, if you need to compute predicted values without the aid of SAS, the following information shows you how the LOGISTIC procedure computes predicted values for ordinal models.

Using the cheese example in the PROC LOGISTIC chapter of the SAS/STAT User's Guide, the following statements fit an ordinal probit model with the single predictor ADDQUANT. The ODS OUTPUT statement creates a data set that contains the parameter estimates from the fitted model. The descending response variable option is used in order to model the higher levels (most-liked) of the taste-rating response variable, Y. The PREDPROBS=(C I) option in the OUTPUT statement requests an output data set that contains the predicted cumulative probabilities, Pr(Y ≤ y_( i ) ), as well as predicted values for the individual response levels, Pr(Y = y_( i ) ), where y_( i ) represents the i ^th ordered value of the response.

    data Cheese;
       do addquant = 1 to 4;
          do y = 1 to 9;
             input freq @@;
             output;
          end;
       end;
       label y='Taste Rating';
       datalines;
    0  0  1  7  8  8 19  8  1
    6  9 12 11  7  6  1  0  0
    1  1  6  8 23  7  5  1  0
    0  0  0  1  3  7 14 16 11
    ;

    ods output parameterestimates=pe;
    proc logistic data=Cheese;
       freq freq;
       model y(descending) = addquant / link=probit;
       output out=out predprobs=(c i);
       run;

The ordinal model that PROC LOGISTIC or PROC SURVEYLOGISTIC always fits is as follows:

Pr(Y ≤ y_( i ) ) = F(α_i + β•addquant) , i = 1, 2, ... , k–1

where F is usually the logistic or normal cumulative distribution function (for logistic and probit models, respectively), k is the number of response levels, and i refers to the ordered values in the Response Profile table. If you use the DESCENDING option, as in this example, the lower ordered values are associated with the higher response values so that y_( 1 ) = 9, y_( 2 ) = 8, etc.:

Response Profile
Ordered Value	y	Total Frequency
1	9	12
2	8	25
3	7	39
4	6	28
5	5	41
6	4	27
7	3	19
8	2	10
9	1	7

PROC LOGISTIC and PROC SURVEYLOGISTIC always model the probabilities of the lower ordered values, which means that you are modeling probabilities of the higher response levels when you specify the DESCENDING option. Because the values of the response are the integers 1, 2, ..., 9 in this example, the logistic and probit models are, respectively, as follows:

Pr(Y ≥ i ) = logistic(α_i + β•addquant)

Pr(Y ≥ i ) = probnorm(α_i + β•addquant)

The PROBNORM function evaluates the normal cumulative distribution function, often denoted as Φ. The LOGISTIC function evaluates the logistic distribution function. Equivalently, use the CDF function with 'NORMAL' or 'LOGISTIC' as the first argument, designating the distribution. See SAS Language Reference: Dictionary for details.

The estimates of the probit model's parameters α_i and β are displayed to 10 decimal places by the following statements:

    proc print data=pe;
       var variable classval0 estimate;
       format estimate 15.10;
       run;

Obs	Variable	ClassVal0	Estimate
1	Intercept	9	-2.3623012670
2	Intercept	8	-1.6473020298
3	Intercept	7	-1.0339915746
4	Intercept	6	-0.6797293498
5	Intercept	5	-0.1454111839
6	Intercept	4	0.2946660833
7	Intercept	3	0.7563945741
8	Intercept	2	1.2012019494
9	addquant		0.2757342561

Using the parameters of the fitted probit model, the estimated cumulative probabilities are computed as follows. It is important to use many significant digits of the parameter estimates to avoid rounding errors.

Pr(Y ≥ 9 ) = probnorm(-2.3623012670 + 0.2757342561*addquant)

Pr(Y ≥ 8 ) = probnorm(-1.6473020298 + 0.2757342561*addquant)

Pr(Y ≥ 7 ) = probnorm(-1.0339915746 + 0.2757342561*addquant)

Pr(Y ≥ 6 ) = probnorm(-0.6797293498 + 0.2757342561*addquant)

Pr(Y ≥ 5 ) = probnorm(-0.1454111839 + 0.2757342561*addquant)

Pr(Y ≥ 4 ) = probnorm( 0.2946660833 + 0.2757342561*addquant)

Pr(Y ≥ 3 ) = probnorm( 0.7563945741 + 0.2757342561*addquant)

Pr(Y ≥ 2 ) = probnorm( 1.2012019494 + 0.2757342561*addquant)

Pr(Y ≥ 1 ) = 1

These are the values that are produced by the P= or PREDPROBS=(C) option in the OUTPUT statement. Note that the predicted probability of an individual response value is that value's cumulative probability minus the cumulative probability of the next-lower ordered value. When the DESCENDING option is used, the next-lower ordered value might be a logically higher value. These individual values are the values that are produced by the PREDPROBS=(I) option in the OUTPUT statement.

The following statements compute both the cumulative and individual predicted values for each level of the predictor variable:


    data recomp;
       do addquant=1 to 4;
       /* cumulative probabilities */
          cp9=probnorm(-2.3623012670 + 0.2757342561*addquant);
          cp8=probnorm(-1.6473020298 + 0.2757342561*addquant);
          cp7=probnorm(-1.0339915746 + 0.2757342561*addquant);
          cp6=probnorm(-0.6797293498 + 0.2757342561*addquant);
          cp5=probnorm(-0.1454111839 + 0.2757342561*addquant);
          cp4=probnorm( 0.2946660833 + 0.2757342561*addquant);
          cp3=probnorm( 0.7563945741 + 0.2757342561*addquant);
          cp2=probnorm( 1.2012019494 + 0.2757342561*addquant);
          cp1=1;
       /* individual probabilities */
          ip9=cp9-0;
          ip8=cp8-cp9;
          ip7=cp7-cp8;
          ip6=cp6-cp7;
          ip5=cp5-cp6;
          ip4=cp4-cp5;
          ip3=cp3-cp4;
          ip2=cp2-cp3;
          ip1=cp1-cp2;
          output;
       end;
       run;

The following tables show that the previous program reproduces the values from the PREDPROBS= option in PROC LOGISTIC. First, the cumulative values that are produced by the PREDPROBS=(C) option are displayed, followed by the cumulative values that are computed and stored in the data set RECOMP:

    proc print data=out noobs;
       where y = 1;
       var addquant cp:;
       run;

addquant	CP_9	CP_8	CP_7	CP_6	CP_5	CP_4	CP_3	CP_2	CP_1
1	0.01846	0.08510	0.22415	0.34311	0.55184	0.71580	0.84899	0.93015	1
2	0.03508	0.13658	0.31472	0.44897	0.65765	0.80126	0.90454	0.96017	1
3	0.06238	0.20608	0.41809	0.55862	0.75231	0.86904	0.94336	0.97874	1
4	0.10395	0.29310	0.52748	0.66393	0.83085	0.91888	0.96851	0.98939	1

    proc print data=recomp noobs;
       var addquant cp:;
       run;

addquant	cp9	cp8	cp7	cp6	cp5	cp4	cp3	cp2	cp1
1	0.01846	0.08510	0.22415	0.34311	0.55184	0.71580	0.84899	0.93015	1
2	0.03508	0.13658	0.31472	0.44897	0.65765	0.80126	0.90454	0.96017	1
3	0.06238	0.20608	0.41809	0.55862	0.75231	0.86904	0.94336	0.97874	1
4	0.10395	0.29310	0.52748	0.66393	0.83085	0.91888	0.96851	0.98939	1

Next, the individual values that are produced by the PREDPROBS=(I) option are displayed, followed by the individual values that are computed and stored in the data set RECOMP:

    proc print data=out noobs;
       where y = 1;
       var addquant ip:;
       run;

addquant	IP_9	IP_8	IP_7	IP_6	IP_5	IP_4	IP_3	IP_2	IP_1
1	0.01846	0.06664	0.13905	0.11896	0.20874	0.16395	0.13320	0.081160	0.069846
2	0.03508	0.10149	0.17814	0.13425	0.20868	0.14361	0.10328	0.055631	0.039829
3	0.06238	0.14370	0.21201	0.14053	0.19369	0.11673	0.07432	0.035383	0.021259
4	0.10395	0.18915	0.23439	0.13644	0.16692	0.08803	0.04963	0.020883	0.010607

    proc print data=recomp noobs;
       var addquant ip:;
       run;

addquant	ip9	ip8	ip7	ip6	ip5	ip4	ip3	ip2	ip1
1	0.01846	0.06664	0.13905	0.11896	0.20874	0.16395	0.13320	0.081160	0.069846
2	0.03508	0.10149	0.17814	0.13425	0.20868	0.14361	0.10328	0.055631	0.039829
3	0.06238	0.14370	0.21201	0.14053	0.19369	0.11673	0.07432	0.035383	0.021259
4	0.10395	0.18915	0.23439	0.13644	0.16692	0.08803	0.04963	0.020883	0.010607

Operating System and Release Information

Product Family	Product	System	SAS Release
			Reported	Fixed*
SAS System	SAS/STAT	All	n/a

* For software releases that are not yet generally available, the Fixed Release is the software release in which the problem is planned to be fixed.

Type:	Usage Note
Priority:	low
Topic:	Analytics ==> Survey Sampling and Analysis SAS Reference ==> Procedures ==> SURVEYLOGISTIC Analytics ==> Categorical Data Analysis SAS Reference ==> Procedures ==> LOGISTIC

Date Modified:	2019-05-02 14:31:38
Date Created:	2002-12-16 10:56:37

Support

Usage Note 23072: How the multiple intercepts in an ordinal model are used in computing predicted probabilities in PROC LOGISTIC

Operating System and Release Information