![]() | ![]() | ![]() |
Before illustrating how the intercepts of an ordinal model are involved in the computation of predicted values, first note that you do not need to compute these yourself either by hand or by writing a program. The PREDPROBS= option in the OUTPUT statement does this for you. You can compute predicted values for new observations (also called scoring) as well as for the original observations that are used to fit the model. However, if you need to compute predicted values without the aid of SAS, the following information shows you how the LOGISTIC procedure computes predicted values for ordinal models.
Using the cheese example in the PROC LOGISTIC chapter of the SAS/STAT User's Guide, the following statements fit an ordinal probit model with the single predictor ADDQUANT. The ODS OUTPUT statement creates a data set that contains the parameter estimates from the fitted model. The descending response variable option is used in order to model the higher levels (most-liked) of the taste-rating response variable, Y. The PREDPROBS=(C I) option in the OUTPUT statement requests an output data set that contains the predicted cumulative probabilities, Pr(Y ≤ y( i ) ), as well as predicted values for the individual response levels, Pr(Y = y( i ) ), where y( i ) represents the i th ordered value of the response.
data Cheese;
do addquant = 1 to 4;
do y = 1 to 9;
input freq @@;
output;
end;
end;
label y='Taste Rating';
datalines;
0 0 1 7 8 8 19 8 1
6 9 12 11 7 6 1 0 0
1 1 6 8 23 7 5 1 0
0 0 0 1 3 7 14 16 11
;
ods output parameterestimates=pe;
proc logistic data=Cheese;
freq freq;
model y(descending) = addquant / link=probit;
output out=out predprobs=(c i);
run;
The ordinal model that PROC LOGISTIC or PROC SURVEYLOGISTIC always fits is as follows:
where F is usually the logistic or normal cumulative distribution function (for logistic and probit models, respectively), k is the number of response levels, and i refers to the ordered values in the Response Profile table. If you use the DESCENDING option, as in this example, the lower ordered values are associated with the higher response values so that y( 1 ) = 9, y( 2 ) = 8, etc.:
| Response Profile | ||
| Ordered Value |
y | Total Frequency |
| 1 | 9 | 12 |
| 2 | 8 | 25 |
| 3 | 7 | 39 |
| 4 | 6 | 28 |
| 5 | 5 | 41 |
| 6 | 4 | 27 |
| 7 | 3 | 19 |
| 8 | 2 | 10 |
| 9 | 1 | 7 |
PROC LOGISTIC and PROC SURVEYLOGISTIC always model the probabilities of the lower ordered values, which means that you are modeling probabilities of the higher response levels when you specify the DESCENDING option. Because the values of the response are the integers 1, 2, ..., 9 in this example, the logistic and probit models are, respectively, as follows:
The PROBNORM function evaluates the normal cumulative distribution function, often denoted as Φ. The LOGISTIC function evaluates the logistic distribution function. Equivalently, use the CDF function with 'NORMAL' or 'LOGISTIC' as the first argument, designating the distribution. See SAS Language Reference: Dictionary for details.
The estimates of the probit model's parameters αi and β are displayed to 10 decimal places by the following statements:
proc print data=pe;
var variable classval0 estimate;
format estimate 15.10;
run;
| Obs | Variable | ClassVal0 | Estimate |
| 1 | Intercept | 9 | -2.3623012670 |
| 2 | Intercept | 8 | -1.6473020298 |
| 3 | Intercept | 7 | -1.0339915746 |
| 4 | Intercept | 6 | -0.6797293498 |
| 5 | Intercept | 5 | -0.1454111839 |
| 6 | Intercept | 4 | 0.2946660833 |
| 7 | Intercept | 3 | 0.7563945741 |
| 8 | Intercept | 2 | 1.2012019494 |
| 9 | addquant | 0.2757342561 |
Using the parameters of the fitted probit model, the estimated cumulative probabilities are computed as follows. It is important to use many significant digits of the parameter estimates to avoid rounding errors.
These are the values that are produced by the P= or PREDPROBS=(C) option in the OUTPUT statement. Note that the predicted probability of an individual response value is that value's cumulative probability minus the cumulative probability of the next-lower ordered value. When the DESCENDING option is used, the next-lower ordered value might be a logically higher value. These individual values are the values that are produced by the PREDPROBS=(I) option in the OUTPUT statement.
The following statements compute both the cumulative and individual predicted values for each level of the predictor variable:
data recomp;
do addquant=1 to 4;
/* cumulative probabilities */
cp9=probnorm(-2.3623012670 + 0.2757342561*addquant);
cp8=probnorm(-1.6473020298 + 0.2757342561*addquant);
cp7=probnorm(-1.0339915746 + 0.2757342561*addquant);
cp6=probnorm(-0.6797293498 + 0.2757342561*addquant);
cp5=probnorm(-0.1454111839 + 0.2757342561*addquant);
cp4=probnorm( 0.2946660833 + 0.2757342561*addquant);
cp3=probnorm( 0.7563945741 + 0.2757342561*addquant);
cp2=probnorm( 1.2012019494 + 0.2757342561*addquant);
cp1=1;
/* individual probabilities */
ip9=cp9-0;
ip8=cp8-cp9;
ip7=cp7-cp8;
ip6=cp6-cp7;
ip5=cp5-cp6;
ip4=cp4-cp5;
ip3=cp3-cp4;
ip2=cp2-cp3;
ip1=cp1-cp2;
output;
end;
run;
The following tables show that the previous program reproduces the values from the PREDPROBS= option in PROC LOGISTIC. First, the cumulative values that are produced by the PREDPROBS=(C) option are displayed, followed by the cumulative values that are computed and stored in the data set RECOMP:
proc print data=out noobs;
where y = 1;
var addquant cp:;
run;
| addquant | CP_9 | CP_8 | CP_7 | CP_6 | CP_5 | CP_4 | CP_3 | CP_2 | CP_1 |
| 1 | 0.01846 | 0.08510 | 0.22415 | 0.34311 | 0.55184 | 0.71580 | 0.84899 | 0.93015 | 1 |
| 2 | 0.03508 | 0.13658 | 0.31472 | 0.44897 | 0.65765 | 0.80126 | 0.90454 | 0.96017 | 1 |
| 3 | 0.06238 | 0.20608 | 0.41809 | 0.55862 | 0.75231 | 0.86904 | 0.94336 | 0.97874 | 1 |
| 4 | 0.10395 | 0.29310 | 0.52748 | 0.66393 | 0.83085 | 0.91888 | 0.96851 | 0.98939 | 1 |
proc print data=recomp noobs;
var addquant cp:;
run;
| addquant | cp9 | cp8 | cp7 | cp6 | cp5 | cp4 | cp3 | cp2 | cp1 |
| 1 | 0.01846 | 0.08510 | 0.22415 | 0.34311 | 0.55184 | 0.71580 | 0.84899 | 0.93015 | 1 |
| 2 | 0.03508 | 0.13658 | 0.31472 | 0.44897 | 0.65765 | 0.80126 | 0.90454 | 0.96017 | 1 |
| 3 | 0.06238 | 0.20608 | 0.41809 | 0.55862 | 0.75231 | 0.86904 | 0.94336 | 0.97874 | 1 |
| 4 | 0.10395 | 0.29310 | 0.52748 | 0.66393 | 0.83085 | 0.91888 | 0.96851 | 0.98939 | 1 |
Next, the individual values that are produced by the PREDPROBS=(I) option are displayed, followed by the individual values that are computed and stored in the data set RECOMP:
proc print data=out noobs;
where y = 1;
var addquant ip:;
run;
| addquant | IP_9 | IP_8 | IP_7 | IP_6 | IP_5 | IP_4 | IP_3 | IP_2 | IP_1 |
| 1 | 0.01846 | 0.06664 | 0.13905 | 0.11896 | 0.20874 | 0.16395 | 0.13320 | 0.081160 | 0.069846 |
| 2 | 0.03508 | 0.10149 | 0.17814 | 0.13425 | 0.20868 | 0.14361 | 0.10328 | 0.055631 | 0.039829 |
| 3 | 0.06238 | 0.14370 | 0.21201 | 0.14053 | 0.19369 | 0.11673 | 0.07432 | 0.035383 | 0.021259 |
| 4 | 0.10395 | 0.18915 | 0.23439 | 0.13644 | 0.16692 | 0.08803 | 0.04963 | 0.020883 | 0.010607 |
proc print data=recomp noobs;
var addquant ip:;
run;
| addquant | ip9 | ip8 | ip7 | ip6 | ip5 | ip4 | ip3 | ip2 | ip1 |
| 1 | 0.01846 | 0.06664 | 0.13905 | 0.11896 | 0.20874 | 0.16395 | 0.13320 | 0.081160 | 0.069846 |
| 2 | 0.03508 | 0.10149 | 0.17814 | 0.13425 | 0.20868 | 0.14361 | 0.10328 | 0.055631 | 0.039829 |
| 3 | 0.06238 | 0.14370 | 0.21201 | 0.14053 | 0.19369 | 0.11673 | 0.07432 | 0.035383 | 0.021259 |
| 4 | 0.10395 | 0.18915 | 0.23439 | 0.13644 | 0.16692 | 0.08803 | 0.04963 | 0.020883 | 0.010607 |
| Product Family | Product | System | SAS Release | |
| Reported | Fixed* | |||
| SAS System | SAS/STAT | All | n/a | |
| Type: | Usage Note |
| Priority: | low |
| Topic: | Analytics ==> Survey Sampling and Analysis SAS Reference ==> Procedures ==> SURVEYLOGISTIC Analytics ==> Categorical Data Analysis SAS Reference ==> Procedures ==> LOGISTIC |
| Date Modified: | 2019-05-02 14:31:38 |
| Date Created: | 2002-12-16 10:56:37 |


