This example illustrates how you use the GEE procedure to analyze nominal multinomial data. A two-year study was conducted
to assess the impact of access to Section 8 housing as a means of providing independent housing to the severely mentally ill
homeless (Hurlbut, Wood, and Hough 1996). In this study, half of the 362 clients received Section 8 housing certificates. The assignment of Section 8 housing certificates
is recorded in the variable Sec
; 0 indicates clients who did not receive a certificate, and 1 indicates clients who received a certificate.
Every six months during the study, research staff interviewed all 362 clients, who provided data about their living arrangements
in the previous 60 days. Clients’ living arrangements were also recorded during a baseline interview. The time of interviews
is recorded in the variable Time
, whose value is 0, 6, 12, or 24 (for the number of months since the study began). There were a total of 159 missed interviews.
The variable Housing
records the living arrangement of a client and is coded as 0 (street living), 1 (community living), or 2 (independent living).
The following statements create the data set Housing
:
data Housing; input ID Housing Time Sec; datalines; 1 1 0 1 1 2 6 1 1 2 12 1 1 2 24 1 2 1 0 1 2 2 6 1 ... more lines ... 362 1 0 0 362 1 6 0 362 1 12 0 362 1 24 0 ;
The following SAS statements use PROC GEE to fit a model to nominal multinomial data:
proc gee data=Housing; class ID Housing Time SEC; model Housing=Sec / dist=multinomial link=glogit; repeated subject=ID / within=Time; run;
An ordinary GEE that has an independent working correlation structure is fit. This model is the only option supported for data that have nominal multinomial responses. In the MODEL statement, you specify LINK=GLOGIT to indicate that the responses are nominal. In the generalized logit model, you model baseline category logits. By default, the GEE procedure chooses the last response category as the baseline category. If your nominal response has J categories, then the baseline logit for category j and subject i is
and
The results of fitting the model are displayed in Output 43.6.1.
Output 43.6.1: Results of Model Fitting
Parameter Estimates for Response Model | ||||||||
---|---|---|---|---|---|---|---|---|
with Empirical Standard Error Estimates | ||||||||
Parameter | Housing | Estimate | Standard Error |
95% Confidence Limits | Z | Pr > |Z| | ||
Intercept | 0 | -0.9532 | 0.1266 | -1.2013 | -0.7051 | -7.53 | <.0001 | |
Intercept | 1 | -0.6562 | 0.1064 | -0.8647 | -0.4477 | -6.17 | <.0001 | |
Sec | 0 | 0 | 0.9226 | 0.1850 | 0.5599 | 1.2853 | 4.99 | <.0001 |
Sec | 0 | 1 | 1.2645 | 0.1642 | 0.9426 | 1.5863 | 7.70 | <.0001 |
Sec | 1 | 0 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | . | . |
Sec | 1 | 1 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | . | . |
The positive estimates for the classification variable Sec
= 0 at each response category, Housing
= 0 and 1, indicate an increased probability that a client will live independently when given access to Section 8 housing.
The model fit criteria are shown in Output 43.6.2
For comparison, the following SAS statements treat the responses as ordinal and use PROC GEE to fit a marginal model by using an independent working correlation structure:
proc gee data=Housing; class ID Housing Time SEC; model Housing=Sec / dist=multinomial; repeated subject=ID / within=Time; run;
The cumulative logit link function is the default option that is used to fit the model. Because the generalized logit link function is not specified, the responses are treated as ordinal multinomial data. The results for the model that is fit by treating the responses as ordinal are displayed in Output 43.6.3.
Output 43.6.3: Results of Model Fitting
Parameter Estimates for Response Model | |||||||
---|---|---|---|---|---|---|---|
with Empirical Standard Error Estimates | |||||||
Parameter | Estimate | Standard Error |
95% Confidence Limits | Z | Pr > |Z| | ||
Intercept1 | -1.6917 | 0.1242 | -1.9352 | -1.4481 | -13.62 | <.0001 | |
Intercept2 | 0.0112 | 0.0960 | -0.1770 | 0.1994 | 0.12 | 0.9072 | |
Sec | 0 | 0.8224 | 0.1327 | 0.5624 | 1.0824 | 6.20 | <.0001 |
Sec | 1 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | . | . |
Treating the responses as ordinal results in a single parameter estimate that is related to the classification variable Sec
. The QIC for the model that is fit by treating the responses as nominal (shown in Output 43.6.2) is 2675.21, whereas the QIC for the model that is fit by treating the responses as ordinal (shown in Output 43.6.4) is 2710.50, indicating a slightly better fit when the responses are treated as nominal.