Generalized Estimating Equations |
This section illustrates the use of the REPEATED statement to fit a GEE model, using repeated measures data from the "Six Cities" study of the health effects of air pollution (Ware et al.; 1984). The data analyzed are the 16 selected cases in Lipsitz et al. (1994). The binary response is the wheezing status of 16 children at ages 9, 10, 11, and 12 years. The mean response is modeled as a logistic regression model by using the explanatory variables city of residence, age, and maternal smoking status at the particular age. The binary responses for individual children are assumed to be equally correlated, implying an exchangeable correlation structure.
The data set and SAS statements that fit the model by the GEE method are as follows:
data six; input case city$ @@; do i=1 to 4; input age smoke wheeze @@; output; end; datalines; 1 portage 9 0 1 10 0 1 11 0 1 12 0 0 2 kingston 9 1 1 10 2 1 11 2 0 12 2 0 3 kingston 9 0 1 10 0 0 11 1 0 12 1 0 4 portage 9 0 0 10 0 1 11 0 1 12 1 0 5 kingston 9 0 0 10 1 0 11 1 0 12 1 0 6 portage 9 0 0 10 1 0 11 1 0 12 1 0 7 kingston 9 1 0 10 1 0 11 0 0 12 0 0 8 portage 9 1 0 10 1 0 11 1 0 12 2 0 9 portage 9 2 1 10 2 0 11 1 0 12 1 0 10 kingston 9 0 0 10 0 0 11 0 0 12 1 0 11 kingston 9 1 1 10 0 0 11 0 1 12 0 1 12 portage 9 1 0 10 0 0 11 0 0 12 0 0 13 kingston 9 1 0 10 0 1 11 1 1 12 1 1 14 portage 9 1 0 10 2 0 11 1 0 12 2 1 15 kingston 9 1 0 10 1 0 11 1 0 12 2 1 16 portage 9 1 1 10 1 1 11 2 0 12 1 0 ;
proc genmod data=six ; class case city ; model wheeze = city age smoke / dist=bin; repeated subject=case / type=exch covb corrw; run;
The CLASS statement and the MODEL statement specify the model for the mean of the wheeze variable response as a logistic regression with city, age, and smoke as independent variables, just as for an ordinary logistic regression.
The REPEATED statement invokes the GEE method, specifies the correlation structure, and controls the displayed output from the GEE model. The option SUBJECT=CASE specifies that individual subjects be identified in the input data set by the variable case. The SUBJECT= variable case must be listed in the CLASS statement. Measurements on individual subjects at ages 9, 10, 11, and 12 are in the proper order in the data set, so the WITHINSUBJECT= option is not required. The TYPE=EXCH option specifies an exchangeable working correlation structure, the COVB option specifies that the parameter estimate covariance matrix be displayed, and the CORRW option specifies that the final working correlation be displayed.
Initial parameter estimates for iterative fitting of the GEE model are computed as in an ordinary generalized linear model, as described previously. Results of the initial model fit displayed as part of the generated output are not shown here. Statistics for the initial model fit such as parameter estimates, standard errors, deviances, and Pearson chi-squares do not apply to the GEE model and are valid only for the initial model fit. The following figures display information that applies to the GEE model fit.
Figure 39.27 displays general information about the GEE model fit.
GEE Model Information | |
---|---|
Correlation Structure | Exchangeable |
Subject Effect | case (16 levels) |
Number of Clusters | 16 |
Correlation Matrix Dimension | 4 |
Maximum Cluster Size | 4 |
Minimum Cluster Size | 4 |
Figure 39.28 displays the parameter estimate covariance matrices specified by the COVB option. Both model-based and empirical covariances are produced.
Covariance Matrix (Model-Based) | ||||
---|---|---|---|---|
Prm1 | Prm2 | Prm4 | Prm5 | |
Prm1 | 5.74947 | -0.22257 | -0.53472 | 0.01655 |
Prm2 | -0.22257 | 0.45478 | -0.002410 | 0.01876 |
Prm4 | -0.53472 | -0.002410 | 0.05300 | -0.01658 |
Prm5 | 0.01655 | 0.01876 | -0.01658 | 0.19104 |
Covariance Matrix (Empirical) | ||||
---|---|---|---|---|
Prm1 | Prm2 | Prm4 | Prm5 | |
Prm1 | 9.33994 | -0.85104 | -0.83253 | -0.16534 |
Prm2 | -0.85104 | 0.47368 | 0.05736 | 0.04023 |
Prm4 | -0.83253 | 0.05736 | 0.07778 | -0.002364 |
Prm5 | -0.16534 | 0.04023 | -0.002364 | 0.13051 |
The exchangeable working correlation matrix specified by the CORRW option is displayed in Figure 39.29.
Working Correlation Matrix | ||||
---|---|---|---|---|
Col1 | Col2 | Col3 | Col4 | |
Row1 | 1.0000 | 0.1648 | 0.1648 | 0.1648 |
Row2 | 0.1648 | 1.0000 | 0.1648 | 0.1648 |
Row3 | 0.1648 | 0.1648 | 1.0000 | 0.1648 |
Row4 | 0.1648 | 0.1648 | 0.1648 | 1.0000 |
The parameter estimates table, displayed in Figure 39.30, contains parameter estimates, standard errors, confidence intervals, scores, and -values for the parameter estimates. Empirical standard error estimates are used in this table. A table that displays model-based standard errors can be created by using the REPEATED statement option MODELSE.
Analysis Of GEE Parameter Estimates | |||||||
---|---|---|---|---|---|---|---|
Empirical Standard Error Estimates | |||||||
Parameter | Estimate | Standard Error | 95% Confidence Limits | Z | Pr > |Z| | ||
Intercept | -1.2751 | 3.0561 | -7.2650 | 4.7148 | -0.42 | 0.6765 | |
city | kingston | -0.1223 | 0.6882 | -1.4713 | 1.2266 | -0.18 | 0.8589 |
city | portage | 0.0000 | 0.0000 | 0.0000 | 0.0000 | . | . |
age | 0.2036 | 0.2789 | -0.3431 | 0.7502 | 0.73 | 0.4655 | |
smoke | 0.0935 | 0.3613 | -0.6145 | 0.8016 | 0.26 | 0.7957 |