In Release 6.12, the GENMOD procedure in SAS/STAT Software introduced the capacity for Generalized Estimating Equations. The GEE methodology, introduced by Liang and Zeger (1986), provides a method of analyzing correlated data that otherwise could be modeled as a generalized linear model. GEEs have become an important strategy in the analysis of correlated data. These data sets may arise from longitudinal studies, in which subjects are measured at different points in time, or clustering, in which measurements are taken on subjects who share a common characteristic, such as belonging to the same litter.
The GEE analysis is implemented with a REPEATED statement in which you specify clustering information and the working correlation matrix. The generalized linear model estimates are used as the starting values. Both model-based and empirical standard errors of the parameter estimates are produced. Many correlation structures are available, including autoregressive(1), exchangeable, independent, m-dependent, and unstructured. In addition, you can input your own correlation structures.
Version 8
In Version 8, the GEE capabilities have been enhanced in several ways, including
type III tests for model effects
CONTRAST, LSMEANS and ESTIMATE statements
alternating logistic regression estimation
models for ordinal data
The proportional odds model is a popular method for GEE analysis of ordinal data and is based on modeling cumulative logit functions. The GENMOD procedure also models cumulative probits and cumulative complementary log-log functions.
Example
A study on the effects of pollution on children produced the following data. The binary response is whether children exhibited symptoms during the period of study at ages 8, 9, 10, and 11. A logistic regression is fit to the data with explanatory variables age, city of residence, and a passive smoking index. The correlations among the binary outcomes are modeled as exchangeable.
data children;
input id city$ @@;
do i=1 to 4;
input age smoke symptom @@;
output;
end;
datalines;
1 steelcity 8 0 1 9 0 1 10 0 1 11 0 0
2 steelcity 8 2 1 9 2 1 10 2 1 11 1 0
3 steelcity 8 2 1 9 2 0 10 1 0 11 0 0
4 greenhills 8 0 0 9 1 1 10 1 1 11 0 0
. . .
;
run;
proc genmod data=children;
class id city smoke;
model symptom = city age smoke / dist=bin type3;
repeated subject=id / type=exch covb corrw;
contrast 'Smoke=0 vs Smoke=1' smoke 1 1 0;
run;
The REPEATED statement requests a GEE analysis. The SUBJECT=ID option identifies ID as the clustering variable, and the TYPE=EXCH option specifies an exchangeable correlation structure. The TYPE3 option in the MODEL statement requests Type 3 statistics for each effect in the model. The CONTRAST statement requests a test comparing the first and second levels of the SMOKE effect. Note that HTML formatted results are produced with the Output Delivery System in Version 8.
| GEE Model Information | |
| Correlation Structure | Exchangeable |
| Subject Effect | id (25 levels) |
| Number of Clusters | 25 |
| Correlation Matrix Dimension | 4 |
| Maximum Cluster Size | 4 |
| Minimum Cluster Size | 4 |
| Analysis Of GEE Parameter Estimates | |||||||
| Empirical Standard Error Estimates | |||||||
| Parameter | Estimate | Standard Error | 95% Confidence Limits | Z | Pr > |Z| | ||
| Lower | Upper | ||||||
| Intercept | 4.2569 | 1.9577 | 0.4199 | 8.0938 | 2.17 | 0.0297 | |
| city | greenhil | 0.0287 | 0.5365 | -1.0227 | 1.0802 | 0.05 | 0.9573 |
| city | steelcit | 0.0000 | 0.0000 | 0.0000 | 0.0000 | . | . |
| age | -0.3330 | 0.1937 | -0.7126 | 0.0467 | -1.72 | 0.0856 | |
| smoke | 0 | -1.6781 | 0.6123 | -2.8783 | -0.4780 | -2.74 | 0.0061 |
| smoke | 1 | -1.7418 | 0.6588 | -3.0330 | -0.4507 | -2.64 | 0.0082 |
| smoke | 2 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | . | . |
Statistics and Operations Research Home Page | What's New in Data Analysis