This section illustrates some of the basic features of the GEE procedure by analyzing longitudinal data from Stokes, Davis, and Koch (2012).
In this study, researchers followed 25 children at ages 8, 9, 10, and 11 years. The goal of this study is to investigate the health effects of air pollution on children. The binary response is the wheezing status of the children at four different ages. The explanatory variables are age, city, and passive smoking index (with values 0, 1, 2) that represented the degree of smoking in the home. The responses for individual children are assumed to be equally correlated, implying an exchangeable correlation structure.
The following statements create the data set Children
:
data Children; input ID City$ @@; do i=1 to 4; input Age Smoke Symptom @@; output; end; datalines; 1 steelcity 8 0 1 9 0 1 10 0 1 11 0 0 2 steelcity 8 2 1 9 2 1 10 2 1 11 1 0 3 steelcity 8 2 1 9 2 0 10 1 0 11 0 0 4 greenhills 8 0 0 9 1 1 10 1 1 11 0 0 5 steelcity 8 0 0 9 1 0 10 1 0 11 1 0 6 greenhills 8 0 1 9 0 0 10 0 0 11 0 1 7 steelcity 8 1 1 9 1 1 10 0 1 11 0 0 8 greenhills 8 1 0 9 1 0 10 1 0 11 2 0 9 greenhills 8 2 1 9 2 0 10 1 1 11 1 0 10 steelcity 8 0 0 9 0 0 10 0 0 11 1 0 11 steelcity 8 1 1 9 0 0 10 0 0 11 0 1 12 greenhills 8 0 0 9 0 0 10 0 0 11 0 0 13 steelcity 8 2 1 9 2 1 10 1 0 11 0 1 14 greenhills 8 0 1 9 0 1 10 0 0 11 0 0 15 steelcity 8 2 0 9 0 0 10 0 0 11 2 1 16 greenhills 8 1 0 9 1 0 10 0 0 11 1 0 17 greenhills 8 0 0 9 0 1 10 0 1 11 1 1 18 steelcity 8 1 1 9 2 1 10 0 0 11 1 0 19 steelcity 8 2 1 9 1 0 10 0 1 11 0 0 20 greenhills 8 0 0 9 0 1 10 0 1 11 0 0 21 steelcity 8 1 0 9 1 0 10 1 0 11 2 1 22 greenhills 8 0 1 9 0 1 10 0 0 11 0 0 23 steelcity 8 1 1 9 1 0 10 0 1 11 0 0 24 greenhills 8 1 0 9 1 1 10 1 1 11 2 1 25 greenhills 8 0 1 9 0 0 10 0 0 11 0 0 ;
The following statements fit the model by the GEE method:
proc gee data=Children descending; class ID City; model Symptom = City Age Smoke / dist=bin link=logit; repeated subject=ID / type=exch covb corrw; run;
Both the MODEL statement and the REPEATED statement are required.
The DIST=BIN and LINK=LOGIT options in the MODEL statement request a logistic regression with the variable Symptom
as the response and City
, Age
, and Smoke
as explanatory variables.
The REPEATED statement specifies the correlation structure and requests various tables in the output. The SUBJECT=ID option
requests that individual subjects be identified in the input data set by the variable ID
, which must be listed in the CLASS statement. Measurements of individual subjects at ages 8, 9, 10, and 11 are in the proper
order in the data set, so the WITHIN= option is not required. The TYPE=EXCH option specifies an exchangeable working correlation
structure, the COVB option requests the parameter estimate covariance matrix, and the CORRW option requests the working correlation
matrix.
Figure 43.1 shows the "Model Information" table, which provides information about the specified logistic regression model and the input data set.
Figure 43.1: Model Information
Figure 43.2 displays general information about the GEE analysis. Each subject has four measurements.
Figure 43.2: GEE Model Information
Figure 43.3 displays the model-based and empirical covariance matrices of the parameter estimates.
Figure 43.3: Covariance Matrices of Parameter Estimates
The exchangeable working correlation matrix is displayed in Figure 43.4.
Figure 43.4: Working Correlation Matrix
The parameter estimates table, shown in Figure 43.5, contains parameter estimates, standard errors, confidence intervals, Z scores, and p-values for the parameter estimates. Empirical standard error estimates are used in this table. You can create a table that
uses model-based standard errors by specifying the MODELSE option in the REPEATED statement. The results indicate that smoking
exposure is significant with a p-value of 0.0211, Age
is marginally influential with a p-value of 0.0893, and City
does not influence wheezing. The parameter estimate for Age
is –0.3201, which indicates that the odds ratio of wheezing for the children at the higher age group compared to those in
the lower age group is .
Figure 43.5: GEE Parameter Estimates Table
Parameter Estimates for Response Model | |||||||
---|---|---|---|---|---|---|---|
with Empirical Standard Error Estimates | |||||||
Parameter | Estimate | Standard Error |
95% Confidence Limits | Z | Pr > |Z| | ||
Intercept | 2.2615 | 2.0243 | -1.7060 | 6.2290 | 1.12 | 0.2639 | |
City | greenhil | 0.0418 | 0.5435 | -1.0234 | 1.1070 | 0.08 | 0.9387 |
City | steelcit | 0.0000 | 0.0000 | 0.0000 | 0.0000 | . | . |
Age | -0.3201 | 0.1884 | -0.6894 | 0.0492 | -1.70 | 0.0893 | |
Smoke | 0.6506 | 0.2821 | 0.0978 | 1.2035 | 2.31 | 0.0211 |
Goodness-of-fit criteria for the model are displayed in Figure 43.6. For more information about the quasi-likelihood information criterion (QIC), see the section Quasi-likelihood Information Criterion.