The GEE Procedure

Getting Started

This section illustrates some of the basic features of the GEE procedure by analyzing longitudinal data from Stokes, Davis, and Koch (2012).

In this study, researchers followed 25 children at ages 8, 9, 10, and 11 years. The goal of this study is to investigate the health effects of air pollution on children. The binary response is the wheezing status of the children at four different ages. The explanatory variables are age, city, and passive smoking index (with values 0, 1, 2) that represented the degree of smoking in the home. The responses for individual children are assumed to be equally correlated, implying an exchangeable correlation structure.

The following statements create the data set Children:

data Children;
   input ID City$ @@;
   do i=1 to 4;
      input Age Smoke Symptom @@;
      output;
   end;
   datalines;
 1 steelcity  8 0 1  9 0 1  10 0 1  11 0 0  
 2 steelcity  8 2 1  9 2 1  10 2 1  11 1 0
 3 steelcity  8 2 1  9 2 0  10 1 0  11 0 0
 4 greenhills 8 0 0  9 1 1  10 1 1  11 0 0
 5 steelcity  8 0 0  9 1 0  10 1 0  11 1 0
 6 greenhills 8 0 1  9 0 0  10 0 0  11 0 1
 7 steelcity  8 1 1  9 1 1  10 0 1  11 0 0
 8 greenhills 8 1 0  9 1 0  10 1 0  11 2 0
 9 greenhills 8 2 1  9 2 0  10 1 1  11 1 0
10 steelcity  8 0 0  9 0 0  10 0 0  11 1 0
11 steelcity  8 1 1  9 0 0  10 0 0  11 0 1
12 greenhills 8 0 0  9 0 0  10 0 0  11 0 0
13 steelcity  8 2 1  9 2 1  10 1 0  11 0 1
14 greenhills 8 0 1  9 0 1  10 0 0  11 0 0
15 steelcity  8 2 0  9 0 0  10 0 0  11 2 1
16 greenhills 8 1 0  9 1 0  10 0 0  11 1 0
17 greenhills 8 0 0  9 0 1  10 0 1  11 1 1  
18 steelcity  8 1 1  9 2 1  10 0 0  11 1 0
19 steelcity  8 2 1  9 1 0  10 0 1  11 0 0
20 greenhills 8 0 0  9 0 1  10 0 1  11 0 0
21 steelcity  8 1 0  9 1 0  10 1 0  11 2 1
22 greenhills 8 0 1  9 0 1  10 0 0  11 0 0
23 steelcity  8 1 1  9 1 0  10 0 1  11 0 0
24 greenhills 8 1 0  9 1 1  10 1 1  11 2 1
25 greenhills 8 0 1  9 0 0  10 0 0  11 0 0
;  

The following statements fit the model by the GEE method:


proc gee data=Children descending;
   class ID City;
   model Symptom = City Age Smoke / dist=bin link=logit;
   repeated subject=ID / type=exch covb corrw;
run;

Both the MODEL statement and the REPEATED statement are required.

The DIST=BIN and LINK=LOGIT options in the MODEL statement request a logistic regression with the variable Symptom as the response and City, Age, and Smoke as explanatory variables.

The REPEATED statement specifies the correlation structure and requests various tables in the output. The SUBJECT=ID option requests that individual subjects be identified in the input data set by the variable ID, which must be listed in the CLASS statement. Measurements of individual subjects at ages 8, 9, 10, and 11 are in the proper order in the data set, so the WITHIN= option is not required. The TYPE=EXCH option specifies an exchangeable working correlation structure, the COVB option requests the parameter estimate covariance matrix, and the CORRW option requests the working correlation matrix.

Figure 43.1 shows the "Model Information" table, which provides information about the specified logistic regression model and the input data set.

Figure 43.1: Model Information

The GEE Procedure

Model Information
Data Set WORK.CHILDREN
Distribution Binomial
Link Function Logit
Dependent Variable Symptom



Figure 43.2 displays general information about the GEE analysis. Each subject has four measurements.

Figure 43.2: GEE Model Information

GEE Model Information
Correlation Structure Exchangeable
Subject Effect ID (25 levels)
Number of Clusters 25
Correlation Matrix Dimension 4
Maximum Cluster Size 4
Minimum Cluster Size 4



Figure 43.3 displays the model-based and empirical covariance matrices of the parameter estimates.

Figure 43.3: Covariance Matrices of Parameter Estimates

Covariance Matrix (Model-Based)
  Prm1 Prm2 Prm4 Prm5
Prm1 3.26069 -0.16313 -0.32274 -0.12257
Prm2 -0.16313 0.24015 0.002520 0.03422
Prm4 -0.32274 0.002520 0.03379 0.004471
Prm5 -0.12257 0.03422 0.004471 0.09533

Covariance Matrix (Empirical)
  Prm1 Prm2 Prm4 Prm5
Prm1 4.09770 -0.55261 -0.37280 -0.29397
Prm2 -0.55261 0.29538 0.03719 0.09143
Prm4 -0.37280 0.03719 0.03550 0.02064
Prm5 -0.29397 0.09143 0.02064 0.07957



The exchangeable working correlation matrix is displayed in Figure 43.4.

Figure 43.4: Working Correlation Matrix

Working Correlation Matrix
  Obs 1 Obs 2 Obs 3 Obs 4
Obs 1 1.0000 0.0883 0.0883 0.0883
Obs 2 0.0883 1.0000 0.0883 0.0883
Obs 3 0.0883 0.0883 1.0000 0.0883
Obs 4 0.0883 0.0883 0.0883 1.0000



The parameter estimates table, shown in Figure 43.5, contains parameter estimates, standard errors, confidence intervals, Z scores, and p-values for the parameter estimates. Empirical standard error estimates are used in this table. You can create a table that uses model-based standard errors by specifying the MODELSE option in the REPEATED statement. The results indicate that smoking exposure is significant with a p-value of 0.0211, Age is marginally influential with a p-value of 0.0893, and City does not influence wheezing. The parameter estimate for Age is –0.3201, which indicates that the odds ratio of wheezing for the children at the higher age group compared to those in the lower age group is $e^{-0.3201} = 0.726$.

Figure 43.5: GEE Parameter Estimates Table

Parameter Estimates for Response Model
with Empirical Standard Error Estimates
Parameter   Estimate Standard
Error
95% Confidence Limits Z Pr > |Z|
Intercept   2.2615 2.0243 -1.7060 6.2290 1.12 0.2639
City greenhil 0.0418 0.5435 -1.0234 1.1070 0.08 0.9387
City steelcit 0.0000 0.0000 0.0000 0.0000 . .
Age   -0.3201 0.1884 -0.6894 0.0492 -1.70 0.0893
Smoke   0.6506 0.2821 0.0978 1.2035 2.31 0.0211



Goodness-of-fit criteria for the model are displayed in Figure 43.6. For more information about the quasi-likelihood information criterion (QIC), see the section Quasi-likelihood Information Criterion.

Figure 43.6: Model Fit Criteria

GEE Fit Criteria
QIC 137.1373
QICu 136.2173