The GEE Procedure

Example 43.2 Log-Linear Model for Count Data

The following example demonstrates how you can fit a GEE model to count data. The data are analyzed by Diggle, Liang, and Zeger (1994). The response is the number of epileptic seizures, which was measured at the end of each of eight two-week treatment periods over sixteen weeks. The first eight weeks were the baseline period (during which no treatment was given), and the second eight weeks were the treatment period, during which patients received either a placebo or the drug progabide. The question of scientific interest is whether progabide is effective in reducing the rate of epileptic seizures.

The following DATA step creates the data set Seizure:

data Seizure;
   input ID Count Visit  Trt Age Weeks;
   datalines;
104 11 0 0 31 8 
104 5 1 0 31 2 
104 3 2 0 31 2 
104 3 3 0 31 2 
104 3 4 0 31 2 
106 11 0 0 30 8 

   ... more lines ...   

236 12 0 1 37 8 
236 1 1 1 37 2 
236 4 2 1 37 2 
236 3 3 1 37 2 
236 2 4 1 37 2 
;

The following DATA step creates a log time interval variable for use as an offset and an indicator variable for whether the observation is for a baseline measurement or a visit measurement. Patient 207 is deleted as an outlier, which was done in the Diggle et al. (2002) analysis:

data Seizure;
   set Seizure;
   if ID ne 207;   
   if Visit = 0 then do;
      X1=0;
      Ltime = log(8);
   end;     
   else do;
      X1=1;
      Ltime=log(2);
   end;
run;

Poisson regression is commonly used to model count data. In this example, the log-linear Poisson model is specified by $V(\mu )=\mu $ (the Poisson variance function) and a log link function,

\[ \log (E(Y_{ij}))=\beta _{0}+x_{i1}\beta _{1}+x_{i2}\beta _{2}+ x_{i1}x_{i2}\beta _{3} + \log (t_{ij}) \]

where

$Y_{ij}=$ number of epileptic seizures in interval j

$ t_{ij}\, =$ length of interval j

$x_{i1}= \left\{  \begin{array}{l} 1: \mbox{ weeks 8--16 (treatment)} \\ 0: \mbox{ weeks 0--8 (baseline)} \end{array} \right. $

$x_{i2}= \left\{ \begin{array}{l} 1: \mbox{ progabide group } \\ 0: \mbox{ placebo group } \end{array} \right. $

Because the visits represent repeated measurements, the responses from the same individual are correlated and inferences need to take this into account. The correlations between the counts are modeled as $r_{ij}=\alpha $, $i \neq j$ (exchangeable correlations).

In this model, the regression parameters are interpreted in terms of the log seizure rate that is displayed in Table 43.14.

Table 43.14: Interpretation of Regression Parameters

Treatment

Visit

$\log (E(Y_{ij})/t_{ij})$

Placebo

Baseline

$\beta _{0}$

 

1–4

$\beta _{0}+\beta _{1}$

Progabide

Baseline

$\beta _{0}+\beta _{2}$

 

1–4

$\beta _{0}+\beta _{1}+\beta _{2}+\beta _{3}$


The difference between the log seizure rates in the pretreatment (baseline) period and the treatment periods is $\beta _{1}$ for the placebo group and $\beta _{1}+\beta _{3}$ for the progabide group. A value of $\beta _{3} < 0$ indicates a reduction in the seizure rate.

The following statements perform the analysis:

proc gee data = Seizure;
   class ID Visit;
   model Count = X1 Trt X1*Trt / dist=poisson link=log offset= Ltime;
   repeated subject = ID / within = Visit type=unstr covb corrw;
run;

In the MODEL statement, Count is the response variable, and X1, Trt, and the interaction X1*Trt are the explanatory variables. You request Poisson regression with the DIST=POISSON and the LINK=LOG options. The offset variable is often used in Poisson regression to account for different exposures. In this case, the OFFSET= option specifies Ltime as the offset variable representing different time intervals.

In the REPEATED statement, the SUBJECT= option indicates that the variable ID identifies the observations from a single cluster, and the TYPE=UNSTR option specifies the unstructured working correlation structure. The CORRW option requests that the working correlation matrix be displayed.

The "Model Information" table that is displayed in Output 43.2.1 provides information about the specified model and the input data set.

Output 43.2.1: Model Information

The GEE Procedure

Model Information
Data Set WORK.SEIZURE
Distribution Poisson
Link Function Log
Dependent Variable Count
Offset Variable Ltime



Output 43.2.2 displays general information about the GEE model analysis.

Output 43.2.2: GEE Model Information

GEE Model Information
Correlation Structure Unstructured
Within-Subject Effect Visit (5 levels)
Subject Effect ID (58 levels)
Number of Clusters 58
Correlation Matrix Dimension 5
Maximum Cluster Size 5
Minimum Cluster Size 5



Output 43.2.3 displays the parameter estimate covariance matrices, which are requested by the COVB option. Both model-based and empirical covariances are produced.

Output 43.2.3: Covariance Matrices of Parameter Estimate

Covariance Matrix (Model-Based)
  Prm1 Prm2 Prm3 Prm4
Prm1 0.01210 0.004902 -0.01210 -0.004902
Prm2 0.004902 0.006660 -0.004902 -0.006660
Prm3 -0.01210 -0.004902 0.02461 0.01299
Prm4 -0.004902 -0.006660 0.01299 0.01852

Covariance Matrix (Empirical)
  Prm1 Prm2 Prm3 Prm4
Prm1 0.02597 -0.003069 -0.02597 0.003069
Prm2 -0.003069 0.008597 0.003069 -0.008597
Prm3 -0.02597 0.003069 0.03841 -0.006196
Prm4 0.003069 -0.008597 -0.006196 0.02237



The exchangeable working correlation matrix is displayed in Output 43.2.4. It shows that there are noticeable correlations among the respective visits.

Output 43.2.4: Working Correlation Matrix

Working Correlation Matrix
  Obs 1 Obs 2 Obs 3 Obs 4 Obs 5
Obs 1 1.0000 0.7920 0.7190 0.8111 0.6582
Obs 2 0.7920 1.0000 0.4859 0.6552 0.4566
Obs 3 0.7190 0.4859 1.0000 0.6988 0.4171
Obs 4 0.8111 0.6552 0.6988 1.0000 0.6464
Obs 5 0.6582 0.4566 0.4171 0.6464 1.0000



The parameter estimates table, shown in Output 43.2.5, contains parameter estimates, standard errors, confidence intervals, Z scores, and p-values for the parameter estimates. Empirical standard error estimates are used in this table.

Output 43.2.5: Parameter Estimates Table

Parameter Estimates for Response Model
with Empirical Standard Error Estimates
Parameter Estimate Standard
Error
95% Confidence Limits Z Pr > |Z|
Intercept 1.3309 0.1612 1.0151 1.6468 8.26 <.0001
X1 0.1128 0.0927 -0.0689 0.2945 1.22 0.2237
Trt -0.1034 0.1960 -0.4875 0.2807 -0.53 0.5978
X1*Trt -0.3162 0.1496 -0.6093 -0.0231 -2.11 0.0345



The estimate of $\beta _{3}$ is –0.3162, which indicates that progabide is effective in reducing the rate of epileptic seizures.

Model fit criteria for the model are displayed in Output 43.2.6. These criteria are used in selecting regression models and working correlations.

Output 43.2.6: Model Fit Criteria

GEE Fit Criteria
QIC -1036.2837
QICu -1041.8041