The GEE Procedure

Generalized Estimating Equations

The marginal model is commonly used in analyzing longitudinal data when the population-averaged effect is of interest. To estimate the regression parameters in the marginal model, Liang and Zeger (1986) proposed the generalized estimating equations method, which is widely used.

Suppose $y_{ij}, j=1,\ldots ,n_ i, i=1,\ldots , K$, represent the jth response of the ith subject, which has a vector of covariates ${x}_{ij}$. There are $n_ i$ measurements on subject i, and the maximum number of measurements per subject is T.

Suppose the responses of the ith subject be $\mb{Y}_{i}=[y_{i1}, \ldots , y_{i n_ i}]^{\prime }$ with corresponding means $\bmu _{i}=[\mu _{i1}, \ldots , \mu _{i n_ i}]^{\prime }$. For generalized linear models, the marginal mean $\mu _{ij}$ of the response $y_{ij}$ is related to a linear predictor through a link function $g(\mu _{ij}) = \mb{x}_{ij}^\prime \bbeta $, and the variance of $y_{ij}$ depends on the mean through a variance function $v(\mu _{ij})$.

An estimate of the parameter $\bbeta $ in the marginal model can be obtained by solving the generalized estimating equations,

\[ \mb{S}(\bbeta ) = \sum _{i=1}^ K \frac{\partial \bmu _{i}^{\prime }}{\partial \bbeta } \mb{V}_{i}^{-1} (\mb{Y}_{i}-\bmu _{i}(\bbeta )) = \mb{0} \]

where $\mb{V}_{i}$ is the working covariance matrix of $\mb{Y}_{i}$.

Only the mean and the covariance of $\mb{Y}_{i}$ are required in the GEE method; a full specification of the joint distribution of the correlated responses is not needed. This is particularly convenient because the joint distribution for noncontinuous responses involves high-order associations and is complicated to specify. Moreover, the regression parameter estimates are consistent even when the working covariance is incorrectly specified. Because of these properties, the GEE method is popular in situations where the marginal effect is of interest and the responses are not continuous. However, the GEE approach can lead to biased estimates when missing responses depend on previous responses. The weighted GEE method, which is described in the section Weighted Generalized Estimating Equations under the MAR Assumption, can provide unbiased estimates.

Working Correlation Matrix

Suppose $\mb{R}_ i(\balpha )$ is an $n_ i \times n_ i$ "working" correlation matrix that is fully specified by the vector of parameters $\balpha $. The covariance matrix of $\mb{Y}_{i}$ is modeled as

\[ \mb{V}_{i} = \phi \mb{A}_ i^{\frac{1}{2}} \mb{W}_ i^{-\frac{1}{2}} \mb{R}(\balpha ) \mb{W}_ i^{-\frac{1}{2}} \mb{A}_ i^{\frac{1}{2}} \]

where $\mb{A}_ i$ is an $n_ i \times n_ i$ diagonal matrix whose jth diagonal element is $v(\mu _{ij})$ and $\mb{W}_ i$ is an $n_ i \times n_ i$ diagonal matrix whose jth diagonal is $w_{ij}$, where $w_{ij}$ is a weight variable that is specified in the WEIGHT statement. If there is no WEIGHT statement, $w_{ij} = 1$ for all i and j. If $\mb{R}_ i(\balpha )$ is the true correlation matrix of $\mb{Y}_{i}$, then $\mb{V}_{i}$ is the true covariance matrix of $\mb{Y}_{i}$.

In practice, the working correlation matrix is usually unknown and must be estimated. It is estimated in the iterative fitting process by using the current value of the parameter vector $\bbeta $ to compute appropriate functions of the Pearson residual:

\[ e_{ij}=\frac{y_{ij}-\mu _{ij}}{\sqrt {v(\mu _{ij})/w_{ij}}} \]

If you specify the working correlation matrix as $\mb{R}_{0} = \mb{I}$, which is the identity matrix, the GEE reduces to the independence estimating equation.

Table 43.11 shows the working correlation structures that are supported by the GEE procedure and the estimators that are used to estimate the working correlations.

Table 43.11: Working Correlation Structures and Estimators

Working Correlation Structure

Estimator

Fixed

 

$\text {Corr}(Y_{ij},Y_{ik}) = r_{jk}$where $r_{jk}$ is the jkth element of a constant, user-specified correlation matrix $\mb{R}_{0}$

The working correlation is not estimated in this case.

Independent

 

$\text {Corr}(Y_{ij},Y_{ik})= \left\{ \begin{array}{ll} 1 &  j = k \\ 0 &  j \ne k \end{array} \right. $

The working correlation is not estimated in this case.

m-dependent

 

$\text {Corr}(Y_{ij},Y_{i,j+t})= \left\{ \begin{array}{ll} 1 &  t = 0 \\ \alpha _{t} &  t=1,2,\ldots ,m \\ 0 &  t > m \end{array} \right.$

$\hat{\alpha }_{t} = \frac{1}{(K_ t-p)\phi }\sum _{i=1}^{K} \sum _{j\leq n_ i-t}e_{ij}e_{i,j+t}$$K_ t = \sum _{i=1}^{K} (n_ i - t)$

Exchangeable

 

$\text {Corr}(Y_{ij},Y_{ik})=\left\{  \begin{array}{ll} 1 &  j = k \\ \alpha &  j \neq k \\ \end{array}\right.$

$\hat{\alpha } = \frac{1}{(N^{*}-p)\phi }\sum _{i=1}^{K} \sum _{j < k}e_{ij}e_{ik}$$N^{*}=0.5\sum _{i=1}^{K} n_ i(n_ i-1)$

Unstructured

 

$\text {Corr}(Y_{ij},Y_{ik})= \left\{  \begin{array}{ll} 1 &  j = k \\ \alpha _{jk} &  j \neq k \\ \end{array}\right. $

$\hat{\alpha }_{jk} = \frac{1}{(K-p)\phi }\sum _{i=1}^{K}e_{ij}e_{ik}$

Autoregressive AR(1)

 

$\mr{Corr}(Y_{ij},Y_{i,j+t})= \alpha ^{t}$$\text {for } t=0,1,2,\ldots ,n_ i-j $

$\hat{\alpha } = \frac{1}{(K_1-p)\phi }\sum _{i=1}^ K\sum _{j\leq n_ i-1}e_{ij}e_{i,j+1}$ $K_1 = \sum _{i=1}^{K} (n_ i - 1)$


Dispersion Parameter

The dispersion parameter $\phi $ is estimated by

\[ \hat{\phi } = \frac{1}{N-p} \sum _{i=1}^ K \sum _{j=1}^{n_ i} e_{ij}^2 \]

where $N = \sum _{i=1}^ K n_ i$ is the total number of measurements and p is the number of regression parameters.

The square root of $\hat{\phi }$ is reported by PROC GEE as the scale parameter in the "Parameter Estimates for Response Model with Model-Based Standard Error" output table. If a fixed scale parameter is specified by using the NOSCALE option in the MODEL statement, then the fixed value is used in estimating the model-based covariance matrix and standard errors.

Quasi-likelihood Information Criterion

The quasi-likelihood information criterion (QIC) was developed by Pan (2001) as a modification of Akaike’s information criterion (AIC) to apply to models fit by the GEE approach.

Define the quasi-likelihood under the independent working correlation assumption, evaluated with the parameter estimates under the working correlation of interest as

\[ Q(\hat{\bbeta }(R), \phi ) = \sum _{i=1}^ K\sum _{j=1}^{n_ i} Q(\hat{\bbeta }(R), \phi ; (Y_{ij}, \mb{X}_{ij})) \]

where the quasi-likelihood contribution of the jth observation in the ith cluster is defined in the section Quasi-likelihood Functions and $\hat{\bbeta }(R)$ are the parameter estimates that are obtained by using the GEE approach with the working correlation of interest R.

QIC is defined as

\[ \mr{QIC}(R) = -2Q(\hat{\bbeta }(R), \phi ) + 2\mr{trace}(\hat{\Omega }_ I\hat{V}_ R) \]

where $\hat{V}_ R$ is the robust covariance estimate and $\hat{\Omega }_ I$ is the inverse of the model-based covariance estimate under the independent working correlation assumption, evaluated at $\hat{\bbeta }(R)$, which are the parameter estimates that are obtained by using the GEE approach with the working correlation of interest R.

PROC GEE also computes an approximation to $\mr{QIC}(R)$, which is defined by Pan (2001) as

\[ \mr{QIC}_ u(R) = -2Q(\hat{\bbeta }(R), \phi ) + 2p \]

where p is the number of regression parameters.

Pan (2001) notes that QIC is appropriate for selecting regression models and working correlations, whereas $\mr{QIC}_ u$ is appropriate only for selecting regression models.

Quasi-likelihood Functions

See McCullagh and Nelder (1989) and Hardin and Hilbe (2003) for discussions of quasi-likelihood functions. The contribution of observation j in cluster i to the quasi-likelihood function that is evaluated at the regression parameters $\bbeta $ is expressed by $Q(\bbeta , \phi ; (Y_{ij}, \mb{X}_{ij})) = \frac{Q_{ij}}{\phi }$, where $Q_{ij}$ is defined in the following list. These definitions are used in the computation of the quasi-likelihood information criteria (QIC) for goodness of fit of models that are fit by the GEE approach. The $w_{ij}$ are prior weights, if any, that are specified in the WEIGHT or FREQ statement. Note that the definition of the quasi-likelihood for the negative binomial differs from that given in McCullagh and Nelder (1989). The definition used here allows the negative binomial quasi-likelihood to approach the Poisson as $k\rightarrow 0$.

  • Normal:

    \[ Q_{ij} = -\frac{1}{2} w_{ij}(y_{ij}-\mu _{ij})^2 \]
  • Inverse Gaussian:

    \[ Q_{ij} = \frac{w_{ij}(\mu _{ij} - .5y_{ij})}{\mu _{ij}^2 } \]
  • Gamma:

    \[ Q_{ij} = -w_{ij}\left[ \frac{y_{ij}}{\mu _{ij}} + \log (\mu _{ij}) \right] \]
  • Negative binomial:

    \[ Q_{ij} = w_{ij}\left[ \log \Gamma \left( y_{ij} + \frac{1}{k}\right) - \log \Gamma \left(\frac{1}{k}\right) + y_{ij}\log \left(\frac{k\mu _{ij}}{1+k\mu _{ij}}\right) + \frac{1}{k}\log \left(\frac{1}{1+k\mu _{ij}}\right)\right] \]
  • Poisson:

    \[ Q_{ij} = w_{ij}(y_{ij} \log (\mu _{ij}) - \mu _{ij} ) \]
  • Binomial:

    \[ Q_{ij} = w_{ij}[r_{ij} \log (p_{ij}) + (n_{ij}-r_{ij}) \log (1-p_{ij})] \]
  • Multinomial (s categories):

    \[ Q_{ij} = w_{ij}\sum _{k=1}^ s y_{ijk}\log (\mu _{ijk}) \]