Introduction to Mixed Modeling Procedures


Generalized Linear Mixed Model

In a generalized linear mixed model (GLMM) the G-side random effects are part of the linear predictor, $\bm {\eta } = \bX \bbeta + \bZ \bgamma $, and the predictor is related nonlinearly to the conditional mean of the data

\[  \mr{E}[\bY |\bgamma ] = g^{-1}(\bm {\eta }) = g^{-1}(\bX \bbeta + \bZ \bgamma )  \]

where $g^{-1}(\cdot )$ is the inverse link function. The conditional distribution of the data, given the random effects, is a member of the exponential family of distributions, such as the binary, binomial, Poisson, gamma, beta, or chi-square distribution. Because the normal distribution is also a member of the exponential family, the class of the linear mixed models is a subset of the generalized linear mixed models. In order to completely specify a GLMM, you need to do the following:

  1. Formulate the linear predictor, including fixed and random effects.

  2. Choose a link function.

  3. Choose the distribution of the response, conditional on the random effects, from the exponential family.

As an example, suppose that s pairs of twins are randomly selected in a matched-pair design. One of the twins in each pair receives a treatment and the outcome variable is some binary measure. This is a study with s clusters (subjects) and each cluster is of size 2. If $Y_{ij}$ denotes the binary response of twin $j=1,2$ in cluster i, then a linear predictor for this experiment could be

\[  \eta _{ij} = \beta _0 + \tau x_{ij} + \gamma _ i  \]

where $x_{ij}$ denotes a regressor variable that takes on the value 1 for the treated observation in each pair, and 0 otherwise. The $\gamma _ i$ are pair-specific random effects that model heterogeneity across sets of twins and that induce a correlation between the members of each pair. By virtue of random sampling the sets of twins, it is reasonable to assume that the $\gamma _ i$ are independent and have equal variance. This leads to a diagonal $\bG $ matrix,

\[  \mr{Var}[\bgamma ] = \mr{Var} \left[ \begin{array}{c} \gamma _1\cr \gamma _2\cr \gamma _3\cr \vdots \cr \gamma _ s \end{array} \right] = \left[ \begin{array}{ccccc} \sigma ^2_\gamma &  0 &  0 &  \cdots &  0 \cr 0 &  \sigma ^2_\gamma &  0 &  \cdots &  0 \cr 0 &  0 &  \sigma ^2_\gamma &  \cdots &  0 \cr \vdots &  \vdots &  \vdots &  \ddots &  \vdots \cr 0 &  0 &  0 &  \cdots &  \sigma ^2_\gamma \end{array}\right]  \]

A common link function for binary data is the logit link, which leads in the second step of model formulation to

\begin{align*}  \mr{E}\left[Y_{ij} | \gamma _ i\right] = \mu _{ij}|\gamma _ i =&  \frac{1}{1+\exp \{ -\eta _{ij}\} }\\ \mr{logit}\left\{ \frac{\mu _{ij}|\gamma _ i}{1-\mu _{ij}|\gamma _ i}\right\}  =&  \eta _{ij} \end{align*}

The final step, choosing a distribution from the exponential family, is automatic in this example; only the binary distribution comes into play to model the distribution of $Y_{ij}|\gamma _ i$.

As for the linear mixed model, there is a marginal model in the case of a generalized linear mixed model that results from integrating the joint distribution over the random effects. This marginal distribution is elusive for many GLMMs, and parameter estimation proceeds by either approximating the model or by approximating the marginal integral. Details of these approaches are described in the section Generalized Linear Mixed Models Theory, in ChapterĀ 44: The GLIMMIX Procedure.

A marginal model, one that models correlation through the $\bR $ matrix and does not involve G-side random effects, can also be formulated in the GLMM family; such models are the extension of the correlated-error models in the linear mixed model family. Because nonnormal distributions in the exponential family exhibit a functional mean-variance relationship, fully parametric estimation is not possible in such models. Instead, estimating equations are formed based on first-moment (mean) and second-moment (covariance) assumptions for the marginal data. The approaches for modeling correlated nonnormal data via generalized estimating equations (GEE) fall into this category (see, for example, Liang and Zeger 1986; Zeger and Liang 1986).