The GLIMMIX Procedure

Pseudo-likelihood Estimation for Weighted Multilevel Models

Multilevel models provide a flexible and powerful tool for the analysis of data that are observed in nested units at multiple levels. A multilevel model is a special case of generalized linear mixed models that can be handled by the GLIMMIX procedure. In proc GLIMMIX, the SUBJECT= option in the RANDOM statement identifies the clustering structure for the random effects. When the subjects of the multiple RANDOM statements are nested, the model is a multilevel model and each RANDOM statement corresponds to one level.

Using a pseudo-maximum-likelihood approach, you can extend the multilevel model framework to accommodate weights at different levels. Such an approach is very useful in analyzing survey data that arise from multistage sampling. In these sampling designs, survey weights are often constructed to account for unequal sampling probabilities, nonresponse adjustments, and poststratification.

The following survey example from a three-stage sampling design illustrates the use of multiple levels of weighting in a multilevel model. Extending this example to models that have more than three levels is straightforward. Let $i=1,\dots ,n^{(3)}$, $j=1,\dots \, n_ i^{(2)}$, and $k=1,\dots \, n_ j^{(1)}$ denote the indices of units at level 3, level 2, and level 1, respectively. Let superscript $(l)$ denote the lth level and $n^{(l)}$ denote the number of level-l units in the sample. Assume that the first-stage cluster (level-3 unit) i is selected with probability $\pi _ i$; the second-stage cluster (level-2 unit) j is selected with probability $\pi _{j|i}$, given that the first-stage cluster i is already selected in the sample; and the third-stage unit (level-1 unit) k is selected with probability $\pi _{k|ij}$, given that the second-stage cluster j within the first-stage cluster i is already selected in the sample.

If you use the inverse selection probability weights $w_{j|i}=1/\pi _{j|i}$, $w_{k|ij}=1/\pi _{k|ij}$, the conditional log likelihood contribution of the first-stage cluster i is

\[ \log {(p(y_{i}|\gamma _{i}^{(2)},\gamma _{i}^{(3)}))}=\sum _{j=1}^{n_ i^{(2)}} {w_{j|i}} \sum _{k=1}^{n_ j^{(1)}} {w_{k|ij}} \log {( p(y_{ijk} |\gamma _{ij}^{(2)}\gamma _{i}^{(3)} ))} \]

where $\gamma _{ij}^{(2)}$ is the random-effects vector for the jth second-stage cluster, $\gamma _{i}^{(2)} = (\gamma ’_{i1},\gamma ’_{i2},\dots ,\gamma ’_{i n_ i^{(2)}})’ $, and $\gamma _ i^{(3)}$ is the random-effects vector for the ith first-stage cluster.

As with unweighted multilevel models, the adaptive quadrature method is used to compute the pseudo-likelihood of the first-stage cluster i:

\[ p(y_{i}) = \int p(y_{i}|\gamma _{i}^{(2)},\gamma _{i}^{(3)}) p( \gamma _{i}^{(2)})p( \gamma _{i}^{(3)})d(\gamma _{i}^{(2)}) d(\gamma _{i}^{(3)}) \]

The total log pseudo-likelihood is

\[ \log {(p(y))} = \sum _{i=1}^{n^{(3)}} w_ i \log {(p(y_ i))} \]

where $w_{i} = 1/\pi _{i}$.

To illustrate weighting in a multilevel model, consider the following data set. In these simulated data, the response y is a Poisson-distributed count, w3 is the weight for the first-stage clusters, w2 is the weight for the second-stage clusters, and w1 is the weight for the observation-level units.

data d;
   input A w3 AB w2 w1 y x;
   datalines;
1    6.1     1    5.3     7.1    56    -.214
1    6.1     1    5.3     3.9    41    0.732
1    6.1     2    7.3     6.3    50    0.372
1    6.1     2    7.3     3.9    36    -.892
1    6.1     3    4.6     8.4    39    0.424
1    6.1     3    4.6     6.3    35    -.200
2    8.5     1    4.8     7.4    30    0.868
2    8.5     1    4.8     6.7    25    0.110
2    8.5     2    8.4     3.5    36    0.004
2    8.5     2    8.4     4.1    28    0.755
2    8.5     3    .80     3.8    33    -.600
2    8.5     3    .80     7.4    30    -.525
3    9.5     1    8.2     6.7    32    -.454
3    9.5     1    8.2     7.1    24    0.458
3    9.5     2     11     4.8    31    0.162
3    9.5     2     11     7.9    27    1.099
3    9.5     3    3.9     3.8    15    -1.57
3    9.5     3    3.9     5.5    19    -.448
4    4.5     1    8.0     5.7    30    -.468
4    4.5     1    8.0     2.9    25    0.890
4    4.5     2    6.0     5.0    35    0.635
4    4.5     2    6.0     3.0    30    0.743
4    4.5     3    6.8     7.3    17    -.015
4    4.5     3    6.8     3.1    18    -.560
;

You can use the following statements to fit a weighted three-level model:

proc glimmix  method=quadrature empirical=classical;
   class A AB;
   model y = x / dist=poisson link=log obsweight=w1;
   random int  / subject=A weight=w3;
   random int  / subject=AB(A) weight=w2;
run;

The SUBJECT = option in the first and second RANDOM statements specifies the first-stage and second-stage clusters A and AB(A), respectively. The OBSWEIGHT = option in the MODEL statement specifies the variable for the weight at the observation level. The WEIGHT = option in the RANDOM statement specifies the variable for the weight at the level that is specified by the SUBJECT = option.

For inference about fixed effects and variance that are estimated by pseudo-likelihood, you can use the empirical (sandwich) variance estimators. For weighted multilevel models, the only empirical estimator available in PROC GLIMMIX is EMPIRICAL =CLASSICAL. The EMPIRICAL =CLASSICAL variance estimator can be described as follows.

Let $\alpha =(\beta ’, \theta ’)’$, where $\beta $ is vector of the fixed-effects parameters and $\theta $ is the vector of covariance parameters. For an L-level model, Rabe-Hesketh and Skrondal (2006) show that the gradient can be written as a weighted sum of the gradients of the top-level units:

\[ \sum _{i=1}^{n^{(L)}} w_ i \frac{\partial \log (p(y_{i};\alpha ))}{\partial \alpha } \equiv \sum _{i=1}^{n^{(L)}} S_ i(\alpha ) \]

where $n^{(L)}$ is the number of level-L units and $S_ i(\alpha )$ is the weighted score vector of the top-level unit i. The estimator of the "meat" of the sandwich estimator can be written as

\[ J= \frac{n^{(L)}}{n^{(L)}-1} \sum _{i=1}^{n^{(L)}} S_ i(\hat{\alpha })S_ i(\hat{\alpha })’ \]

The empirical estimator of the covariance matrix of $\hat{\alpha }$ can be constructed as

\[ H(\hat{\alpha })^{-1}JH(\hat{\alpha })^{-1} \]

where $H(\alpha )$ is the second derivative matrix of the log pseudo-likelihood with respect to $\alpha $.

The covariance parameter estimators that are obtained by the pseudo-maximum-likelihood method can be biased when the sample size is small. Pfeffermann et al. (1998) and Rabe-Hesketh and Skrondal (2006) discuss two weight-scaling methods for reducing the biases of the covariance parameter estimators in a two-level model. To derive the scaling factor $\lambda $ for a two-level model, let $n_ i$ denote the number of level-1 units in the level-2 unit i and let $w_{j|i}$ denote the weight of the jth level-1 unit in level-2 unit i. The first method computes an "apparent" cluster size as the "effective" sample size:

\[ \sum _{j=1}^{n_ i} \lambda w_{j|i} = \frac{(\sum _{j=1}^{n_ i} w_{j|i})^2}{\sum _{j=1}^{n_ i} w^2_{j|i} } \]

Therefore the scale factor is

\[ \lambda = \frac{\sum _{j=1}^{n_ i} w_{j|i}}{\sum _{j=1}^{n_ i} w^2_{j|i}} \]

The second method sets the apparent cluster size equal to the actual cluster size so that the scale factor is

\[ \lambda =\frac{n_ i}{\sum _{j=1}^{n_ i} w_{j|i}} \]

PROC GLIMMIX uses the weights provided in the data set directly. To use the scaled weights, you need to provide them in the data set.