# The GLIMMIX Procedure

Quadrature methods, like the Laplace approximation, approximate integrals. If you choose METHOD= QUAD for a generalized linear mixed model, the GLIMMIX procedure approximates the marginal log likelihood with an adaptive Gauss-Hermite quadrature rule. Gaussian quadrature is particularly well suited to numerically evaluate integrals against probability measures (Lange 1999, Ch. 16). And Gauss-Hermite quadrature is appropriate when the density has kernel and integration extends over the real line, as is the case for the normal distribution. Suppose that is a probability density function and the function is to be integrated against it. Then the quadrature rule is

where N denotes the number of quadrature points, the are the quadrature weights, and the are the abscissas. The Gaussian quadrature chooses abscissas in areas of high density, and if is continuous, the quadrature rule is exact if is a polynomial of up to degree 2N – 1. In the generalized linear mixed model the roles of and are played by the conditional distribution of the data given the random effects, and the random-effects distribution, respectively. Quadrature abscissas and weights are those of the standard Gauss-Hermite quadrature (Golub and Welsch 1969; see also Table 25.10 of Abramowitz and Stegun 1972; Evans 1993).

A numerical integration rule is called adaptive when it uses a variable step size to control the error of the approximation. For example, an adaptive trapezoidal rule uses serial splitting of intervals at midpoints until a desired tolerance is achieved. The quadrature rule in the GLIMMIX procedure is adaptive in the following sense: if you do not specify the number of quadrature points (nodes) with the QPOINTS= suboption of the METHOD= QUAD option, then the number of quadrature points is determined by evaluating the log likelihood at the starting values at a successively larger number of nodes until a tolerance is met (for more details see the text under the heading "Starting Values" in the next section). Furthermore, the GLIMMIX procedure centers and scales the quadrature points by using the empirical Bayes estimates (EBEs) of the random effects and the Hessian (second derivative) matrix from the EBE suboptimization. This centering and scaling improves the likelihood approximation by placing the abscissas according to the density function of the random effects. It is not, however, adaptiveness in the previously stated sense.

##### Objective Function

Let denote the vector of fixed-effects parameters and the vector of covariance parameters. For quadrature estimation in the GLIMMIX procedure, includes the G-side parameters and a possible scale parameter , provided that the conditional distribution of the data contains such a scale parameter. is the vector of the G-side parameters. The marginal distribution of the data for subject i in a mixed model can be expressed as

Suppose denotes the number of quadrature points in each dimension (for each random effect) and r denotes the number of random effects. For each subject, obtain the empirical Bayes estimates of as the vector that minimizes

If are the standard abscissas for Gauss-Hermite quadrature, and is a point on the r-dimensional quadrature grid, then the centered and scaled abscissas are

As for the Laplace approximation, is the second derivative matrix with respect to the random effects,

These centered and scaled abscissas, along with the Gauss-Hermite quadrature weights , are used to construct the r-dimensional integral by a sequence of one-dimensional rules

The right-hand side of this expression, properly accumulated across subjects, is the objective function for adaptive quadrature estimation in the GLIMMIX procedure. The preceding expression constitutes a one-level adaptive Gaussian quadrature approximation.

As the number of random effects grows, the dimension of the integral increases accordingly. This increase can happen especially when you have nested random effects. In this case, the one-level quadrature approximation described earlier quickly becomes computationally infeasible. The following scenarios illustrate the relationship among the computational effort, the dimension of the random effects, and the number of quadrature nodes. Suppose that the A effect has four levels, and consider the following statements:

proc glimmix method=quad(qpoints=5);
class A id;
model y = / dist=negbin;
random A / subject=id;
run;


For each subject, computing the marginal log likelihood requires the numerical evaluation of a four-dimensional integral. With the number of quadrature points set to five by the QPOINTS=5 option, this means that each marginal log-likelihood evaluation requires conditional log likelihoods to be computed for each observation on each pass through the data. As the number of quadrature points or the number of random effects increases, this constitutes a sizable computational effort. Suppose, for example, that just one additional random effect, B, with two levels is added as an interaction, as in the following statements:

proc glimmix method=quad(qpoints=5);
class A B id;
model y = / dist=negbin;
random A A*B / subject=id;
run;


Now a single marginal likelihood calculation requires = 244,140,625 conditional log likelihoods for each observation on each pass through the data.

You can reduce the dimension of the random effects in the preceding PROC GLIMMIX code by factoring A out of the two random effects in the RANDOM statement, as shown in the following statements:

proc glimmix method=quad(qpoints=5);
class A B id;
model y = / dist=negbin;
random int B / subject=id*A;
run;


With the random effects int and B, the preceding PROC GLIMMIX code requires the evaluation of conditional log likelihoods for each observation on each pass through the data.

This idea of reducing the dimension of random effects is the key to the multilevel adaptive Gaussian quadrature algorithm described in Pinheiro and Chao (2006). By exploiting the sparseness in the random-effects design matrix , the multilevel quadrature algorithm reduces the dimension of the random effects to the sum of the dimensions of random effects from each level. You can use the FASTQUAD suboption in the METHOD= QUAD option to prompt PROC GLIMMIX to compute this multilevel quadrature approximation.

To see the effect of the FASTQUAD option, consider the following model for the preceding example:

proc glimmix method=quad(qpoints=5);
class A B id;
model y = / dist=negbin;
random A A*B B/ subject=id;
run;


In this case, it is not possible to factor a single SUBJECT= variable out of all the random effects. Formulated in this one-level way, a single evaluation of the marginal likelihood requires the computing of = 488,281,250 conditional log likelihoods for each observation on each pass through the data.

Alternatively, to take advantage of the multilevel quadrature approximation, you need to use the FASTQUAD option and explicitly specify the two-level structure by including one RANDOM statement for each level:

proc glimmix method=quad(qpoints=5 fastquad);
class A B id;
model y = / dist=negbin;
random B     / subject=id;
random int B / subject=id*A;
run;


The first RANDOM statement specifies the random effect B for the level that corresponds to id; the second RANDOM statement specifies the random effects int and B for the level that corresponds to id*A. With this specification, the multilevel quadrature approximation computes only = 3,125 conditional log likelihoods for each observation on each pass through the data, where the exponent is the sum of the number of random effects in the two RANDOM statements.

In general, consider a two-level model in which m level-2 units are nested within each level-1 unit. In this case, the one-level point adaptive quadrature approximation to a marginal likelihood that is an integral over level-1 random effects and level-2 random effects requires evaluations of the conditional log likelihoods for each observation. However, the two-level adaptive quadrature approximation requires only evaluations of the conditional log likelihoods. By increasing exponentially with instead of with , the multilevel quadrature algorithm significantly reduces the computational and memory requirements.

proc glimmix method=quad(qpoints=1);