The NLMIXED Procedure

Integral Approximations

Subsections:

Adaptive Gaussian Quadrature
First-Order Method

An important part of the marginal maximum likelihood method described previously is the computation of the integral over the random effects. The default method in PROC NLMIXED for computing this integral is adaptive Gaussian quadrature as described in Pinheiro and Bates (1995). Another approximation method is the first-order method of Beal and Sheiner (1982, 1988). A description of these two methods follows.

Adaptive Gaussian Quadrature

A quadrature method approximates a given integral by a weighted sum over predefined abscissas for the random effects. A good approximation can usually be obtained with an adequate number of quadrature points as well as appropriate centering and scaling of the abscissas. Adaptive Gaussian quadrature for the integral over $\mb {u}_ i$ centers the integral at the empirical Bayes estimate of $\mb {u}_ i$ , defined as the vector $\widehat{\mb {u}}_ i$ that minimizes

$-\log \left[ p(\mb {y}_ i | \mb {X}_ i,\bphi , \mb {u}_ i) q(\mb {u}_ i | \bxi ) \right]$

with $\bphi$ and $\bxi$ set equal to their current estimates. The final Hessian matrix from this optimization can be used to scale the quadrature abscissas.

Suppose $(z_ j, w_ j; j=1,\ldots ,p)$ denote the standard Gauss-Hermite abscissas and weights (Golub and Welsch 1969, or Table 25.10 of Abramowitz and Stegun 1972). The adaptive Gaussian quadrature integral approximation is as follows:

$\begin{align*} \lefteqn{ \int p(\mb {y}_ i | \mb {X}_ i, \bphi , \mb {u}_ i) q(\mb {u}_ i | \bxi ) d \mb {u}_ i \approx } \\ & 2^{r/2} \left| \bGamma (\mb {X}_ i,\btheta ) \right|^{-1/2} \sum _{j_1=1}^ p \cdots \sum _{j_ r=1}^ p \left[ p(\mb {y}_ i | \mb {X}_ i, \bphi , \mb {a}_{j_1,\ldots ,j_ r}) q(\mb {a}_{j_1,\ldots ,j_ r} | \bxi ) \prod _{k=1}^ r w_{j_ k} \exp {z^2_{j_ k}} \right] \\ \end{align*}$

where r is the dimension of $u_ i$ , $\bGamma (\mb {X}_ i,\btheta )$ is the Hessian matrix from the empirical Bayes minimization, $\mb {z}_{j_1,\ldots ,j_ r}$ is a vector with elements $(z_{j_1},\ldots ,z_{j_ r})$ , and

$\mb {a}_{j_1,\ldots ,j_ r} = \widehat{\mb {u}}_ i + 2^{1/2} \bGamma (\mb {X}_ i,\btheta )^{-1/2} \mb {z}_{j_1,\ldots ,j_ r}$

PROC NLMIXED selects the number of quadrature points adaptively by evaluating the log-likelihood function at the starting values of the parameters until two successive evaluations have a relative difference less than the value of the QTOL= option. The specific search sequence is described under the QFAC= option. Using the QPOINTS= option, you can adjust the number of quadrature points p to obtain different levels of accuracy. Setting p = 1 results in the Laplacian approximation as described in: Beal and Sheiner (1992); Wolfinger (1993); Vonesh (1992, 1996); Vonesh and Chinchilli (1997); Wolfinger and Lin (1997).

The NOAD option in the PROC NLMIXED statement requests nonadaptive Gaussian quadrature. Here all $\widehat{\mb {u}}_ i$ are set equal to zero, and the Cholesky root of the estimated variance matrix of the random effects is substituted for $\bGamma (\mb {X}_ i,\btheta )^{-1/2}$ in the preceding expression for $\mb {a}_{j_1,\ldots ,j_ r}$ . In this case derivatives are computed using the algorithm of Smith (1995). The NOADSCALE option requests the same scaling substitution but with the empirical Bayes $\widehat{\mb {u}}_ i$ .

PROC NLMIXED computes the derivatives of the adaptive Gaussian quadrature approximation when carrying out the default dual quasi-Newton optimization.

First-Order Method

Another integral approximation available in PROC NLMIXED is the first-order method of Beal and Sheiner (1982, 1988) and Sheiner and Beal (1985). This approximation is used only in the case where $p(\mb {y}_ i | \mb {X}_ i, \bphi , \mb {u}_ i)$ is normal—that is,

$\begin{align*} p(\mb {y}_ i | \mb {X}_ i, \bphi , \mb {u}_ i) & = (2 \pi )^{-n_ i/2} \left| \mb {R}_ i(\mb {X}_ i,\bphi ) \right| ^{-1/2} \\ & \exp \left\{ -(1/2) \left[ \mb {y}_ i - \mb {m}_ i(\mb {X}_ i,\bphi ,\mb {u}_ i) \right]^\prime \mb {R}_ i(\mb {X}_ i,\bphi )^{-1} \left[ \mb {y}_ i - \mb {m}_ i(\mb {X}_ i,\bphi ,\mb {u}_ i) \right] \right\} \end{align*}$

where $n_ i$ is the dimension of $\mb {y}_ i$ , $\mb {R}_ i$ is a diagonal variance matrix, and $\mb {m}_ i$ is the conditional mean vector of $\mb {y}_ i$ .

The first-order approximation is obtained by expanding $\mb {m}_ i(\mb {X}_ i, \bphi , \mb {u}_ i)$ with a one-term Taylor series expansion about $\mb {u}_ i = \mb {0}$ , resulting in the approximation

$\begin{align*} p(\mb {y}_ i | \mb {X}_ i, \bphi , \mb {u}_ i) & \approx (2 \pi )^{-n_ i/2} \left| \mb {R}_ i(\mb {X}_ i,\bphi ) \right| ^{-1/2} \\ & \exp \left( -(1/2) \left[ \mb {y}_ i - \mb {m}_ i(\mb {X}_ i,\bphi ,\mb {0}) - \mb {Z}_ i(\mb {X}_ i,\bphi ) \mb {u}_ i \right]^\prime \right. \\ & \left. \vphantom {-(1/2) \left[ \mb {y}_ i - \mb {m}_ i(\mb {X}_ i,\bphi ,\mb {0}) - \mb {Z}_ i(\mb {X}_ i,\bphi ) \mb {u}_ i \right]^\prime }\mb {R}_ i(\mb {X}_ i,\bphi )^{-1} \left[ \mb {y}_ i - \mb {m}_ i(\mb {X}_ i,\bphi ,\mb {0}) - \mb {Z}_ i(\mb {X}_ i,\bphi ) \mb {u}_ i \right] \right) \end{align*}$

where $\mb {Z}_ i(\mb {X}_ i,\bphi )$ is the Jacobian matrix $\partial \mb {m}_ i(\mb {X}_ i,\bphi ,\mb {u}_ i) / \partial \mb {u}_ i$ evaluated at $\mb {u}_ i = \mb {0}$ .

Assuming that $q(\mb {u}_ i | \bxi )$ is normal with mean $\mb {0}$ and variance matrix $\mb {G}(\bxi )$ , the first-order integral approximation is computable in closed form after completing the square:

$\begin{align*} \int p(\mb {y}_ i | \mb {X}_ i, \bphi , \mb {u}_ i) q(\mb {u}_ i | \bxi )\, d \mb {u}_ i & \approx (2 \pi )^{-n_ i/2} \left| \mb {V}_ i(\mb {X}_ i,\btheta ) \right|^{-1/2} \\ & \exp \left( -(1/2) \left[ \mb {y}_ i - \mb {m}_ i(\mb {X}_ i,\bphi ,\mb {0}) \right]^\prime \mb {V}_ i(\mb {X}_ i,\btheta )^{-1} \left[ \mb {y}_ i - \mb {m}_ i(\mb {X}_ i,\bphi ,\mb {0}) \right] \right) \end{align*}$

where $\mb {V}_ i(\mb {X}_ i,\btheta ) = \mb {Z}_ i(\mb {X}_ i,\bphi ) \mb {G}(\bxi ) \mb {Z}_ i(\mb {X}_ i,\bphi )^\prime + \mb {R}_ i(\mb {X}_ i,\bphi )$ . The resulting approximation for $f(\btheta )$ is then minimized over $\btheta = [\bphi , \bxi ]$ to obtain the first-order estimates. PROC NLMIXED uses finite-difference derivatives of the first-order integral approximation when carrying out the default dual quasi-Newton optimization.