The NLMIXED Procedure

Modeling Assumptions and Notation

PROC NLMIXED operates under the following general framework for nonlinear mixed models. Assume that you have an observed data vector $\mb{y}_ i$ for each of i subjects, $i=1,\ldots ,s$. The $\mb{y}_ i$ are assumed to be independent across i, but within-subject covariance is likely to exist because each of the elements of $\mb{y}_ i$ is measured on the same subject. As a statistical mechanism for modeling this within-subject covariance, assume that there exist latent random-effect vectors $\mb{u}_ i$ of small dimension (typically one or two) that are also independent across i. Assume also that an appropriate model linking $\mb{y}_ i$ and $\mb{u}_ i$ exists, leading to the joint probability density function

\[ p(\mb{y}_ i | \mb{X}_ i, \bphi , \mb{u}_ i) q(\mb{u}_ i | \bxi ) \]

where $\mb{X}_ i$ is a matrix of observed explanatory variables and $\bphi $ and $\bxi $ are vectors of unknown parameters.

Let $\btheta = [\bphi , \bxi ]$ and assume that it is of dimension n. Then inferences about $\btheta $ are based on the marginal likelihood function

\[ m(\btheta ) = \prod _{i=1}^ s \int p(\mb{y}_ i | \mb{X}_ i, \bphi , \mb{u}_ i) q(\mb{u}_ i | \bxi ) d \mb{u}_ i \]

In particular, the function

\[ f(\btheta ) = - \log m(\btheta ) \]

is minimized over $\btheta $ numerically in order to estimate $\btheta $, and the inverse Hessian (second derivative) matrix at the estimates provides an approximate variance-covariance matrix for the estimate of $\btheta $. The function $f(\btheta )$ is referred to both as the negative log likelihood function and as the objective function for optimization.

As an example of the preceding general framework, consider the nonlinear growth curve example in the section Getting Started: NLMIXED Procedure. Here, the conditional distribution $p(\mb{y}_ i | \mb{X}_ i, \bphi , u_ i)$ is normal with mean

\[ \frac{b_1 + u_{i1}}{1 + \exp [-(d_{ij} - b_2)/ b_3]} \]

and variance $\sigma ^2_ e$; thus $\bphi = [b_1,b_2,b_3,\sigma ^2_ e]$. Also, $u_ i$ is a scalar and $q(u_ i | \xi )$ is normal with mean 0 and variance $\sigma ^2_ u$; thus $\xi = \sigma ^2_ u$.

The following additional notation is also found in this chapter. The quantity $\btheta ^{(k)}$ refers to the parameter vector at the kth iteration, the vector $\mb{g}(\btheta )$ refers to the gradient vector $\nabla f(\btheta )$, and the matrix $\mb{H}(\btheta )$ refers to the Hessian $\nabla ^2 f(\btheta )$. Other symbols are used to denote various constants or option values.

Nested Multilevel Nonlinear Mixed Models

The general framework for nested multilevel nonlinear mixed models in cases of two levels can be explained as follows. Let $\mb{y}_{j(i)}$ be the response vector observed on subject j that is nested within subject i, where j is commonly referred as the second-level subject and i is the first-level subject. There are s first-level subjects, and each has $s_ i$ second-level subjects that are nested within. An example is $\mb{y}_{j(i)}$, which are the heights of students in class j of school i, where $j=1,\ldots ,s_ i$ for each i and $i=1,\ldots ,s$. Suppose there exist latent random-effect vectors $\mb{v}_{j(i)}$ and $\mb{v}_ i$ of small dimensions for modeling within subject covariance. Assume also that an appropriate model that links $\mb{y}_{j(i)}$ and $(\mb{v}_{j(i)}, \mb{v}_ i)$ exists, and if you use the notation $\mb{y}_ i = (\mb{y}_{1(i)},\ldots ,\mb{y}_{s_ i(i)})$, $\mb{u}_ i = (\mb{v}_ i,\mb{v}_{1(i)},\ldots ,\mb{v}_{s_ i(i)})$, and $\bxi = (\bxi _1,\bxi _2)$, the joint density function in terms of the first-level subject can be expressed as

\[ p(\mb{y}_ i | \mb{X}_ i, \bphi , \mb{u}_ i) q(\mb{u}_ i | \bxi ) = \left( \prod _{j=1}^{s_ i} p(\mb{y}_{j(i)} | \mb{X}_ i,\bphi ,\mb{v}_ i,\mb{v}_{j(i)}) q_2(\mb{v}_{j(i)} | \bxi _2) \right) q_1(\mb{v}_ i | \bxi _1) \]

As defined in the previous section, the marginal likelihood function where $\btheta = [\bphi , \bxi ]$ is

\[ m(\btheta ) = \prod _{i=1}^ s \int p(\mb{y}_ i | \mb{X}_ i, \bphi , \mb{u}_ i) q(\mb{u}_ i | \bxi ) d \mb{u}_ i \]

Again, the function

\[ f(\btheta ) = - \log m(\btheta ) \]

is minimized over $\btheta $ numerically in order to estimate $\btheta $. Models that have more than two levels follow similar notation.