The GLIMMIX Procedure

Maximum Likelihood

The GLIMMIX procedure forms the log likelihoods of generalized linear models as

$L(\bmu ,\phi ;\mb {y}) = \sum _{i=1}^{n} f_ i \, l(\mu _ i,\phi ;y_ i,w_ i)$

where $l(\mu _ i,\phi ;y_ i,w_ i)$ is the log likelihood contribution of the ith observation with weight $w_ i$ and $f_ i$ is the value of the frequency variable. For the determination of $w_ i$ and $f_ i$ , see the WEIGHT and FREQ statements. The individual log likelihood contributions for the various distributions are as follows.

Beta: $\begin{align*} l(\mu _ i,\phi ;y_ i,w_ i) & = \log \left\{ \frac{\Gamma (\phi /w_ i)}{\Gamma (\mu \phi /w_ i)\Gamma ((1-\mu )\phi /w_ i)}\right\} \\ & \mbox{} + \, (\mu \phi /w_ i - 1)\log \{ y_ i\} \\ & \mbox{} + \, ((1-\mu )\phi /w_ i - 1)\log \{ 1-y_ i\} \end{align*}$

$\mr {Var}[Y] = \mu (1-\mu )/(1+\phi ), \, \phi > 0$ . See Ferrari and Cribari-Neto (2004).
Binary: $l(\mu _ i,\phi ;y_ i,w_ i) = w_ i(y_ i \log \{ \mu _ i\} + (1-y_ i)\log \{ 1-\mu _ i\} )$

$\mr {Var}[Y] = \mu (1-\mu ), \, \phi \equiv 1$ .
Binomial: $\begin{align*} l(\mu _ i,\phi ;y_ i,w_ i) & = w_ i \left( y_ i \log \{ \mu _ i\} + (n_ i - y_ i) \log \{ 1-\mu _ i\} \right) \\ & + w_ i \left( \log \{ \Gamma (n_ i+1)\} - \log \{ \Gamma (y_ i+1)\} - \log \{ \Gamma (n_ i-y_ i+1)\} \right) \end{align*}$

where $y_ i$ and $n_ i$ are the events and trials in the events/trials syntax, and $0 < \mu < 1$ . $\mr {Var}[Y/n] = \mu (1-\mu )/n, \, \phi \equiv 1$ .
Exponential: $l(\mu _ i,\phi ;y_ i,w_ i) = \left\{ \begin{array}{ll} -\log \{ \mu _ i\} - y_ i/\mu _ i & w_ i = 1 \\ w_ i\log \left\{ \frac{w_ iy_ i}{\mu _ i}\right\} - \frac{w_ iy_ i}{\mu _ i} - \log \{ y_ i \Gamma (w_ i)\} & w_ i \not= 1 \end{array} \right.$

$\mr {Var}[Y] = \mu ^2 , \, \phi \equiv 1$ .
Gamma: $l(\mu _ i,\phi ;y_ i,w_ i) = w_ i\phi \log \left\{ \frac{w_ iy_ i\phi }{\mu _ i}\right\} - \frac{w_ iy_ i\phi }{\mu _ i} - \log \{ y_ i\} - \log \left\{ \Gamma (w_ i\phi )\right\}$

$\mr {Var}[Y] = \phi \mu ^2, \, \phi > 0$ .
Geometric: $\begin{align*} l(\mu _ i,\phi ;y_ i,w_ i) & = y_ i \log \left\{ \frac{\mu _ i}{w_ i}\right\} - (y_ i + w_ i) \log \left\{ 1 + \frac{\mu _ i}{w_ i}\right\} \\ & + \log \left\{ \frac{\Gamma (y_ i + w_ i)}{\Gamma (w_ i) \Gamma (y_ i + 1)} \right\} \end{align*}$

$\mr {Var}[Y] = \mu + \mu ^2, \, \phi \equiv 1$ .
Inverse Gaussian: $l(\mu _ i,\phi ;y_ i,w_ i) = -\frac{1}{2} \left[ \frac{w_ i(y_ i-\mu _ i)^2}{y_ i\phi \mu _ i^2} + \log \left\{ \frac{\phi y_ i^3}{w_ i} \right\} + \log \{ 2\pi \} \right]$

$\mr {Var}[Y] = \phi \mu ^3, \, \phi > 0$ .
“Lognormal”: $l(\mu _ i,\phi ;\log \{ y_ i\} ,w_ i) = -\frac{1}{2} \left[ \frac{w_ i(\log \{ y_ i\} -\mu _ i)^2}{\phi } + \log \left\{ \frac{\phi }{w_ i}\right\} + \log \{ 2\pi \} \right]$

$\mr {Var}[\log \{ Y\} ] = \phi , \, \phi > 0$ .If you specify DIST=LOGNORMAL with response variable Y, the GLIMMIX procedure assumes that $\log \{ Y\} \sim N(\mu ,\sigma ^2)$ . Note that the preceding density is not the density of Y.
Multinomial: $l(\bmu _ i,\phi ;\mb {y}_ i,w_ i) = w_ i \sum _{j=1}^{J} y_{ij}\log \{ \mu _{ij}\}$

$\phi \equiv 1$ .
Negative Binomial: $\begin{align*} l(\mu _ i,\phi ;y_ i,w_ i) & = y_ i \log \left\{ \frac{k\mu _ i}{w_ i}\right\} - (y_ i + w_ i / k)\log \left\{ 1 + \frac{k\mu _ i}{w_ i}\right\} \\ & + \log \left\{ \frac{\Gamma (y_ i + w_ i/k)}{\Gamma (w_ i/k) \Gamma (y_ i + 1)} \right\} \end{align*}$

$\mr {Var}[Y] = \mu + k\mu ^2, \, k > 0, \phi \equiv 1$ . For a given k, the negative binomial distribution is a member of the exponential family. The parameter k is related to the scale of the data, because it is part of the variance function. However, it cannot be factored from the variance, as is the case with the $\phi$ parameter in many other distributions. The parameter k is designated as “Scale” in the “Parameter Estimates” table of the GLIMMIX procedure.
Normal (Gaussian): $l(\mu _ i,\phi ;y_ i,w_ i) = -\frac{1}{2} \left[ \frac{w_ i(y_ i-\mu _ i)^2}{\phi } + \log \left\{ \frac{\phi }{w_ i}\right\} + \log \{ 2\pi \} \right]$

$\mr {Var}[Y] = \phi , \, \phi > 0$ .
Poisson: $l(\mu _ i,\phi ;y_ i,w_ i) = w_ i (y_ i \log \{ \mu _ i\} - \mu _ i - \log \{ \Gamma (y_ i + 1)\} )$

$\mr {Var}[Y] = \mu , \, \phi \equiv 1$ .
Shifted T: $\begin{align*} z_ i & = -0.5\log \{ \phi /\sqrt {w_ i}\} + \log \left\{ \Gamma (0.5(\nu +1)\right\} \\ & \mbox{ } -\log \left\{ \Gamma (0.5\nu )\right\} - 0.5\times \log \left\{ \pi \nu \right\} \\ l(\mu _ i,\phi ;y_ i,w_ i) & = -(\nu /2+0.5) \log \left\{ 1+\frac{w_ i}{\nu } \frac{(y_ i-\mu _ i)^2}{\phi }\right\} + z_ i \end{align*}$

$\phi > 0, \nu > 0, \, \mr {Var}[Y] = \phi \nu /(\nu -2)$ .

Define the parameter vector for the generalized linear model as $\btheta = \bbeta$ , if $\phi \equiv 1$ , and as $\btheta = [\bbeta ’,\phi ]’$ otherwise. $\bbeta$ denotes the fixed-effects parameters in the linear predictor. For the negative binomial distribution, the relevant parameter vector is $\btheta = [\bbeta ’,k]’$ . The gradient and Hessian of the negative log likelihood are then

$\mb {g} = - \frac{\partial L(\btheta ;\mb {y})}{\partial \btheta } \quad \quad \quad \mb {H} = - \frac{\partial ^2 L(\btheta ;\mb {y})}{\partial \btheta \, \partial \btheta }$

The GLIMMIX procedure computes the gradient vector and Hessian matrix analytically, unless your programming statements involve functions whose derivatives are determined by finite differences. If the procedure is in scoring mode, $\bH$ is replaced by its expected value. PROC GLIMMIX is in scoring mode when the number n of SCORING=n iterations has not been exceeded and the optimization technique uses second derivatives, or when the Hessian is computed at convergence and the EXPHESSIAN option is in effect. Note that the objective function is the negative log likelihood when the GLIMMIX procedure fits a GLM model. The procedure performs a minimization problem in this case.

In models for independent data with known distribution, parameter estimates are obtained by the method of maximum likelihood. No parameters are profiled from the optimization. The default optimization technique for GLMs is the Newton-Raphson algorithm, except for Gaussian models with identity link, which do not require iterative model fitting. In the case of a Gaussian model, the scale parameter is estimated by restricted maximum likelihood, because this estimate is unbiased. The results from the GLIMMIX procedure agree with those from the GLM and REG procedure for such models. You can obtain the maximum likelihood estimate of the scale parameter with the NOREML option in the PROC GLIMMIX statement. To change the optimization algorithm, use the TECHNIQUE= option in the NLOPTIONS statement.

Standard errors of the parameter estimates are obtained from the inverse of the (observed or expected) second derivative matrix $\bH$ .