The Likelihood Function and Maximum-Likelihood Estimation

Fit Analyses

The Likelihood Function and Maximum-Likelihood Estimation

The log-likelihood function

$l(\theta , \phi ; y) = \log f(y ; \theta, \phi) = \frac{y \theta-b(\theta)}{a(\phi)} + c(y , \phi)$

can be expressed in terms of the mean $\mu$ and the dispersion parameter $\phi$ :

Normal: ${l(\mu , \phi ; y) = -\frac{1}2 \log(\phi) -\frac{1}{2\phi} (y-\mu)^2} \hfil {for -\infty\lt y\lt\infty}$
Inverse Gaussian: ${l(\mu , \phi ; y) = - \log( y^3 \phi) -\frac{(y-\mu)^2}{2 y \mu^2 \phi} } {for y\gt}$
Gamma: ${l(\mu , \phi ; y) = -\log(y {\Gamma}(\frac{1}{\phi})) + \frac{1}{\phi} \log(\frac{y}{\mu\phi}) -\frac{y}{\mu\phi}} {for y\gt}$
Poisson: ${l(\mu , \phi ; y) = y \log(\mu)-\mu }$ for y = 0, 1, 2, ...
Binomial: ${l(\mu , \phi ; y) = r \log(\mu) + (m-r) \log(1-\mu) }$

for y=r/m, r=0, 1, 2,..., m

Note	Some terms in the density function have been dropped in the log-likelihood function since they do not affect the estimation of the mean and scale parameters.

SAS/INSIGHT software uses a ridge stabilized Newton-Raphson algorithm to maximize the log-likelihood function l( $\mu$ , $\phi$ ; y) with respect to the regression parameters. On the rth iteration, the algorithm updates the parameter vector b by

b_(r) = b_(r-1) - H^-1_(r-1) u_(r-1)

where H is the Hessian matrix and u is the gradient vector, both evaluated at ${{\beta}= b_{(r-1)}}$ .

$H = ( h_{jk} ) = ( \frac{\partial^2l}{\partial \beta_{j} \partial \beta_{k}} )$

$u = ( u_{j} ) = ( \frac{\partial l}{\partial \beta_{j} } ).$

The Hessian matrix H can be expressed as

H = - X' W_o X

where X is the design matrix, W_o is a diagonal matrix with ith diagonal element

$w_{oi} = w_{ei} + ( y_{i}- \mu_{i}) \frac{V_{i}{g_{i}"} + {V_{i}'} {g_{i}'}}{V^2_{i} ({g_{i}'})^3 a_{i}(\phi) }$

$w_{ei} = E( w_{oi}) = \frac{1}{a_{i}(\phi) V_{i} ({g_{i}'})^2 }$

where g_i is the link function, V_i is the variance function, and the primes denote derivatives of g and V with respect to $\mu$ .All values are evaluated at the current mean estimate ${ \mu_{i}}$ . ${ a_{i}(\phi) = \phi / w_{i}}$ ,where w_i is the prior weight for the ith observation.

SAS/INSIGHT software uses either the full Hessian matrix H = - X' W_o X or the Fisher's scoring method in the maximum-likelihood estimation. In the Fisher's scoring method, W_o is replaced by its expected value W_e with ith element w_ei.

H = X' W_e X

The estimated variance-covariance matrix of the parameter estimates is

$\hat{{{\Sigma}}} = - H^{-1}$

where H is the Hessian matrix evaluated at the model parameter estimates.

The estimated correlation matrix of the parameter estimates is derived by scaling the estimated variance-covariance matrix to 1 on the diagonal.

Note	A warning message appears when the specified model fails to converge. The output tables, graphs, and variables are based on the results from the last iteration.

Top of Page