Introduction to Statistical Modeling with SAS/STAT Software

Maximum Likelihood Estimation

To estimate the parameters in a linear model with mean function $\mr{E}[\bY ] = \bX \bbeta$ by maximum likelihood, you need to specify the distribution of the response vector $\bY$ . In the linear model with a continuous response variable, it is commonly assumed that the response is normally distributed. In that case, the estimation problem is completely defined by specifying the mean and variance of $\bY$ in addition to the normality assumption. The model can be written as $\bY \sim N(\bX \bbeta , \sigma ^2\bI )$ , where the notation $N(\mb{a},\mb{V})$ indicates a multivariate normal distribution with mean vector $\mb{a}$ and variance matrix $\mb{V}$ . The log likelihood for $\bY$ then can be written as

$l(\bbeta ,\sigma ^2; \mb{y}) = -\frac{n}{2}\log \{ 2\pi \} -\frac{n}{2}\log \{ \sigma ^2\} - \frac{1}{2\sigma ^2} \left(\mb{y}-\bX \bbeta \right)’\left(\mb{y}-\bX \bbeta \right)$

This function is maximized in $\bbeta$ when the sum of squares $(\mb{y}-\bX \bbeta )’(\mb{y}-\bX \bbeta )$ is minimized. The maximum likelihood estimator of $\bbeta$ is thus identical to the ordinary least squares estimator. To maximize $l(\bbeta ,\sigma ^2; \mb{y})$ with respect to $\sigma ^2$ , note that

$\frac{\partial l(\bbeta ,\sigma ^2; \mb{y})}{\partial \sigma ^2} = -\frac{n}{2\sigma ^2} + \frac{1}{2\sigma ^4} \left(\mb{y}-\bX \bbeta \right)’\left(\mb{y}-\bX \bbeta \right)$

Hence the MLE of $\sigma ^2$ is the estimator

$\begin{align*} \widehat{\sigma }^2_ M & = \frac{1}{n} \left(\bY -\bX \widehat{\bbeta }\right)’ \left(\bY -\bX \widehat{\bbeta }\right) \\ & = \mr{SSR}/n \end{align*}$

This is a biased estimator of $\sigma ^2$ , with a bias that decreases with n.