Introduction to Statistical Modeling with SAS/STAT Software


Maximum Likelihood Estimation

To estimate the parameters in a linear model with mean function $\mr{E}[\bY ] = \bX \bbeta $ by maximum likelihood, you need to specify the distribution of the response vector $\bY $. In the linear model with a continuous response variable, it is commonly assumed that the response is normally distributed. In that case, the estimation problem is completely defined by specifying the mean and variance of $\bY $ in addition to the normality assumption. The model can be written as $\bY \sim N(\bX \bbeta , \sigma ^2\bI )$, where the notation $N(\mb{a},\mb{V})$ indicates a multivariate normal distribution with mean vector $\mb{a}$ and variance matrix $\mb{V}$. The log likelihood for $\bY $ then can be written as

\[ l(\bbeta ,\sigma ^2; \mb{y}) = -\frac{n}{2}\log \{ 2\pi \} -\frac{n}{2}\log \{ \sigma ^2\} - \frac{1}{2\sigma ^2} \left(\mb{y}-\bX \bbeta \right)’\left(\mb{y}-\bX \bbeta \right) \]

This function is maximized in $\bbeta $ when the sum of squares $(\mb{y}-\bX \bbeta )’(\mb{y}-\bX \bbeta )$ is minimized. The maximum likelihood estimator of $\bbeta $ is thus identical to the ordinary least squares estimator. To maximize $l(\bbeta ,\sigma ^2; \mb{y})$ with respect to $\sigma ^2$, note that

\[ \frac{\partial l(\bbeta ,\sigma ^2; \mb{y})}{\partial \sigma ^2} = -\frac{n}{2\sigma ^2} + \frac{1}{2\sigma ^4} \left(\mb{y}-\bX \bbeta \right)’\left(\mb{y}-\bX \bbeta \right) \]

Hence the MLE of $\sigma ^2$ is the estimator

\begin{align*} \widehat{\sigma }^2_ M & = \frac{1}{n} \left(\bY -\bX \widehat{\bbeta }\right)’ \left(\bY -\bX \widehat{\bbeta }\right) \\ & = \mr{SSR}/n \end{align*}

This is a biased estimator of $\sigma ^2$, with a bias that decreases with n.