Maximum Likelihood Estimation

To estimate the parameters in a linear model with mean function $\mr {E}[\bY ] = \bX \bbeta $ by maximum likelihood, you need to specify the distribution of the response vector $\bY $. In the linear model with a continuous response variable, it is commonly assumed that the response is normally distributed. In that case, the estimation problem is completely defined by specifying the mean and variance of $\bY $ in addition to the normality assumption. The model can be written as $\bY \sim N(\bX \bbeta , \sigma ^2\bI )$, where the notation $N(\mb {a},\mb {V})$ indicates a multivariate normal distribution with mean vector $\mb {a}$ and variance matrix $\mb {V}$. The log likelihood for $\bY $ then can be written as

\[  l(\bbeta ,\sigma ^2; \mb {y}) = -\frac{n}{2}\log \{ 2\pi \}  -\frac{n}{2}\log \{ \sigma ^2\}  - \frac{1}{2\sigma ^2} \left(\mb {y}-\bX \bbeta \right)’\left(\mb {y}-\bX \bbeta \right)  \]

This function is maximized in $\bbeta $ when the sum of squares $(\mb {y}-\bX \bbeta )’(\mb {y}-\bX \bbeta )$ is minimized. The maximum likelihood estimator of $\bbeta $ is thus identical to the ordinary least squares estimator. To maximize $l(\bbeta ,\sigma ^2; \mb {y})$ with respect to $\sigma ^2$, note that

\[  \frac{\partial l(\bbeta ,\sigma ^2; \mb {y})}{\partial \sigma ^2} = -\frac{n}{2\sigma ^2} + \frac{1}{2\sigma ^4} \left(\mb {y}-\bX \bbeta \right)’\left(\mb {y}-\bX \bbeta \right)  \]

Hence the MLE of $\sigma ^2$ is the estimator

$\displaystyle  \widehat{\sigma }^2_ M  $
$\displaystyle = \frac{1}{n} \left(\bY -\bX \widehat{\bbeta }\right)’ \left(\bY -\bX \widehat{\bbeta }\right)  $
$\displaystyle  $
$\displaystyle = \mr {SSR}/n  $

This is a biased estimator of $\sigma ^2$, with a bias that decreases with n.