Robust Regression Examples

Combining Robust Residual and Robust Distance

This section is based entirely on Rousseeuw and Van Zomeren (1990). Observations x_i, which are far away from most of the other observations, are called leverage points. One classical method inspects the Mahalanobis distances md_i to find outliers x_i:

md_i = \sqrt{(x_i - \mu) c^{-1}(x_i - \mu)^t}
where c is the classical sample covariance matrix.

Note that the MVE subroutine prints the classical Mahalanobis distances md_i together with the robust distances rd_i. In classical linear regression, the diagonal elements h_{ii} of the hat matrix

h = x(x^t{x})^{-1}x^t
are used to identify leverage points. Rousseeuw and Van Zomeren (1990) report the following monotone relationship between the h_{ii} and md_i:
h_{ii} = \frac{(md_i)^2}{n-1} + \frac{1}n
They point out that neither the md_i nor the h_{ii} are entirely safe for detecting leverage points reliably. Multiple outliers do not necessarily have large md_i values because of the masking effect.

The definition of a leverage point is, therefore, based entirely on the outlyingness of x_i and is not related to the response value y_i. By including the y_i value in the definition, Rousseeuw and Van Zomeren (1990) distinguish between the following:

Rousseeuw and Van Zomeren (1990) propose to plot the standardized residuals of robust regression (LMS or LTS) versus the robust distances rd_i obtained from MVE. Two horizontal lines corresponding to residual values of +2.5 and -2.5 are useful to distinguish between small and large residuals, and one vertical line corresponding to the \sqrt{\chi^2_{n,.975}} is used to distinguish between small and large distances.


Example 9.6: Hawkins-Bradu-Kass Data

Example 9.7: Stackloss Data

Previous Page | Next Page | Top of Page