The ROBUSTREG Procedure

Overview: ROBUSTREG Procedure

Subsections:

Features

The main purpose of robust regression is to detect outliers and provide resistant (stable) results in the presence of outliers. In order to achieve this stability, robust regression limits the influence of outliers. Historically, robust regression techniques have addressed three classes of problems:

problems with outliers in the Y direction (response direction)
problems with multivariate outliers in the X space (that is, outliers in the covariate space, which are also referred to as leverage points)
problems with outliers in both the Y direction and the X space

Many methods have been developed in response to these problems. However, in statistical applications of outlier detection and robust regression, the methods that are most commonly used today are Huber M estimation, high breakdown value estimation, and combinations of these two methods. The ROBUSTREG procedure provides four such methods: M estimation, LTS estimation, S estimation, and MM estimation.

M estimation, introduced by Huber (1973), is the simplest approach both computationally and theoretically. Although it is not robust with respect to leverage points, it is still used extensively in data analysis when contamination can be assumed to be mainly in the response direction.
Least trimmed squares (LTS) estimation is a high breakdown value method that was introduced by Rousseeuw (1984). The breakdown value is a measure of the proportion of contamination that an estimation method can withstand and still maintain its robustness. The performance of this method was improved by the FAST-LTS algorithm of Rousseeuw and Van Driessen (2000).
S estimation is a high breakdown value method that was introduced by Rousseeuw and Yohai (1984). Given the same breakdown value, S estimation has a higher statistical efficiency than LTS estimation.
MM estimation, introduced by Yohai (1987), combines high breakdown value estimation and M estimation. It has the same high breakdown property as S estimation but a higher statistical efficiency.