Robust Regression Examples

Example 12.3 LMS and LTS Univariate (Location) Problem: Barnett and Lewis Data

If you do not specify matrix $\text{[math]}$ of the last input argument, the regression problem is reduced to the estimation problem of the location parameter $\text{[math]}$ . The following example is described in Rousseeuw and Leroy (1987):

title2 "*** Barnett and Lewis (1978) ***";
b = { 3, 4, 7, 8, 10, 949, 951 };

optn = j(9,1,.);
optn[2]= 3;    /* ipri */
optn[3]= 3;    /* ilsq */
optn[8]= 3;    /* icov */

call lms(sc,coef,wgt,optn,b);

Output 12.3.1 shows the results of the unweighted LS regression.

Output 12.3.1 Table of Unweighted LS Regression

*** Barnett and Lewis (1978) ***

LS Residuals
N	Observed	Residual	Res / S
1	3.000000	-273.000000	-0.592916
2	4.000000	-272.000000	-0.590744
3	7.000000	-269.000000	-0.584229
4	8.000000	-268.000000	-0.582057
5	10.000000	-266.000000	-0.577713
6	949.000000	673.000000	1.461658
7	951.000000	675.000000	1.466002

MinRes	1st Qu.	Median	Mean	3rd Qu.	MaxRes
-273	-272	-268	0	-266	675

Output 12.3.2 shows the results for LMS regression.

Output 12.3.2 Table of LMS Results

LMS Residuals
N	Observed	Residual	Res / S
1	3.000000	-2.500000	-0.819232
2	4.000000	-1.500000	-0.491539
3	7.000000	1.500000	0.491539
4	8.000000	2.500000	0.819232
5	10.000000	4.500000	1.474617
6	949.000000	943.500000	309.178127
7	951.000000	945.500000	309.833512

MinRes	1st Qu.	Median	Mean	3rd Qu.	MaxRes
-2.5	-1.5	2.5	270.5	4.5	945.5

You obtain the LMS location estimate $\text{[math]}$ compared with the mean $\text{[math]}$ (which is the LS estimate of the location parameter) and the median $\text{[math]}$ . The scale estimate in the univariate problem is a resistant (high breakdown) estimator for the dispersion of the data (see Rousseeuw and Leroy (1987)).

For weighted LS regression, the last two observations are ignored (that is, given zero weights), as shown in Output 12.3.3.

Output 12.3.3 Table of Weighted LS Regression

Weighted LS Residuals
N	Observed	Residual	Res / S	Weight
1	3.000000	-3.400000	-1.180157	1.000000
2	4.000000	-2.400000	-0.833052	1.000000
3	7.000000	0.600000	0.208263	1.000000
4	8.000000	1.600000	0.555368	1.000000
5	10.000000	3.600000	1.249578	1.000000
6	949.000000	942.600000	327.181236	0
7	951.000000	944.600000	327.875447	0

MinRes	1st Qu.	Median	Mean	3rd Qu.	MaxRes
-3.4	-2.4	1.6	269.6	3.6	944.6

Use the following code to obtain results from LTS:

title2 "*** Barnett and Lewis (1978) ***";
b = { 3, 4, 7, 8, 10, 949, 951 };

optn = j(9,1,.);
optn[2]= 3;    /* ipri */
optn[3]= 3;    /* ilsq */
optn[8]= 3;    /* icov */

call lts(sc,coef,wgt,optn,b);

The results for LTS are similar to those reported for LMS in Rousseeuw and Leroy (1987), as shown in Output 12.3.4.

Output 12.3.4 Table of LTS Results

*** Barnett and Lewis (1978) ***

LTS Residuals
N	Observed	Residual	Res / S
1	3.000000	-2.500000	-0.819232
2	4.000000	-1.500000	-0.491539
3	7.000000	1.500000	0.491539
4	8.000000	2.500000	0.819232
5	10.000000	4.500000	1.474617
6	949.000000	943.500000	309.178127
7	951.000000	945.500000	309.833512

MinRes	1st Qu.	Median	Mean	3rd Qu.	MaxRes
-2.5	-1.5	2.5	270.5	4.5	945.5

Since nonzero weights are chosen for the same observations as with LMS, the WLS results based on LTS agree with those based on LMS (shown previously in Output 12.3.3).

In summary, you obtain the following estimates for the location parameter:

LS estimate (unweighted mean) = 276
Median = 8
LMS estimate = 5.5
LTS estimate = 5.5
WLS estimate (weighted mean based on LMS or LTS) = 6.4