Robust Regression Examples |
Example 9.3: LMS and LTS Univariate (Location) Problem: Barnett and Lewis Data
If you do not specify matrix of the last
input argument, the regression problem is reduced
to the estimation problem of the location parameter .
The following example is described in
Rousseeuw and Leroy (1987, p. 175):
print "*** Barnett and Lewis (1978) ***";
b = { 3, 4, 7, 8, 10, 949, 951 };
optn = j(9,1,.);
optn[2]= 3; /* ipri */
optn[3]= 3; /* ilsq */
optn[8]= 3; /* icov */
call lms(sc,coef,wgt,optn,b);
Output 9.3.1 shows the results of the unweighted LS regression.
Output 9.3.1: Table of Unweighted LS Regression
Robust Estimation of Location and Scale |
LMS: The 4th ordered squared residual will be minimized. |
Unweighted Least-Squares Estimation |
Median = 8 MAD ( * 1.4826) = 5.930408874 |
Mean = 276 Standard Deviation = 460.43602523 |
LS Residuals |
N |
Observed |
Residual |
Res / S |
1 |
3.000000 |
-273.000000 |
-0.592916 |
2 |
4.000000 |
-272.000000 |
-0.590744 |
3 |
7.000000 |
-269.000000 |
-0.584229 |
4 |
8.000000 |
-268.000000 |
-0.582057 |
5 |
10.000000 |
-266.000000 |
-0.577713 |
6 |
949.000000 |
673.000000 |
1.461658 |
7 |
951.000000 |
675.000000 |
1.466002 |
Distribution of Residuals |
MinRes |
1st Qu. |
Median |
Mean |
3rd Qu. |
MaxRes |
-273 |
-272 |
-268 |
0 |
-266 |
675 |
|
Output 9.3.2 shows the results for LMS regression.
Output 9.3.2: Table of LMS Results
Least Median of Squares (LMS) Method |
Minimizing 4th Ordered Squared Residual. |
Highest Possible Breakdown Value = 57.14 % |
LMS Objective Function = 2.5 |
Preliminary LMS Scale = 5.4137257125 |
Final LMS Scale = 3.0516389039 |
LMS Residuals |
N |
Observed |
Residual |
Res / S |
1 |
3.000000 |
-2.500000 |
-0.819232 |
2 |
4.000000 |
-1.500000 |
-0.491539 |
3 |
7.000000 |
1.500000 |
0.491539 |
4 |
8.000000 |
2.500000 |
0.819232 |
5 |
10.000000 |
4.500000 |
1.474617 |
6 |
949.000000 |
943.500000 |
309.178127 |
7 |
951.000000 |
945.500000 |
309.833512 |
Distribution of Residuals |
MinRes |
1st Qu. |
Median |
Mean |
3rd Qu. |
MaxRes |
-2.5 |
-1.5 |
2.5 |
270.5 |
4.5 |
945.5 |
|
You obtain the LMS location estimate 6.5 compared
with the mean 276 (which is the LS estimate
of the location parameter) and the median 8.
The scale estimate in the univariate problem is
a resistant (high breakdown) estimator for the dispersion
of the data (see Rousseeuw and Leroy 1987, p. 178).
For weighted LS regression, the last two
observations are ignored (that is, given zero weights),
as shown in Output 9.3.3.
Output 9.3.3: Table of Weighted LS Regression
Weighted Least-Squares Estimation |
Weighted Standard Deviation = 2.8809720582 |
There are 5 points with nonzero weight. |
Average Weight = 0.7142857143 |
Weighted LS Residuals |
N |
Observed |
Residual |
Res / S |
Weight |
1 |
3.000000 |
-3.400000 |
-1.180157 |
1.000000 |
2 |
4.000000 |
-2.400000 |
-0.833052 |
1.000000 |
3 |
7.000000 |
0.600000 |
0.208263 |
1.000000 |
4 |
8.000000 |
1.600000 |
0.555368 |
1.000000 |
5 |
10.000000 |
3.600000 |
1.249578 |
1.000000 |
6 |
949.000000 |
942.600000 |
327.181236 |
0 |
7 |
951.000000 |
944.600000 |
327.875447 |
0 |
Distribution of Residuals |
MinRes |
1st Qu. |
Median |
Mean |
3rd Qu. |
MaxRes |
-3.4 |
-2.4 |
1.6 |
269.6 |
3.6 |
944.6 |
|
Use the following code to obtain results from LTS:
optn = j(9,1,.);
optn[2]= 3; /* ipri */
optn[3]= 3; /* ilsq */
optn[8]= 3; /* icov */
call lts(sc,coef,wgt,optn,b);
The results for LTS are similar to those
reported for LMS in Rousseeuw and Leroy (1987),
as shown in
Output 9.3.4.
Output 9.3.4: Table of LTS Results
Least Trimmed Squares (LTS) Method |
Minimizing Sum of 4 Smallest Squared Residuals. |
Highest Possible Breakdown Value = 57.14 % |
LTS Objective Function = 2.0615528128 |
Preliminary LTS Scale = 4.7050421234 |
Final LTS Scale = 3.0516389039 |
LTS Residuals |
N |
Observed |
Residual |
Res / S |
1 |
3.000000 |
-2.500000 |
-0.819232 |
2 |
4.000000 |
-1.500000 |
-0.491539 |
3 |
7.000000 |
1.500000 |
0.491539 |
4 |
8.000000 |
2.500000 |
0.819232 |
5 |
10.000000 |
4.500000 |
1.474617 |
6 |
949.000000 |
943.500000 |
309.178127 |
7 |
951.000000 |
945.500000 |
309.833512 |
Distribution of Residuals |
MinRes |
1st Qu. |
Median |
Mean |
3rd Qu. |
MaxRes |
-2.5 |
-1.5 |
2.5 |
270.5 |
4.5 |
945.5 |
|
Since nonzero weights are chosen for the same
observations as with LMS, the WLS results based on
LTS agree with those based on LMS (shown previously in Output 9.3.3).
In summary, you obtain the following
estimates for the location parameter:
- LS estimate (unweighted mean) = 276
- Median = 8
- LMS estimate = 5.5
- LTS estimate = 5.5
- WLS estimate (weighted mean based on LMS or LTS) = 6.4