Robust Regression Examples |
The first 14 observations of the following data set (see Hawkins, Bradu, and Kass 1984) are leverage points; however, only observations 12, 13, and 14 have large , and only observations 12 and 14 have large values.
title "Hawkins, Bradu, Kass (1984) Data"; aa = { 1 10.1 19.6 28.3 9.7, 2 9.5 20.5 28.9 10.1, 3 10.7 20.2 31.0 10.3, 4 9.9 21.5 31.7 9.5, 5 10.3 21.1 31.1 10.0, 6 10.8 20.4 29.2 10.0, 7 10.5 20.9 29.1 10.8, 8 9.9 19.6 28.8 10.3, 9 9.7 20.7 31.0 9.6, 10 9.3 19.7 30.3 9.9, 11 11.0 24.0 35.0 -0.2, 12 12.0 23.0 37.0 -0.4, 13 12.0 26.0 34.0 0.7, 14 11.0 34.0 34.0 0.1, 15 3.4 2.9 2.1 -0.4, 16 3.1 2.2 0.3 0.6, 17 0.0 1.6 0.2 -0.2, 18 2.3 1.6 2.0 0.0, 19 0.8 2.9 1.6 0.1, 20 3.1 3.4 2.2 0.4, 21 2.6 2.2 1.9 0.9, 22 0.4 3.2 1.9 0.3, 23 2.0 2.3 0.8 -0.8, 24 1.3 2.3 0.5 0.7, 25 1.0 0.0 0.4 -0.3, 26 0.9 3.3 2.5 -0.8, 27 3.3 2.5 2.9 -0.7, 28 1.8 0.8 2.0 0.3, 29 1.2 0.9 0.8 0.3, 30 1.2 0.7 3.4 -0.3, 31 3.1 1.4 1.0 0.0, 32 0.5 2.4 0.3 -0.4, 33 1.5 3.1 1.5 -0.6, 34 0.4 0.0 0.7 -0.7, 35 3.1 2.4 3.0 0.3, 36 1.1 2.2 2.7 -1.0, 37 0.1 3.0 2.6 -0.6, 38 1.5 1.2 0.2 0.9, 39 2.1 0.0 1.2 -0.7, 40 0.5 2.0 1.2 -0.5, 41 3.4 1.6 2.9 -0.1, 42 0.3 1.0 2.7 -0.7, 43 0.1 3.3 0.9 0.6, 44 1.8 0.5 3.2 -0.7, 45 1.9 0.1 0.6 -0.5, 46 1.8 0.5 3.0 -0.4, 47 3.0 0.1 0.8 -0.9, 48 3.1 1.6 3.0 0.1, 49 3.1 2.5 1.9 0.9, 50 2.1 2.8 2.9 -0.4, 51 2.3 1.5 0.4 0.7, 52 3.3 0.6 1.2 -0.5, 53 0.3 0.4 3.3 0.7, 54 1.1 3.0 0.3 0.7, 55 0.5 2.4 0.9 0.0, 56 1.8 3.2 0.9 0.1, 57 1.8 0.7 0.7 0.7, 58 2.4 3.4 1.5 -0.1, 59 1.6 2.1 3.0 -0.3, 60 0.3 1.5 3.3 -0.9, 61 0.4 3.4 3.0 -0.3, 62 0.9 0.1 0.3 0.6, 63 1.1 2.7 0.2 -0.3, 64 2.8 3.0 2.9 -0.5, 65 2.0 0.7 2.7 0.6, 66 0.2 1.8 0.8 -0.9, 67 1.6 2.0 1.2 -0.7, 68 0.1 0.0 1.1 0.6, 69 2.0 0.6 0.3 0.2, 70 1.0 2.2 2.9 0.7, 71 2.2 2.5 2.3 0.2, 72 0.6 2.0 1.5 -0.2, 73 0.3 1.7 2.2 0.4, 74 0.0 2.2 1.6 -0.9, 75 0.3 0.4 2.6 0.2 }; a = aa[,2:4]; b = aa[,5];
The data are also listed in Rousseeuw and Leroy (1987, p. 94).
The complete enumeration must inspect 1,215,450 subsets.
Output 9.6.1 displays the iteration history for MVE.
Output 9.6.1: Iteration History for MVEOutput 9.6.2 reports the robust parameter estimates for MVE.
Output 9.6.2: Robust Location Estimates
|
Robust MVE Location Estimates Estimates |
|
VAR1 | 1.513333333 |
VAR2 | 1.808333333 |
VAR3 | 1.701666667 |
Robust MVE Scatter Matrix | |||
VAR1 | VAR2 | VAR3 | |
VAR1 | 1.114395480 | 0.093954802 | 0.141672316 |
VAR2 | 0.093954802 | 1.123149718 | 0.117443503 |
VAR3 | 0.141672316 | 0.117443503 | 1.074742938 |
Output 9.6.3 reports the eigenvalues of the robust scatter matrix and the robust correlation matrix.
Output 9.6.3: MVE Scatter Matrix
|
Eigenvalues of Robust Scatter Matrix Estimates |
|
VAR1 | 1.339637154 |
VAR2 | 1.028124757 |
VAR3 | 0.944526224 |
Robust Correlation Matrix | |||
VAR1 | VAR2 | VAR3 | |
VAR1 | 1.000000000 | 0.083980892 | 0.129453270 |
VAR2 | 0.083980892 | 1.000000000 | 0.106895118 |
VAR3 | 0.129453270 | 0.106895118 | 1.000000000 |
Output 9.6.4 shows the classical Mahalanobis and robust distances obtained by complete enumeration. The first 14 observations are recognized as outliers (leverage points).
Output 9.6.4: Mahalanobis and Robust Distances
|
Classical and Robust Distances | |||
N | Mahalanobis Distances | Robust Distances | Weight |
1 | 1.916821 | 29.541649 | 0 |
2 | 1.855757 | 30.344481 | 0 |
3 | 2.313658 | 31.985694 | 0 |
4 | 2.229655 | 33.011768 | 0 |
5 | 2.100114 | 32.404938 | 0 |
6 | 2.146169 | 30.683153 | 0 |
7 | 2.010511 | 30.794838 | 0 |
8 | 1.919277 | 29.905756 | 0 |
9 | 2.221249 | 32.092048 | 0 |
10 | 2.333543 | 31.072200 | 0 |
11 | 2.446542 | 36.808021 | 0 |
12 | 3.108335 | 38.071382 | 0 |
13 | 2.662380 | 37.094539 | 0 |
14 | 6.381624 | 41.472255 | 0 |
15 | 1.815487 | 1.994672 | 1.000000 |
16 | 2.151357 | 2.202278 | 1.000000 |
17 | 1.384915 | 1.918208 | 1.000000 |
18 | 0.848155 | 0.819163 | 1.000000 |
19 | 1.148941 | 1.288387 | 1.000000 |
20 | 1.591431 | 2.046703 | 1.000000 |
21 | 1.089981 | 1.068327 | 1.000000 |
22 | 1.548776 | 1.768905 | 1.000000 |
23 | 1.085421 | 1.166951 | 1.000000 |
24 | 0.971195 | 1.304648 | 1.000000 |
25 | 0.799268 | 2.030417 | 1.000000 |
26 | 1.168373 | 1.727131 | 1.000000 |
27 | 1.449625 | 1.983831 | 1.000000 |
28 | 0.867789 | 1.073856 | 1.000000 |
29 | 0.576399 | 1.168060 | 1.000000 |
30 | 1.568868 | 2.091386 | 1.000000 |
Output 9.6.4: (continued)
|
Classical and Robust Distances | |||
N | Mahalanobis Distances | Robust Distances | Weight |
61 | 1.674945 | 2.286045 | 1.000000 |
62 | 0.759533 | 2.024702 | 1.000000 |
63 | 1.292259 | 1.783035 | 1.000000 |
64 | 0.973868 | 1.835207 | 1.000000 |
65 | 1.148208 | 1.562278 | 1.000000 |
66 | 1.296746 | 1.444491 | 1.000000 |
67 | 0.629827 | 0.552899 | 1.000000 |
68 | 1.549548 | 2.101580 | 1.000000 |
69 | 1.070511 | 1.827919 | 1.000000 |
70 | 0.997761 | 1.354151 | 1.000000 |
71 | 0.642927 | 0.988770 | 1.000000 |
72 | 1.053395 | 0.908316 | 1.000000 |
73 | 1.472178 | 1.314779 | 1.000000 |
74 | 1.646461 | 1.516083 | 1.000000 |
75 | 1.899178 | 2.042560 | 1.000000 |
Distribution of Robust Distances |
MinRes | 1st Qu. | Median | Mean | 3rd Qu. | MaxRes |
0.55289874 | 1.44449066 | 1.88493749 | 7.56960939 | 2.16610046 | 41.4722551 |
Cutoff Value = 3.0575159206 |
The cutoff value is the square root of the 0.975 quantile of the chi square distribution with 3 degrees of freedom. |
There are 14 points with large robust distances receiving zero weights. These may include boundary cases. Only points whose robust distances are sub s tantially larger than the cutoff value should be considered outliers. |
The graphs in Figure 9.6.5 and Figure 9.6.6 show the following:
Copyright © 2009 by SAS Institute Inc., Cary, NC, USA. All rights reserved.