Robust Regression Examples

Example 9.6: Hawkins-Bradu-Kass Data

The first 14 observations of the following data set (see Hawkins, Bradu, and Kass 1984) are leverage points; however, only observations 12, 13, and 14 have large h_{ii}, and only observations 12 and 14 have large md_i values.

  
    title "Hawkins, Bradu, Kass (1984) Data"; 
       aa = { 1  10.1  19.6  28.3   9.7, 
              2   9.5  20.5  28.9  10.1, 
              3  10.7  20.2  31.0  10.3, 
              4   9.9  21.5  31.7   9.5, 
              5  10.3  21.1  31.1  10.0, 
              6  10.8  20.4  29.2  10.0, 
              7  10.5  20.9  29.1  10.8, 
              8   9.9  19.6  28.8  10.3, 
              9   9.7  20.7  31.0   9.6, 
             10   9.3  19.7  30.3   9.9, 
             11  11.0  24.0  35.0  -0.2, 
             12  12.0  23.0  37.0  -0.4, 
             13  12.0  26.0  34.0   0.7, 
             14  11.0  34.0  34.0   0.1, 
             15   3.4   2.9   2.1  -0.4, 
             16   3.1   2.2   0.3   0.6, 
             17   0.0   1.6   0.2  -0.2, 
             18   2.3   1.6   2.0   0.0, 
             19   0.8   2.9   1.6   0.1, 
             20   3.1   3.4   2.2   0.4, 
             21   2.6   2.2   1.9   0.9, 
             22   0.4   3.2   1.9   0.3, 
             23   2.0   2.3   0.8  -0.8, 
             24   1.3   2.3   0.5   0.7, 
             25   1.0   0.0   0.4  -0.3, 
             26   0.9   3.3   2.5  -0.8, 
             27   3.3   2.5   2.9  -0.7, 
             28   1.8   0.8   2.0   0.3, 
             29   1.2   0.9   0.8   0.3, 
             30   1.2   0.7   3.4  -0.3, 
             31   3.1   1.4   1.0   0.0, 
             32   0.5   2.4   0.3  -0.4, 
             33   1.5   3.1   1.5  -0.6, 
             34   0.4   0.0   0.7  -0.7, 
             35   3.1   2.4   3.0   0.3, 
             36   1.1   2.2   2.7  -1.0, 
             37   0.1   3.0   2.6  -0.6, 
             38   1.5   1.2   0.2   0.9, 
             39   2.1   0.0   1.2  -0.7, 
             40   0.5   2.0   1.2  -0.5, 
             41   3.4   1.6   2.9  -0.1, 
             42   0.3   1.0   2.7  -0.7, 
             43   0.1   3.3   0.9   0.6, 
             44   1.8   0.5   3.2  -0.7, 
             45   1.9   0.1   0.6  -0.5, 
             46   1.8   0.5   3.0  -0.4, 
             47   3.0   0.1   0.8  -0.9, 
             48   3.1   1.6   3.0   0.1, 
             49   3.1   2.5   1.9   0.9, 
             50   2.1   2.8   2.9  -0.4, 
             51   2.3   1.5   0.4   0.7, 
             52   3.3   0.6   1.2  -0.5, 
             53   0.3   0.4   3.3   0.7, 
             54   1.1   3.0   0.3   0.7, 
             55   0.5   2.4   0.9   0.0, 
             56   1.8   3.2   0.9   0.1, 
             57   1.8   0.7   0.7   0.7, 
             58   2.4   3.4   1.5  -0.1, 
             59   1.6   2.1   3.0  -0.3, 
             60   0.3   1.5   3.3  -0.9, 
             61   0.4   3.4   3.0  -0.3, 
             62   0.9   0.1   0.3   0.6, 
             63   1.1   2.7   0.2  -0.3, 
             64   2.8   3.0   2.9  -0.5, 
             65   2.0   0.7   2.7   0.6, 
             66   0.2   1.8   0.8  -0.9, 
             67   1.6   2.0   1.2  -0.7, 
             68   0.1   0.0   1.1   0.6, 
             69   2.0   0.6   0.3   0.2, 
             70   1.0   2.2   2.9   0.7, 
             71   2.2   2.5   2.3   0.2, 
             72   0.6   2.0   1.5  -0.2, 
             73   0.3   1.7   2.2   0.4, 
             74   0.0   2.2   1.6  -0.9, 
             75   0.3   0.4   2.6   0.2 }; 
  
        a = aa[,2:4]; b = aa[,5];
 

The data are also listed in Rousseeuw and Leroy (1987, p. 94).

The complete enumeration must inspect 1,215,450 subsets.

Output 9.6.1 displays the iteration history for MVE.

Output 9.6.1: Iteration History for MVE
Random Subsampling for MVE

Subset Singular Best
Criterion
Percent
121545 0 51.104276 10
243090 2 51.104276 20
364635 4 51.104276 30
486180 7 51.104276 40
607725 9 51.104276 50
729270 22 6.271725 60
850815 67 6.271725 70
972360 104 5.912308 80
1093905 135 5.912308 90
1215450 185 5.912308 100

Minimum Criterion= 5.9123076564

Among 1215450 subsets 185 are singular.



Output 9.6.2 reports the robust parameter estimates for MVE.

Output 9.6.2: Robust Location Estimates

Robust MVE Location Estimates
Estimates
VAR1 1.513333333
VAR2 1.808333333
VAR3 1.701666667

Robust MVE Scatter Matrix
  VAR1 VAR2 VAR3
VAR1 1.114395480 0.093954802 0.141672316
VAR2 0.093954802 1.123149718 0.117443503
VAR3 0.141672316 0.117443503 1.074742938



Output 9.6.3 reports the eigenvalues of the robust scatter matrix and the robust correlation matrix.

Output 9.6.3: MVE Scatter Matrix

Eigenvalues of Robust Scatter Matrix
Estimates
VAR1 1.339637154
VAR2 1.028124757
VAR3 0.944526224

Robust Correlation Matrix
  VAR1 VAR2 VAR3
VAR1 1.000000000 0.083980892 0.129453270
VAR2 0.083980892 1.000000000 0.106895118
VAR3 0.129453270 0.106895118 1.000000000



Output 9.6.4 shows the classical Mahalanobis and robust distances obtained by complete enumeration. The first 14 observations are recognized as outliers (leverage points).

Output 9.6.4: Mahalanobis and Robust Distances

Classical and Robust Distances
N Mahalanobis Distances Robust Distances Weight
1 1.916821 29.541649 0
2 1.855757 30.344481 0
3 2.313658 31.985694 0
4 2.229655 33.011768 0
5 2.100114 32.404938 0
6 2.146169 30.683153 0
7 2.010511 30.794838 0
8 1.919277 29.905756 0
9 2.221249 32.092048 0
10 2.333543 31.072200 0
11 2.446542 36.808021 0
12 3.108335 38.071382 0
13 2.662380 37.094539 0
14 6.381624 41.472255 0
15 1.815487 1.994672 1.000000
16 2.151357 2.202278 1.000000
17 1.384915 1.918208 1.000000
18 0.848155 0.819163 1.000000
19 1.148941 1.288387 1.000000
20 1.591431 2.046703 1.000000
21 1.089981 1.068327 1.000000
22 1.548776 1.768905 1.000000
23 1.085421 1.166951 1.000000
24 0.971195 1.304648 1.000000
25 0.799268 2.030417 1.000000
26 1.168373 1.727131 1.000000
27 1.449625 1.983831 1.000000
28 0.867789 1.073856 1.000000
29 0.576399 1.168060 1.000000
30 1.568868 2.091386 1.000000



Output 9.6.4: (continued)

Classical and Robust Distances
N Mahalanobis Distances Robust Distances Weight
61 1.674945 2.286045 1.000000
62 0.759533 2.024702 1.000000
63 1.292259 1.783035 1.000000
64 0.973868 1.835207 1.000000
65 1.148208 1.562278 1.000000
66 1.296746 1.444491 1.000000
67 0.629827 0.552899 1.000000
68 1.549548 2.101580 1.000000
69 1.070511 1.827919 1.000000
70 0.997761 1.354151 1.000000
71 0.642927 0.988770 1.000000
72 1.053395 0.908316 1.000000
73 1.472178 1.314779 1.000000
74 1.646461 1.516083 1.000000
75 1.899178 2.042560 1.000000

Distribution of Robust Distances

MinRes 1st Qu. Median Mean 3rd Qu. MaxRes
0.55289874 1.44449066 1.88493749 7.56960939 2.16610046 41.4722551

Cutoff Value = 3.0575159206

The cutoff value is the square root of the 0.975 quantile of the chi square distribution
with 3 degrees of freedom.

There are 14 points with large robust distances receiving zero weights. These may include
boundary cases. Only points whose robust distances are sub s tantially larger than the
cutoff value should be considered outliers.



The graphs in Figure 9.6.5 and Figure 9.6.6 show the following:

The graph identifies the four good leverage points 11, 12, 13, and 14, which have small standardized LMS residuals but large robust distances, and the 10 bad leverage points 1, ... ,10, which have large standardized LMS residuals and large robust distances.

Output 9.6.5: Hawkins-Bradu-Kass Data: LMS Residuals vs. Robust Distances
hawklms.gif (7951 bytes)

Output 9.6.6: Hawkins-Bradu-Kass Data: LS Residuals vs. Mahalanobis Distances
hawkls.gif (8055 bytes)

Previous Page | Next Page | Top of Page