| Language Reference |
finds the minimum volume ellipsoid estimator
| n | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
| 500 | 50 | 22 | 17 | 15 | 14 | 0 | 0 | 0 | 0 | |
| 1414 | 182 | 71 | 43 | 32 | 27 | 24 | 23 | 22 | ||
| 500 | 1000 | 1500 | 2000 | 2500 | 3000 | 3000 | 3000 | 3000 | 3000 |
| n | 11 | 12 | 13 | 14 | 15 |
| 0 | 0 | 0 | 0 | 0 | |
| 22 | 22 | 22 | 23 | 23 | |
| 3000 | 3000 | 3000 | 3000 | 3000 |
/* X1 X2 X3 Y Stackloss data */
aa = { 1 80 27 89 42,
1 80 27 88 37,
1 75 25 90 37,
1 62 24 87 28,
1 62 22 87 18,
1 62 23 87 18,
1 62 24 93 19,
1 62 24 93 20,
1 58 23 87 15,
1 58 18 80 14,
1 58 18 89 14,
1 58 17 88 13,
1 58 18 82 11,
1 58 19 93 12,
1 50 18 89 8,
1 50 18 86 7,
1 50 19 72 8,
1 50 19 79 8,
1 50 20 80 9,
1 56 20 82 15,
1 70 20 91 15 };
Rousseeuw and Leroy (1987, p. 76) cite a large number of papers where this data set was analyzed and state that most researchers ``concluded that observations 1, 3, 4, and 21 were outliers''; some people also reported observation 2 as an outlier.
By default, subroutine MVE chooses only 2,000 randomly selected subsets in its search. There are in total 5,985 subsets of 4 cases out of 21 cases. Here is the code:
a = aa[,2:4]; optn = j(8,1,.); optn[1]= 2; /* ipri */ optn[2]= 1; /* pcov: print COV */ optn[3]= 1; /* pcor: print CORR */ optn[5]= -1; /* nrep: use all subsets */ CALL MVE(sc,xmve,dist,optn,a);
The first part of the output shows the classical scatter and correlation matrix:
Minimum Volume Ellipsoid (MVE) Estimation
Consider Ellipsoids Containing 12 Cases.
Classical Covariance Matrix
VAR1 VAR2 VAR3
VAR1 84.057142857 22.657142857 24.571428571
VAR2 22.657142857 9.9904761905 6.6214285714
VAR3 24.571428571 6.6214285714 28.714285714
Classical Correlation Matrix
VAR1 VAR2 VAR3
VAR1 1 0.781852333 0.5001428749
VAR2 0.781852333 1 0.3909395378
VAR3 0.5001428749 0.3909395378 1
Classical Mean
VAR1 60.428571429
VAR2 21.095238095
VAR3 86.285714286
There are 5985 subsets of 4 cases out of 21 cases.
All 5985 subsets will be considered.
The second part of the output shows the results of the optimization (complete subset sampling):
Complete Enumeration for MVE
Best
Subset Singular Criterion Percent
1497 22 253.312431 25
2993 46 224.084073 50
4489 77 165.830053 75
5985 156 165.634363 100
Minimum Criterion= 165.63436284
Among 5985 subsets 156 are singular.
Observations of Best Subset
7 10 14 20
Initial MVE Location
Estimates
VAR1 58.5
VAR2 20.25
VAR3 87
Initial MVE Scatter Matrix
VAR1 VAR2 VAR3
VAR1 34.829014749 28.413143611 62.32560534
VAR2 28.413143611 38.036950318 58.659393261
VAR3 62.32560534 58.659393261 267.63348175
The third part of the output shows the optimization results after local improvement:
Final MVE Estimates (Using Local Improvement)
Number of Points with Nonzero Weight=17
Robust MVE Location
Estimates
VAR1 56.705882353
VAR2 20.235294118
VAR3 85.529411765
Robust MVE Scatter Matrix
VAR1 VAR2 VAR3
VAR1 23.470588235 7.5735294118 16.102941176
VAR2 7.5735294118 6.3161764706 5.3676470588
VAR3 16.102941176 5.3676470588 32.389705882
Eigenvalues of Robust
Scatter Matrix
VAR1 46.597431018
VAR2 12.155938483
VAR3 3.423101087
Robust Correlation Matrix
VAR1 VAR2 VAR3
VAR1 1 0.6220269501 0.5840361335
VAR2 0.6220269501 1 0.375278187
VAR3 0.5840361335 0.375278187 1
The final output presents a table containing the classical
Mahalanobis distances, the robust distances, and the weights
identifying the outlying observations (that is leverage points
when explaining
with these three regressor variables):
Classical Distances and Robust (Rousseeuw) Distances
Unsquared Mahalanobis Distance and
Unsquared Rousseeuw Distance of Each Observation
Mahalanobis Robust
N Distances Distances Weight
1 2.253603 5.528395 0
2 2.324745 5.637357 0
3 1.593712 4.197235 0
4 1.271898 1.588734 1.000000
5 0.303357 1.189335 1.000000
6 0.772895 1.308038 1.000000
7 1.852661 1.715924 1.000000
8 1.852661 1.715924 1.000000
9 1.360622 1.226680 1.000000
10 1.745997 1.936256 1.000000
11 1.465702 1.493509 1.000000
12 1.841504 1.913079 1.000000
13 1.482649 1.659943 1.000000
14 1.778785 1.689210 1.000000
15 1.690241 2.230109 1.000000
16 1.291934 1.767582 1.000000
17 2.700016 2.431021 1.000000
18 1.503155 1.523316 1.000000
19 1.593221 1.710165 1.000000
20 0.807054 0.675124 1.000000
21 2.176761 3.657281 0
Distribution of Robust Distances
MinRes 1st Qu. Median
0.6751244996 1.5084120761 1.7159242054
Mean 3rd Qu. MaxRes
2.2282960174 2.0831826658 5.6373573538
Cutoff Value = 3.0575159206
The cutoff value is the square root of
the 0.975 quantile of the chi square
distribution with 3 degrees of freedom.
There are 4 points with large robust distances receiving
zero weights. These may include boundary cases.
Only points whose robust distances are substantially larger
than the cutoff value should be considered outliers.
Copyright © 2009 by SAS Institute Inc., Cary, NC, USA. All rights reserved.