This section analyzes the three regressors in the stack loss data of Brownlee (1965), which are defined in Example 12.2. As in the previous section, the LOAD MODULES=_ALL_ statement loads modules that are defined in the RobustMC.sas
file.
%include "C:\<path>\RobustMC.sas"; proc iml; load module=_all_;
By default, the MVE subroutine generates 2,000 randomly selected subsets in its search. The following call to the MVE subroutine uses all 5,985 subsets of four observations that can be chosen from the 21 observations:
/* Obs X1 X2 X3 Y Stack Loss data */ SL = { 1 80 27 89 42, 2 80 27 88 37, 3 75 25 90 37, 4 62 24 87 28, 5 62 22 87 18, 6 62 23 87 18, 7 62 24 93 19, 8 62 24 93 20, 9 58 23 87 15, 10 58 18 80 14, 11 58 18 89 14, 12 58 17 88 13, 13 58 18 82 11, 14 58 19 93 12, 15 50 18 89 8, 16 50 18 86 7, 17 50 19 72 8, 18 50 19 79 8, 19 50 20 80 9, 20 56 20 82 15, 21 70 20 91 15 }; x = SL[, 2:4]; y = SL[, 5]; optn = j(5,1,.); optn[1] = 1; /* print basic output */ optn[2] = 1; /* print covariance matrices */ optn[5]= -1; /* nrep: use all subsets */ ods select ClassicalMean ClassicalCov RobustMVELoc RobustMVEScatter; call mve(sc, xmve, dist, optn, x); ods select all;
Output 12.5.1 shows the classical and robust estimates of the location. Output 12.5.2 shows the classical and robust estimates of the scatter.
Output 12.5.1: Classical and Robust Estimates of the Location
Classical Mean | |
---|---|
VAR1 | 60.428571429 |
VAR2 | 21.095238095 |
VAR3 | 86.285714286 |
Robust MVE Location Estimates | |
---|---|
VAR1 | 56.705882353 |
VAR2 | 20.235294118 |
VAR3 | 85.529411765 |
Output 12.5.2: Classical and Robust Estimates of the Scatter
Classical Covariance Matrix | |||
---|---|---|---|
VAR1 | VAR2 | VAR3 | |
VAR1 | 84.057142857 | 22.657142857 | 24.571428571 |
VAR2 | 22.657142857 | 9.9904761905 | 6.6214285714 |
VAR3 | 24.571428571 | 6.6214285714 | 28.714285714 |
Robust MVE Scatter Matrix | |||
---|---|---|---|
VAR1 | VAR2 | VAR3 | |
VAR1 | 23.470588235 | 7.5735294118 | 16.102941176 |
VAR2 | 7.5735294118 | 6.3161764706 | 5.3676470588 |
VAR3 | 16.102941176 | 5.3676470588 | 32.389705882 |
The following statements generate three bivariate scatter plots of the classical and robust tolerance ellipsoids. The plots are shown in Output 12.5.3, Output 12.5.4, and Output 12.5.5, one plot for each pair of variables.
optn = j(5,1,.); optn[5]= -1; vnam = {"Rate", "Temperature", "AcidConcent"}; titl = "Stack Loss Data: Use All Subsets"; call MVEScatter(x, optn, 0.9, vnam, titl);
You can also use the MCD method for the stack loss data as follows:
optn = j(5,1,.); optn[1]= 2; /* print distances */ optn[2]= 1; /* print covariance matrices */ optn[5]= -1 ; /* nrep: use all subsets */ call mcd(sc, xmcd, dist, optn, x);
The optimization results are displayed in Output 12.5.6. The reweighted results are displayed in Output 12.5.7.
Output 12.5.6: MCD Results of Optimization
4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 20 |
MCD Location Estimate | ||
---|---|---|
VAR1 | VAR2 | VAR3 |
59.5 | 20.833333333 | 87.333333333 |
MCD Scatter Matrix Estimate | |||
---|---|---|---|
VAR1 | VAR2 | VAR3 | |
VAR1 | 5.1818181818 | 4.8181818182 | 4.7272727273 |
VAR2 | 4.8181818182 | 7.6060606061 | 5.0606060606 |
VAR3 | 4.7272727273 | 5.0606060606 | 19.151515152 |
Consistent Scatter Matrix | |||
---|---|---|---|
VAR1 | VAR2 | VAR3 | |
VAR1 | 8.6578437815 | 8.0502757968 | 7.8983838007 |
VAR2 | 8.0502757968 | 12.708297013 | 8.4553211199 |
VAR3 | 7.8983838007 | 8.4553211199 | 31.998580526 |
Output 12.5.7: Final Reweighted MCD Results
Reweighted Location Estimate | ||
---|---|---|
VAR1 | VAR2 | VAR3 |
59.5 | 20.833333333 | 87.333333333 |
Reweighted Scatter Matrix | |||
---|---|---|---|
VAR1 | VAR2 | VAR3 | |
VAR1 | 5.1818181818 | 4.8181818182 | 4.7272727273 |
VAR2 | 4.8181818182 | 7.6060606061 | 5.0606060606 |
VAR3 | 4.7272727273 | 5.0606060606 | 19.151515152 |
Eigenvalues | ||
---|---|---|
VAR1 | VAR2 | VAR3 |
23.191069268 | 7.3520037086 | 1.3963209628 |
The MCD robust distances and outlying diagnostic are displayed in Output 12.5.8. MCD identifies more leverage points than MVE identifies.
Output 12.5.8: MCD Robust Distances
Classical Distances and Robust (Rousseeuw) Distances | |||
---|---|---|---|
Unsquared Mahalanobis Distance and | |||
Unsquared Rousseeuw Distance of Each Observation | |||
N | Mahalanobis Distances | Robust Distances | Weight |
1 | 2.253603 | 12.173282 | 0 |
2 | 2.324745 | 12.255677 | 0 |
3 | 1.593712 | 9.263990 | 0 |
4 | 1.271898 | 1.401368 | 1.000000 |
5 | 0.303357 | 1.420020 | 1.000000 |
6 | 0.772895 | 1.291188 | 1.000000 |
7 | 1.852661 | 1.460370 | 1.000000 |
8 | 1.852661 | 1.460370 | 1.000000 |
9 | 1.360622 | 2.120590 | 1.000000 |
10 | 1.745997 | 1.809708 | 1.000000 |
11 | 1.465702 | 1.362278 | 1.000000 |
12 | 1.841504 | 1.667437 | 1.000000 |
13 | 1.482649 | 1.416724 | 1.000000 |
14 | 1.778785 | 1.988240 | 1.000000 |
15 | 1.690241 | 5.874858 | 0 |
16 | 1.291934 | 5.606157 | 0 |
17 | 2.700016 | 6.133319 | 0 |
18 | 1.503155 | 5.760432 | 0 |
19 | 1.593221 | 6.156248 | 0 |
20 | 0.807054 | 2.172300 | 1.000000 |
21 | 2.176761 | 7.622769 | 0 |
The following statements generate three bivariate scatter plots of the classical and robust tolerance ellipsoids:
optn = j(5,1,.); optn[5]= -1; vnam = {"Rate", "Temperature", "AcidConcent"}; titl = "Stack Loss Data: Use All Subsets"; call MCDScatter(x, optn, 0.9, vnam, titl);
Output 12.5.9, Output 12.5.10, and Output 12.5.11 display these plots, one plot for each pair of variables: