Robust Regression Examples

Example 9.5: Stackloss Data

The following example analyzes the three regressors of Brownlee (1965) stackloss data. By default, the MVE subroutine tries only 2000 randomly selected subsets in its search. There are, in total, 5985 subsets of 4 cases out of 21 cases. Here is the code:

  
    title2 "***MVE for Stackloss Data***"; 
    title3 "*** Use All Subsets***"; 
       a = aa[,2:4]; 
       optn = j(9,1,.); 
       optn[1]= 2;              /* ipri */ 
       optn[2]= 1;              /* pcov: print COV */ 
       optn[3]= 1;              /* pcor: print CORR */ 
       optn[5]= -1;             /* nrep: use all subsets */ 
    call mve(sc,xmve,dist,optn,a);
 
Output 9.5.1 of the output shows the classical scatter and correlation matrix.

Output 9.5.1: Some Simple Statistics
Minimum Volume Ellipsoid (MVE) Estimation

Consider Ellipsoids Containing 12 Cases.

Classical Covariance Matrix
  VAR1 VAR2 VAR3
VAR1 84.057142857 22.657142857 24.571428571
VAR2 22.657142857 9.9904761905 6.6214285714
VAR3 24.571428571 6.6214285714 28.714285714

Classical Correlation Matrix
  VAR1 VAR2 VAR3
VAR1 1 0.781852333 0.5001428749
VAR2 0.781852333 1 0.3909395378
VAR3 0.5001428749 0.3909395378 1

Classical Mean
VAR1 60.428571429
VAR2 21.095238095
VAR3 86.285714286



Output 9.5.2 shows the results of the optimization (complete subset sampling).

Output 9.5.2: Iteration History
 
***MVE for Stackloss Data***

Subset Singular Best
Criterion
Percent
1497 22 253.312431 25
2993 46 224.084073 50
4489 77 165.830053 75
5985 156 165.634363 100
 
Observations of Best Subset
7 10 14 20
 
Initial MVE Location
Estimates
VAR1 58.5
VAR2 20.25
VAR3 87
 
Initial MVE Scatter Matrix
  VAR1 VAR2 VAR3
VAR1 34.829014749 28.413143611 62.32560534
VAR2 28.413143611 38.036950318 58.659393261
VAR3 62.32560534 58.659393261 267.63348175



Output 9.5.3 shows the optimization results after local improvement.

Output 9.5.3: Table of MVE Results
 
***MVE for Stackloss Data***

Robust MVE Location Estimates
VAR1 56.705882353
VAR2 20.235294118
VAR3 85.529411765
 
Robust MVE Scatter Matrix
  VAR1 VAR2 VAR3
VAR1 23.470588235 7.5735294118 16.102941176
VAR2 7.5735294118 6.3161764706 5.3676470588
VAR3 16.102941176 5.3676470588 32.389705882
 
Eigenvalues of Robust
Scatter Matrix
VAR1 46.597431018
VAR2 12.155938483
VAR3 3.423101087
 
Robust Correlation Matrix
  VAR1 VAR2 VAR3
VAR1 1 0.6220269501 0.5840361335
VAR2 0.6220269501 1 0.375278187
VAR3 0.5840361335 0.375278187 1



Output 9.5.4 presents a table containing the classical Mahalanobis distances, the robust distances, and the weights identifying the outlying observations (that is, the leverage points when explaining y with these three regressor variables).

Output 9.5.4: Mahalanobis and Robust Distances
 
***MVE for Stackloss Data***

Classical Distances and Robust (Rousseeuw) Distances
Unsquared Mahalanobis Distance and
Unsquared Rousseeuw Distance of Each Observation
N Mahalanobis Distances Robust Distances Weight
1 2.253603 5.528395 0
2 2.324745 5.637357 0
3 1.593712 4.197235 0
4 1.271898 1.588734 1.000000
5 0.303357 1.189335 1.000000
6 0.772895 1.308038 1.000000
7 1.852661 1.715924 1.000000
8 1.852661 1.715924 1.000000
9 1.360622 1.226680 1.000000
10 1.745997 1.936256 1.000000
11 1.465702 1.493509 1.000000
12 1.841504 1.913079 1.000000
13 1.482649 1.659943 1.000000
14 1.778785 1.689210 1.000000
15 1.690241 2.230109 1.000000
16 1.291934 1.767582 1.000000
17 2.700016 2.431021 1.000000
18 1.503155 1.523316 1.000000
19 1.593221 1.710165 1.000000
20 0.807054 0.675124 1.000000
21 2.176761 3.657281 0



The following specification generates three bivariate plots of the classical and robust tolerance ellipsoids. They are shown in Figure 9.5.5, Figure 9.5.6, and Figure 9.5.7, one plot for each pair of variables.

  
       optn = j(9,1,.); optn[5]= -1; 
       vnam = { "Rate", "Temperature", "AcidConcent" }; 
       filn = "stlmve"; 
       titl = "Stackloss Data: Use All Subsets"; 
    call scatmve(2,optn,.9,a,vnam,titl,1,filn);
 

Output 9.5.5: Stackloss Data: Rate vs. Temperature (MVE)
stlmve21.gif (5600 bytes)

Output 9.5.6: Stackloss Data: Rate vs. Acid Concentration (MVE)
stlmve31.gif (6059 bytes)

Output 9.5.7: Stackloss Data: Temperature vs. Acid Concentration (MVE)
stlmve32.gif (5967 bytes)

You can also use the MCD method for the stackloss data as follows:

  
   title2 "***MCD for Stackloss Data***"; 
   title3 "*** Use 500 Random Subsets***"; 
   a = aa[,2:4]; 
   optn = j(8,1,.); 
   optn[1]= 2;              /* ipri */ 
   optn[2]= 1;              /* pcov: print COV */ 
   optn[3]= 1;              /* pcor: print CORR */ 
   optn[5]= -1 ;            /* nrep: use all subsets */ 
   CALL MCD(sc,xmcd,dist,optn,a);
 

The optimization results are displayed in Output 9.5.8. The reweighted results are displayed in Output 9.5.9.

Output 9.5.8: MCD Results of Optimization
 
***MCD for Stackloss Data***
*** Use 500 Random Subsets***

4 5 6 7 8 9 10 11 12 13 14 20
 
MCD Location Estimate
VAR1 VAR2 VAR3
59.5 20.833333333 87.333333333
 
MCD Scatter Matrix Estimate
  VAR1 VAR2 VAR3
VAR1 5.1818181818 4.8181818182 4.7272727273
VAR2 4.8181818182 7.6060606061 5.0606060606
VAR3 4.7272727273 5.0606060606 19.151515152
 
MCD Correlation Matrix
  VAR1 VAR2 VAR3
VAR1 1 0.7674714142 0.4745347313
VAR2 0.7674714142 1 0.4192963398
VAR3 0.4745347313 0.4192963398 1
 
Consistent Scatter Matrix
  VAR1 VAR2 VAR3
VAR1 8.6578437815 8.0502757968 7.8983838007
VAR2 8.0502757968 12.708297013 8.4553211199
VAR3 7.8983838007 8.4553211199 31.998580526



Output 9.5.9: Final Reweighted MCD Results
 
***MCD for Stackloss Data***
*** Use 500 Random Subsets***

Reweighted Location Estimate
VAR1 VAR2 VAR3
59.5 20.833333333 87.333333333
 
Reweighted Scatter Matrix
  VAR1 VAR2 VAR3
VAR1 5.1818181818 4.8181818182 4.7272727273
VAR2 4.8181818182 7.6060606061 5.0606060606
VAR3 4.7272727273 5.0606060606 19.151515152
 
Eigenvalues
VAR1 VAR2 VAR3
23.191069268 7.3520037086 1.3963209628
 
Reweighted Correlation Matrix
  VAR1 VAR2 VAR3
VAR1 1 0.7674714142 0.4745347313
VAR2 0.7674714142 1 0.4192963398
VAR3 0.4745347313 0.4192963398 1



The MCD robust distances and outlying diagnostic are displayed in Output 9.5.10. MCD identifies more leverage points than MVE.

Output 9.5.10: MCD Robust Distances
 
***MCD for Stackloss Data***
*** Use 500 Random Subsets***

Classical Distances and Robust (Rousseeuw) Distances
Unsquared Mahalanobis Distance and
Unsquared Rousseeuw Distance of Each Observation
N Mahalanobis Distances Robust Distances Weight
1 2.253603 12.173282 0
2 2.324745 12.255677 0
3 1.593712 9.263990 0
4 1.271898 1.401368 1.000000
5 0.303357 1.420020 1.000000
6 0.772895 1.291188 1.000000
7 1.852661 1.460370 1.000000
8 1.852661 1.460370 1.000000
9 1.360622 2.120590 1.000000
10 1.745997 1.809708 1.000000
11 1.465702 1.362278 1.000000
12 1.841504 1.667437 1.000000
13 1.482649 1.416724 1.000000
14 1.778785 1.988240 1.000000
15 1.690241 5.874858 0
16 1.291934 5.606157 0
17 2.700016 6.133319 0
18 1.503155 5.760432 0
19 1.593221 6.156248 0
20 0.807054 2.172300 1.000000
21 2.176761 7.622769 0



Similarly, you can use the SCATMCD routine to generate three bivariate plots of the classical and robust tolerance ellipsoids, one plot for each pair of variables. Here is the code:

  
       optn = j(9,1,.); optn[5]= -1; 
       vnam = { "Rate", "Temperature", "AcidConcent" }; 
       filn = "stlmcd"; 
       titl = "Stackloss Data: Use All Subsets"; 
    call scatmcd(2,optn,.9,a,vnam,titl,1,filn);
 

Figure 9.5.11, Figure 9.5.12, and Figure 9.5.13 display these plots.

Output 9.5.11: Stackloss Data: Rate vs. Temperature (MCD)
stlmcd21.gif (5386 bytes)

Output 9.5.12: Stackloss Data: Rate vs. Acid Concentration (MCD)
stlmcd31.gif (5499 bytes)

Output 9.5.13: Stackloss Data: Temperature vs. Acid Concentration (MCD)
stlmcd32.gif (5299 bytes)

Previous Page | Next Page | Top of Page