The MIXED Procedure

Example 59.7 Influence in Heterogeneous Variance Model

In this example from Snedecor and Cochran (1976, p. 256), a one-way classification model with heterogeneous variances is fit. The data, shown in the following DATA step, represent amounts of different types of fat absorbed by batches of doughnuts during cooking, measured in grams.

data absorb;
   input FatType Absorbed @@;
   datalines;
 1 164  1 172  1 168  1 177  1 156  1 195
 2 178  2 191  2 197  2 182  2 185  2 177
 3 175  3 193  3 178  3 171  3 163  3 176
 4 155  4 166  4 149  4 164  4 170  4 168
;

The statistical model for these data can be written as

$\displaystyle  Y_{ij}  $
$\displaystyle = \mu + \tau _ i + \epsilon _{ij}  $
$\displaystyle i  $
$\displaystyle = 1,\cdots ,t=4  $
$\displaystyle j  $
$\displaystyle = 1,\cdots ,r=6  $
$\displaystyle \epsilon _{ij}  $
$\displaystyle = N(0,\sigma ^2_ i)  $

where $Y_{ij}$ is the amount of fat absorbed by the jth batch of the ith fat type, and $\tau _ i$ denotes the fat-type effects. A quick glance at the data suggests that observations 6, 9, 14, and 21 might be influential on the analysis, because these are extreme observations for the respective fat types.

The following SAS statements fit this model and request influence diagnostics for the fixed effects and covariance parameters. ODS Graphics is used to create plots of the influence diagnostics in addition to the tabular output. The ESTIMATES suboption requests plots of leave-one-out estimates for the fixed effects and group variances.

ods graphics on;

proc mixed data=absorb asycov;
   class FatType;
   model Absorbed = FatType / s
                    influence(iter=10 estimates);
   repeated / group=FatType;
   ods output Influence=inf;
run;

ods graphics off;

The Influence table is output to the SAS data set inf so that parameter estimates can be printed subsequently. Results from this analysis are shown in Output 59.7.1.

Output 59.7.1: Heterogeneous Variance Analysis

The Mixed Procedure

Model Information
Data Set WORK.ABSORB
Dependent Variable Absorbed
Covariance Structure Variance Components
Group Effect FatType
Estimation Method REML
Residual Variance Method None
Fixed Effects SE Method Model-Based
Degrees of Freedom Method Between-Within

Covariance Parameter Estimates
Cov Parm Group Estimate
Residual FatType 1 178.00
Residual FatType 2 60.4000
Residual FatType 3 97.6000
Residual FatType 4 67.6000

Solution for Fixed Effects
Effect FatType Estimate Standard Error DF t Value Pr > |t|
Intercept   162.00 3.3566 20 48.26 <.0001
FatType 1 10.0000 6.3979 20 1.56 0.1337
FatType 2 23.0000 4.6188 20 4.98 <.0001
FatType 3 14.0000 5.2472 20 2.67 0.0148
FatType 4 0 . . . .


The fixed-effects solutions correspond to estimates of the following parameters:

$\displaystyle  \mr { Intercept}  $
$\displaystyle : \mu + \tau _4  $
$\displaystyle \mr { FatType 1}  $
$\displaystyle : \tau _1 - \tau _4  $
$\displaystyle \mr { FatType 2}  $
$\displaystyle : \tau _2 - \tau _4  $
$\displaystyle \mr { FatType 3}  $
$\displaystyle : \tau _3 - \tau _4  $
$\displaystyle \mr { FatType 4}  $
$\displaystyle : 0  $

You can easily verify that these estimates are simple functions of the arithmetic means $\overline{y}_{i.}$ in the groups. For example, $\widehat{\mu + \tau _4} = \overline{y}_{4.} = 162.0$, $\widehat{\tau _1-\tau _4} = \overline{y}_{1.} - \overline{y}_{4.} = 10.0$, and so forth. The covariance parameter estimates are the sample variances in the groups and are uncorrelated.

The variances in the four groups are shown in the Covariance Parameter Estimates table (Output 59.7.1). The estimated variance in the first group is two to three times larger than the variance in the other groups.

Output 59.7.2: Asymptotic Variances of Group Variance Estimates

Asymptotic Covariance Matrix of Estimates
Row Cov Parm CovP1 CovP2 CovP3 CovP4
1 Residual 12674      
2 Residual   1459.26    
3 Residual     3810.30  
4 Residual       1827.90


In groups where the residual variance estimate is large, the precision of the estimate is also small (Output 59.7.2).

The following statements print the leave-one-out estimates for fixed effects and covariance parameters that were written to the inf data set with the ESTIMATES suboption (Output 59.7.3):

proc print data=inf label;
   var parm1-parm5 covp1-covp4;
run;

Output 59.7.3: Leave-One-Out Estimates

Obs Intercept FatType 1 FatType 2 FatType 3 FatType 4 Residual FatType 1 Residual FatType 2 Residual FatType 3 Residual FatType 4
1 162.00 11.600 23.000 14.000 0 203.30 60.400 97.60 67.600
2 162.00 10.000 23.000 14.000 0 222.47 60.400 97.60 67.600
3 162.00 10.800 23.000 14.000 0 217.68 60.400 97.60 67.600
4 162.00 9.000 23.000 14.000 0 214.99 60.400 97.60 67.600
5 162.00 13.200 23.000 14.000 0 145.70 60.400 97.60 67.600
6 162.00 5.400 23.000 14.000 0 63.80 60.400 97.60 67.600
7 162.00 10.000 24.400 14.000 0 178.00 60.795 97.60 67.600
8 162.00 10.000 21.800 14.000 0 178.00 64.691 97.60 67.600
9 162.00 10.000 20.600 14.000 0 178.00 32.296 97.60 67.600
10 162.00 10.000 23.600 14.000 0 178.00 72.797 97.60 67.600
11 162.00 10.000 23.000 14.000 0 178.00 75.490 97.60 67.600
12 162.00 10.000 24.600 14.000 0 178.00 56.285 97.60 67.600
13 162.00 10.000 23.000 14.200 0 178.00 60.400 121.68 67.600
14 162.00 10.000 23.000 10.600 0 178.00 60.400 35.30 67.600
15 162.00 10.000 23.000 13.600 0 178.00 60.400 120.79 67.600
16 162.00 10.000 23.000 15.000 0 178.00 60.400 114.50 67.600
17 162.00 10.000 23.000 16.600 0 178.00 60.400 71.30 67.600
18 162.00 10.000 23.000 14.000 0 178.00 60.400 121.98 67.600
19 163.40 8.600 21.600 12.600 0 178.00 60.400 97.60 69.799
20 161.20 10.800 23.800 14.800 0 178.00 60.400 97.60 79.698
21 164.60 7.400 20.400 11.400 0 178.00 60.400 97.60 33.800
22 161.60 10.400 23.400 14.400 0 178.00 60.400 97.60 83.292
23 160.40 11.600 24.600 15.600 0 178.00 60.400 97.60 65.299
24 160.80 11.200 24.200 15.200 0 178.00 60.400 97.60 73.677


The graphical displays in Output 59.7.4 and Output 59.7.5 are created when ODS Graphics is enabled. For general information about ODS Graphics, see Chapter 21: Statistical Graphics Using ODS. For specific information about the graphics available in the MIXED procedure, see the section ODS Graphics.

Output 59.7.4: Fixed-Effects Deletion Estimates

 Fixed-Effects Deletion Estimates


Output 59.7.5: Covariance Parameter Deletion Estimates

 Covariance Parameter Deletion Estimates


The estimate of the intercept is affected only when observations from the last group are removed. The estimate of the FatType 1 effect reacts to removal of observations in the first and last group (Output 59.7.4).

While observations can affect one or more fixed-effects solutions in this model, they can affect only one covariance parameter, the variance in their group (Output 59.7.5). Observations 6, 9, 14, and 21, which are extreme in their group, reduce the group variance considerably.

Diagnostics related to residuals and predicted values are printed with the following statements:

proc print data=inf label;
   var observed predicted residual pressres
       student Rstudent;
run;

Output 59.7.6: Residual Diagnostics

Obs Observed
Value
Predicted Mean Residual PRESS Residual Internally Studentized
Residual
Externally Studentized
Residual
1 164 172.0 -8.000 -9.600 -0.6569 -0.6146
2 172 172.0 0.000 0.000 0.0000 0.0000
3 168 172.0 -4.000 -4.800 -0.3284 -0.2970
4 177 172.0 5.000 6.000 0.4105 0.3736
5 156 172.0 -16.000 -19.200 -1.3137 -1.4521
6 195 172.0 23.000 27.600 1.8885 3.1544
7 178 185.0 -7.000 -8.400 -0.9867 -0.9835
8 191 185.0 6.000 7.200 0.8457 0.8172
9 197 185.0 12.000 14.400 1.6914 2.3131
10 182 185.0 -3.000 -3.600 -0.4229 -0.3852
11 185 185.0 0.000 -0.000 0.0000 0.0000
12 177 185.0 -8.000 -9.600 -1.1276 -1.1681
13 175 176.0 -1.000 -1.200 -0.1109 -0.0993
14 193 176.0 17.000 20.400 1.8850 3.1344
15 178 176.0 2.000 2.400 0.2218 0.1993
16 171 176.0 -5.000 -6.000 -0.5544 -0.5119
17 163 176.0 -13.000 -15.600 -1.4415 -1.6865
18 176 176.0 0.000 0.000 0.0000 0.0000
19 155 162.0 -7.000 -8.400 -0.9326 -0.9178
20 166 162.0 4.000 4.800 0.5329 0.4908
21 149 162.0 -13.000 -15.600 -1.7321 -2.4495
22 164 162.0 2.000 2.400 0.2665 0.2401
23 170 162.0 8.000 9.600 1.0659 1.0845
24 168 162.0 6.000 7.200 0.7994 0.7657


Observations 6, 9, 14, and 21 have large studentized residuals (Output 59.7.6). That the externally studentized residuals are much larger than the internally studentized residuals for these observations indicates that the variance estimate in the group shrinks when the observation is removed. Also important to note is that comparisons based on raw residuals in models with heterogeneous variance can be misleading. Observation 5, for example, has a larger residual but a smaller studentized residual than observation 21. The variance for the first fat type is much larger than the variance in the fourth group. A large residual is more surprising in the groups with small variance.

A measure of the overall influence on the analysis is the (restricted) likelihood distance, shown in Output 59.7.7. Observations 6, 9, 14, and 21 clearly displace the REML solution more than any other observations.

Output 59.7.7: Restricted Likelihood Distance

 Restricted Likelihood Distance


The following statements list the restricted likelihood distance and various diagnostics related to the fixed-effects estimates (Output 59.7.8):

proc print data=inf label;
   var leverage observed CookD DFFITS CovRatio RLD;
run;

Output 59.7.8: Restricted Likelihood Distance and Fixed-Effects Diagnostics

Obs Leverage Observed
Value
Cook's D DFFITS COVRATIO Restr. Likelihood
Distance
1 0.167 164 0.02157 -0.27487 1.3706 0.1178
2 0.167 172 0.00000 -0.00000 1.4998 0.1156
3 0.167 168 0.00539 -0.13282 1.4675 0.1124
4 0.167 177 0.00843 0.16706 1.4494 0.1117
5 0.167 156 0.08629 -0.64938 0.9822 0.5290
6 0.167 195 0.17831 1.41069 0.4301 5.8101
7 0.167 178 0.04868 -0.43982 1.2078 0.1935
8 0.167 191 0.03576 0.36546 1.2853 0.1451
9 0.167 197 0.14305 1.03446 0.6416 2.2909
10 0.167 182 0.00894 -0.17225 1.4463 0.1116
11 0.167 185 0.00000 -0.00000 1.4998 0.1156
12 0.167 177 0.06358 -0.52239 1.1183 0.2856
13 0.167 175 0.00061 -0.04441 1.4961 0.1151
14 0.167 193 0.17766 1.40175 0.4340 5.7044
15 0.167 178 0.00246 0.08915 1.4851 0.1139
16 0.167 171 0.01537 -0.22892 1.4078 0.1129
17 0.167 163 0.10389 -0.75423 0.8766 0.8433
18 0.167 176 0.00000 0.00000 1.4998 0.1156
19 0.167 155 0.04349 -0.41047 1.2390 0.1710
20 0.167 166 0.01420 0.21950 1.4148 0.1124
21 0.167 149 0.15000 -1.09545 0.6000 2.7343
22 0.167 164 0.00355 0.10736 1.4786 0.1133
23 0.167 170 0.05680 0.48500 1.1592 0.2383
24 0.167 168 0.03195 0.34245 1.3079 0.1353


In this example, observations with large likelihood distances also have large values for Cook’s D and values of CovRatio far less than one (Output 59.7.8). The latter indicates that the fixed effects are estimated more precisely when these observations are removed from the analysis.

The following statements print the values of the D statistic and the CovRatio for the covariance parameters:

proc print data=inf label;
   var iter CookDCP CovRatioCP;
run;

The same conclusions as for the fixed-effects estimates hold for the covariance parameter estimates. Observations 6, 9, 14, and 21 change the estimates and their precision considerably (Output 59.7.9, Output 59.7.10). All iterative updates converged within at most four iterations.

Output 59.7.9: Covariance Parameter Diagnostics

Obs Iterations Cook's D CovParms COVRATIO CovParms
1 3 0.05050 1.6306
2 3 0.15603 1.9520
3 3 0.12426 1.8692
4 3 0.10796 1.8233
5 4 0.08232 0.8375
6 4 1.02909 0.1606
7 1 0.00011 1.2662
8 2 0.01262 1.4335
9 3 0.54126 0.3573
10 3 0.10531 1.8156
11 3 0.15603 1.9520
12 2 0.01160 1.0849
13 3 0.15223 1.9425
14 4 1.01865 0.1635
15 3 0.14111 1.9141
16 3 0.07494 1.7203
17 3 0.18154 0.6671
18 3 0.15603 1.9520
19 2 0.00265 1.3326
20 3 0.08008 1.7374
21 1 0.62500 0.3125
22 3 0.13472 1.8974
23 2 0.00290 1.1663
24 2 0.02020 1.4839


Output 59.7.10 displays the standard panel of influence diagnostics that is obtained when influence analysis is iterative. The Cook’s D and CovRatio statistics are displayed for each deletion set for both fixed-effects and covariance parameter estimates. This provides a convenient summary of the impact on the analysis for each deletion set, since Cook’s D statistic measures impact on the estimates and the CovRatio statistic measures impact on the precision of the estimates.

Output 59.7.10: Influence Diagnostics

 Influence Diagnostics


Observations 6, 9, 14, and 21 have considerable impact on estimates and precision of fixed effects and covariance parameters. This is not necessarily the case. Observations can be influential on only some aspects of the analysis, as shown in the next example.