Previous Page | Next Page

The GLIMMIX Procedure

Example 38.9 Testing Equality of Covariance and Correlation Matrices

Fisher’s iris data are widely used in multivariate statistics. They comprise measurements in millimeters of four flower attributes, the length and width of sepals and petals for 50 specimens from each of three species, Iris setosa, I. versicolor, and I. virginica (Fisher 1936).

When modeling multiple attributes from the same specimen, correlations among measurements from the same flower must be taken into account. Unstructured covariance matrices are common in this multivariate setting. Species comparisons can focus on comparisons of mean response, but comparisons of the variation and covariation are also of interest. In this example, the equivalence of covariance and correlation matrices among the species are examined.

The following DATA step creates the iris data in multivariate format—that is, each observation contains multiple response variables. The subsequent DATA step creates a data set in univariate form, where each observation corresponds to a single response variable. This is the form needed by the GLIMMIX procedure.

proc format;
   value specname
      1='Setosa    '
      2='Versicolor'
      3='Virginica ';
   value variable
      1='Sepal Length'
      2='Sepal Width '
      3='Petal Length'
      4='Petal Width';
run;

data iris;
   input Y1 Y2 Y3 Y4 Species @@;
   format Species specname.;
   label Y1 = 'Sepal Length (mm)'
         Y2 = 'Sepal Width (mm)'
         Y3 = 'Petal Length (mm)'
         Y4 = 'Petal Width (mm)';
   datalines;
50 33 14 02 1 64 28 56 22 3 65 28 46 15 2 67 31 56 24 3
63 28 51 15 3 46 34 14 03 1 69 31 51 23 3 62 22 45 15 2
59 32 48 18 2 46 36 10 02 1 61 30 46 14 2 60 27 51 16 2
65 30 52 20 3 56 25 39 11 2 65 30 55 18 3 58 27 51 19 3
68 32 59 23 3 51 33 17 05 1 57 28 45 13 2 62 34 54 23 3
77 38 67 22 3 63 33 47 16 2 67 33 57 25 3 76 30 66 21 3
49 25 45 17 3 55 35 13 02 1 67 30 52 23 3 70 32 47 14 2
64 32 45 15 2 61 28 40 13 2 48 31 16 02 1 59 30 51 18 3
55 24 38 11 2 63 25 50 19 3 64 32 53 23 3 52 34 14 02 1
49 36 14 01 1 54 30 45 15 2 79 38 64 20 3 44 32 13 02 1
67 33 57 21 3 50 35 16 06 1 58 26 40 12 2 44 30 13 02 1
77 28 67 20 3 63 27 49 18 3 47 32 16 02 1 55 26 44 12 2
50 23 33 10 2 72 32 60 18 3 48 30 14 03 1 51 38 16 02 1
61 30 49 18 3 48 34 19 02 1 50 30 16 02 1 50 32 12 02 1
61 26 56 14 3 64 28 56 21 3 43 30 11 01 1 58 40 12 02 1
51 38 19 04 1 67 31 44 14 2 62 28 48 18 3 49 30 14 02 1
51 35 14 02 1 56 30 45 15 2 58 27 41 10 2 50 34 16 04 1
46 32 14 02 1 60 29 45 15 2 57 26 35 10 2 57 44 15 04 1
50 36 14 02 1 77 30 61 23 3 63 34 56 24 3 58 27 51 19 3
57 29 42 13 2 72 30 58 16 3 54 34 15 04 1 52 41 15 01 1
71 30 59 21 3 64 31 55 18 3 60 30 48 18 3 63 29 56 18 3
49 24 33 10 2 56 27 42 13 2 57 30 42 12 2 55 42 14 02 1
49 31 15 02 1 77 26 69 23 3 60 22 50 15 3 54 39 17 04 1
66 29 46 13 2 52 27 39 14 2 60 34 45 16 2 50 34 15 02 1
44 29 14 02 1 50 20 35 10 2 55 24 37 10 2 58 27 39 12 2
47 32 13 02 1 46 31 15 02 1 69 32 57 23 3 62 29 43 13 2
74 28 61 19 3 59 30 42 15 2 51 34 15 02 1 50 35 13 03 1
56 28 49 20 3 60 22 40 10 2 73 29 63 18 3 67 25 58 18 3
49 31 15 01 1 67 31 47 15 2 63 23 44 13 2 54 37 15 02 1
56 30 41 13 2 63 25 49 15 2 61 28 47 12 2 64 29 43 13 2
51 25 30 11 2 57 28 41 13 2 65 30 58 22 3 69 31 54 21 3
54 39 13 04 1 51 35 14 03 1 72 36 61 25 3 65 32 51 20 3
61 29 47 14 2 56 29 36 13 2 69 31 49 15 2 64 27 53 19 3
68 30 55 21 3 55 25 40 13 2 48 34 16 02 1 48 30 14 01 1
45 23 13 03 1 57 25 50 20 3 57 38 17 03 1 51 38 15 03 1
55 23 40 13 2 66 30 44 14 2 68 28 48 14 2 54 34 17 02 1
51 37 15 04 1 52 35 15 02 1 58 28 51 24 3 67 30 50 17 2
63 33 60 25 3 53 37 15 02 1
;
data iris_univ;
   set iris;
   retain id 0;
   array y{4};
   format var variable.;
   id+1;
   do var=1 to 4;
      response = y{var};
      output;
   end;
   drop y:;
run;

The following GLIMMIX statements fit a model with separate unstructured covariance matrices for each species:

ods select FitStatistics CovParms CovTests;
proc glimmix data=iris_univ;
   class species var id;
   model response = species*var;
   random _residual_ / type=un group=species subject=id;
   covtest homogeneity;
run;

The mean function is modeled as a cell-means model that allows for different means for each species and outcome variable. The covariances are modeled directly (R-side) rather than through random effects. The ID variable identifies the individual plant, so that responses from different plants are independent. The GROUP=SPECIES option varies the parameters of the unstructured covariance matrix by species. Hence, this model has covariance parameters: unique parameters for a covariance matrix for each of three species.

The COVTEST statement requests a test of homogeneity—that is, it tests whether varying the covariance parameters by the group effect provides a significantly better fit compared to a model in which different groups share the same parameter.

Output 38.9.1 Fit Statistics for Analysis of Fisher’s Iris Data
The GLIMMIX Procedure

Fit Statistics
-2 Res Log Likelihood 2812.89
AIC (smaller is better) 2872.89
AICC (smaller is better) 2876.23
BIC (smaller is better) 2963.21
CAIC (smaller is better) 2993.21
HQIC (smaller is better) 2909.58
Generalized Chi-Square 588.00
Gener. Chi-Square / DF 1.00

The "Fit Statistics" table shows the restricted (residual) log likelihood in the full model and other fit statistics (Output 38.9.1). The "-2 Res Log Likelihood" sets the benchmark against which a model with homogeneity constraint is compared. Output 38.9.2 displays the 30 covariance parameters in this model.

There appear to be substantial differences among the covariance parameters from different groups. For example, the residual variability of the petal length of the three species is 12.4249, 26.6433, and 40.4343, respectively. The homogeneity hypothesis restricts these variances to be equal and similarly for the other covariance parameters. The results from the COVTEST statement are shown in Output 38.9.3.

Output 38.9.2 Covariance Parameters Varied by Species (TYPE=UN)
Covariance Parameter Estimates
Cov Parm Subject Group Estimate Standard Error
UN(1,1) id Species Setosa 12.4249 2.5102
UN(2,1) id Species Setosa 9.9216 2.3775
UN(2,2) id Species Setosa 14.3690 2.9030
UN(3,1) id Species Setosa 1.6355 0.9052
UN(3,2) id Species Setosa 1.1698 0.9552
UN(3,3) id Species Setosa 3.0159 0.6093
UN(4,1) id Species Setosa 1.0331 0.5508
UN(4,2) id Species Setosa 0.9298 0.5859
UN(4,3) id Species Setosa 0.6069 0.2755
UN(4,4) id Species Setosa 1.1106 0.2244
UN(1,1) id Species Versicolor 26.6433 5.3828
UN(2,1) id Species Versicolor 8.5184 2.6144
UN(2,2) id Species Versicolor 9.8469 1.9894
UN(3,1) id Species Versicolor 18.2898 4.3398
UN(3,2) id Species Versicolor 8.2653 2.4149
UN(3,3) id Species Versicolor 22.0816 4.4612
UN(4,1) id Species Versicolor 5.5780 1.6617
UN(4,2) id Species Versicolor 4.1204 1.0641
UN(4,3) id Species Versicolor 7.3102 1.6891
UN(4,4) id Species Versicolor 3.9106 0.7901
UN(1,1) id Species Virginica 40.4343 8.1690
UN(2,1) id Species Virginica 9.3763 3.2213
UN(2,2) id Species Virginica 10.4004 2.1012
UN(3,1) id Species Virginica 30.3290 6.6262
UN(3,2) id Species Virginica 7.1380 2.7395
UN(3,3) id Species Virginica 30.4588 6.1536
UN(4,1) id Species Virginica 4.9094 2.5916
UN(4,2) id Species Virginica 4.7629 1.4367
UN(4,3) id Species Virginica 4.8824 2.2750
UN(4,4) id Species Virginica 7.5433 1.5240

Output 38.9.3 Likelihood Ratio Test of Homogeneity
Tests of Covariance Parameters
Based on the Restricted Likelihood
Label DF -2 Res Log Like ChiSq Pr > ChiSq Note
Homogeneity 20 2959.55 146.66 <.0001 DF

DF: P-value based on a chi-square with DF degrees of freedom.


Denote as the covariance matrix for species with elements . In processing the COVTEST hypothesis , the GLIMMIX procedure fits a model that satistfies the constraints

     
     
     
     
     

where is the covariance between the th and th variable for the th species. The 2 restricted log likelihood of this restricted model is 2959.55 (Output 38.9.3). The change of 146.66 compared to the full model is highly significant. There is sufficient evidence to reject the notion of equal covariance matrices among the three iris species.

Equality of covariance matrices implies equality of correlation matrices, but the reverse is not true. Fewer constraints are needed to equate correlations because the diagonal entries of the covariance matrices are free to vary. In order to test the equality of the correlation matrices among the three species, you can parameterize the unstructured covariance matrix in terms of the correlations and use a COVTEST statement with general contrasts, as shown in the following statements:

ods select FitStatistics CovParms CovTests;
proc glimmix data=iris_univ;
   class species var id;
   model response = species*var;
   random _residual_ / type=unr group=species subject=id;
   covtest 'Equal Covariance Matrices'  homogeneity;
   covtest 'Equal Correlation Matrices' general
        0  0  0  0  1  0  0  0  0  0
        0  0  0  0 -1  0  0  0  0  0,
        0  0  0  0  1  0  0  0  0  0
        0  0  0  0  0  0  0  0  0  0
        0  0  0  0 -1  0  0  0  0  0,
        0  0  0  0  0  1  0  0  0  0
        0  0  0  0  0 -1  0  0  0  0,
        0  0  0  0  0  1  0  0  0  0
        0  0  0  0  0  0  0  0  0  0
        0  0  0  0  0 -1  0  0  0  0,
        0  0  0  0  0  0  1  0  0  0
        0  0  0  0  0  0 -1  0  0  0,
        0  0  0  0  0  0  1  0  0  0
        0  0  0  0  0  0  0  0  0  0
        0  0  0  0  0  0 -1  0  0  0,
        0  0  0  0  0  0  0  1  0  0
        0  0  0  0  0  0  0 -1  0  0,
        0  0  0  0  0  0  0  1  0  0
        0  0  0  0  0  0  0  0  0  0
        0  0  0  0  0  0  0 -1  0  0,
        0  0  0  0  0  0  0  0  1  0
        0  0  0  0  0  0  0  0 -1  0,
        0  0  0  0  0  0  0  0  1  0
        0  0  0  0  0  0  0  0  0  0
        0  0  0  0  0  0  0  0  -1 0,
        0  0  0  0  0  0  0  0  0  1
        0  0  0  0  0  0  0  0  0 -1,
        0  0  0  0  0  0  0  0  0  1
        0  0  0  0  0  0  0  0  0  0
        0  0  0  0  0  0  0  0  0 -1 / estimates;
run;

The TYPE=UNR structure is a reparameterization of TYPE=UN. The models provide the same fit, as seen by comparison of the "Fit Statistics" tables in Output 38.9.1 and Output 38.9.4. The covariance parameters are ordered differently, however. In each group, the four variances precede the six correlations (Output 38.9.4). The first COVTEST statement tests the homogeneity hypothesis in terms of the UNR parameterization, and the result is identical to the test in Output 38.9.3. The second COVTEST statement restricts the correlations to be equal across groups. If is the correlation between the th and th variable for the th species, the 12 restrictions are

     
     
     
     
     
     

The ESTIMATES option in the COVTEST statement requests that the GLIMMIX procedure display the covariance parameter estimates in the restricted model (Output 38.9.4).

Output 38.9.4 Fit Statistics, Covariance Parameters (TYPE=UNR), and Likelihood Ratio Tests for Equality of Covariance and Correlation Matrices
The GLIMMIX Procedure

Fit Statistics
-2 Res Log Likelihood 2812.89
AIC (smaller is better) 2872.89
AICC (smaller is better) 2876.23
BIC (smaller is better) 2963.21
CAIC (smaller is better) 2993.21
HQIC (smaller is better) 2909.58
Generalized Chi-Square 588.00
Gener. Chi-Square / DF 1.00

Covariance Parameter Estimates
Cov Parm Subject Group Estimate Standard Error
Var(1) id Species Setosa 12.4249 2.5102
Var(2) id Species Setosa 14.3690 2.9030
Var(3) id Species Setosa 3.0159 0.6093
Var(4) id Species Setosa 1.1106 0.2244
Corr(2,1) id Species Setosa 0.7425 0.06409
Corr(3,1) id Species Setosa 0.2672 0.1327
Corr(3,2) id Species Setosa 0.1777 0.1383
Corr(4,1) id Species Setosa 0.2781 0.1318
Corr(4,2) id Species Setosa 0.2328 0.1351
Corr(4,3) id Species Setosa 0.3316 0.1271
Var(1) id Species Versicolor 26.6433 5.3828
Var(2) id Species Versicolor 9.8469 1.9894
Var(3) id Species Versicolor 22.0816 4.4612
Var(4) id Species Versicolor 3.9106 0.7901
Corr(2,1) id Species Versicolor 0.5259 0.1033
Corr(3,1) id Species Versicolor 0.7540 0.06163
Corr(3,2) id Species Versicolor 0.5605 0.09797
Corr(4,1) id Species Versicolor 0.5465 0.1002
Corr(4,2) id Species Versicolor 0.6640 0.07987
Corr(4,3) id Species Versicolor 0.7867 0.05445
Var(1) id Species Virginica 40.4343 8.1690
Var(2) id Species Virginica 10.4004 2.1012
Var(3) id Species Virginica 30.4588 6.1536
Var(4) id Species Virginica 7.5433 1.5240
Corr(2,1) id Species Virginica 0.4572 0.1130
Corr(3,1) id Species Virginica 0.8642 0.03616
Corr(3,2) id Species Virginica 0.4010 0.1199
Corr(4,1) id Species Virginica 0.2811 0.1316
Corr(4,2) id Species Virginica 0.5377 0.1015
Corr(4,3) id Species Virginica 0.3221 0.1280

Tests of Covariance Parameters
Based on the Restricted Likelihood
Label DF -2 Res Log Like ChiSq Pr > ChiSq Estimates H0 Note
Est1 Est2 Est3 Est4 Est5 Est6 Est7 Est8 Est9 Est10 Est11 Est12 Est13 Est14 Est15 Est16 Est17 Est18 Est19 Est20 Est21 Est22 Est23 Est24 Est25 Est26 Est27 Est28 Est29 Est30
Equal Covariance Matrices 20 2959.55 146.66 <.0001 26.5 11.5 18.52 4.19 0.530 0.756 0.378 0.365 0.471 0.484 26.5 11.54 18.5 4.19 0.530 0.756 0.378 0.365 0.471 0.484 26.5 11.5 18.5 4.19 0.530 0.756 0.378 0.365 0.471 0.484 DF
Equal Correlation Matrices 12 2876.38 63.49 <.0001 16.5 14.9 4.84 1.44 0.561 0.683 0.402 0.384 0.498 0.522 24.4 9.16 17.4 3.00 0.561 0.683 0.402 0.384 0.498 0.522 35.1 10.8 27.4 8.14 0.561 0.683 0.402 0.384 0.498 0.522 DF

DF: P-value based on a chi-square with DF degrees of freedom.


The result of the homogeneity test is identical to that in Output 38.9.3. The hypothesis of equality of the correlation matrices is also rejected with a chi-square value of and a p-value of . Notice, however, that the chi-square statistic is smaller than in the test of homogeneity due to the smaller number of restrictions imposed on the full model. The estimate of the common correlation matrix in the restricted model is

     
Previous Page | Next Page | Top of Page