Fisher’s iris data are widely used in multivariate statistics. They comprise measurements in millimeters of four flower attributes, the length and width of sepals and petals for 50 specimens from each of three species, Iris setosa, I. versicolor, and I. virginica (Fisher, 1936).
When modeling multiple attributes from the same specimen, correlations among measurements from the same flower must be taken into account. Unstructured covariance matrices are common in this multivariate setting. Species comparisons can focus on comparisons of mean response, but comparisons of the variation and covariation are also of interest. In this example, the equivalence of covariance and correlation matrices among the species are examined.
The iris
data set is available in the Sashelp
library. The following step displays the first 10 observations of the iris
data in multivariate format—that is, each observation contains multiple response variables. The DATA step that follows creates
a data set in univariate form, where each observation corresponds to a single response variable. This is the form needed by
the GLIMMIX procedure.
proc print data=Sashelp.iris(obs=10); run;
Output 41.9.1: Fisher (1936) Iris Data
Obs | Species | SepalLength | SepalWidth | PetalLength | PetalWidth |
---|---|---|---|---|---|
1 | Setosa | 50 | 33 | 14 | 2 |
2 | Setosa | 46 | 34 | 14 | 3 |
3 | Setosa | 46 | 36 | 10 | 2 |
4 | Setosa | 51 | 33 | 17 | 5 |
5 | Setosa | 55 | 35 | 13 | 2 |
6 | Setosa | 48 | 31 | 16 | 2 |
7 | Setosa | 52 | 34 | 14 | 2 |
8 | Setosa | 49 | 36 | 14 | 1 |
9 | Setosa | 44 | 32 | 13 | 2 |
10 | Setosa | 50 | 35 | 16 | 6 |
data iris_univ; set sashelp.iris; retain id 0; array y (4) SepalLength SepalWidth PetalLength PetalWidth; id+1; do var=1 to 4; response = y{var}; output; end; drop SepalLength SepalWidth PetalLength PetalWidth:; run;
The following GLIMMIX statements fit a model with separate unstructured covariance matrices for each species:
ods select FitStatistics CovParms CovTests; proc glimmix data=iris_univ; class species var id; model response = species*var; random _residual_ / type=un group=species subject=id; covtest homogeneity; run;
The mean function is modeled as a cell-means model that allows for different means for each species and outcome variable.
The covariances are modeled directly (R-side) rather than through random effects. The ID
variable identifies the individual plant, so that responses from different plants are independent. The GROUP=SPECIES option varies the parameters of the unstructured covariance matrix by species. Hence, this model has 30 covariance
parameters: 10 unique parameters for a covariance matrix for each of three species.
The COVTEST statement requests a test of homogeneity—that is, it tests whether varying the covariance parameters by the group effect provides a significantly better fit compared to a model in which different groups share the same parameter.
Output 41.9.2: Fit Statistics for Analysis of Fisher’s Iris Data
Fit Statistics | |
---|---|
-2 Res Log Likelihood | 2812.89 |
AIC (smaller is better) | 2872.89 |
AICC (smaller is better) | 2876.23 |
BIC (smaller is better) | 2963.21 |
CAIC (smaller is better) | 2993.21 |
HQIC (smaller is better) | 2909.58 |
Generalized Chi-Square | 588.00 |
Gener. Chi-Square / DF | 1.00 |
The “Fit Statistics” table shows the –2 restricted (residual) log likelihood in the full model and other fit statistics (Output 41.9.2). The “-2 Res Log Likelihood” sets the benchmark against which a model with homogeneity constraint is compared. Output 41.9.3 displays the 30 covariance parameters in this model.
There appear to be substantial differences among the covariance parameters from different groups. For example, the residual variability of the petal length of the three species is 12.4249, 26.6433, and 40.4343, respectively. The homogeneity hypothesis restricts these variances to be equal and similarly for the other covariance parameters. The results from the COVTEST statement are shown in Output 41.9.4.
Output 41.9.3: Covariance Parameters Varied by Species (TYPE=UN)
Covariance Parameter Estimates | ||||
---|---|---|---|---|
Cov Parm | Subject | Group | Estimate | Standard Error |
UN(1,1) | id | Species Setosa | 12.4249 | 2.5102 |
UN(2,1) | id | Species Setosa | 9.9216 | 2.3775 |
UN(2,2) | id | Species Setosa | 14.3690 | 2.9030 |
UN(3,1) | id | Species Setosa | 1.6355 | 0.9052 |
UN(3,2) | id | Species Setosa | 1.1698 | 0.9552 |
UN(3,3) | id | Species Setosa | 3.0159 | 0.6093 |
UN(4,1) | id | Species Setosa | 1.0331 | 0.5508 |
UN(4,2) | id | Species Setosa | 0.9298 | 0.5859 |
UN(4,3) | id | Species Setosa | 0.6069 | 0.2755 |
UN(4,4) | id | Species Setosa | 1.1106 | 0.2244 |
UN(1,1) | id | Species Versicolor | 26.6433 | 5.3828 |
UN(2,1) | id | Species Versicolor | 8.5184 | 2.6144 |
UN(2,2) | id | Species Versicolor | 9.8469 | 1.9894 |
UN(3,1) | id | Species Versicolor | 18.2898 | 4.3398 |
UN(3,2) | id | Species Versicolor | 8.2653 | 2.4149 |
UN(3,3) | id | Species Versicolor | 22.0816 | 4.4612 |
UN(4,1) | id | Species Versicolor | 5.5780 | 1.6617 |
UN(4,2) | id | Species Versicolor | 4.1204 | 1.0641 |
UN(4,3) | id | Species Versicolor | 7.3102 | 1.6891 |
UN(4,4) | id | Species Versicolor | 3.9106 | 0.7901 |
UN(1,1) | id | Species Virginica | 40.4343 | 8.1690 |
UN(2,1) | id | Species Virginica | 9.3763 | 3.2213 |
UN(2,2) | id | Species Virginica | 10.4004 | 2.1012 |
UN(3,1) | id | Species Virginica | 30.3290 | 6.6262 |
UN(3,2) | id | Species Virginica | 7.1380 | 2.7395 |
UN(3,3) | id | Species Virginica | 30.4588 | 6.1536 |
UN(4,1) | id | Species Virginica | 4.9094 | 2.5916 |
UN(4,2) | id | Species Virginica | 4.7629 | 1.4367 |
UN(4,3) | id | Species Virginica | 4.8824 | 2.2750 |
UN(4,4) | id | Species Virginica | 7.5433 | 1.5240 |
Output 41.9.4: Likelihood Ratio Test of Homogeneity
Tests of Covariance Parameters Based on the Restricted Likelihood |
|||||
---|---|---|---|---|---|
Label | DF | -2 Res Log Like | ChiSq | Pr > ChiSq | Note |
Homogeneity | 20 | 2959.55 | 146.66 | <.0001 | DF |
Denote as the covariance matrix for species with elements . In processing the COVTEST hypothesis , the GLIMMIX procedure fits a model that satisfies the constraints
|
|
|
|
|
|
|
|
|
|
where is the covariance between the ith and jth variable for the kth species. The –2 restricted log likelihood of this restricted model is 2959.55 (Output 41.9.4). The change of 146.66 compared to the full model is highly significant. There is sufficient evidence to reject the notion of equal covariance matrices among the three iris species.
Equality of covariance matrices implies equality of correlation matrices, but the reverse is not true. Fewer constraints are needed to equate correlations because the diagonal entries of the covariance matrices are free to vary. In order to test the equality of the correlation matrices among the three species, you can parameterize the unstructured covariance matrix in terms of the correlations and use a COVTEST statement with general contrasts, as shown in the following statements:
ods select FitStatistics CovParms CovTests; proc glimmix data=iris_univ; class species var id; model response = species*var; random _residual_ / type=unr group=species subject=id; covtest 'Equal Covariance Matrices' homogeneity; covtest 'Equal Correlation Matrices' general 0 0 0 0 1 0 0 0 0 0 0 0 0 0 -1 0 0 0 0 0, 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -1 0 0 0 0 0, 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 -1 0 0 0 0, 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -1 0 0 0 0, 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 -1 0 0 0, 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -1 0 0 0, 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 -1 0 0, 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -1 0 0, 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 -1 0, 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -1 0, 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 -1, 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -1 / estimates; run;
The TYPE=UNR structure is a reparameterization of TYPE=UN. The models provide the same fit, as seen by comparison of the “Fit Statistics” tables in Output 41.9.2 and Output 41.9.5. The covariance parameters are ordered differently, however. In each group, the four variances precede the six correlations (Output 41.9.5). The first COVTEST statement tests the homogeneity hypothesis in terms of the UNR parameterization, and the result is identical to the test in Output 41.9.4. The second COVTEST statement restricts the correlations to be equal across groups. If is the correlation between the ith and jth variable for the kth species, the 12 restrictions are
|
|
|
|
|
|
|
|
|
|
|
|
The ESTIMATES option in the COVTEST statement requests that the GLIMMIX procedure display the covariance parameter estimates in the restricted model (Output 41.9.5).
Output 41.9.5: Fit Statistics, Covariance Parameters (TYPE=UNR), and Likelihood Ratio Tests for Equality of Covariance and Correlation Matrices
Fit Statistics | |
---|---|
-2 Res Log Likelihood | 2812.89 |
AIC (smaller is better) | 2872.89 |
AICC (smaller is better) | 2876.23 |
BIC (smaller is better) | 2963.21 |
CAIC (smaller is better) | 2993.21 |
HQIC (smaller is better) | 2909.58 |
Generalized Chi-Square | 588.00 |
Gener. Chi-Square / DF | 1.00 |
Covariance Parameter Estimates | ||||
---|---|---|---|---|
Cov Parm | Subject | Group | Estimate | Standard Error |
Var(1) | id | Species Setosa | 12.4249 | 2.5102 |
Var(2) | id | Species Setosa | 14.3690 | 2.9030 |
Var(3) | id | Species Setosa | 3.0159 | 0.6093 |
Var(4) | id | Species Setosa | 1.1106 | 0.2244 |
Corr(2,1) | id | Species Setosa | 0.7425 | 0.06409 |
Corr(3,1) | id | Species Setosa | 0.2672 | 0.1327 |
Corr(3,2) | id | Species Setosa | 0.1777 | 0.1383 |
Corr(4,1) | id | Species Setosa | 0.2781 | 0.1318 |
Corr(4,2) | id | Species Setosa | 0.2328 | 0.1351 |
Corr(4,3) | id | Species Setosa | 0.3316 | 0.1271 |
Var(1) | id | Species Versicolor | 26.6433 | 5.3828 |
Var(2) | id | Species Versicolor | 9.8469 | 1.9894 |
Var(3) | id | Species Versicolor | 22.0816 | 4.4612 |
Var(4) | id | Species Versicolor | 3.9106 | 0.7901 |
Corr(2,1) | id | Species Versicolor | 0.5259 | 0.1033 |
Corr(3,1) | id | Species Versicolor | 0.7540 | 0.06163 |
Corr(3,2) | id | Species Versicolor | 0.5605 | 0.09797 |
Corr(4,1) | id | Species Versicolor | 0.5465 | 0.1002 |
Corr(4,2) | id | Species Versicolor | 0.6640 | 0.07987 |
Corr(4,3) | id | Species Versicolor | 0.7867 | 0.05445 |
Var(1) | id | Species Virginica | 40.4343 | 8.1690 |
Var(2) | id | Species Virginica | 10.4004 | 2.1012 |
Var(3) | id | Species Virginica | 30.4588 | 6.1536 |
Var(4) | id | Species Virginica | 7.5433 | 1.5240 |
Corr(2,1) | id | Species Virginica | 0.4572 | 0.1130 |
Corr(3,1) | id | Species Virginica | 0.8642 | 0.03616 |
Corr(3,2) | id | Species Virginica | 0.4010 | 0.1199 |
Corr(4,1) | id | Species Virginica | 0.2811 | 0.1316 |
Corr(4,2) | id | Species Virginica | 0.5377 | 0.1015 |
Corr(4,3) | id | Species Virginica | 0.3221 | 0.1280 |
Tests of Covariance Parameters Based on the Restricted Likelihood |
|||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Label | DF | -2 Res Log Like | ChiSq | Pr > ChiSq | Estimates H0 | Note | |||||||||||||||||||||||||||||
Est1 | Est2 | Est3 | Est4 | Est5 | Est6 | Est7 | Est8 | Est9 | Est10 | Est11 | Est12 | Est13 | Est14 | Est15 | Est16 | Est17 | Est18 | Est19 | Est20 | Est21 | Est22 | Est23 | Est24 | Est25 | Est26 | Est27 | Est28 | Est29 | Est30 | ||||||
Equal Covariance Matrices | 20 | 2959.55 | 146.66 | <.0001 | 26.5004 | 11.5395 | 18.5179 | 4.1883 | 0.5302 | 0.7562 | 0.3779 | 0.3645 | 0.4705 | 0.4845 | 26.5004 | 11.5395 | 18.5179 | 4.1883 | 0.5302 | 0.7562 | 0.3779 | 0.3645 | 0.4705 | 0.4845 | 26.5004 | 11.5395 | 18.5179 | 4.1883 | 0.5302 | 0.7562 | 0.3779 | 0.3645 | 0.4705 | 0.4845 | DF |
Equal Correlation Matrices | 12 | 2876.38 | 63.49 | <.0001 | 16.4715 | 14.8656 | 4.8427 | 1.4392 | 0.5612 | 0.6827 | 0.4016 | 0.3844 | 0.4976 | 0.5219 | 24.4020 | 9.1566 | 17.4434 | 3.0021 | 0.5612 | 0.6827 | 0.4016 | 0.3844 | 0.4976 | 0.5219 | 35.0544 | 10.8350 | 27.3593 | 8.1395 | 0.5612 | 0.6827 | 0.4016 | 0.3844 | 0.4976 | 0.5219 | DF |
The result of the homogeneity test is identical to that in Output 41.9.4. The hypothesis of equality of the correlation matrices is also rejected with a chi-square value of 63.49 and a p-value of . Notice, however, that the chi-square statistic is smaller than in the test of homogeneity due to the smaller number of restrictions imposed on the full model. The estimate of the common correlation matrix in the restricted model is