Example 41.6 Multivariate Analysis of Variance

This example employs multivariate analysis of variance (MANOVA) to measure differences in the chemical characteristics of ancient pottery found at four kiln sites in Great Britain. The data are from Tubb, Parker, and Nickless (1980), as reported in Hand et al. (1994).

For each of 26 samples of pottery, the percentages of oxides of five metals are measured. The following statements create the data set and invoke the GLM procedure to perform a one-way MANOVA. Additionally, it is of interest to know whether the pottery from one site in Wales (Llanederyn) differs from the samples from other sites; a CONTRAST statement is used to test this hypothesis.

title "Romano-British Pottery";
data pottery;
   input Site $12. Al Fe Mg Ca Na;
   datalines;
Llanederyn   14.4 7.00 4.30 0.15 0.51
Llanederyn   13.8 7.08 3.43 0.12 0.17
Llanederyn   14.6 7.09 3.88 0.13 0.20
Llanederyn   11.5 6.37 5.64 0.16 0.14
Llanederyn   13.8 7.06 5.34 0.20 0.20
Llanederyn   10.9 6.26 3.47 0.17 0.22
Llanederyn   10.1 4.26 4.26 0.20 0.18
Llanederyn   11.6 5.78 5.91 0.18 0.16
Llanederyn   11.1 5.49 4.52 0.29 0.30
Llanederyn   13.4 6.92 7.23 0.28 0.20
Llanederyn   12.4 6.13 5.69 0.22 0.54
Llanederyn   13.1 6.64 5.51 0.31 0.24
Llanederyn   12.7 6.69 4.45 0.20 0.22
Llanederyn   12.5 6.44 3.94 0.22 0.23
Caldicot     11.8 5.44 3.94 0.30 0.04
Caldicot     11.6 5.39 3.77 0.29 0.06
IslandThorns 18.3 1.28 0.67 0.03 0.03
IslandThorns 15.8 2.39 0.63 0.01 0.04
IslandThorns 18.0 1.50 0.67 0.01 0.06
IslandThorns 18.0 1.88 0.68 0.01 0.04
IslandThorns 20.8 1.51 0.72 0.07 0.10
AshleyRails  17.7 1.12 0.56 0.06 0.06
AshleyRails  18.3 1.14 0.67 0.06 0.05
AshleyRails  16.7 0.92 0.53 0.01 0.05
AshleyRails  14.8 2.74 0.67 0.03 0.05
AshleyRails  19.1 1.64 0.60 0.10 0.03
;
proc glm data=pottery;
   class Site;
   model Al Fe Mg Ca Na = Site;
   contrast 'Llanederyn vs. the rest' Site 1 1 1 -3;
   manova h=_all_ / printe printh;
run;

After the summary information, displayed in Output 41.6.1, PROC GLM produces the univariate analyses for each of the dependent variables, as shown in Output 41.6.2 through Output 41.6.6. These analyses show that sites are significantly different for all oxides individually. You can suppress these univariate analyses by specifying the NOUNI option in the MODEL statement.

Output 41.6.1 Summary Information about Groups
Romano-British Pottery

The GLM Procedure

Class Level Information
Class Levels Values
Site 4 AshleyRails Caldicot IslandThorns Llanederyn

Number of Observations Read 26
Number of Observations Used 26

Output 41.6.2 Univariate Analysis of Variance for Aluminum Oxide
Romano-British Pottery

The GLM Procedure
 
Dependent Variable: Al

Source DF Sum of Squares Mean Square F Value Pr > F
Model 3 175.6103187 58.5367729 26.67 <.0001
Error 22 48.2881429 2.1949156    
Corrected Total 25 223.8984615      

R-Square Coeff Var Root MSE Al Mean
0.784330 10.22284 1.481525 14.49231

Source DF Type I SS Mean Square F Value Pr > F
Site 3 175.6103187 58.5367729 26.67 <.0001

Source DF Type III SS Mean Square F Value Pr > F
Site 3 175.6103187 58.5367729 26.67 <.0001

Contrast DF Contrast SS Mean Square F Value Pr > F
Llanederyn vs. the rest 1 58.58336640 58.58336640 26.69 <.0001

Output 41.6.3 Univariate Analysis of Variance for Iron Oxide
Romano-British Pottery

The GLM Procedure
 
Dependent Variable: Fe

Source DF Sum of Squares Mean Square F Value Pr > F
Model 3 134.2216158 44.7405386 89.88 <.0001
Error 22 10.9508457 0.4977657    
Corrected Total 25 145.1724615      

R-Square Coeff Var Root MSE Fe Mean
0.924567 15.79171 0.705525 4.467692

Source DF Type I SS Mean Square F Value Pr > F
Site 3 134.2216158 44.7405386 89.88 <.0001

Source DF Type III SS Mean Square F Value Pr > F
Site 3 134.2216158 44.7405386 89.88 <.0001

Contrast DF Contrast SS Mean Square F Value Pr > F
Llanederyn vs. the rest 1 71.15144132 71.15144132 142.94 <.0001

Output 41.6.4 Univariate Analysis of Variance for Calcium Oxide
Romano-British Pottery

The GLM Procedure
 
Dependent Variable: Ca

Source DF Sum of Squares Mean Square F Value Pr > F
Model 3 0.20470275 0.06823425 29.16 <.0001
Error 22 0.05148571 0.00234026    
Corrected Total 25 0.25618846      

R-Square Coeff Var Root MSE Ca Mean
0.799032 33.01265 0.048376 0.146538

Source DF Type I SS Mean Square F Value Pr > F
Site 3 0.20470275 0.06823425 29.16 <.0001

Source DF Type III SS Mean Square F Value Pr > F
Site 3 0.20470275 0.06823425 29.16 <.0001

Contrast DF Contrast SS Mean Square F Value Pr > F
Llanederyn vs. the rest 1 0.03531688 0.03531688 15.09 0.0008

Output 41.6.5 Univariate Analysis of Variance for Magnesium Oxide
Romano-British Pottery

The GLM Procedure
 
Dependent Variable: Mg

Source DF Sum of Squares Mean Square F Value Pr > F
Model 3 103.3505270 34.4501757 49.12 <.0001
Error 22 15.4296114 0.7013460    
Corrected Total 25 118.7801385      

R-Square Coeff Var Root MSE Mg Mean
0.870099 26.65777 0.837464 3.141538

Source DF Type I SS Mean Square F Value Pr > F
Site 3 103.3505270 34.4501757 49.12 <.0001

Source DF Type III SS Mean Square F Value Pr > F
Site 3 103.3505270 34.4501757 49.12 <.0001

Contrast DF Contrast SS Mean Square F Value Pr > F
Llanederyn vs. the rest 1 56.59349339 56.59349339 80.69 <.0001

Output 41.6.6 Univariate Analysis of Variance for Sodium Oxide
Romano-British Pottery

The GLM Procedure
 
Dependent Variable: Na

Source DF Sum of Squares Mean Square F Value Pr > F
Model 3 0.25824560 0.08608187 9.50 0.0003
Error 22 0.19929286 0.00905877    
Corrected Total 25 0.45753846      

R-Square Coeff Var Root MSE Na Mean
0.564424 60.06350 0.095178 0.158462

Source DF Type I SS Mean Square F Value Pr > F
Site 3 0.25824560 0.08608187 9.50 0.0003

Source DF Type III SS Mean Square F Value Pr > F
Site 3 0.25824560 0.08608187 9.50 0.0003

Contrast DF Contrast SS Mean Square F Value Pr > F
Llanederyn vs. the rest 1 0.23344446 0.23344446 25.77 <.0001

The PRINTE option in the MANOVA statement displays the elements of the error matrix, also called the Error Sums of Squares and Crossproducts matrix. (See Output 41.6.7.) The diagonal elements of this matrix are the error sums of squares from the corresponding univariate analyses.

The PRINTE option also displays the partial correlation matrix associated with the E matrix. In this example, none of the oxides are very strongly correlated; the strongest correlation () is between magnesium oxide and calcium oxide.

Output 41.6.7 Error SSCP Matrix and Partial Correlations
Romano-British Pottery

The GLM Procedure
Multivariate Analysis of Variance

E = Error SSCP Matrix
  Al Fe Mg Ca Na
Al 48.288142857 7.0800714286 0.6080142857 0.1064714286 0.5889571429
Fe 7.0800714286 10.950845714 0.5270571429 -0.155194286 0.0667585714
Mg 0.6080142857 0.5270571429 15.429611429 0.4353771429 0.0276157143
Ca 0.1064714286 -0.155194286 0.4353771429 0.0514857143 0.0100785714
Na 0.5889571429 0.0667585714 0.0276157143 0.0100785714 0.1992928571

Partial Correlation Coefficients from the Error SSCP Matrix / Prob > |r|
DF = 22 Al Fe Mg Ca Na
Al
1.000000
 
0.307889
0.1529
0.022275
0.9196
0.067526
0.7595
0.189853
0.3856
Fe
0.307889
0.1529
1.000000
 
0.040547
0.8543
-0.206685
0.3440
0.045189
0.8378
Mg
0.022275
0.9196
0.040547
0.8543
1.000000
 
0.488478
0.0180
0.015748
0.9431
Ca
0.067526
0.7595
-0.206685
0.3440
0.488478
0.0180
1.000000
 
0.099497
0.6515
Na
0.189853
0.3856
0.045189
0.8378
0.015748
0.9431
0.099497
0.6515
1.000000
 

The PRINTH option produces the SSCP matrix for the hypotheses being tested (Site and the contrast); see Output 41.6.8 and Output 41.6.9. Since the Type III SS are the highest-level SS produced by PROC GLM by default, and since the HTYPE= option is not specified, the SSCP matrix for Site gives the Type III matrix. The diagonal elements of this matrix are the model sums of squares from the corresponding univariate analyses.

Four multivariate tests are computed, all based on the characteristic roots and vectors of . These roots and vectors are displayed along with the tests. All four tests can be transformed to variates that have distributions under the null hypothesis. Note that the four tests all give the same results for the contrast, since it has only one degree of freedom. In this case, the multivariate analysis matches the univariate results: there is an overall difference between the chemical composition of samples from different sites, and the samples from Llanederyn are different from the average of the other sites.

Output 41.6.8 Hypothesis SSCP Matrix and Multivariate Tests for Overall Site Effect
Romano-British Pottery

The GLM Procedure
Multivariate Analysis of Variance

H = Type III SSCP Matrix for Site
  Al Fe Mg Ca Na
Al 175.61031868 -149.295533 -130.8097066 -5.889163736 -5.372264835
Fe -149.295533 134.22161582 117.74503516 4.8217865934 5.3259491209
Mg -130.8097066 117.74503516 103.35052703 4.2091613187 4.7105458242
Ca -5.889163736 4.8217865934 4.2091613187 0.2047027473 0.154782967
Na -5.372264835 5.3259491209 4.7105458242 0.154782967 0.2582456044

Characteristic Roots and Vectors of: E Inverse * H, where
H = Type III SSCP Matrix for Site
E = Error SSCP Matrix
Characteristic Root Percent Characteristic Vector V'EV=1
Al Fe Mg Ca Na
34.1611140 96.39 0.09562211 -0.26330469 -0.05305978 -1.87982100 -0.47071123
1.2500994 3.53 0.02651891 -0.01239715 0.17564390 -4.25929785 1.23727668
0.0275396 0.08 0.09082220 0.13159869 0.03508901 -0.15701602 -1.39364544
0.0000000 0.00 0.03673984 -0.15129712 0.20455529 0.54624873 -0.17402107
0.0000000 0.00 0.06862324 0.03056912 -0.10662399 2.51151978 1.23668841

MANOVA Test Criteria and F Approximations for the Hypothesis of No Overall Site Effect
H = Type III SSCP Matrix for Site
E = Error SSCP Matrix

S=3 M=0.5 N=8
Statistic Value F Value Num DF Den DF Pr > F
Wilks' Lambda 0.01230091 13.09 15 50.091 <.0001
Pillai's Trace 1.55393619 4.30 15 60 <.0001
Hotelling-Lawley Trace 35.43875302 40.59 15 29.13 <.0001
Roy's Greatest Root 34.16111399 136.64 5 20 <.0001
NOTE: F Statistic for Roy's Greatest Root is an upper bound.

Output 41.6.9 Hypothesis SSCP Matrix and Multivariate Tests for Differences between Llanederyn and the Other Sites
H = Contrast SSCP Matrix for Llanederyn vs. the rest
  Al Fe Mg Ca Na
Al 58.583366402 -64.56230291 -57.57983466 -1.438395503 -3.698102513
Fe -64.56230291 71.151441323 63.456352116 1.5851961376 4.0755256878
Mg -57.57983466 63.456352116 56.593493386 1.4137558201 3.6347541005
Ca -1.438395503 1.5851961376 1.4137558201 0.0353168783 0.0907993915
Na -3.698102513 4.0755256878 3.6347541005 0.0907993915 0.2334444577

Characteristic Roots and Vectors of: E Inverse * H, where
H = Contrast SSCP Matrix for Llanederyn vs. the rest
E = Error SSCP Matrix
Characteristic Root Percent Characteristic Vector V'EV=1
Al Fe Mg Ca Na
16.1251646 100.00 -0.08883488 0.25458141 0.08723574 0.98158668 0.71925759
0.0000000 0.00 -0.00503538 0.03825743 -0.17632854 5.16256699 -0.01022754
0.0000000 0.00 0.00162771 -0.08885364 -0.01774069 -0.83096817 2.17644566
0.0000000 0.00 0.04450136 -0.15722494 0.22156791 0.00000000 0.00000000
0.0000000 0.00 0.11939206 0.10833549 0.00000000 0.00000000 0.00000000

MANOVA Test Criteria and Exact F Statistics for the Hypothesis of No Overall Llanederyn vs. the rest Effect
H = Contrast SSCP Matrix for Llanederyn vs. the rest
E = Error SSCP Matrix

S=1 M=1.5 N=8
Statistic Value F Value Num DF Den DF Pr > F
Wilks' Lambda 0.05839360 58.05 5 18 <.0001
Pillai's Trace 0.94160640 58.05 5 18 <.0001
Hotelling-Lawley Trace 16.12516462 58.05 5 18 <.0001
Roy's Greatest Root 16.12516462 58.05 5 18 <.0001