Example 32.3 Normal-Theory Discriminant Analysis of Iris Data

In this example, PROC DISCRIM uses normal-theory methods to classify the iris data used in Example 32.1. The POOL=TEST option tests the homogeneity of the within-group covariance matrices (Output 32.3.3). Since the resulting test statistic is significant at the 0.10 level, the within-group covariance matrices are used to derive the quadratic discriminant criterion. The WCOV and PCOV options display the within-group covariance matrices and the pooled covariance matrix (Output 32.3.2). The DISTANCE option displays squared distances between classes (Output 32.3.4). The ANOVA and MANOVA options test the hypothesis that the class means are equal, by using univariate statistics and multivariate statistics; all statistics are significant at the 0.0001 level (Output 32.3.5). The LISTERR option lists the misclassified observations under resubstitution (Output 32.3.6). The CROSSLISTERR option lists the observations that are misclassified under cross validation and displays cross validation error-rate estimates (Output 32.3.7). The resubstitution error count estimate, 0.02, is not larger than the cross validation error count estimate, 0.0267, as would be expected because the resubstitution estimate is optimistically biased. The OUTSTAT= option generates a TYPE=MIXED (because POOL=TEST) output data set containing various statistics such as means, covariances, and coefficients of the discriminant function (Output 32.3.8).

The following statements produce Output 32.3.1 through Output 32.3.8:

title 'Discriminant Analysis of Fisher (1936) Iris Data';
title2 'Using Quadratic Discriminant Function';

proc discrim data=sashelp.iris outstat=irisstat
             wcov pcov method=normal pool=test
             distance anova manova listerr crosslisterr;
   class Species;
   var SepalLength SepalWidth PetalLength PetalWidth;
run;

proc print data=irisstat;
   title2 'Output Discriminant Statistics';
run;

Output 32.3.1 Quadratic Discriminant Analysis of Iris Data
Discriminant Analysis of Fisher (1936) Iris Data
Using Quadratic Discriminant Function

The DISCRIM Procedure

Total Sample Size 150 DF Total 149
Variables 4 DF Within Classes 147
Classes 3 DF Between Classes 2

Number of Observations Read 150
Number of Observations Used 150

Class Level Information
Species Variable
Name
Frequency Weight Proportion Prior
Probability
Setosa Setosa 50 50.0000 0.333333 0.333333
Versicolor Versicolor 50 50.0000 0.333333 0.333333
Virginica Virginica 50 50.0000 0.333333 0.333333

Output 32.3.2 Covariance Matrices
Discriminant Analysis of Fisher (1936) Iris Data
Using Quadratic Discriminant Function

The DISCRIM Procedure
Within-Class Covariance Matrices

Species = Setosa, DF = 49
Variable Label SepalLength SepalWidth PetalLength PetalWidth
SepalLength Sepal Length (mm) 12.42489796 9.92163265 1.63551020 1.03306122
SepalWidth Sepal Width (mm) 9.92163265 14.36897959 1.16979592 0.92979592
PetalLength Petal Length (mm) 1.63551020 1.16979592 3.01591837 0.60693878
PetalWidth Petal Width (mm) 1.03306122 0.92979592 0.60693878 1.11061224


 

Species = Versicolor, DF = 49
Variable Label SepalLength SepalWidth PetalLength PetalWidth
SepalLength Sepal Length (mm) 26.64326531 8.51836735 18.28979592 5.57795918
SepalWidth Sepal Width (mm) 8.51836735 9.84693878 8.26530612 4.12040816
PetalLength Petal Length (mm) 18.28979592 8.26530612 22.08163265 7.31020408
PetalWidth Petal Width (mm) 5.57795918 4.12040816 7.31020408 3.91061224


 

Species = Virginica, DF = 49
Variable Label SepalLength SepalWidth PetalLength PetalWidth
SepalLength Sepal Length (mm) 40.43428571 9.37632653 30.32897959 4.90938776
SepalWidth Sepal Width (mm) 9.37632653 10.40040816 7.13795918 4.76285714
PetalLength Petal Length (mm) 30.32897959 7.13795918 30.45877551 4.88244898
PetalWidth Petal Width (mm) 4.90938776 4.76285714 4.88244898 7.54326531


 

Discriminant Analysis of Fisher (1936) Iris Data
Using Quadratic Discriminant Function

The DISCRIM Procedure

Pooled Within-Class Covariance Matrix, DF = 147
Variable Label SepalLength SepalWidth PetalLength PetalWidth
SepalLength Sepal Length (mm) 26.50081633 9.27210884 16.75142857 3.84013605
SepalWidth Sepal Width (mm) 9.27210884 11.53877551 5.52435374 3.27102041
PetalLength Petal Length (mm) 16.75142857 5.52435374 18.51877551 4.26653061
PetalWidth Petal Width (mm) 3.84013605 3.27102041 4.26653061 4.18816327

Within Covariance Matrix Information
Species Covariance
Matrix Rank
Natural Log of the
Determinant of the
Covariance Matrix
Setosa 4 5.35332
Versicolor 4 7.54636
Virginica 4 9.49362
Pooled 4 8.46214

Output 32.3.3 Homogeneity Test
Discriminant Analysis of Fisher (1936) Iris Data
Using Quadratic Discriminant Function

The DISCRIM Procedure
Test of Homogeneity of Within Covariance Matrices

Chi-Square DF Pr > ChiSq
140.943050 20 <.0001

Since the Chi-Square value is significant at the 0.1 level, the within covariance matrices will be used in the discriminant function.
Reference: Morrison, D.F. (1976) Multivariate Statistical Methods p252.


Output 32.3.4 Squared Distances
Discriminant Analysis of Fisher (1936) Iris Data
Using Quadratic Discriminant Function

The DISCRIM Procedure

Squared Distance to Species
From Species Setosa Versicolor Virginica
Setosa 0 103.19382 168.76759
Versicolor 323.06203 0 13.83875
Virginica 706.08494 17.86670 0

Generalized Squared Distance to Species
From Species Setosa Versicolor Virginica
Setosa 5.35332 110.74017 178.26121
Versicolor 328.41535 7.54636 23.33238
Virginica 711.43826 25.41306 9.49362

Output 32.3.5 Tests of Equal Class Means
Discriminant Analysis of Fisher (1936) Iris Data
Using Quadratic Discriminant Function

The DISCRIM Procedure

Univariate Test Statistics
F Statistics, Num DF=2, Den DF=147
Variable Label Total
Standard
Deviation
Pooled
Standard
Deviation
Between
Standard
Deviation
R-Square R-Square
/ (1-RSq)
F Value Pr > F
SepalLength Sepal Length (mm) 8.2807 5.1479 7.9506 0.6187 1.6226 119.26 <.0001
SepalWidth Sepal Width (mm) 4.3587 3.3969 3.3682 0.4008 0.6688 49.16 <.0001
PetalLength Petal Length (mm) 17.6530 4.3033 20.9070 0.9414 16.0566 1180.16 <.0001
PetalWidth Petal Width (mm) 7.6224 2.0465 8.9673 0.9289 13.0613 960.01 <.0001

Average R-Square
Unweighted 0.7224358
Weighted by Variance 0.8689444

Multivariate Statistics and F Approximations
S=2 M=0.5 N=71
Statistic Value F Value Num DF Den DF Pr > F
Wilks' Lambda 0.02343863 199.15 8 288 <.0001
Pillai's Trace 1.19189883 53.47 8 290 <.0001
Hotelling-Lawley Trace 32.47732024 582.20 8 203.4 <.0001
Roy's Greatest Root 32.19192920 1166.96 4 145 <.0001
NOTE: F Statistic for Roy's Greatest Root is an upper bound.
NOTE: F Statistic for Wilks' Lambda is exact.

Output 32.3.6 Misclassified Observations: Resubstitution
Discriminant Analysis of Fisher (1936) Iris Data
Using Quadratic Discriminant Function

The DISCRIM Procedure
Classification Results for Calibration Data: SASHELP.IRIS
Resubstitution Results using Quadratic Discriminant Function

Posterior Probability of Membership in Species
Obs From Species Classified into
Species
Setosa Versicolor Virginica
53 Versicolor Virginica * 0.0000 0.3359 0.6641
55 Versicolor Virginica * 0.0000 0.1543 0.8457
103 Virginica Versicolor * 0.0000 0.6050 0.3950

* Misclassified observation


Discriminant Analysis of Fisher (1936) Iris Data
Using Quadratic Discriminant Function

The DISCRIM Procedure
Classification Summary for Calibration Data: SASHELP.IRIS
Resubstitution Summary using Quadratic Discriminant Function

Number of Observations and Percent Classified
into Species
From Species Setosa Versicolor Virginica Total
Setosa
50
100.00
0
0.00
0
0.00
50
100.00
Versicolor
0
0.00
48
96.00
2
4.00
50
100.00
Virginica
0
0.00
1
2.00
49
98.00
50
100.00
Total
50
33.33
49
32.67
51
34.00
150
100.00
Priors
0.33333
 
0.33333
 
0.33333
 
 
 

Error Count Estimates for Species
  Setosa Versicolor Virginica Total
Rate 0.0000 0.0400 0.0200 0.0200
Priors 0.3333 0.3333 0.3333  

Output 32.3.7 Misclassified Observations: Cross Validation
Discriminant Analysis of Fisher (1936) Iris Data
Using Quadratic Discriminant Function

The DISCRIM Procedure
Classification Results for Calibration Data: SASHELP.IRIS
Cross-validation Results using Quadratic Discriminant Function

Posterior Probability of Membership in Species
Obs From Species Classified into
Species
Setosa Versicolor Virginica
52 Versicolor Virginica * 0.0000 0.3134 0.6866
53 Versicolor Virginica * 0.0000 0.1616 0.8384
55 Versicolor Virginica * 0.0000 0.0713 0.9287
103 Virginica Versicolor * 0.0000 0.6632 0.3368

* Misclassified observation


Discriminant Analysis of Fisher (1936) Iris Data
Using Quadratic Discriminant Function

The DISCRIM Procedure
Classification Summary for Calibration Data: SASHELP.IRIS
Cross-validation Summary using Quadratic Discriminant Function

Number of Observations and Percent Classified
into Species
From Species Setosa Versicolor Virginica Total
Setosa
50
100.00
0
0.00
0
0.00
50
100.00
Versicolor
0
0.00
47
94.00
3
6.00
50
100.00
Virginica
0
0.00
1
2.00
49
98.00
50
100.00
Total
50
33.33
48
32.00
52
34.67
150
100.00
Priors
0.33333
 
0.33333
 
0.33333
 
 
 

Error Count Estimates for Species
  Setosa Versicolor Virginica Total
Rate 0.0000 0.0600 0.0200 0.0267
Priors 0.3333 0.3333 0.3333  

Output 32.3.8 Output Statistics from Iris Data
Discriminant Analysis of Fisher (1936) Iris Data
Output Discriminant Statistics

Obs Species _TYPE_ _NAME_ SepalLength SepalWidth PetalLength PetalWidth
1   N   150.00 150.00 150.00 150.00
2 Setosa N   50.00 50.00 50.00 50.00
3 Versicolor N   50.00 50.00 50.00 50.00
4 Virginica N   50.00 50.00 50.00 50.00
5   MEAN   58.43 30.57 37.58 11.99
6 Setosa MEAN   50.06 34.28 14.62 2.46
7 Versicolor MEAN   59.36 27.70 42.60 13.26
8 Virginica MEAN   65.88 29.74 55.52 20.26
9 Setosa PRIOR   0.33 0.33 0.33 0.33
10 Versicolor PRIOR   0.33 0.33 0.33 0.33
11 Virginica PRIOR   0.33 0.33 0.33 0.33
12 Setosa CSSCP SepalLength 608.82 486.16 80.14 50.62
13 Setosa CSSCP SepalWidth 486.16 704.08 57.32 45.56
14 Setosa CSSCP PetalLength 80.14 57.32 147.78 29.74
15 Setosa CSSCP PetalWidth 50.62 45.56 29.74 54.42
16 Versicolor CSSCP SepalLength 1305.52 417.40 896.20 273.32
17 Versicolor CSSCP SepalWidth 417.40 482.50 405.00 201.90
18 Versicolor CSSCP PetalLength 896.20 405.00 1082.00 358.20
19 Versicolor CSSCP PetalWidth 273.32 201.90 358.20 191.62
20 Virginica CSSCP SepalLength 1981.28 459.44 1486.12 240.56
21 Virginica CSSCP SepalWidth 459.44 509.62 349.76 233.38
22 Virginica CSSCP PetalLength 1486.12 349.76 1492.48 239.24
23 Virginica CSSCP PetalWidth 240.56 233.38 239.24 369.62
24   PSSCP SepalLength 3895.62 1363.00 2462.46 564.50
25   PSSCP SepalWidth 1363.00 1696.20 812.08 480.84
26   PSSCP PetalLength 2462.46 812.08 2722.26 627.18
27   PSSCP PetalWidth 564.50 480.84 627.18 615.66
28   BSSCP SepalLength 6321.21 -1995.27 16524.84 7127.93
29   BSSCP SepalWidth -1995.27 1134.49 -5723.96 -2293.27
30   BSSCP PetalLength 16524.84 -5723.96 43710.28 18677.40
31   BSSCP PetalWidth 7127.93 -2293.27 18677.40 8041.33
32   CSSCP SepalLength 10216.83 -632.27 18987.30 7692.43
33   CSSCP SepalWidth -632.27 2830.69 -4911.88 -1812.43
34   CSSCP PetalLength 18987.30 -4911.88 46432.54 19304.58
35   CSSCP PetalWidth 7692.43 -1812.43 19304.58 8656.99
36   RSQUARED   0.62 0.40 0.94 0.93
37 Setosa COV SepalLength 12.42 9.92 1.64 1.03
38 Setosa COV SepalWidth 9.92 14.37 1.17 0.93
39 Setosa COV PetalLength 1.64 1.17 3.02 0.61
40 Setosa COV PetalWidth 1.03 0.93 0.61 1.11
41 Versicolor COV SepalLength 26.64 8.52 18.29 5.58
42 Versicolor COV SepalWidth 8.52 9.85 8.27 4.12
43 Versicolor COV PetalLength 18.29 8.27 22.08 7.31
44 Versicolor COV PetalWidth 5.58 4.12 7.31 3.91
45 Virginica COV SepalLength 40.43 9.38 30.33 4.91
46 Virginica COV SepalWidth 9.38 10.40 7.14 4.76
47 Virginica COV PetalLength 30.33 7.14 30.46 4.88
48 Virginica COV PetalWidth 4.91 4.76 4.88 7.54

Discriminant Analysis of Fisher (1936) Iris Data
Output Discriminant Statistics

Obs Species _TYPE_ _NAME_ SepalLength SepalWidth PetalLength PetalWidth
49   PCOV SepalLength 26.501 9.2721 16.751 3.840
50   PCOV SepalWidth 9.272 11.5388 5.524 3.271
51   PCOV PetalLength 16.751 5.5244 18.519 4.267
52   PCOV PetalWidth 3.840 3.2710 4.267 4.188
53   BCOV SepalLength 63.212 -19.9527 165.248 71.279
54   BCOV SepalWidth -19.953 11.3449 -57.240 -22.933
55   BCOV PetalLength 165.248 -57.2396 437.103 186.774
56   BCOV PetalWidth 71.279 -22.9327 186.774 80.413
57   COV SepalLength 68.569 -4.2434 127.432 51.627
58   COV SepalWidth -4.243 18.9979 -32.966 -12.164
59   COV PetalLength 127.432 -32.9656 311.628 129.561
60   COV PetalWidth 51.627 -12.1639 129.561 58.101
61 Setosa STD   3.525 3.7906 1.737 1.054
62 Versicolor STD   5.162 3.1380 4.699 1.978
63 Virginica STD   6.359 3.2250 5.519 2.747
64   PSTD   5.148 3.3969 4.303 2.047
65   BSTD   7.951 3.3682 20.907 8.967
66   STD   8.281 4.3587 17.653 7.622
67 Setosa CORR SepalLength 1.000 0.7425 0.267 0.278
68 Setosa CORR SepalWidth 0.743 1.0000 0.178 0.233
69 Setosa CORR PetalLength 0.267 0.1777 1.000 0.332
70 Setosa CORR PetalWidth 0.278 0.2328 0.332 1.000
71 Versicolor CORR SepalLength 1.000 0.5259 0.754 0.546
72 Versicolor CORR SepalWidth 0.526 1.0000 0.561 0.664
73 Versicolor CORR PetalLength 0.754 0.5605 1.000 0.787
74 Versicolor CORR PetalWidth 0.546 0.6640 0.787 1.000
75 Virginica CORR SepalLength 1.000 0.4572 0.864 0.281
76 Virginica CORR SepalWidth 0.457 1.0000 0.401 0.538
77 Virginica CORR PetalLength 0.864 0.4010 1.000 0.322
78 Virginica CORR PetalWidth 0.281 0.5377 0.322 1.000
79   PCORR SepalLength 1.000 0.5302 0.756 0.365
80   PCORR SepalWidth 0.530 1.0000 0.378 0.471
81   PCORR PetalLength 0.756 0.3779 1.000 0.484
82   PCORR PetalWidth 0.365 0.4705 0.484 1.000
83   BCORR SepalLength 1.000 -0.7451 0.994 1.000
84   BCORR SepalWidth -0.745 1.0000 -0.813 -0.759
85   BCORR PetalLength 0.994 -0.8128 1.000 0.996
86   BCORR PetalWidth 1.000 -0.7593 0.996 1.000
87   CORR SepalLength 1.000 -0.1176 0.872 0.818
88   CORR SepalWidth -0.118 1.0000 -0.428 -0.366
89   CORR PetalLength 0.872 -0.4284 1.000 0.963
90   CORR PetalWidth 0.818 -0.3661 0.963 1.000
91 Setosa STDMEAN   -1.011 0.8504 -1.301 -1.251
92 Versicolor STDMEAN   0.112 -0.6592 0.284 0.166
93 Virginica STDMEAN   0.899 -0.1912 1.016 1.085
94 Setosa PSTDMEAN   -1.627 1.0912 -5.335 -4.658
95 Versicolor PSTDMEAN   0.180 -0.8459 1.167 0.619
96 Virginica PSTDMEAN   1.447 -0.2453 4.169 4.039

Discriminant Analysis of Fisher (1936) Iris Data
Output Discriminant Statistics

Obs Species _TYPE_ _NAME_ SepalLength SepalWidth PetalLength PetalWidth
97   LNDETERM   8.462 8.462 8.462 8.462
98 Setosa LNDETERM   5.353 5.353 5.353 5.353
99 Versicolor LNDETERM   7.546 7.546 7.546 7.546
100 Virginica LNDETERM   9.494 9.494 9.494 9.494
101 Setosa QUAD SepalLength -0.095 0.062 0.023 0.024
102 Setosa QUAD SepalWidth 0.062 -0.078 -0.006 0.011
103 Setosa QUAD PetalLength 0.023 -0.006 -0.194 0.090
104 Setosa QUAD PetalWidth 0.024 0.011 0.090 -0.530
105 Setosa QUAD _LINEAR_ 4.455 -0.762 3.356 -3.126
106 Setosa QUAD _CONST_ -121.826 -121.826 -121.826 -121.826
107 Versicolor QUAD SepalLength -0.048 0.018 0.043 -0.032
108 Versicolor QUAD SepalWidth 0.018 -0.099 -0.011 0.097
109 Versicolor QUAD PetalLength 0.043 -0.011 -0.099 0.135
110 Versicolor QUAD PetalWidth -0.032 0.097 0.135 -0.436
111 Versicolor QUAD _LINEAR_ 1.801 1.596 0.327 -1.471
112 Versicolor QUAD _CONST_ -76.549 -76.549 -76.549 -76.549
113 Virginica QUAD SepalLength -0.053 0.017 0.050 -0.009
114 Virginica QUAD SepalWidth 0.017 -0.079 -0.006 0.042
115 Virginica QUAD PetalLength 0.050 -0.006 -0.067 0.014
116 Virginica QUAD PetalWidth -0.009 0.042 0.014 -0.097
117 Virginica QUAD _LINEAR_ 0.737 1.325 0.623 0.966
118 Virginica QUAD _CONST_ -75.821 -75.821 -75.821 -75.821