Previous Page | Next Page

The DISCRIM Procedure

Example 31.3 Normal-Theory Discriminant Analysis of Iris Data

In this example, PROC DISCRIM uses normal-theory methods to classify the iris data used in Example 31.1. The POOL=TEST option tests the homogeneity of the within-group covariance matrices (Output 31.3.3). Since the resulting test statistic is significant at the 0.10 level, the within-group covariance matrices are used to derive the quadratic discriminant criterion. The WCOV and PCOV options display the within-group covariance matrices and the pooled covariance matrix (Output 31.3.2). The DISTANCE option displays squared distances between classes (Output 31.3.4). The ANOVA and MANOVA options test the hypothesis that the class means are equal, by using univariate statistics and multivariate statistics; all statistics are significant at the 0.0001 level (Output 31.3.5). The LISTERR option lists the misclassified observations under resubstitution (Output 31.3.6). The CROSSLISTERR option lists the observations that are misclassified under cross validation and displays cross validation error-rate estimates (Output 31.3.7). The resubstitution error count estimate, 0.02, is not larger than the cross validation error count estimate, 0.0267, as would be expected because the resubstitution estimate is optimistically biased. The OUTSTAT= option generates a TYPE=MIXED (because POOL=TEST) output data set containing various statistics such as means, covariances, and coefficients of the discriminant function (Output 31.3.8).

The following statements produce Output 31.3.1 through Output 31.3.8:

   title 'Discriminant Analysis of Fisher (1936) Iris Data';
   title2 'Using Quadratic Discriminant Function';
   
   proc discrim data=iris outstat=irisstat
                wcov pcov method=normal pool=test
                distance anova manova listerr crosslisterr;
      class Species;
      var SepalLength SepalWidth PetalLength PetalWidth;
   run;
   
   proc print data=irisstat;
      title2 'Output Discriminant Statistics';
   run;
   

Output 31.3.1 Quadratic Discriminant Analysis of Iris Data
Discriminant Analysis of Fisher (1936) Iris Data
Using Quadratic Discriminant Function

The DISCRIM Procedure

Total Sample Size 150 DF Total 149
Variables 4 DF Within Classes 147
Classes 3 DF Between Classes 2

Number of Observations Read 150
Number of Observations Used 150

Class Level Information
Species Variable
Name
Frequency Weight Proportion Prior
Probability
Setosa Setosa 50 50.0000 0.333333 0.333333
Versicolor Versicolor 50 50.0000 0.333333 0.333333
Virginica Virginica 50 50.0000 0.333333 0.333333

Output 31.3.2 Covariance Matrices
Discriminant Analysis of Fisher (1936) Iris Data
Using Quadratic Discriminant Function

The DISCRIM Procedure
Within-Class Covariance Matrices

Species = Setosa, DF = 49
Variable Label SepalLength SepalWidth PetalLength PetalWidth
SepalLength Sepal Length in mm. 12.42489796 9.92163265 1.63551020 1.03306122
SepalWidth Sepal Width in mm. 9.92163265 14.36897959 1.16979592 0.92979592
PetalLength Petal Length in mm. 1.63551020 1.16979592 3.01591837 0.60693878
PetalWidth Petal Width in mm. 1.03306122 0.92979592 0.60693878 1.11061224


 

Species = Versicolor, DF = 49
Variable Label SepalLength SepalWidth PetalLength PetalWidth
SepalLength Sepal Length in mm. 26.64326531 8.51836735 18.28979592 5.57795918
SepalWidth Sepal Width in mm. 8.51836735 9.84693878 8.26530612 4.12040816
PetalLength Petal Length in mm. 18.28979592 8.26530612 22.08163265 7.31020408
PetalWidth Petal Width in mm. 5.57795918 4.12040816 7.31020408 3.91061224


 

Species = Virginica, DF = 49
Variable Label SepalLength SepalWidth PetalLength PetalWidth
SepalLength Sepal Length in mm. 40.43428571 9.37632653 30.32897959 4.90938776
SepalWidth Sepal Width in mm. 9.37632653 10.40040816 7.13795918 4.76285714
PetalLength Petal Length in mm. 30.32897959 7.13795918 30.45877551 4.88244898
PetalWidth Petal Width in mm. 4.90938776 4.76285714 4.88244898 7.54326531


 

Discriminant Analysis of Fisher (1936) Iris Data
Using Quadratic Discriminant Function

The DISCRIM Procedure

Pooled Within-Class Covariance Matrix, DF = 147
Variable Label SepalLength SepalWidth PetalLength PetalWidth
SepalLength Sepal Length in mm. 26.50081633 9.27210884 16.75142857 3.84013605
SepalWidth Sepal Width in mm. 9.27210884 11.53877551 5.52435374 3.27102041
PetalLength Petal Length in mm. 16.75142857 5.52435374 18.51877551 4.26653061
PetalWidth Petal Width in mm. 3.84013605 3.27102041 4.26653061 4.18816327

Within Covariance Matrix Information
Species Covariance
Matrix Rank
Natural Log of the
Determinant of the
Covariance Matrix
Setosa 4 5.35332
Versicolor 4 7.54636
Virginica 4 9.49362
Pooled 4 8.46214

Output 31.3.3 Homogeneity Test
Discriminant Analysis of Fisher (1936) Iris Data
Using Quadratic Discriminant Function

The DISCRIM Procedure
Test of Homogeneity of Within Covariance Matrices

Chi-Square DF Pr > ChiSq
140.943050 20 <.0001

Since the Chi-Square value is significant at the 0.1 level, the within covariance matrices will be used in the discriminant function.
Reference: Morrison, D.F. (1976) Multivariate Statistical Methods p252.


Output 31.3.4 Squared Distances
Discriminant Analysis of Fisher (1936) Iris Data
Using Quadratic Discriminant Function

The DISCRIM Procedure

Squared Distance to Species
From Species Setosa Versicolor Virginica
Setosa 0 103.19382 168.76759
Versicolor 323.06203 0 13.83875
Virginica 706.08494 17.86670 0

Generalized Squared Distance to Species
From Species Setosa Versicolor Virginica
Setosa 5.35332 110.74017 178.26121
Versicolor 328.41535 7.54636 23.33238
Virginica 711.43826 25.41306 9.49362

Output 31.3.5 Tests of Equal Class Means
Discriminant Analysis of Fisher (1936) Iris Data
Using Quadratic Discriminant Function

The DISCRIM Procedure

Univariate Test Statistics
F Statistics, Num DF=2, Den DF=147
Variable Label Total
Standard
Deviation
Pooled
Standard
Deviation
Between
Standard
Deviation
R-Square R-Square
/ (1-RSq)
F Value Pr > F
SepalLength Sepal Length in mm. 8.2807 5.1479 7.9506 0.6187 1.6226 119.26 <.0001
SepalWidth Sepal Width in mm. 4.3587 3.3969 3.3682 0.4008 0.6688 49.16 <.0001
PetalLength Petal Length in mm. 17.6530 4.3033 20.9070 0.9414 16.0566 1180.16 <.0001
PetalWidth Petal Width in mm. 7.6224 2.0465 8.9673 0.9289 13.0613 960.01 <.0001

Average R-Square
Unweighted 0.7224358
Weighted by Variance 0.8689444

Multivariate Statistics and F Approximations
S=2 M=0.5 N=71
Statistic Value F Value Num DF Den DF Pr > F
Wilks' Lambda 0.02343863 199.15 8 288 <.0001
Pillai's Trace 1.19189883 53.47 8 290 <.0001
Hotelling-Lawley Trace 32.47732024 582.20 8 203.4 <.0001
Roy's Greatest Root 32.19192920 1166.96 4 145 <.0001
NOTE: F Statistic for Roy's Greatest Root is an upper bound.
NOTE: F Statistic for Wilks' Lambda is exact.

Output 31.3.6 Misclassified Observations: Resubstitution
Discriminant Analysis of Fisher (1936) Iris Data
Using Quadratic Discriminant Function

The DISCRIM Procedure
Classification Results for Calibration Data: WORK.IRIS
Resubstitution Results using Quadratic Discriminant Function

Posterior Probability of Membership in Species
Obs From Species Classified into
Species
Setosa Versicolor Virginica
5 Virginica Versicolor * 0.0000 0.6050 0.3950
9 Versicolor Virginica * 0.0000 0.3359 0.6641
12 Versicolor Virginica * 0.0000 0.1543 0.8457

* Misclassified observation


Discriminant Analysis of Fisher (1936) Iris Data
Using Quadratic Discriminant Function

The DISCRIM Procedure
Classification Summary for Calibration Data: WORK.IRIS
Resubstitution Summary using Quadratic Discriminant Function

Number of Observations and Percent Classified
into Species
From Species Setosa Versicolor Virginica Total
Setosa
50
100.00
0
0.00
0
0.00
50
100.00
Versicolor
0
0.00
48
96.00
2
4.00
50
100.00
Virginica
0
0.00
1
2.00
49
98.00
50
100.00
Total
50
33.33
49
32.67
51
34.00
150
100.00
Priors
0.33333
 
0.33333
 
0.33333
 
 
 

Error Count Estimates for Species
  Setosa Versicolor Virginica Total
Rate 0.0000 0.0400 0.0200 0.0200
Priors 0.3333 0.3333 0.3333  

Output 31.3.7 Misclassified Observations: Cross Validation
Discriminant Analysis of Fisher (1936) Iris Data
Using Quadratic Discriminant Function

The DISCRIM Procedure
Classification Results for Calibration Data: WORK.IRIS
Cross-validation Results using Quadratic Discriminant Function

Posterior Probability of Membership in Species
Obs From Species Classified into
Species
Setosa Versicolor Virginica
5 Virginica Versicolor * 0.0000 0.6632 0.3368
8 Versicolor Virginica * 0.0000 0.3134 0.6866
9 Versicolor Virginica * 0.0000 0.1616 0.8384
12 Versicolor Virginica * 0.0000 0.0713 0.9287

* Misclassified observation


Discriminant Analysis of Fisher (1936) Iris Data
Using Quadratic Discriminant Function

The DISCRIM Procedure
Classification Summary for Calibration Data: WORK.IRIS
Cross-validation Summary using Quadratic Discriminant Function

Number of Observations and Percent Classified
into Species
From Species Setosa Versicolor Virginica Total
Setosa
50
100.00
0
0.00
0
0.00
50
100.00
Versicolor
0
0.00
47
94.00
3
6.00
50
100.00
Virginica
0
0.00
1
2.00
49
98.00
50
100.00
Total
50
33.33
48
32.00
52
34.67
150
100.00
Priors
0.33333
 
0.33333
 
0.33333
 
 
 

Error Count Estimates for Species
  Setosa Versicolor Virginica Total
Rate 0.0000 0.0600 0.0200 0.0267
Priors 0.3333 0.3333 0.3333  

Output 31.3.8 Output Statistics from Iris Data
Discriminant Analysis of Fisher (1936) Iris Data
Output Discriminant Statistics

Obs Species _TYPE_ _NAME_ SepalLength SepalWidth PetalLength PetalWidth
1 . N   150.00 150.00 150.00 150.00
2 Setosa N   50.00 50.00 50.00 50.00
3 Versicolor N   50.00 50.00 50.00 50.00
4 Virginica N   50.00 50.00 50.00 50.00
5 . MEAN   58.43 30.57 37.58 11.99
6 Setosa MEAN   50.06 34.28 14.62 2.46
7 Versicolor MEAN   59.36 27.70 42.60 13.26
8 Virginica MEAN   65.88 29.74 55.52 20.26
9 Setosa PRIOR   0.33 0.33 0.33 0.33
10 Versicolor PRIOR   0.33 0.33 0.33 0.33
11 Virginica PRIOR   0.33 0.33 0.33 0.33
12 Setosa CSSCP SepalLength 608.82 486.16 80.14 50.62
13 Setosa CSSCP SepalWidth 486.16 704.08 57.32 45.56
14 Setosa CSSCP PetalLength 80.14 57.32 147.78 29.74
15 Setosa CSSCP PetalWidth 50.62 45.56 29.74 54.42
16 Versicolor CSSCP SepalLength 1305.52 417.40 896.20 273.32
17 Versicolor CSSCP SepalWidth 417.40 482.50 405.00 201.90
18 Versicolor CSSCP PetalLength 896.20 405.00 1082.00 358.20
19 Versicolor CSSCP PetalWidth 273.32 201.90 358.20 191.62
20 Virginica CSSCP SepalLength 1981.28 459.44 1486.12 240.56
21 Virginica CSSCP SepalWidth 459.44 509.62 349.76 233.38
22 Virginica CSSCP PetalLength 1486.12 349.76 1492.48 239.24
23 Virginica CSSCP PetalWidth 240.56 233.38 239.24 369.62
24 . PSSCP SepalLength 3895.62 1363.00 2462.46 564.50
25 . PSSCP SepalWidth 1363.00 1696.20 812.08 480.84
26 . PSSCP PetalLength 2462.46 812.08 2722.26 627.18
27 . PSSCP PetalWidth 564.50 480.84 627.18 615.66
28 . BSSCP SepalLength 6321.21 -1995.27 16524.84 7127.93
29 . BSSCP SepalWidth -1995.27 1134.49 -5723.96 -2293.27
30 . BSSCP PetalLength 16524.84 -5723.96 43710.28 18677.40
31 . BSSCP PetalWidth 7127.93 -2293.27 18677.40 8041.33
32 . CSSCP SepalLength 10216.83 -632.27 18987.30 7692.43
33 . CSSCP SepalWidth -632.27 2830.69 -4911.88 -1812.43
34 . CSSCP PetalLength 18987.30 -4911.88 46432.54 19304.58
35 . CSSCP PetalWidth 7692.43 -1812.43 19304.58 8656.99
36 . RSQUARED   0.62 0.40 0.94 0.93
37 Setosa COV SepalLength 12.42 9.92 1.64 1.03
38 Setosa COV SepalWidth 9.92 14.37 1.17 0.93
39 Setosa COV PetalLength 1.64 1.17 3.02 0.61
40 Setosa COV PetalWidth 1.03 0.93 0.61 1.11
41 Versicolor COV SepalLength 26.64 8.52 18.29 5.58
42 Versicolor COV SepalWidth 8.52 9.85 8.27 4.12
43 Versicolor COV PetalLength 18.29 8.27 22.08 7.31
44 Versicolor COV PetalWidth 5.58 4.12 7.31 3.91
45 Virginica COV SepalLength 40.43 9.38 30.33 4.91
46 Virginica COV SepalWidth 9.38 10.40 7.14 4.76
47 Virginica COV PetalLength 30.33 7.14 30.46 4.88
48 Virginica COV PetalWidth 4.91 4.76 4.88 7.54

Discriminant Analysis of Fisher (1936) Iris Data
Output Discriminant Statistics

Obs Species _TYPE_ _NAME_ SepalLength SepalWidth PetalLength PetalWidth
49 . PCOV SepalLength 26.501 9.2721 16.751 3.840
50 . PCOV SepalWidth 9.272 11.5388 5.524 3.271
51 . PCOV PetalLength 16.751 5.5244 18.519 4.267
52 . PCOV PetalWidth 3.840 3.2710 4.267 4.188
53 . BCOV SepalLength 63.212 -19.9527 165.248 71.279
54 . BCOV SepalWidth -19.953 11.3449 -57.240 -22.933
55 . BCOV PetalLength 165.248 -57.2396 437.103 186.774
56 . BCOV PetalWidth 71.279 -22.9327 186.774 80.413
57 . COV SepalLength 68.569 -4.2434 127.432 51.627
58 . COV SepalWidth -4.243 18.9979 -32.966 -12.164
59 . COV PetalLength 127.432 -32.9656 311.628 129.561
60 . COV PetalWidth 51.627 -12.1639 129.561 58.101
61 Setosa STD   3.525 3.7906 1.737 1.054
62 Versicolor STD   5.162 3.1380 4.699 1.978
63 Virginica STD   6.359 3.2250 5.519 2.747
64 . PSTD   5.148 3.3969 4.303 2.047
65 . BSTD   7.951 3.3682 20.907 8.967
66 . STD   8.281 4.3587 17.653 7.622
67 Setosa CORR SepalLength 1.000 0.7425 0.267 0.278
68 Setosa CORR SepalWidth 0.743 1.0000 0.178 0.233
69 Setosa CORR PetalLength 0.267 0.1777 1.000 0.332
70 Setosa CORR PetalWidth 0.278 0.2328 0.332 1.000
71 Versicolor CORR SepalLength 1.000 0.5259 0.754 0.546
72 Versicolor CORR SepalWidth 0.526 1.0000 0.561 0.664
73 Versicolor CORR PetalLength 0.754 0.5605 1.000 0.787
74 Versicolor CORR PetalWidth 0.546 0.6640 0.787 1.000
75 Virginica CORR SepalLength 1.000 0.4572 0.864 0.281
76 Virginica CORR SepalWidth 0.457 1.0000 0.401 0.538
77 Virginica CORR PetalLength 0.864 0.4010 1.000 0.322
78 Virginica CORR PetalWidth 0.281 0.5377 0.322 1.000
79 . PCORR SepalLength 1.000 0.5302 0.756 0.365
80 . PCORR SepalWidth 0.530 1.0000 0.378 0.471
81 . PCORR PetalLength 0.756 0.3779 1.000 0.484
82 . PCORR PetalWidth 0.365 0.4705 0.484 1.000
83 . BCORR SepalLength 1.000 -0.7451 0.994 1.000
84 . BCORR SepalWidth -0.745 1.0000 -0.813 -0.759
85 . BCORR PetalLength 0.994 -0.8128 1.000 0.996
86 . BCORR PetalWidth 1.000 -0.7593 0.996 1.000
87 . CORR SepalLength 1.000 -0.1176 0.872 0.818
88 . CORR SepalWidth -0.118 1.0000 -0.428 -0.366
89 . CORR PetalLength 0.872 -0.4284 1.000 0.963
90 . CORR PetalWidth 0.818 -0.3661 0.963 1.000
91 Setosa STDMEAN   -1.011 0.8504 -1.301 -1.251
92 Versicolor STDMEAN   0.112 -0.6592 0.284 0.166
93 Virginica STDMEAN   0.899 -0.1912 1.016 1.085
94 Setosa PSTDMEAN   -1.627 1.0912 -5.335 -4.658
95 Versicolor PSTDMEAN   0.180 -0.8459 1.167 0.619
96 Virginica PSTDMEAN   1.447 -0.2453 4.169 4.039

Discriminant Analysis of Fisher (1936) Iris Data
Output Discriminant Statistics

Obs Species _TYPE_ _NAME_ SepalLength SepalWidth PetalLength PetalWidth
97 . LNDETERM   8.462 8.462 8.462 8.462
98 Setosa LNDETERM   5.353 5.353 5.353 5.353
99 Versicolor LNDETERM   7.546 7.546 7.546 7.546
100 Virginica LNDETERM   9.494 9.494 9.494 9.494
101 Setosa QUAD SepalLength -0.095 0.062 0.023 0.024
102 Setosa QUAD SepalWidth 0.062 -0.078 -0.006 0.011
103 Setosa QUAD PetalLength 0.023 -0.006 -0.194 0.090
104 Setosa QUAD PetalWidth 0.024 0.011 0.090 -0.530
105 Setosa QUAD _LINEAR_ 4.455 -0.762 3.356 -3.126
106 Setosa QUAD _CONST_ -121.826 -121.826 -121.826 -121.826
107 Versicolor QUAD SepalLength -0.048 0.018 0.043 -0.032
108 Versicolor QUAD SepalWidth 0.018 -0.099 -0.011 0.097
109 Versicolor QUAD PetalLength 0.043 -0.011 -0.099 0.135
110 Versicolor QUAD PetalWidth -0.032 0.097 0.135 -0.436
111 Versicolor QUAD _LINEAR_ 1.801 1.596 0.327 -1.471
112 Versicolor QUAD _CONST_ -76.549 -76.549 -76.549 -76.549
113 Virginica QUAD SepalLength -0.053 0.017 0.050 -0.009
114 Virginica QUAD SepalWidth 0.017 -0.079 -0.006 0.042
115 Virginica QUAD PetalLength 0.050 -0.006 -0.067 0.014
116 Virginica QUAD PetalWidth -0.009 0.042 0.014 -0.097
117 Virginica QUAD _LINEAR_ 0.737 1.325 0.623 0.966
118 Virginica QUAD _CONST_ -75.821 -75.821 -75.821 -75.821

Previous Page | Next Page | Top of Page