The DISCRIM Procedure

Example 35.3 Normal-Theory Discriminant Analysis of Iris Data

In this example, PROC DISCRIM uses normal-theory methods to classify the iris data used in Example 35.1. The POOL=TEST option tests the homogeneity of the within-group covariance matrices (Output 35.3.3). Since the resulting test statistic is significant at the 0.10 level, the within-group covariance matrices are used to derive the quadratic discriminant criterion. The WCOV and PCOV options display the within-group covariance matrices and the pooled covariance matrix (Output 35.3.2). The DISTANCE option displays squared distances between classes (Output 35.3.4). The ANOVA and MANOVA options test the hypothesis that the class means are equal, by using univariate statistics and multivariate statistics; all statistics are significant at the 0.0001 level (Output 35.3.5). The LISTERR option lists the misclassified observations under resubstitution (Output 35.3.6). The CROSSLISTERR option lists the observations that are misclassified under cross validation and displays cross validation error-rate estimates (Output 35.3.7). The resubstitution error count estimate, 0.02, is not larger than the cross validation error count estimate, 0.0267, as would be expected because the resubstitution estimate is optimistically biased. The OUTSTAT= option generates a TYPE=MIXED (because POOL=TEST) output data set containing various statistics such as means, covariances, and coefficients of the discriminant function (Output 35.3.8).

The following statements produce Output 35.3.1 through Output 35.3.8:

title 'Discriminant Analysis of Fisher (1936) Iris Data';
title2 'Using Quadratic Discriminant Function';

proc discrim data=sashelp.iris outstat=irisstat
             wcov pcov method=normal pool=test
             distance anova manova listerr crosslisterr;
   class Species;
   var SepalLength SepalWidth PetalLength PetalWidth;
run;

proc print data=irisstat;
   title2 'Output Discriminant Statistics';
run;

Output 35.3.1: Quadratic Discriminant Analysis of Iris Data

Discriminant Analysis of Fisher (1936) Iris Data
Using Quadratic Discriminant Function

The DISCRIM Procedure

Total Sample Size 150 DF Total 149
Variables 4 DF Within Classes 147
Classes 3 DF Between Classes 2

Number of Observations Read 150
Number of Observations Used 150

Class Level Information
Species Variable
Name
Frequency Weight Proportion Prior
Probability
Setosa Setosa 50 50.0000 0.333333 0.333333
Versicolor Versicolor 50 50.0000 0.333333 0.333333
Virginica Virginica 50 50.0000 0.333333 0.333333



Output 35.3.2: Covariance Matrices

Discriminant Analysis of Fisher (1936) Iris Data
Using Quadratic Discriminant Function

The DISCRIM Procedure
Within-Class Covariance Matrices

Species = Setosa, DF = 49
Variable Label SepalLength SepalWidth PetalLength PetalWidth
SepalLength Sepal Length (mm) 12.42489796 9.92163265 1.63551020 1.03306122
SepalWidth Sepal Width (mm) 9.92163265 14.36897959 1.16979592 0.92979592
PetalLength Petal Length (mm) 1.63551020 1.16979592 3.01591837 0.60693878
PetalWidth Petal Width (mm) 1.03306122 0.92979592 0.60693878 1.11061224


 

Species = Versicolor, DF = 49
Variable Label SepalLength SepalWidth PetalLength PetalWidth
SepalLength Sepal Length (mm) 26.64326531 8.51836735 18.28979592 5.57795918
SepalWidth Sepal Width (mm) 8.51836735 9.84693878 8.26530612 4.12040816
PetalLength Petal Length (mm) 18.28979592 8.26530612 22.08163265 7.31020408
PetalWidth Petal Width (mm) 5.57795918 4.12040816 7.31020408 3.91061224


 

Species = Virginica, DF = 49
Variable Label SepalLength SepalWidth PetalLength PetalWidth
SepalLength Sepal Length (mm) 40.43428571 9.37632653 30.32897959 4.90938776
SepalWidth Sepal Width (mm) 9.37632653 10.40040816 7.13795918 4.76285714
PetalLength Petal Length (mm) 30.32897959 7.13795918 30.45877551 4.88244898
PetalWidth Petal Width (mm) 4.90938776 4.76285714 4.88244898 7.54326531


 

Discriminant Analysis of Fisher (1936) Iris Data
Using Quadratic Discriminant Function

The DISCRIM Procedure

Pooled Within-Class Covariance Matrix, DF = 147
Variable Label SepalLength SepalWidth PetalLength PetalWidth
SepalLength Sepal Length (mm) 26.50081633 9.27210884 16.75142857 3.84013605
SepalWidth Sepal Width (mm) 9.27210884 11.53877551 5.52435374 3.27102041
PetalLength Petal Length (mm) 16.75142857 5.52435374 18.51877551 4.26653061
PetalWidth Petal Width (mm) 3.84013605 3.27102041 4.26653061 4.18816327



Within Covariance Matrix Information
Species Covariance
Matrix Rank
Natural Log of the
Determinant of the
Covariance Matrix
Setosa 4 5.35332
Versicolor 4 7.54636
Virginica 4 9.49362
Pooled 4 8.46214


Output 35.3.3: Homogeneity Test

Discriminant Analysis of Fisher (1936) Iris Data
Using Quadratic Discriminant Function

The DISCRIM Procedure
Test of Homogeneity of Within Covariance Matrices

Chi-Square DF Pr > ChiSq
140.943050 20 <.0001

Since the Chi-Square value is significant at the 0.1 level, the within covariance matrices will be used in the discriminant function.
Reference: Morrison, D.F. (1976) Multivariate Statistical Methods p252.




Output 35.3.4: Squared Distances

Discriminant Analysis of Fisher (1936) Iris Data
Using Quadratic Discriminant Function

The DISCRIM Procedure

Squared Distance to Species
From Species Setosa Versicolor Virginica
Setosa 0 103.19382 168.76759
Versicolor 323.06203 0 13.83875
Virginica 706.08494 17.86670 0

Generalized Squared Distance to Species
From Species Setosa Versicolor Virginica
Setosa 5.35332 110.74017 178.26121
Versicolor 328.41535 7.54636 23.33238
Virginica 711.43826 25.41306 9.49362



Output 35.3.5: Tests of Equal Class Means

Discriminant Analysis of Fisher (1936) Iris Data
Using Quadratic Discriminant Function

The DISCRIM Procedure

Univariate Test Statistics
F Statistics, Num DF=2, Den DF=147
Variable Label Total
Standard
Deviation
Pooled
Standard
Deviation
Between
Standard
Deviation
R-Square R-Square
/ (1-RSq)
F Value Pr > F
SepalLength Sepal Length (mm) 8.2807 5.1479 7.9506 0.6187 1.6226 119.26 <.0001
SepalWidth Sepal Width (mm) 4.3587 3.3969 3.3682 0.4008 0.6688 49.16 <.0001
PetalLength Petal Length (mm) 17.6530 4.3033 20.9070 0.9414 16.0566 1180.16 <.0001
PetalWidth Petal Width (mm) 7.6224 2.0465 8.9673 0.9289 13.0613 960.01 <.0001

Average R-Square
Unweighted 0.7224358
Weighted by Variance 0.8689444

Multivariate Statistics and F Approximations
S=2 M=0.5 N=71
Statistic Value F Value Num DF Den DF Pr > F
Wilks' Lambda 0.02343863 199.15 8 288 <.0001
Pillai's Trace 1.19189883 53.47 8 290 <.0001
Hotelling-Lawley Trace 32.47732024 582.20 8 203.4 <.0001
Roy's Greatest Root 32.19192920 1166.96 4 145 <.0001
NOTE: F Statistic for Roy's Greatest Root is an upper bound.
NOTE: F Statistic for Wilks' Lambda is exact.



Output 35.3.6: Misclassified Observations: Resubstitution

Discriminant Analysis of Fisher (1936) Iris Data
Using Quadratic Discriminant Function

The DISCRIM Procedure
Classification Results for Calibration Data: SASHELP.IRIS
Resubstitution Results using Quadratic Discriminant Function

Posterior Probability of Membership in Species
Obs From Species Classified into
Species
Setosa Versicolor Virginica
53 Versicolor Virginica * 0.0000 0.3359 0.6641
55 Versicolor Virginica * 0.0000 0.1543 0.8457
103 Virginica Versicolor * 0.0000 0.6050 0.3950

* Misclassified observation


Discriminant Analysis of Fisher (1936) Iris Data
Using Quadratic Discriminant Function

The DISCRIM Procedure
Classification Summary for Calibration Data: SASHELP.IRIS
Resubstitution Summary using Quadratic Discriminant Function

Number of Observations and Percent Classified
into Species
From Species Setosa Versicolor Virginica Total
Setosa
50
100.00
0
0.00
0
0.00
50
100.00
Versicolor
0
0.00
48
96.00
2
4.00
50
100.00
Virginica
0
0.00
1
2.00
49
98.00
50
100.00
Total
50
33.33
49
32.67
51
34.00
150
100.00
Priors
0.33333
 
0.33333
 
0.33333
 
 
 

Error Count Estimates for Species
  Setosa Versicolor Virginica Total
Rate 0.0000 0.0400 0.0200 0.0200
Priors 0.3333 0.3333 0.3333  



Output 35.3.7: Misclassified Observations: Cross Validation

Discriminant Analysis of Fisher (1936) Iris Data
Using Quadratic Discriminant Function

The DISCRIM Procedure
Classification Results for Calibration Data: SASHELP.IRIS
Cross-validation Results using Quadratic Discriminant Function

Posterior Probability of Membership in Species
Obs From Species Classified into
Species
Setosa Versicolor Virginica
52 Versicolor Virginica * 0.0000 0.3134 0.6866
53 Versicolor Virginica * 0.0000 0.1616 0.8384
55 Versicolor Virginica * 0.0000 0.0713 0.9287
103 Virginica Versicolor * 0.0000 0.6632 0.3368

* Misclassified observation


Discriminant Analysis of Fisher (1936) Iris Data
Using Quadratic Discriminant Function

The DISCRIM Procedure
Classification Summary for Calibration Data: SASHELP.IRIS
Cross-validation Summary using Quadratic Discriminant Function

Number of Observations and Percent Classified
into Species
From Species Setosa Versicolor Virginica Total
Setosa
50
100.00
0
0.00
0
0.00
50
100.00
Versicolor
0
0.00
47
94.00
3
6.00
50
100.00
Virginica
0
0.00
1
2.00
49
98.00
50
100.00
Total
50
33.33
48
32.00
52
34.67
150
100.00
Priors
0.33333
 
0.33333
 
0.33333
 
 
 

Error Count Estimates for Species
  Setosa Versicolor Virginica Total
Rate 0.0000 0.0600 0.0200 0.0267
Priors 0.3333 0.3333 0.3333  



Output 35.3.8: Output Statistics from Iris Data

Discriminant Analysis of Fisher (1936) Iris Data
Output Discriminant Statistics

Obs Species _TYPE_ _NAME_ SepalLength SepalWidth PetalLength PetalWidth
1   N   150.00 150.00 150.00 150.00
2 Setosa N   50.00 50.00 50.00 50.00
3 Versicolor N   50.00 50.00 50.00 50.00
4 Virginica N   50.00 50.00 50.00 50.00
5   MEAN   58.43 30.57 37.58 11.99
6 Setosa MEAN   50.06 34.28 14.62 2.46
7 Versicolor MEAN   59.36 27.70 42.60 13.26
8 Virginica MEAN   65.88 29.74 55.52 20.26
9 Setosa PRIOR   0.33 0.33 0.33 0.33
10 Versicolor PRIOR   0.33 0.33 0.33 0.33
11 Virginica PRIOR   0.33 0.33 0.33 0.33
12 Setosa CSSCP SepalLength 608.82 486.16 80.14 50.62
13 Setosa CSSCP SepalWidth 486.16 704.08 57.32 45.56
14 Setosa CSSCP PetalLength 80.14 57.32 147.78 29.74
15 Setosa CSSCP PetalWidth 50.62 45.56 29.74 54.42
16 Versicolor CSSCP SepalLength 1305.52 417.40 896.20 273.32
17 Versicolor CSSCP SepalWidth 417.40 482.50 405.00 201.90
18 Versicolor CSSCP PetalLength 896.20 405.00 1082.00 358.20
19 Versicolor CSSCP PetalWidth 273.32 201.90 358.20 191.62
20 Virginica CSSCP SepalLength 1981.28 459.44 1486.12 240.56
21 Virginica CSSCP SepalWidth 459.44 509.62 349.76 233.38
22 Virginica CSSCP PetalLength 1486.12 349.76 1492.48 239.24
23 Virginica CSSCP PetalWidth 240.56 233.38 239.24 369.62
24   PSSCP SepalLength 3895.62 1363.00 2462.46 564.50
25   PSSCP SepalWidth 1363.00 1696.20 812.08 480.84
26   PSSCP PetalLength 2462.46 812.08 2722.26 627.18
27   PSSCP PetalWidth 564.50 480.84 627.18 615.66
28   BSSCP SepalLength 6321.21 -1995.27 16524.84 7127.93
29   BSSCP SepalWidth -1995.27 1134.49 -5723.96 -2293.27
30   BSSCP PetalLength 16524.84 -5723.96 43710.28 18677.40
31   BSSCP PetalWidth 7127.93 -2293.27 18677.40 8041.33
32   CSSCP SepalLength 10216.83 -632.27 18987.30 7692.43
33   CSSCP SepalWidth -632.27 2830.69 -4911.88 -1812.43
34   CSSCP PetalLength 18987.30 -4911.88 46432.54 19304.58
35   CSSCP PetalWidth 7692.43 -1812.43 19304.58 8656.99
36   RSQUARED   0.62 0.40 0.94 0.93
37 Setosa COV SepalLength 12.42 9.92 1.64 1.03
38 Setosa COV SepalWidth 9.92 14.37 1.17 0.93
39 Setosa COV PetalLength 1.64 1.17 3.02 0.61
40 Setosa COV PetalWidth 1.03 0.93 0.61 1.11
41 Versicolor COV SepalLength 26.64 8.52 18.29 5.58
42 Versicolor COV SepalWidth 8.52 9.85 8.27 4.12
43 Versicolor COV PetalLength 18.29 8.27 22.08 7.31
44 Versicolor COV PetalWidth 5.58 4.12 7.31 3.91
45 Virginica COV SepalLength 40.43 9.38 30.33 4.91
46 Virginica COV SepalWidth 9.38 10.40 7.14 4.76
47 Virginica COV PetalLength 30.33 7.14 30.46 4.88
48 Virginica COV PetalWidth 4.91 4.76 4.88 7.54
49   PCOV SepalLength 26.50 9.27 16.75 3.84
50   PCOV SepalWidth 9.27 11.54 5.52 3.27
51   PCOV PetalLength 16.75 5.52 18.52 4.27
52   PCOV PetalWidth 3.84 3.27 4.27 4.19
53   BCOV SepalLength 63.21 -19.95 165.25 71.28
54   BCOV SepalWidth -19.95 11.34 -57.24 -22.93
55   BCOV PetalLength 165.25 -57.24 437.10 186.77
56   BCOV PetalWidth 71.28 -22.93 186.77 80.41
57   COV SepalLength 68.57 -4.24 127.43 51.63
58   COV SepalWidth -4.24 19.00 -32.97 -12.16
59   COV PetalLength 127.43 -32.97 311.63 129.56
60   COV PetalWidth 51.63 -12.16 129.56 58.10
61 Setosa STD   3.52 3.79 1.74 1.05
62 Versicolor STD   5.16 3.14 4.70 1.98
63 Virginica STD   6.36 3.22 5.52 2.75
64   PSTD   5.15 3.40 4.30 2.05
65   BSTD   7.95 3.37 20.91 8.97
66   STD   8.28 4.36 17.65 7.62
67 Setosa CORR SepalLength 1.00 0.74 0.27 0.28
68 Setosa CORR SepalWidth 0.74 1.00 0.18 0.23
69 Setosa CORR PetalLength 0.27 0.18 1.00 0.33
70 Setosa CORR PetalWidth 0.28 0.23 0.33 1.00
71 Versicolor CORR SepalLength 1.00 0.53 0.75 0.55
72 Versicolor CORR SepalWidth 0.53 1.00 0.56 0.66
73 Versicolor CORR PetalLength 0.75 0.56 1.00 0.79
74 Versicolor CORR PetalWidth 0.55 0.66 0.79 1.00
75 Virginica CORR SepalLength 1.00 0.46 0.86 0.28
76 Virginica CORR SepalWidth 0.46 1.00 0.40 0.54
77 Virginica CORR PetalLength 0.86 0.40 1.00 0.32
78 Virginica CORR PetalWidth 0.28 0.54 0.32 1.00
79   PCORR SepalLength 1.00 0.53 0.76 0.36
80   PCORR SepalWidth 0.53 1.00 0.38 0.47
81   PCORR PetalLength 0.76 0.38 1.00 0.48
82   PCORR PetalWidth 0.36 0.47 0.48 1.00
83   BCORR SepalLength 1.00 -0.75 0.99 1.00
84   BCORR SepalWidth -0.75 1.00 -0.81 -0.76
85   BCORR PetalLength 0.99 -0.81 1.00 1.00
86   BCORR PetalWidth 1.00 -0.76 1.00 1.00
87   CORR SepalLength 1.00 -0.12 0.87 0.82
88   CORR SepalWidth -0.12 1.00 -0.43 -0.37
89   CORR PetalLength 0.87 -0.43 1.00 0.96
90   CORR PetalWidth 0.82 -0.37 0.96 1.00
91 Setosa STDMEAN   -1.01 0.85 -1.30 -1.25
92 Versicolor STDMEAN   0.11 -0.66 0.28 0.17
93 Virginica STDMEAN   0.90 -0.19 1.02 1.08
94 Setosa PSTDMEAN   -1.63 1.09 -5.34 -4.66
95 Versicolor PSTDMEAN   0.18 -0.85 1.17 0.62
96 Virginica PSTDMEAN   1.45 -0.25 4.17 4.04
97   LNDETERM   8.46 8.46 8.46 8.46
98 Setosa LNDETERM   5.35 5.35 5.35 5.35
99 Versicolor LNDETERM   7.55 7.55 7.55 7.55
100 Virginica LNDETERM   9.49 9.49 9.49 9.49
101 Setosa QUAD SepalLength -0.09 0.06 0.02 0.02
102 Setosa QUAD SepalWidth 0.06 -0.08 -0.01 0.01
103 Setosa QUAD PetalLength 0.02 -0.01 -0.19 0.09
104 Setosa QUAD PetalWidth 0.02 0.01 0.09 -0.53
105 Setosa QUAD _LINEAR_ 4.46 -0.76 3.36 -3.13
106 Setosa QUAD _CONST_ -121.83 -121.83 -121.83 -121.83
107 Versicolor QUAD SepalLength -0.05 0.02 0.04 -0.03
108 Versicolor QUAD SepalWidth 0.02 -0.10 -0.01 0.10
109 Versicolor QUAD PetalLength 0.04 -0.01 -0.10 0.13
110 Versicolor QUAD PetalWidth -0.03 0.10 0.13 -0.44
111 Versicolor QUAD _LINEAR_ 1.80 1.60 0.33 -1.47
112 Versicolor QUAD _CONST_ -76.55 -76.55 -76.55 -76.55
113 Virginica QUAD SepalLength -0.05 0.02 0.05 -0.01
114 Virginica QUAD SepalWidth 0.02 -0.08 -0.01 0.04
115 Virginica QUAD PetalLength 0.05 -0.01 -0.07 0.01
116 Virginica QUAD PetalWidth -0.01 0.04 0.01 -0.10
117 Virginica QUAD _LINEAR_ 0.74 1.32 0.62 0.97
118 Virginica QUAD _CONST_ -75.82 -75.82 -75.82 -75.82