Previous Page | Next Page

The DISCRIM Procedure

Example 31.2 Bivariate Density Estimates and Posterior Probabilities

In this example, four more discriminant analyses of iris data are run with two quantitative variables: petal width and petal length. The following statements produce Output 31.2.1 through Output 31.2.5:

proc template;
   define statgraph scatter;
      begingraph;
         entrytitle 'Fisher (1936) Iris Data';
         layout overlayequated / equatetype=fit;
            scatterplot x=petallength y=petalwidth /
                        group=species name='iris';
            layout gridded / autoalign=(topleft);
               discretelegend 'iris' / border=false opaque=false;
            endlayout;
         endlayout;
      endgraph;
   end;
run;

proc sgrender data=iris template=scatter;
run;

The scatter plot in Output 31.2.1 shows the joint sample distribution.

Output 31.2.1 Joint Sample Distribution of Petal Width and Petal Length in Three Species
Joint Sample Distribution of Petal Width and Petal Length in Three Species

Another data set is created for plotting, containing a grid of points suitable for contour plots. The following statements create the data set:

data plotdata;
   do PetalLength = -2 to 72 by 0.5;
      do PetalWidth= - 5 to 32 by 0.5;
         output;
      end;
   end;
run;


Three macros are defined as follows to make contour plots of density estimates, posterior probabilities, and classification results:

%let close = thresholdmin=0 thresholdmax=0 offsetmin=0 offsetmax=0;
%let close = xaxisopts=(&close) yaxisopts=(&close);

proc template;
   define statgraph contour;
      begingraph;
         layout overlayequated / equatetype=equate &close;
            contourplotparm x=petallength y=petalwidth z=z /
                            contourtype=fill nhint=30;
            scatterplot x=pl y=pw / group=species name='iris'
                        includemissinggroup=false primary=true;
            layout gridded / autoalign=(topleft);
               discretelegend 'iris' / border=false opaque=false;
            endlayout;
         endlayout;
      endgraph;
   end;
run;

%macro contden;
   data contour(keep=PetalWidth PetalLength species z pl pw);
      merge plotd(in=d) iris(keep=PetalWidth PetalLength species
                             rename=(PetalWidth=pw PetalLength=pl));
      if d then z = max(setosa,versicolor,virginica);
   run;

   title3 'Plot of Estimated Densities';

   proc sgrender data=contour template=contour;
   run;
%mend;

%macro contprob;
   data posterior(keep=PetalWidth PetalLength species z pl pw _into_);
      merge plotp(in=d) iris(keep=PetalWidth PetalLength species
                             rename=(PetalWidth=pw PetalLength=pl));
      if d then z = max(setosa,versicolor,virginica);
   run;

   title3 'Plot of Posterior Probabilities ';

   proc sgrender data=posterior template=contour;
   run;
%mend;

%macro contclass;
   title3 'Plot of Classification Results';

   proc sgrender data=posterior(drop=z rename=(_into_=z)) template=contour;
   run;
%mend;

A normal-theory analysis (METHOD=NORMAL) assuming equal covariance matrices (POOL=YES) illustrates the linearity of the classification boundaries. These statements produce Output 31.2.2:

title2 'Using Normal Density Estimates with Equal Variance';

proc discrim data=iris method=normal pool=yes
             testdata=plotdata testout=plotp testoutd=plotd
             short noclassify crosslisterr;
   class Species;
   var Petal:;
run;

%contden;
%contprob;
%contclass;

Output 31.2.2 Normal Density Estimates with Equal Variance
Discriminant Analysis of Fisher (1936) Iris Data
Using Normal Density Estimates with Equal Variance

The DISCRIM Procedure

Total Sample Size 150 DF Total 149
Variables 2 DF Within Classes 147
Classes 3 DF Between Classes 2

Number of Observations Read 150
Number of Observations Used 150

Class Level Information
Species Variable
Name
Frequency Weight Proportion Prior
Probability
Setosa Setosa 50 50.0000 0.333333 0.333333
Versicolor Versicolor 50 50.0000 0.333333 0.333333
Virginica Virginica 50 50.0000 0.333333 0.333333

Discriminant Analysis of Fisher (1936) Iris Data
Using Normal Density Estimates with Equal Variance

The DISCRIM Procedure
Classification Results for Calibration Data: WORK.IRIS
Cross-validation Results using Linear Discriminant Function

Posterior Probability of Membership in Species
Obs From Species Classified into
Species
Setosa Versicolor Virginica
5 Virginica Versicolor * 0.0000 0.8453 0.1547
9 Versicolor Virginica * 0.0000 0.2130 0.7870
25 Virginica Versicolor * 0.0000 0.8322 0.1678
57 Virginica Versicolor * 0.0000 0.8057 0.1943
91 Virginica Versicolor * 0.0000 0.8903 0.1097
148 Versicolor Virginica * 0.0000 0.3118 0.6882

* Misclassified observation


Discriminant Analysis of Fisher (1936) Iris Data
Using Normal Density Estimates with Equal Variance

The DISCRIM Procedure
Classification Summary for Calibration Data: WORK.IRIS
Cross-validation Summary using Linear Discriminant Function

Number of Observations and Percent Classified
into Species
From Species Setosa Versicolor Virginica Total
Setosa
50
100.00
0
0.00
0
0.00
50
100.00
Versicolor
0
0.00
48
96.00
2
4.00
50
100.00
Virginica
0
0.00
4
8.00
46
92.00
50
100.00
Total
50
33.33
52
34.67
48
32.00
150
100.00
Priors
0.33333
 
0.33333
 
0.33333
 
 
 

Error Count Estimates for Species
  Setosa Versicolor Virginica Total
Rate 0.0000 0.0400 0.0800 0.0400
Priors 0.3333 0.3333 0.3333  

Discriminant Analysis of Fisher (1936) Iris Data
Using Normal Density Estimates with Equal Variance

The DISCRIM Procedure
Classification Summary for Test Data: WORK.PLOTDATA
Classification Summary using Linear Discriminant Function

Observation Profile for Test Data
Number of Observations Read 11175
Number of Observations Used 11175

Number of Observations and Percent Classified
into Species
  Setosa Versicolor Virginica Total
Total
3670
32.84
4243
37.97
3262
29.19
11175
100.00
Priors
0.33333
 
0.33333
 
0.33333
 
 
 

disx2fcdisx2fc, continueddisx2fc, continued

A normal-theory analysis assuming unequal covariance matrices (POOL=NO) illustrates quadratic classification boundaries. These statements produce Output 31.2.3:

title2 'Using Normal Density Estimates with Unequal Variance';

proc discrim data=iris method=normal pool=no
             testdata=plotdata testout=plotp testoutd=plotd
             short noclassify crosslisterr;
   class Species;
   var Petal:;
run;

%contden;
%contprob;
%contclass;

Output 31.2.3 Normal Density Estimates with Unequal Variance
Discriminant Analysis of Fisher (1936) Iris Data
Using Normal Density Estimates with Unequal Variance

The DISCRIM Procedure

Total Sample Size 150 DF Total 149
Variables 2 DF Within Classes 147
Classes 3 DF Between Classes 2

Number of Observations Read 150
Number of Observations Used 150

Class Level Information
Species Variable
Name
Frequency Weight Proportion Prior
Probability
Setosa Setosa 50 50.0000 0.333333 0.333333
Versicolor Versicolor 50 50.0000 0.333333 0.333333
Virginica Virginica 50 50.0000 0.333333 0.333333

Discriminant Analysis of Fisher (1936) Iris Data
Using Normal Density Estimates with Unequal Variance

The DISCRIM Procedure
Classification Results for Calibration Data: WORK.IRIS
Cross-validation Results using Quadratic Discriminant Function

Posterior Probability of Membership in Species
Obs From Species Classified into
Species
Setosa Versicolor Virginica
5 Virginica Versicolor * 0.0000 0.7288 0.2712
9 Versicolor Virginica * 0.0000 0.0903 0.9097
25 Virginica Versicolor * 0.0000 0.5196 0.4804
91 Virginica Versicolor * 0.0000 0.8335 0.1665
148 Versicolor Virginica * 0.0000 0.4675 0.5325

* Misclassified observation


Discriminant Analysis of Fisher (1936) Iris Data
Using Normal Density Estimates with Unequal Variance

The DISCRIM Procedure
Classification Summary for Calibration Data: WORK.IRIS
Cross-validation Summary using Quadratic Discriminant Function

Number of Observations and Percent Classified
into Species
From Species Setosa Versicolor Virginica Total
Setosa
50
100.00
0
0.00
0
0.00
50
100.00
Versicolor
0
0.00
48
96.00
2
4.00
50
100.00
Virginica
0
0.00
3
6.00
47
94.00
50
100.00
Total
50
33.33
51
34.00
49
32.67
150
100.00
Priors
0.33333
 
0.33333
 
0.33333
 
 
 

Error Count Estimates for Species
  Setosa Versicolor Virginica Total
Rate 0.0000 0.0400 0.0600 0.0333
Priors 0.3333 0.3333 0.3333  

Discriminant Analysis of Fisher (1936) Iris Data
Using Normal Density Estimates with Unequal Variance

The DISCRIM Procedure
Classification Summary for Test Data: WORK.PLOTDATA
Classification Summary using Quadratic Discriminant Function

Observation Profile for Test Data
Number of Observations Read 11175
Number of Observations Used 11175

Number of Observations and Percent Classified
into Species
  Setosa Versicolor Virginica Total
Total
1382
12.37
1345
12.04
8448
75.60
11175
100.00
Priors
0.33333
 
0.33333
 
0.33333
 
 
 

disx2kcdisx2kc, continueddisx2kc, continued

A nonparametric analysis (METHOD=NPAR) follows, using normal kernels (KERNEL=NORMAL) and equal bandwidths (POOL=YES) in each class. The value of the radius parameter that, assuming normality, minimizes an approximate mean integrated square error is (see the section Nonparametric Methods). These statements produce Output 31.2.4:

title2 'Using Kernel Density Estimates with Equal Bandwidth';

proc discrim data=iris method=npar kernel=normal
             r=.5 pool=yes testoutd=plotd
             testdata=plotdata testout=plotp
             short noclassify crosslisterr;
   class Species;
   var Petal:;
run;

%contden;
%contprob;
%contclass;

Output 31.2.4 Kernel Density Estimates with Equal Bandwidth
Discriminant Analysis of Fisher (1936) Iris Data
Using Kernel Density Estimates with Equal Bandwidth

The DISCRIM Procedure

Total Sample Size 150 DF Total 149
Variables 2 DF Within Classes 147
Classes 3 DF Between Classes 2

Number of Observations Read 150
Number of Observations Used 150

Class Level Information
Species Variable
Name
Frequency Weight Proportion Prior
Probability
Setosa Setosa 50 50.0000 0.333333 0.333333
Versicolor Versicolor 50 50.0000 0.333333 0.333333
Virginica Virginica 50 50.0000 0.333333 0.333333

Discriminant Analysis of Fisher (1936) Iris Data
Using Kernel Density Estimates with Equal Bandwidth

The DISCRIM Procedure
Classification Results for Calibration Data: WORK.IRIS
Cross-validation Results using Normal Kernel Density

Posterior Probability of Membership in Species
Obs From Species Classified into
Species
Setosa Versicolor Virginica
5 Virginica Versicolor * 0.0000 0.7474 0.2526
9 Versicolor Virginica * 0.0000 0.0800 0.9200
25 Virginica Versicolor * 0.0000 0.5863 0.4137
91 Virginica Versicolor * 0.0000 0.8358 0.1642
148 Versicolor Virginica * 0.0000 0.4123 0.5877

* Misclassified observation


Discriminant Analysis of Fisher (1936) Iris Data
Using Kernel Density Estimates with Equal Bandwidth

The DISCRIM Procedure
Classification Summary for Calibration Data: WORK.IRIS
Cross-validation Summary using Normal Kernel Density

Number of Observations and Percent Classified
into Species
From Species Setosa Versicolor Virginica Total
Setosa
50
100.00
0
0.00
0
0.00
50
100.00
Versicolor
0
0.00
48
96.00
2
4.00
50
100.00
Virginica
0
0.00
3
6.00
47
94.00
50
100.00
Total
50
33.33
51
34.00
49
32.67
150
100.00
Priors
0.33333
 
0.33333
 
0.33333
 
 
 

Error Count Estimates for Species
  Setosa Versicolor Virginica Total
Rate 0.0000 0.0400 0.0600 0.0333
Priors 0.3333 0.3333 0.3333  

Discriminant Analysis of Fisher (1936) Iris Data
Using Kernel Density Estimates with Equal Bandwidth

The DISCRIM Procedure
Classification Summary for Test Data: WORK.PLOTDATA
Classification Summary using Normal Kernel Density

Observation Profile for Test Data
Number of Observations Read 11175
Number of Observations Used 11175

Number of Observations and Percent Classified
into Species
  Setosa Versicolor Virginica Total
Total
3195
28.59
2492
22.30
5488
49.11
11175
100.00
Priors
0.33333
 
0.33333
 
0.33333
 
 
 

disx2qcdisx2qc, continueddisx2qc, continued

Another nonparametric analysis is run with unequal bandwidths (POOL=NO). These statements produce Output 31.2.5:

title2 'Using Kernel Density Estimates with Unequal Bandwidth';

proc discrim data=iris method=npar kernel=normal
             r=.5 pool=no testoutd=plotd
             testdata=plotdata testout=plotp
             short noclassify crosslisterr;
   class Species;
   var Petal:;
run;

%contden;
%contprob;
%contclass;

Output 31.2.5 Kernel Density Estimates with Unequal Bandwidth
Discriminant Analysis of Fisher (1936) Iris Data
Using Kernel Density Estimates with Unequal Bandwidth

The DISCRIM Procedure

Total Sample Size 150 DF Total 149
Variables 2 DF Within Classes 147
Classes 3 DF Between Classes 2

Number of Observations Read 150
Number of Observations Used 150

Class Level Information
Species Variable
Name
Frequency Weight Proportion Prior
Probability
Setosa Setosa 50 50.0000 0.333333 0.333333
Versicolor Versicolor 50 50.0000 0.333333 0.333333
Virginica Virginica 50 50.0000 0.333333 0.333333

Discriminant Analysis of Fisher (1936) Iris Data
Using Kernel Density Estimates with Unequal Bandwidth

The DISCRIM Procedure
Classification Results for Calibration Data: WORK.IRIS
Cross-validation Results using Normal Kernel Density

Posterior Probability of Membership in Species
Obs From Species Classified into
Species
Setosa Versicolor Virginica
5 Virginica Versicolor * 0.0000 0.7826 0.2174
9 Versicolor Virginica * 0.0000 0.0506 0.9494
91 Virginica Versicolor * 0.0000 0.8802 0.1198
148 Versicolor Virginica * 0.0000 0.3726 0.6274

* Misclassified observation


Discriminant Analysis of Fisher (1936) Iris Data
Using Kernel Density Estimates with Unequal Bandwidth

The DISCRIM Procedure
Classification Summary for Calibration Data: WORK.IRIS
Cross-validation Summary using Normal Kernel Density

Number of Observations and Percent Classified
into Species
From Species Setosa Versicolor Virginica Total
Setosa
50
100.00
0
0.00
0
0.00
50
100.00
Versicolor
0
0.00
48
96.00
2
4.00
50
100.00
Virginica
0
0.00
2
4.00
48
96.00
50
100.00
Total
50
33.33
50
33.33
50
33.33
150
100.00
Priors
0.33333
 
0.33333
 
0.33333
 
 
 

Error Count Estimates for Species
  Setosa Versicolor Virginica Total
Rate 0.0000 0.0400 0.0400 0.0267
Priors 0.3333 0.3333 0.3333  

Discriminant Analysis of Fisher (1936) Iris Data
Using Kernel Density Estimates with Unequal Bandwidth

The DISCRIM Procedure
Classification Summary for Test Data: WORK.PLOTDATA
Classification Summary using Normal Kernel Density

Observation Profile for Test Data
Number of Observations Read 11175
Number of Observations Used 11175

Number of Observations and Percent Classified
into Species
  Setosa Versicolor Virginica Total
Total
1370
12.26
1505
13.47
8300
74.27
11175
100.00
Priors
0.33333
 
0.33333
 
0.33333
 
 
 

disx2vcdisx2vc, continueddisx2vc, continued

Previous Page | Next Page | Top of Page