Example 85.1 Performing a Stepwise Discriminant Analysis
The iris data published by Fisher (1936) have been widely used for examples in discriminant analysis and cluster analysis. The sepal length, sepal width, petal length, and petal width are measured in millimeters on 50 iris specimens from each of three species: Iris setosa, I. versicolor, and I. virginica. The iris data set is available from the Sashelp library.
A stepwise discriminant analysis is performed by using stepwise selection.
In the PROC STEPDISC statement, the BSSCP and TSSCP options display the between-class SSCP matrix and the total-sample corrected SSCP matrix. By default, the significance level of an F test from an analysis of covariance is used as the selection criterion. The variable under consideration is the dependent variable, and the variables already chosen act as covariates. The following SAS statements produce Output 85.1.1 through Output 85.1.8:
title 'Fisher (1936) Iris Data';
%let _stdvar = ;
proc stepdisc data=sashelp.iris bsscp tsscp;
class Species;
var SepalLength SepalWidth PetalLength PetalWidth;
run;
Output 85.1.1
Iris Data: Summary Information
Setosa |
50 |
50.0000 |
0.333333 |
Versicolor |
50 |
50.0000 |
0.333333 |
Virginica |
50 |
50.0000 |
0.333333 |
Output 85.1.2
Iris Data: Between-Class and Total-Sample SSCP Matrices
The STEPDISC Procedure
Sepal Length (mm) |
6321.21333 |
-1995.26667 |
16524.84000 |
7127.93333 |
Sepal Width (mm) |
-1995.26667 |
1134.49333 |
-5723.96000 |
-2293.26667 |
Petal Length (mm) |
16524.84000 |
-5723.96000 |
43710.28000 |
18677.40000 |
Petal Width (mm) |
7127.93333 |
-2293.26667 |
18677.40000 |
8041.33333 |
Sepal Length (mm) |
10216.83333 |
-632.26667 |
18987.30000 |
7692.43333 |
Sepal Width (mm) |
-632.26667 |
2830.69333 |
-4911.88000 |
-1812.42667 |
Petal Length (mm) |
18987.30000 |
-4911.88000 |
46432.54000 |
19304.58000 |
Petal Width (mm) |
7692.43333 |
-1812.42667 |
19304.58000 |
8656.99333 |
In step 1, the tolerance is 1.0 for each variable under consideration because no variables have yet entered the model. The variable PetalLength is selected because its F statistic, 1180.161, is the largest among all variables.
Output 85.1.3
Iris Data: Stepwise Selection Step 1
The STEPDISC Procedure
Stepwise Selection: Step 1
Sepal Length (mm) |
0.6187 |
119.26 |
<.0001 |
1.0000 |
Sepal Width (mm) |
0.4008 |
49.16 |
<.0001 |
1.0000 |
Petal Length (mm) |
0.9414 |
1180.16 |
<.0001 |
1.0000 |
Petal Width (mm) |
0.9289 |
960.01 |
<.0001 |
1.0000 |
Variable PetalLength will be entered. |
0.058628 |
1180.16 |
2 |
147 |
<.0001 |
0.941372 |
1180.16 |
2 |
147 |
<.0001 |
0.470686 |
|
|
|
|
In step 2, with the variable PetalLength already in the model, PetalLength is tested for removal before a new variable is selected for entry. Since PetalLength meets the criterion to stay, it is used as a covariate in the analysis of covariance for variable selection. The variable SepalWidth is selected because its F statistic, 43.035, is the largest among all variables not in the model and because its associated tolerance, 0.8164, meets the criterion to enter. The process is repeated in steps 3 and 4. The variable PetalWidth is entered in step 3, and the variable SepalLength is entered in step 4.
Output 85.1.4
Iris Data: Stepwise Selection Step 2
The STEPDISC Procedure
Stepwise Selection: Step 2
Petal Length (mm) |
0.9414 |
1180.16 |
<.0001 |
No variables can be removed. |
Sepal Length (mm) |
0.3198 |
34.32 |
<.0001 |
0.2400 |
Sepal Width (mm) |
0.3709 |
43.04 |
<.0001 |
0.8164 |
Petal Width (mm) |
0.2533 |
24.77 |
<.0001 |
0.0729 |
Variable SepalWidth will be entered. |
0.036884 |
307.10 |
4 |
292 |
<.0001 |
1.119908 |
93.53 |
4 |
294 |
<.0001 |
0.559954 |
|
|
|
|
Output 85.1.5
Iris Data: Stepwise Selection Step 3
The STEPDISC Procedure
Stepwise Selection: Step 3
Sepal Width (mm) |
0.3709 |
43.04 |
<.0001 |
Petal Length (mm) |
0.9384 |
1112.95 |
<.0001 |
No variables can be removed. |
Sepal Length (mm) |
0.1447 |
12.27 |
<.0001 |
0.1323 |
Petal Width (mm) |
0.3229 |
34.57 |
<.0001 |
0.0662 |
Variable PetalWidth will be entered. |
SepalWidth |
PetalLength |
PetalWidth |
0.024976 |
257.50 |
6 |
290 |
<.0001 |
1.189914 |
71.49 |
6 |
292 |
<.0001 |
0.594957 |
|
|
|
|
Output 85.1.6
Iris Data: Stepwise Selection Step 4
The STEPDISC Procedure
Stepwise Selection: Step 4
Sepal Width (mm) |
0.4295 |
54.58 |
<.0001 |
Petal Length (mm) |
0.3482 |
38.72 |
<.0001 |
Petal Width (mm) |
0.3229 |
34.57 |
<.0001 |
No variables can be removed. |
Sepal Length (mm) |
0.0615 |
4.72 |
0.0103 |
0.0320 |
Variable SepalLength will be entered. |
All variables have been entered. |
0.023439 |
199.15 |
8 |
288 |
<.0001 |
1.191899 |
53.47 |
8 |
290 |
<.0001 |
0.595949 |
|
|
|
|
Since no more variables can be added to or removed from the model, the procedure stops at step 5 and displays a summary of the selection process.
Output 85.1.7
Iris Data: Stepwise Selection Step 5
The STEPDISC Procedure
Stepwise Selection: Step 5
Sepal Length (mm) |
0.0615 |
4.72 |
0.0103 |
Sepal Width (mm) |
0.2335 |
21.94 |
<.0001 |
Petal Length (mm) |
0.3308 |
35.59 |
<.0001 |
Petal Width (mm) |
0.2570 |
24.90 |
<.0001 |
No variables can be removed. |
Output 85.1.8
Iris Data: Stepwise Selection Summary
No further steps are possible. |
The STEPDISC Procedure
1 |
PetalLength |
|
Petal Length (mm) |
0.9414 |
1180.16 |
<.0001 |
0.05862828 |
<.0001 |
0.47068586 |
<.0001 |
2 |
SepalWidth |
|
Sepal Width (mm) |
0.3709 |
43.04 |
<.0001 |
0.03688411 |
<.0001 |
0.55995394 |
<.0001 |
3 |
PetalWidth |
|
Petal Width (mm) |
0.3229 |
34.57 |
<.0001 |
0.02497554 |
<.0001 |
0.59495691 |
<.0001 |
4 |
SepalLength |
|
Sepal Length (mm) |
0.0615 |
4.72 |
0.0103 |
0.02343863 |
<.0001 |
0.59594941 |
<.0001 |
PROC STEPDISC automatically creates a list of the selected variables and stores it in a macro variable. You can submit the following statement to see the list of selected variables:
* print the macro variable list;
%put &_stdvar;
The macro variable _StdVar contains the following variable list:
SepalLength SepalWidth PetalLength PetalWidth
You could use this macro variable if you want to analyze these variables in subsequent steps as follows:
proc discrim data=sashelp.iris;
class Species;
var &_stdvar;
run;
The results of this step are not shown.