Example 85.1 Performing a Stepwise Discriminant Analysis

The iris data published by Fisher (1936) have been widely used for examples in discriminant analysis and cluster analysis. The sepal length, sepal width, petal length, and petal width are measured in millimeters on 50 iris specimens from each of three species: Iris setosa, I. versicolor, and I. virginica. The iris data set is available from the Sashelp library.

A stepwise discriminant analysis is performed by using stepwise selection.

In the PROC STEPDISC statement, the BSSCP and TSSCP options display the between-class SSCP matrix and the total-sample corrected SSCP matrix. By default, the significance level of an F test from an analysis of covariance is used as the selection criterion. The variable under consideration is the dependent variable, and the variables already chosen act as covariates. The following SAS statements produce Output 85.1.1 through Output 85.1.8:

title 'Fisher (1936) Iris Data';

%let _stdvar = ;
proc stepdisc data=sashelp.iris bsscp tsscp;
   class Species;
   var SepalLength SepalWidth PetalLength PetalWidth;
run;

Output 85.1.1 Iris Data: Summary Information
Fisher (1936) Iris Data

The STEPDISC Procedure

The Method for Selecting Variables is STEPWISE
Total Sample Size 150 Variable(s) in the Analysis 4
Class Levels 3 Variable(s) Will Be Included 0
    Significance Level to Enter 0.15
    Significance Level to Stay 0.15

Number of Observations Read 150
Number of Observations Used 150

Class Level Information
Species Variable
Name
Frequency Weight Proportion
Setosa Setosa 50 50.0000 0.333333
Versicolor Versicolor 50 50.0000 0.333333
Virginica Virginica 50 50.0000 0.333333

Output 85.1.2 Iris Data: Between-Class and Total-Sample SSCP Matrices
Fisher (1936) Iris Data

The STEPDISC Procedure

Between-Class SSCP Matrix
Variable Label SepalLength SepalWidth PetalLength PetalWidth
SepalLength Sepal Length (mm) 6321.21333 -1995.26667 16524.84000 7127.93333
SepalWidth Sepal Width (mm) -1995.26667 1134.49333 -5723.96000 -2293.26667
PetalLength Petal Length (mm) 16524.84000 -5723.96000 43710.28000 18677.40000
PetalWidth Petal Width (mm) 7127.93333 -2293.26667 18677.40000 8041.33333

Total-Sample SSCP Matrix
Variable Label SepalLength SepalWidth PetalLength PetalWidth
SepalLength Sepal Length (mm) 10216.83333 -632.26667 18987.30000 7692.43333
SepalWidth Sepal Width (mm) -632.26667 2830.69333 -4911.88000 -1812.42667
PetalLength Petal Length (mm) 18987.30000 -4911.88000 46432.54000 19304.58000
PetalWidth Petal Width (mm) 7692.43333 -1812.42667 19304.58000 8656.99333


In step 1, the tolerance is 1.0 for each variable under consideration because no variables have yet entered the model. The variable PetalLength is selected because its F statistic, 1180.161, is the largest among all variables.

Output 85.1.3 Iris Data: Stepwise Selection Step 1
Fisher (1936) Iris Data

The STEPDISC Procedure
Stepwise Selection: Step 1

Statistics for Entry, DF = 2, 147
Variable Label R-Square F Value Pr > F Tolerance
SepalLength Sepal Length (mm) 0.6187 119.26 <.0001 1.0000
SepalWidth Sepal Width (mm) 0.4008 49.16 <.0001 1.0000
PetalLength Petal Length (mm) 0.9414 1180.16 <.0001 1.0000
PetalWidth Petal Width (mm) 0.9289 960.01 <.0001 1.0000

Variable PetalLength will be entered.

Variable(s) That
Have Been Entered
PetalLength

Multivariate Statistics
Statistic Value F Value Num DF Den DF Pr > F
Wilks' Lambda 0.058628 1180.16 2 147 <.0001
Pillai's Trace 0.941372 1180.16 2 147 <.0001
Average Squared Canonical Correlation 0.470686        

In step 2, with the variable PetalLength already in the model, PetalLength is tested for removal before a new variable is selected for entry. Since PetalLength meets the criterion to stay, it is used as a covariate in the analysis of covariance for variable selection. The variable SepalWidth is selected because its F statistic, 43.035, is the largest among all variables not in the model and because its associated tolerance, 0.8164, meets the criterion to enter. The process is repeated in steps 3 and 4. The variable PetalWidth is entered in step 3, and the variable SepalLength is entered in step 4.


Output 85.1.4 Iris Data: Stepwise Selection Step 2
Fisher (1936) Iris Data

The STEPDISC Procedure
Stepwise Selection: Step 2

Statistics for Removal, DF = 2, 147
Variable Label R-Square F Value Pr > F
PetalLength Petal Length (mm) 0.9414 1180.16 <.0001

No variables can be removed.

Statistics for Entry, DF = 2, 146
Variable Label Partial
R-Square
F Value Pr > F Tolerance
SepalLength Sepal Length (mm) 0.3198 34.32 <.0001 0.2400
SepalWidth Sepal Width (mm) 0.3709 43.04 <.0001 0.8164
PetalWidth Petal Width (mm) 0.2533 24.77 <.0001 0.0729

Variable SepalWidth will be entered.

Variable(s) That Have Been
Entered
SepalWidth PetalLength

Multivariate Statistics
Statistic Value F Value Num DF Den DF Pr > F
Wilks' Lambda 0.036884 307.10 4 292 <.0001
Pillai's Trace 1.119908 93.53 4 294 <.0001
Average Squared Canonical Correlation 0.559954        


Output 85.1.5 Iris Data: Stepwise Selection Step 3
Fisher (1936) Iris Data

The STEPDISC Procedure
Stepwise Selection: Step 3

Statistics for Removal, DF = 2, 146
Variable Label Partial
R-Square
F Value Pr > F
SepalWidth Sepal Width (mm) 0.3709 43.04 <.0001
PetalLength Petal Length (mm) 0.9384 1112.95 <.0001

No variables can be removed.

Statistics for Entry, DF = 2, 145
Variable Label Partial
R-Square
F Value Pr > F Tolerance
SepalLength Sepal Length (mm) 0.1447 12.27 <.0001 0.1323
PetalWidth Petal Width (mm) 0.3229 34.57 <.0001 0.0662

Variable PetalWidth will be entered.

Variable(s) That Have Been Entered
SepalWidth PetalLength PetalWidth

Multivariate Statistics
Statistic Value F Value Num DF Den DF Pr > F
Wilks' Lambda 0.024976 257.50 6 290 <.0001
Pillai's Trace 1.189914 71.49 6 292 <.0001
Average Squared Canonical Correlation 0.594957        


Output 85.1.6 Iris Data: Stepwise Selection Step 4
Fisher (1936) Iris Data

The STEPDISC Procedure
Stepwise Selection: Step 4

Statistics for Removal, DF = 2, 145
Variable Label Partial
R-Square
F Value Pr > F
SepalWidth Sepal Width (mm) 0.4295 54.58 <.0001
PetalLength Petal Length (mm) 0.3482 38.72 <.0001
PetalWidth Petal Width (mm) 0.3229 34.57 <.0001

No variables can be removed.

Statistics for Entry, DF = 2, 144
Variable Label Partial
R-Square
F Value Pr > F Tolerance
SepalLength Sepal Length (mm) 0.0615 4.72 0.0103 0.0320

Variable SepalLength will be entered.

All variables have been entered.

Multivariate Statistics
Statistic Value F Value Num DF Den DF Pr > F
Wilks' Lambda 0.023439 199.15 8 288 <.0001
Pillai's Trace 1.191899 53.47 8 290 <.0001
Average Squared Canonical Correlation 0.595949        

Since no more variables can be added to or removed from the model, the procedure stops at step 5 and displays a summary of the selection process.

Output 85.1.7 Iris Data: Stepwise Selection Step 5
Fisher (1936) Iris Data

The STEPDISC Procedure
Stepwise Selection: Step 5

Statistics for Removal, DF = 2, 144
Variable Label Partial
R-Square
F Value Pr > F
SepalLength Sepal Length (mm) 0.0615 4.72 0.0103
SepalWidth Sepal Width (mm) 0.2335 21.94 <.0001
PetalLength Petal Length (mm) 0.3308 35.59 <.0001
PetalWidth Petal Width (mm) 0.2570 24.90 <.0001

No variables can be removed.

Output 85.1.8 Iris Data: Stepwise Selection Summary
No further steps are possible.

Fisher (1936) Iris Data

The STEPDISC Procedure

Stepwise Selection Summary
Step Number
In
Entered Removed Label Partial
R-Square
F Value Pr > F Wilks'
Lambda
Pr <
Lambda
Average
Squared
Canonical
Correlation
Pr >
ASCC
1 1 PetalLength   Petal Length (mm) 0.9414 1180.16 <.0001 0.05862828 <.0001 0.47068586 <.0001
2 2 SepalWidth   Sepal Width (mm) 0.3709 43.04 <.0001 0.03688411 <.0001 0.55995394 <.0001
3 3 PetalWidth   Petal Width (mm) 0.3229 34.57 <.0001 0.02497554 <.0001 0.59495691 <.0001
4 4 SepalLength   Sepal Length (mm) 0.0615 4.72 0.0103 0.02343863 <.0001 0.59594941 <.0001


PROC STEPDISC automatically creates a list of the selected variables and stores it in a macro variable. You can submit the following statement to see the list of selected variables:

* print the macro variable list;
%put &_stdvar;

The macro variable _StdVar contains the following variable list:

   SepalLength SepalWidth PetalLength PetalWidth

You could use this macro variable if you want to analyze these variables in subsequent steps as follows:

proc discrim data=sashelp.iris;
   class Species;
   var &_stdvar;
run;

The results of this step are not shown.