The iris data published by Fisher (1936) have been widely used for examples in discriminant analysis and cluster analysis. The sepal length, sepal width, petal length,
and petal width are measured in millimeters on 50 iris specimens from each of three species: Iris setosa, I. versicolor, and I. virginica. The iris data set is available from the Sashelp library.
A stepwise discriminant analysis is performed by using stepwise selection.
In the PROC STEPDISC statement, the BSSCP and TSSCP options display the between-class SSCP matrix and the total-sample corrected SSCP matrix. By default, the significance level of an F test from an analysis of covariance is used as the selection criterion. The variable under consideration is the dependent variable, and the variables already chosen act as covariates. The following SAS statements produce Output 96.1.1 through Output 96.1.8:
title 'Fisher (1936) Iris Data'; %let _stdvar = ; proc stepdisc data=sashelp.iris bsscp tsscp; class Species; var SepalLength SepalWidth PetalLength PetalWidth; run;
Output 96.1.2: Iris Data: Between-Class and Total-Sample SSCP Matrices
| Fisher (1936) Iris Data |
| Between-Class SSCP Matrix | |||||
|---|---|---|---|---|---|
| Variable | Label | SepalLength | SepalWidth | PetalLength | PetalWidth |
| SepalLength | Sepal Length (mm) | 6321.21333 | -1995.26667 | 16524.84000 | 7127.93333 |
| SepalWidth | Sepal Width (mm) | -1995.26667 | 1134.49333 | -5723.96000 | -2293.26667 |
| PetalLength | Petal Length (mm) | 16524.84000 | -5723.96000 | 43710.28000 | 18677.40000 |
| PetalWidth | Petal Width (mm) | 7127.93333 | -2293.26667 | 18677.40000 | 8041.33333 |
| Total-Sample SSCP Matrix | |||||
|---|---|---|---|---|---|
| Variable | Label | SepalLength | SepalWidth | PetalLength | PetalWidth |
| SepalLength | Sepal Length (mm) | 10216.83333 | -632.26667 | 18987.30000 | 7692.43333 |
| SepalWidth | Sepal Width (mm) | -632.26667 | 2830.69333 | -4911.88000 | -1812.42667 |
| PetalLength | Petal Length (mm) | 18987.30000 | -4911.88000 | 46432.54000 | 19304.58000 |
| PetalWidth | Petal Width (mm) | 7692.43333 | -1812.42667 | 19304.58000 | 8656.99333 |
In step 1, the tolerance is 1.0 for each variable under consideration because no variables have yet entered the model. The
variable PetalLength is selected because its F statistic, 1180.161, is the largest among all variables.
Output 96.1.3: Iris Data: Stepwise Selection Step 1
| Fisher (1936) Iris Data |
| Statistics for Entry, DF = 2, 147 | |||||
|---|---|---|---|---|---|
| Variable | Label | R-Square | F Value | Pr > F | Tolerance |
| SepalLength | Sepal Length (mm) | 0.6187 | 119.26 | <.0001 | 1.0000 |
| SepalWidth | Sepal Width (mm) | 0.4008 | 49.16 | <.0001 | 1.0000 |
| PetalLength | Petal Length (mm) | 0.9414 | 1180.16 | <.0001 | 1.0000 |
| PetalWidth | Petal Width (mm) | 0.9289 | 960.01 | <.0001 | 1.0000 |
In step 2, with the variable PetalLength already in the model, PetalLength is tested for removal before a new variable is selected for entry. Since PetalLength meets the criterion to stay, it is used as a covariate in the analysis of covariance for variable selection. The variable
SepalWidth is selected because its F statistic, 43.035, is the largest among all variables not in the model and because its associated tolerance, 0.8164, meets
the criterion to enter. The process is repeated in steps 3 and 4. The variable PetalWidth is entered in step 3, and the variable SepalLength is entered in step 4.
Output 96.1.6: Iris Data: Stepwise Selection Step 4
| Fisher (1936) Iris Data |
| Statistics for Removal, DF = 2, 145 | ||||
|---|---|---|---|---|
| Variable | Label | Partial R-Square |
F Value | Pr > F |
| SepalWidth | Sepal Width (mm) | 0.4295 | 54.58 | <.0001 |
| PetalLength | Petal Length (mm) | 0.3482 | 38.72 | <.0001 |
| PetalWidth | Petal Width (mm) | 0.3229 | 34.57 | <.0001 |
Since no more variables can be added to or removed from the model, the procedure stops at step 5 and displays a summary of the selection process.
Output 96.1.7: Iris Data: Stepwise Selection Step 5
| Fisher (1936) Iris Data |
| Statistics for Removal, DF = 2, 144 | ||||
|---|---|---|---|---|
| Variable | Label | Partial R-Square |
F Value | Pr > F |
| SepalLength | Sepal Length (mm) | 0.0615 | 4.72 | 0.0103 |
| SepalWidth | Sepal Width (mm) | 0.2335 | 21.94 | <.0001 |
| PetalLength | Petal Length (mm) | 0.3308 | 35.59 | <.0001 |
| PetalWidth | Petal Width (mm) | 0.2570 | 24.90 | <.0001 |
Output 96.1.8: Iris Data: Stepwise Selection Summary
| Fisher (1936) Iris Data |
| Stepwise Selection Summary | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Step | Number In |
Entered | Removed | Label | Partial R-Square |
F Value | Pr > F | Wilks' Lambda |
Pr < Lambda |
Average Squared Canonical Correlation |
Pr > ASCC |
| 1 | 1 | PetalLength | Petal Length (mm) | 0.9414 | 1180.16 | <.0001 | 0.05862828 | <.0001 | 0.47068586 | <.0001 | |
| 2 | 2 | SepalWidth | Sepal Width (mm) | 0.3709 | 43.04 | <.0001 | 0.03688411 | <.0001 | 0.55995394 | <.0001 | |
| 3 | 3 | PetalWidth | Petal Width (mm) | 0.3229 | 34.57 | <.0001 | 0.02497554 | <.0001 | 0.59495691 | <.0001 | |
| 4 | 4 | SepalLength | Sepal Length (mm) | 0.0615 | 4.72 | 0.0103 | 0.02343863 | <.0001 | 0.59594941 | <.0001 | |
PROC STEPDISC automatically creates a list of the selected variables and stores it in a macro variable. You can submit the following statement to see the list of selected variables:
* print the macro variable list; %put &_stdvar;
The macro variable _StdVar contains the following variable list:
SepalLength SepalWidth PetalLength PetalWidth
You could use this macro variable if you want to analyze these variables in subsequent steps as follows:
proc discrim data=sashelp.iris; class Species; var &_stdvar; run;
The results of this step are not shown.