The STEPDISC Procedure |
The iris data published by Fisher (1936) have been widely used for examples in discriminant analysis and cluster analysis. The sepal length, sepal width, petal length, and petal width are measured in millimeters on 50 iris specimens from each of three species: Iris setosa, I. versicolor, and I. virginica.
title 'Fisher (1936) Iris Data'; proc format; value specname 1='Setosa ' 2='Versicolor' 3='Virginica '; run; data iris; input SepalLength SepalWidth PetalLength PetalWidth Species @@; format Species specname.; label SepalLength='Sepal Length in mm.' SepalWidth ='Sepal Width in mm.' PetalLength='Petal Length in mm.' PetalWidth ='Petal Width in mm.'; datalines; 50 33 14 02 1 64 28 56 22 3 65 28 46 15 2 67 31 56 24 3 63 28 51 15 3 46 34 14 03 1 69 31 51 23 3 62 22 45 15 2 59 32 48 18 2 46 36 10 02 1 61 30 46 14 2 60 27 51 16 2 ... more lines ... 63 33 60 25 3 53 37 15 02 1 ;
A stepwise discriminant analysis is performed by using stepwise selection.
In the PROC STEPDISC statement, the BSSCP and TSSCP options display the between-class SSCP matrix and the total-sample corrected SSCP matrix. By default, the significance level of an test from an analysis of covariance is used as the selection criterion. The variable under consideration is the dependent variable, and the variables already chosen act as covariates. The following SAS statements produce Output 82.1.1 through Output 82.1.8:
%let _stdvar = ; proc stepdisc data=iris bsscp tsscp; class Species; var SepalLength SepalWidth PetalLength PetalWidth; run;
The Method for Selecting Variables is STEPWISE | |||
---|---|---|---|
Total Sample Size | 150 | Variable(s) in the Analysis | 4 |
Class Levels | 3 | Variable(s) Will Be Included | 0 |
Significance Level to Enter | 0.15 | ||
Significance Level to Stay | 0.15 |
Fisher (1936) Iris Data |
Between-Class SSCP Matrix | |||||
---|---|---|---|---|---|
Variable | Label | SepalLength | SepalWidth | PetalLength | PetalWidth |
SepalLength | Sepal Length in mm. | 6321.21333 | -1995.26667 | 16524.84000 | 7127.93333 |
SepalWidth | Sepal Width in mm. | -1995.26667 | 1134.49333 | -5723.96000 | -2293.26667 |
PetalLength | Petal Length in mm. | 16524.84000 | -5723.96000 | 43710.28000 | 18677.40000 |
PetalWidth | Petal Width in mm. | 7127.93333 | -2293.26667 | 18677.40000 | 8041.33333 |
Total-Sample SSCP Matrix | |||||
---|---|---|---|---|---|
Variable | Label | SepalLength | SepalWidth | PetalLength | PetalWidth |
SepalLength | Sepal Length in mm. | 10216.83333 | -632.26667 | 18987.30000 | 7692.43333 |
SepalWidth | Sepal Width in mm. | -632.26667 | 2830.69333 | -4911.88000 | -1812.42667 |
PetalLength | Petal Length in mm. | 18987.30000 | -4911.88000 | 46432.54000 | 19304.58000 |
PetalWidth | Petal Width in mm. | 7692.43333 | -1812.42667 | 19304.58000 | 8656.99333 |
In step 1, the tolerance is 1.0 for each variable under consideration because no variables have yet entered the model. The variable PetalLength is selected because its statistic, 1180.161, is the largest among all variables.
Statistics for Entry, DF = 2, 147 | |||||
---|---|---|---|---|---|
Variable | Label | R-Square | F Value | Pr > F | Tolerance |
SepalLength | Sepal Length in mm. | 0.6187 | 119.26 | <.0001 | 1.0000 |
SepalWidth | Sepal Width in mm. | 0.4008 | 49.16 | <.0001 | 1.0000 |
PetalLength | Petal Length in mm. | 0.9414 | 1180.16 | <.0001 | 1.0000 |
PetalWidth | Petal Width in mm. | 0.9289 | 960.01 | <.0001 | 1.0000 |
Multivariate Statistics | |||||
---|---|---|---|---|---|
Statistic | Value | F Value | Num DF | Den DF | Pr > F |
Wilks' Lambda | 0.058628 | 1180.16 | 2 | 147 | <.0001 |
Pillai's Trace | 0.941372 | 1180.16 | 2 | 147 | <.0001 |
Average Squared Canonical Correlation | 0.470686 |
In step 2, with the variable PetalLength already in the model, PetalLength is tested for removal before a new variable is selected for entry. Since PetalLength meets the criterion to stay, it is used as a covariate in the analysis of covariance for variable selection. The variable SepalWidth is selected because its statistic, 43.035, is the largest among all variables not in the model and because its associated tolerance, 0.8164, meets the criterion to enter. The process is repeated in steps 3 and 4. The variable PetalWidth is entered in step 3, and the variable SepalLength is entered in step 4.
Statistics for Removal, DF = 2, 147 | ||||
---|---|---|---|---|
Variable | Label | R-Square | F Value | Pr > F |
PetalLength | Petal Length in mm. | 0.9414 | 1180.16 | <.0001 |
Statistics for Entry, DF = 2, 146 | |||||
---|---|---|---|---|---|
Variable | Label | Partial R-Square |
F Value | Pr > F | Tolerance |
SepalLength | Sepal Length in mm. | 0.3198 | 34.32 | <.0001 | 0.2400 |
SepalWidth | Sepal Width in mm. | 0.3709 | 43.04 | <.0001 | 0.8164 |
PetalWidth | Petal Width in mm. | 0.2533 | 24.77 | <.0001 | 0.0729 |
Multivariate Statistics | |||||
---|---|---|---|---|---|
Statistic | Value | F Value | Num DF | Den DF | Pr > F |
Wilks' Lambda | 0.036884 | 307.10 | 4 | 292 | <.0001 |
Pillai's Trace | 1.119908 | 93.53 | 4 | 294 | <.0001 |
Average Squared Canonical Correlation | 0.559954 |
Fisher (1936) Iris Data |
Statistics for Removal, DF = 2, 146 | ||||
---|---|---|---|---|
Variable | Label | Partial R-Square |
F Value | Pr > F |
SepalWidth | Sepal Width in mm. | 0.3709 | 43.04 | <.0001 |
PetalLength | Petal Length in mm. | 0.9384 | 1112.95 | <.0001 |
Fisher (1936) Iris Data |
Statistics for Removal, DF = 2, 145 | ||||
---|---|---|---|---|
Variable | Label | Partial R-Square |
F Value | Pr > F |
SepalWidth | Sepal Width in mm. | 0.4295 | 54.58 | <.0001 |
PetalLength | Petal Length in mm. | 0.3482 | 38.72 | <.0001 |
PetalWidth | Petal Width in mm. | 0.3229 | 34.57 | <.0001 |
Since no more variables can be added to or removed from the model, the procedure stops at step 5 and displays a summary of the selection process.
Statistics for Removal, DF = 2, 144 | ||||
---|---|---|---|---|
Variable | Label | Partial R-Square |
F Value | Pr > F |
SepalLength | Sepal Length in mm. | 0.0615 | 4.72 | 0.0103 |
SepalWidth | Sepal Width in mm. | 0.2335 | 21.94 | <.0001 |
PetalLength | Petal Length in mm. | 0.3308 | 35.59 | <.0001 |
PetalWidth | Petal Width in mm. | 0.2570 | 24.90 | <.0001 |
Fisher (1936) Iris Data |
Stepwise Selection Summary | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Step | Number In |
Entered | Removed | Label | Partial R-Square |
F Value | Pr > F | Wilks' Lambda |
Pr < Lambda |
Average Squared Canonical Correlation |
Pr > ASCC |
1 | 1 | PetalLength | Petal Length in mm. | 0.9414 | 1180.16 | <.0001 | 0.05862828 | <.0001 | 0.47068586 | <.0001 | |
2 | 2 | SepalWidth | Sepal Width in mm. | 0.3709 | 43.04 | <.0001 | 0.03688411 | <.0001 | 0.55995394 | <.0001 | |
3 | 3 | PetalWidth | Petal Width in mm. | 0.3229 | 34.57 | <.0001 | 0.02497554 | <.0001 | 0.59495691 | <.0001 | |
4 | 4 | SepalLength | Sepal Length in mm. | 0.0615 | 4.72 | 0.0103 | 0.02343863 | <.0001 | 0.59594941 | <.0001 |
PROC STEPDISC automatically creates a list of the selected variables and stores it in a macro variable. You can submit the following statement to see the list of selected variables:
* print the macro variable list; %put &_stdvar;
The macro variable _StdVar contains the following variable list:
SepalLength SepalWidth PetalLength PetalWidth
You could use this macro variable if you want to analyze these variables in subsequent steps as follows:
proc discrim data=iris; class Species; var &_stdvar; run;
The results of this step are not shown.
Copyright © 2009 by SAS Institute Inc., Cary, NC, USA. All rights reserved.