Example 36.1 Fisher’s Iris Data
The iris data published by Fisher (1936) have been widely used for examples in discriminant analysis and cluster analysis. The sepal length, sepal width, petal length,
and petal width are measured in millimeters on 50 iris specimens from each of three species, Iris setosa, I. versicolor, and I. virginica. Mezzich and Solomon (1980) discuss a variety of cluster analyses of the iris data.
In this example, the FASTCLUS procedure is used to find two and then three clusters. In the following code, an output data
set is created, and PROC FREQ is invoked to compare the clusters with the species classification. See Output 36.1.1 and Output 36.1.2 for these results.
For three clusters, you can use the CANDISC procedure to compute canonical variables for plotting the clusters. See Output 36.1.3 and Output 36.1.4 for the results.
title 'Fisher (1936) Iris Data';
proc fastclus data=sashelp.iris maxc=2 maxiter=10 out=clus;
var SepalLength SepalWidth PetalLength PetalWidth;
run;
proc freq;
tables cluster*species;
run;
proc fastclus data=sashelp.iris maxc=3 maxiter=10 out=clus;
var SepalLength SepalWidth PetalLength PetalWidth;
run;
proc freq;
tables cluster*Species;
run;
proc candisc anova out=can;
class cluster;
var SepalLength SepalWidth PetalLength PetalWidth;
title2 'Canonical Discriminant Analysis of Iris Clusters';
run;
proc sgplot data=Can;
scatter y=Can2 x=Can1 / group=Cluster;
title2 'Plot of Canonical Variables Identified by Cluster';
run;
Output 36.1.1: Fisher’s Iris Data: PROC FASTCLUS with MAXC=2 and PROC FREQ
The FASTCLUS Procedure
Replace=FULL Radius=0 Maxclusters=2 Maxiter=10 Converge=0.02
| 77.00000000 |
26.00000000 |
69.00000000 |
23.00000000 |
| 45.00000000 |
23.00000000 |
13.00000000 |
3.00000000 |
| 11.0045 |
0.3169 |
0.2164 |
| 5.6161 |
0.0379 |
0.0791 |
| 5.1042 |
0.0133 |
0.0306 |
| 5.0417 |
0.00348 |
0.00679 |
| Convergence criterion is satisfied. |
| 97 |
5.6779 |
24.8448 |
|
2 |
39.2879 |
| 53 |
3.7050 |
21.6197 |
|
1 |
39.2879 |
| 8.28066 |
5.49313 |
0.562896 |
1.287784 |
| 4.35866 |
3.70393 |
0.282710 |
0.394137 |
| 17.65298 |
6.80331 |
0.852470 |
5.778291 |
| 7.62238 |
3.57200 |
0.781868 |
3.584390 |
| 10.69224 |
5.07291 |
0.776410 |
3.472463 |
WARNING: The two values above are invalid for correlated variables.
| 63.01030928 |
28.86597938 |
49.58762887 |
16.95876289 |
| 50.05660377 |
33.69811321 |
15.60377358 |
2.90566038 |
| 6.336887455 |
3.267991438 |
7.800577673 |
4.155612484 |
| 3.427350930 |
4.396611045 |
4.404279486 |
2.105525249 |
Output 36.1.2: Fisher’s Iris Data: PROC FASTCLUS with MAXC=3 and PROC FREQ
The FASTCLUS Procedure
Replace=FULL Radius=0 Maxclusters=3 Maxiter=10 Converge=0.02
| 77.00000000 |
38.00000000 |
67.00000000 |
22.00000000 |
| 57.00000000 |
44.00000000 |
15.00000000 |
4.00000000 |
| 49.00000000 |
25.00000000 |
45.00000000 |
17.00000000 |
| 7.0151 |
0.3205 |
0.3151 |
0.2985 |
| 3.7097 |
0.0459 |
0 |
0.0317 |
| 3.6427 |
0.0182 |
0 |
0.0124 |
| Convergence criterion is satisfied. |
| 38 |
4.0168 |
14.9736 |
|
3 |
17.9718 |
| 50 |
2.7803 |
12.4803 |
|
3 |
33.5693 |
| 62 |
4.0398 |
16.9272 |
|
1 |
17.9718 |
| 8.28066 |
4.39488 |
0.722096 |
2.598359 |
| 4.35866 |
3.24816 |
0.452102 |
0.825156 |
| 17.65298 |
4.21431 |
0.943773 |
16.784895 |
| 7.62238 |
2.45244 |
0.897872 |
8.791618 |
| 10.69224 |
3.66198 |
0.884275 |
7.641194 |
WARNING: The two values above are invalid for correlated variables.
| 68.50000000 |
30.73684211 |
57.42105263 |
20.71052632 |
| 50.06000000 |
34.28000000 |
14.62000000 |
2.46000000 |
| 59.01612903 |
27.48387097 |
43.93548387 |
14.33870968 |
| 4.941550255 |
2.900924461 |
4.885895746 |
2.798724562 |
| 3.524896872 |
3.790643691 |
1.736639965 |
1.053855894 |
| 4.664100551 |
2.962840548 |
5.088949673 |
2.974997167 |
Output 36.1.3: Fisher’s Iris Data using PROC CANDISC
| _1 |
38 |
38.0000 |
0.253333 |
| _2 |
50 |
50.0000 |
0.333333 |
| _3 |
62 |
62.0000 |
0.413333 |
The CANDISC Procedure
| Sepal Length (mm) |
8.2807 |
4.3949 |
8.5893 |
0.7221 |
2.5984 |
190.98 |
<.0001 |
| Sepal Width (mm) |
4.3587 |
3.2482 |
3.5774 |
0.4521 |
0.8252 |
60.65 |
<.0001 |
| Petal Length (mm) |
17.6530 |
4.2143 |
20.9336 |
0.9438 |
16.7849 |
1233.69 |
<.0001 |
| Petal Width (mm) |
7.6224 |
2.4524 |
8.8164 |
0.8979 |
8.7916 |
646.18 |
<.0001 |
| 0.03222337 |
164.55 |
8 |
288 |
<.0001 |
| 1.25669612 |
61.29 |
8 |
290 |
<.0001 |
| 21.06722883 |
377.66 |
8 |
203.4 |
<.0001 |
| 20.63266809 |
747.93 |
4 |
145 |
<.0001 |
The CANDISC Procedure
| 0.976613 |
0.976123 |
0.003787 |
0.953774 |
20.6327 |
20.1981 |
0.9794 |
0.9794 |
0.03222337 |
164.55 |
8 |
288 |
<.0001 |
| 0.550384 |
0.543354 |
0.057107 |
0.302923 |
0.4346 |
|
0.0206 |
1.0000 |
0.69707749 |
21.00 |
3 |
145 |
<.0001 |
The CANDISC Procedure
| Sepal Length (mm) |
0.831965 |
0.452137 |
| Sepal Width (mm) |
-0.515082 |
0.810630 |
| Petal Length (mm) |
0.993520 |
0.087514 |
| Petal Width (mm) |
0.966325 |
0.154745 |
| Sepal Length (mm) |
0.956160 |
0.292846 |
| Sepal Width (mm) |
-0.748136 |
0.663545 |
| Petal Length (mm) |
0.998770 |
0.049580 |
| Petal Width (mm) |
0.995952 |
0.089883 |
| Sepal Length (mm) |
0.339314 |
0.716082 |
| Sepal Width (mm) |
-0.149614 |
0.914351 |
| Petal Length (mm) |
0.900839 |
0.308136 |
| Petal Width (mm) |
0.650123 |
0.404282 |
The CANDISC Procedure
| Sepal Length (mm) |
0.047747341 |
1.021487262 |
| Sepal Width (mm) |
-0.577569244 |
0.864455153 |
| Petal Length (mm) |
3.341309573 |
-1.283043758 |
| Petal Width (mm) |
0.996451144 |
0.900476563 |
| Sepal Length (mm) |
0.0253414487 |
0.5421446856 |
| Sepal Width (mm) |
-.4304161258 |
0.6442092294 |
| Petal Length (mm) |
0.7976741592 |
-.3063023132 |
| Petal Width (mm) |
0.3205998034 |
0.2897207865 |
| Sepal Length (mm) |
0.0057661265 |
0.1233581748 |
| Sepal Width (mm) |
-.1325106494 |
0.1983303556 |
| Petal Length (mm) |
0.1892773419 |
-.0726814163 |
| Petal Width (mm) |
0.1307270927 |
0.1181359305 |
| 4.931414018 |
0.861972277 |
| -6.131527227 |
0.244761516 |
| 1.922300462 |
-0.725693908 |
Output 36.1.4: Plot of Fisher’s Iris Data using PROC CANDISC
Copyright © SAS Institute Inc. All Rights Reserved.