The TREE Procedure

Example 105.2 Iris Data

Fisher (1936)’s iris data give sepal and petal dimensions for three different species of iris. The data, which are available in the Sashelp library, are clustered by kth-nearest-neighbor density linkage by using the CLUSTER procedure with K=8. Observations are identified by species ('Setosa', 'Versicolor', or 'Virginica') in the tree diagram, which is oriented with the height axis horizontal.

The following statements produce the results shown in Output 105.2.1:

title 'Fisher (1936) Iris Data';
ods graphics on;

proc cluster data=sashelp.iris method=twostage print=10
             outtree=tree k=8 noeigen;
   var SepalLength SepalWidth PetalLength PetalWidth;
   copy Species;
run;
proc tree data=tree horizontal lineprinter pages=1 maxh=10;
   id species;
run;

The PAGES=1 option specifies that the tree diagram extend over one page from tree to root. Since the HORIZONTAL option is also specified, the horizontal extent of the diagram is one page. The number of vertical pages required for the diagram is dictated by the number of leaves in the tree.

The MAXH=10 limits the values displayed on the height axis to a maximum of 10. This prunes the tree diagram so that only the portion from the leaves to level 10 is produced. The line printer plot is not displayed.

Output 105.2.1: Clustering of Fisher’s Iris Data

Fisher (1936) Iris Data

The CLUSTER Procedure
Two-Stage Density Linkage Clustering


K = 8

Root-Mean-Square Total-Sample Standard Deviation 10.69224

Cluster History
Number
of
Clusters
  Freq Normalized
Fusion Density
Maximum Density
in Each Cluster
Tie
Clusters Joined Lesser Greater
10 CL11 OB79 48 0.2879 0.1479 8.3678  
9 CL13 OB112 46 0.2802 0.2005 3.5156  
8 CL10 OB113 49 0.2699 0.1372 8.3678  
7 CL8 OB91 50 0.2586 0.1372 8.3678  
6 CL9 OB120 47 0.1412 0.0832 3.5156  
5 CL6 OB118 48 0.107 0.0605 3.5156  
4 CL5 OB110 49 0.0969 0.0541 3.5156  
3 CL4 OB135 50 0.0715 0.0370 3.5156  
2 CL7 CL3 100 2.6277 3.5156 8.3678  


3 modal clusters have been formed.