The TREE Procedure |
The TREE procedure produces a tree diagram, also known as a dendrogram or phenogram, from a data set created by the CLUSTER or VARCLUS procedure that contains the results of hierarchical clustering as a tree structure. The TREE procedure uses the data set to produce a diagram of the tree structure in the style of Johnson (1967), with the root at the top. Alternatively, the diagram can be oriented horizontally, with the root at the left. Any numeric variable in the output data set can be used to specify the heights of the clusters. PROC TREE can also create an output data set containing a variable to indicate the disjoint clusters at a specified level in the tree.
Tree diagrams are discussed in the context of cluster analysis by Duran and Odell (1974), Hartigan (1975), and Everitt (1980). Knuth (1973) provides a general treatment of tree diagrams in computer programming.
The literature on tree diagrams contains a mixture of botanical and genealogical terminology. The objects that are clustered are leaves. The cluster containing all objects is the root. A cluster containing at least two objects but not all of them is a branch. The general term for leaves, branches, and roots is node. If a cluster A is the union of clusters B and C, then A is the parent of B and C, and B and C are children of A. A leaf is thus a node with no children, and a root is a node with no parent. If every cluster has at most two children, the tree diagram is a binary tree. The CLUSTER procedure always produces binary trees. The VARCLUS procedure can produce tree diagrams with clusters that have many children.
Copyright © SAS Institute, Inc. All Rights Reserved.