SAS/STAT Software

CLUSTER Procedure

The CLUSTER procedure hierarchically clusters the observations in a SAS data set by using one of 11 methods. The data can be coordinates or distances. If the data are coordinates, PROC CLUSTER computes (possibly squared) Euclidean distances. The following are highlights of the CLUSTER procedure's features:

  • supports the following clustering methods:
    • average linkage
    • centroid method
    • complete linkage
    • density linkage (including Wong's hybrid and kth-nearest-neighbor methods)
    • maximum likelihood for mixtures of spherical multivariate normal distributions with equal variances but possibly unequal mixing proportions
    • flexible-beta method
    • McQuitty's similarity analysis
    • median method
    • single linkage
    • two-stage density linkage
    • Ward's minimum-variance
  • displays a history of the clustering process, showing statistics useful for estimating the number of clusters in the population from which the data are sampled
  • creates a data set that can be used by the TREE procedure to draw a tree diagram of the cluster hierarchy or to output the cluster membership at any desired level
  • performs BY group processing, which enables you to obtain separate analyses on grouped observations
  • creates a data set that corresponds to any output table
  • automatically produce graphs by using ODS Graphics

For further details see the CLUSTER Procedure