SAS/STAT Software

FASTCLUS Procedure

The FASTCLUS procedure performs a disjoint cluster analysis on the basis of distances computed from one or more quantitative variables. The observations are divided into clusters such that every observation belongs to one and only one cluster. The following are highlights of the procedure's features:

  • uses Euclidean distances, so the cluster centers are based on least squares estimation (k-means model)
  • designed to find good clusters (but not necessarily the best possible clusters) with only two or three passes through the data set
  • can be an effective procedure for detecting outliers because outliers often appear as clusters with only one member
  • can use an Lp (least pth powers) clustering criterion
  • is intended for use with large data sets, with 100 or more observations
  • uses algorithms that place a larger influence on variables with larger variance
  • produces brief summaries of the clusters
  • produces an output data set containing a cluster membership variable
  • performs BY group processing, which enables you to obtain separate analysis on grouped observations
  • computes weighted cluster means
  • creates a SAS data set that corresponds to any output table

For further details see the FASTCLUS Procedure