The purpose of cluster analysis is to place objects into groups, or clusters, suggested by the data,
not defined a priori, such that objects in a given cluster tend to be similar to each other in some
sense, and objects in different clusters tend to be dissimilar. You can also use cluster analysis to
summarize data rather than to find "natural" or "real" clusters; this use of clustering is sometimes
called dissection. The SAS/STAT procedures for clustering are oriented toward disjoint or hierarchical
clusters from coordinate data, distance data, or a correlation or covariance matrix.
Below are highlights of the capabilities of the SAS/STAT procedures that perform cluster analysis:
- hierarchical clustering of multivariate data or distance data
- some methods include average linkage, centroid method,
complete linkage, density linkage, flexible-beta method,
median method, single linkage, and Ward's minimum-variance method
- K-means and hybrid clustering for large multivariate data sets
- disjoint and hierarchical clustering of variables by oblique multiple-group
component analysis providing a least squares fit to the data
- approximate covariance estimation for clustering
- disjoint or hierarchical clustering based on correlation or covariance matrix
- clustering based on nonparametric density estimates
- numeric coordinates or distance data
- approximate significance tests for number of clusters
- hierarchical joins of nonsignificant clusters
- tree diagrams
Statistics and Operations Research Home Page | SAS/STAT Software