• Print  |
• Feedback  |

# Cluster Analysis Procedures

• ACECLUS — Obtains approximate estimates of the pooled within-cluster covariance matrix when the clusters are assumed to be multivariate normal with equal covariance matrices
• CLUSTER — Hierarchically clusters the observations in a SAS data
• FASTCLUS — Disjoint cluster analysis on the basis of distances computed from one or more quantitative variables
• MODECLUS — Clusters observations in a SAS data set
• TREE — Produces a tree diagram, also known as a dendrogram or phenogram, from a data set created by the CLUSTER or VARCLUS procedure
• VARCLUS — Divides a set of numeric variables into disjoint or hierarchical clusters

# Cluster Analysis

The purpose of cluster analysis is to place objects into groups, or clusters, suggested by the data, not defined a priori, such that objects in a given cluster tend to be similar to each other in some sense, and objects in different clusters tend to be dissimilar. You can also use cluster analysis to summarize data rather than to find "natural" or "real" clusters; this use of clustering is sometimes called dissection. The SAS/STAT procedures for clustering are oriented toward disjoint or hierarchical clusters from coordinate data, distance data, or a correlation or covariance matrix.

Below are highlights of the capabilities of the SAS/STAT procedures that perform cluster analysis:

• hierarchical clustering of multivariate data or distance data
• K-means and hybrid clustering for large multivariate data sets
• disjoint and hierarchical clustering of variables by oblique multiple-group component analysis providing a least squares fit to the data
• approximate covariance estimation for clustering
• disjoint or hierarchical clustering based on correlation or covariance matrix
• clustering based on nonparametric density estimates
• numeric coordinates or distance data
• approximate significance tests for number of clusters
• hierarchical joins of nonsignificant clusters
• tree diagrams