SAS Institute. The Power to Know
close window

Applied Clustering Techniques


The course looks at the theoretical and practical implications of a wide array of clustering techniques currently available in SAS. The techniques considered include cluster preprocessing, variable clustering, k-means clustering, and hierarchical clustering.

Learn how to

  • prepare and explore data for a cluster analysis
  • distinguish among many different clustering techniques, making informed choices about which to use
  • evaluate the results of a cluster analysis
  • determine the appropriate number of clusters to retain
  • profile and describe clustered observations
  • score observations into clusters.

Who should attend

Intermediate or senior level statisticians, data analysts, and data miners

Duration:  2 days

To Register

I am attending SAS Global Forum conference. Register Now
I am not attending the conference but would like to register for this course. Register Now
Before attending this course, you should
  • be able to execute SAS programs and create SAS data sets. You can gain this experience by completing the SAS Programming 1: Essentials course.
  • have completed a graduate-level course in statistics or the Statistics 1: Introduction to ANOVA, Regression, and Logistic Regression course.
  • have an understanding of matrix algebra.


This course addresses SAS/STAT software.

Introduction to Clustering

  • identifying types of clustering
  • measuring similarity
  • assessing multivariate normality
  • using classification matrices
Preparation for Clustering

  • using variable clustering for variable selection
  • using graphical clustering aids
  • making elongated clusters more spherical
  • viewing the impact of input standardization
Partitive Clustering

  • k-means clustering for segmentation
  • outlining the advantages of nonparametric clustering
Hierarchical Clustering

  • comparing hierarchical clustering methods
Assessing Clustering Results

  • determining the number of clusters
  • profiling a cluster solution
  • scoring new observations
Cluster Analysis Case Study

  • variable selection
  • graphical exploration of selected variables
  • hierarchical clustering and determining the number of clusters
  • profiling the seven-cluster solution
  • modeling cluster membership
  • scoring the customer database
Canonical Discriminant Analysis (CDA)Plots

  • using canonical discriminant analysis to summarize multivariate data
  • interpreting CANDISC procedure output
Fuzzy Clustering

  • performing fuzzy clustering using the FACTOR procedure
  • interpreting the PROC FACTOR output in terms of fuzzy clustering membership
Assessing Multivariate Normality

  • defining multivariate normality
  • exploring the implications of univariate and multivariate normality in the context of clustering
  • illustrating the calculation of mulitvariate normality
This course description was created using SAS software. CLUS93