The CLUSTER, FASTCLUS, and MODECLUS procedures treat all numeric variables as continuous. To cluster binary, ordinal, or nominal data, you can use PROC DISTANCE to create a distance matrix that can be read by PROC CLUSTER or PROC MODECLUS. The VAR statement in PROC DISTANCE supports interval-level variables (continuous) and nominal-level variables (binary or polytomous, numeric or character). PROC FASTCLUS does not accept distance data as input.
Another option is the KCLUS procedure, which uses the k-prototypes algorithm for clustering when interval and nominal variables are both used. PROC KCLUS is available in SAS® Viya® when you license SAS® Visual Statistics. See "Clustering Mixed Variables" in the Examples section in the PROC KCLUS documentation.
This example creates a distance matrix from a set of numeric interval variables and a set of character-valued categorical variables. PROC CLUSTER generates a dendrogram using the distance matrix. The following statements cluster a subset of observations in the SASHELP.BASEBALL data set, creates a distance matrix, and saves it in a TYPE=DISTANCE data set named DIST. The GOWER method is used since it accommodates all measurement levels. For more information on the available methods, see "Proximity Measures" in the Details section of the PROC DISTANCE documentation.
proc distance data=sashelp.baseball method=dgower out=dist; where league='National' and division='East'; id name; var interval (CrAtBat CrHits CrRuns CrRbi CrBB); var nominal (Team League Division Position Div); run;
These statements perform the cluster analysis using the distance matrix and display a dendrogram that summarizes the clustering. The ODS TRACE statements before and following the CLUSTER step display information in the SAS® log about the tables and graphs produced by the procedure. This information will be used to alter the title in a redisplayed dendrogram.
ods trace on; proc cluster data=dist method=Ward plots=dendrogram(height=rsq); id name; run; ods trace off;
In the SAS log, the following is displayed, showing the name of the template that PROC CLUSTER used for the dendrogram.
Output Added: ------------- Name: Dendrogram Label: Dendrogram Template: Stat.Cluster.Graphics.Dendrogram Path: Cluster.Dendrogram -------------
Following is the dendrogram from the cluster analysis step above.
As described in the "Examples of ODS Graphics Template Modification" chapter in the SAS/STAT® User's Guide, you can use the GrTitle macro to modify the template of a graph so that you can change its title when redisplayed. This and other methods for altering various elements of ODS Graphics are discussed in this note.
The following calls the GrTitle macro and specifies the dendrogram template name shown above in PATH=.
%grtitle(path=Stat.Cluster.Graphics.Dendrogram)
The macro displays the following in the SAS log, which provides the name of a macro variable that you can use to assign the desired title.
Stat.Cluster.Graphics.Dendrogram Cluster_Dendrogram
Using the displayed macro variable name, the following reruns the cluster analysis and changes the title:
%let Cluster_Dendrogram=Cluster Analysis: National League, East Division; proc cluster data=Dist method=Ward plots=dendrogram(height=rsq); id name; run;
The dendrogram with the modified title is displayed.
Here are some additional resources:
Product Family | Product | System | SAS Release | |
Reported | Fixed* | |||
SAS System | SAS/STAT | All | n/a |
Type: | Usage Note |
Priority: | low |
Topic: | SAS Reference ==> Procedures ==> CLUSTER SAS Reference ==> Procedures ==> MODELCLUS Analytics ==> Multivariate Analysis Analytics ==> Cluster Analysis Analytics ==> Data Mining Analytics ==> Categorical Data Analysis SAS Reference ==> Procedures ==> DISTANCE SAS Reference ==> Procedures ==> FASTCLUS SAS Reference ==> Procedures ==> KCLUS Analytics ==> Statistical Graphics |
Date Modified: | 2008-01-11 14:35:10 |
Date Created: | 2002-12-16 10:56:37 |