The CLUSTER Procedure |
This example clusters 10 American cities based on the flying mileages between them. Six clustering methods are shown with corresponding tree diagrams produced by the TREE procedure. The EML method cannot be used because it requires coordinate data. The other omitted methods produce the same clusters, although not the same distances between clusters, as one of the illustrated methods: complete linkage and the flexible-beta method yield the same clusters as Ward’s method, McQuitty’s similarity analysis produces the same clusters as average linkage, and the median method corresponds to the centroid method.
All of the methods suggest a division of the cities into two clusters along the east-west dimension. There is disagreement, however, about which cluster Denver should belong to. Some of the methods indicate a possible third cluster containing Denver and Houston.
title 'Cluster Analysis of Flying Mileages Between 10 American Cities'; data mileages(type=distance); input (Atlanta Chicago Denver Houston LosAngeles Miami NewYork SanFran Seattle WashDC) (5.) @55 City $15.; datalines; 0 Atlanta 587 0 Chicago 1212 920 0 Denver 701 940 879 0 Houston 1936 1745 831 1374 0 Los Angeles 604 1188 1726 968 2339 0 Miami 748 713 1631 1420 2451 1092 0 New York 2139 1858 949 1645 347 2594 2571 0 San Francisco 2182 1737 1021 1891 959 2734 2408 678 0 Seattle 543 597 1494 1220 2300 923 205 2442 2329 0 Washington D.C. ; goptions htext=0.15in htitle=0.15in;
The following statements produce Output 29.1.1 and Output 29.1.2:
/*---------------------- Average linkage --------------------*/ proc cluster data=mileages outtree=tree method=average pseudo; id City; run; title2 'Using METHOD=AVERAGE' ; proc tree horizontal; id City; run; title2;
Cluster History | |||||||
---|---|---|---|---|---|---|---|
NCL | Clusters Joined | FREQ | PSF | PST2 | Norm RMS Dist |
T i e |
|
9 | New York | Washington D.C. | 2 | 66.7 | . | 0.1297 | |
8 | Los Angeles | San Francisco | 2 | 39.2 | . | 0.2196 | |
7 | Atlanta | Chicago | 2 | 21.7 | . | 0.3715 | |
6 | CL7 | CL9 | 4 | 14.5 | 3.4 | 0.4149 | |
5 | CL8 | Seattle | 3 | 12.4 | 7.3 | 0.5255 | |
4 | Denver | Houston | 2 | 13.9 | . | 0.5562 | |
3 | CL6 | Miami | 5 | 15.5 | 3.8 | 0.6185 | |
2 | CL3 | CL4 | 7 | 16.0 | 5.3 | 0.8005 | |
1 | CL2 | CL5 | 10 | . | 16.0 | 1.2967 |
The following statements produce Output 29.1.3 and Output 29.1.4:
/*---------------------- Centroid method --------------------*/ proc cluster data=mileages method=centroid pseudo; id City; run; title2 'Using METHOD=CENTROID' ; proc tree horizontal; id City; run; title2;
Cluster History | |||||||
---|---|---|---|---|---|---|---|
NCL | Clusters Joined | FREQ | PSF | PST2 | Norm Cent Dist |
T i e |
|
9 | New York | Washington D.C. | 2 | 66.7 | . | 0.1297 | |
8 | Los Angeles | San Francisco | 2 | 39.2 | . | 0.2196 | |
7 | Atlanta | Chicago | 2 | 21.7 | . | 0.3715 | |
6 | CL7 | CL9 | 4 | 14.5 | 3.4 | 0.3652 | |
5 | CL8 | Seattle | 3 | 12.4 | 7.3 | 0.5139 | |
4 | Denver | CL5 | 4 | 12.4 | 2.1 | 0.5337 | |
3 | CL6 | Miami | 5 | 14.2 | 3.8 | 0.5743 | |
2 | CL3 | Houston | 6 | 22.1 | 2.6 | 0.6091 | |
1 | CL2 | CL4 | 10 | . | 22.1 | 1.173 |
The following statements produce Output 29.1.5 and Output 29.1.6:
/*-------- Density linkage with 3rd-nearest-neighbor --------*/ proc cluster data=mileages method=density k=3; id City; run; title2 'Using METHOD=DENSITY K=3' ; proc tree horizontal; id City; run; title2;
Cluster History | ||||||||
---|---|---|---|---|---|---|---|---|
NCL | FREQ | Normalized Fusion Density |
Maximum Density in Each Cluster |
T i e |
||||
Clusters Joined | Lesser | Greater | ||||||
9 | Atlanta | Washington D.C. | 2 | 96.106 | 92.5043 | 100.0 | ||
8 | CL9 | Chicago | 3 | 95.263 | 90.9548 | 100.0 | ||
7 | CL8 | New York | 4 | 86.465 | 76.1571 | 100.0 | ||
6 | CL7 | Miami | 5 | 74.079 | 58.8299 | 100.0 | T | |
5 | CL6 | Houston | 6 | 74.079 | 61.7747 | 100.0 | ||
4 | Los Angeles | San Francisco | 2 | 71.968 | 65.3430 | 80.0885 | ||
3 | CL4 | Seattle | 3 | 66.341 | 56.6215 | 80.0885 | ||
2 | CL3 | Denver | 4 | 63.509 | 61.7747 | 80.0885 | ||
1 | CL5 | CL2 | 10 | 61.775 | * | 80.0885 | 100.0 |
The following statements produce Output 29.1.7 and Output 29.1.8:
/*--------------------- Single linkage ----------------------*/ proc cluster data=mileages method=single; id City; run; title2 'Using METHOD=SINGLE' ; proc tree horizontal; id City; run; title2;
Cluster History | |||||
---|---|---|---|---|---|
NCL | Clusters Joined | FREQ | Norm Min Dist |
T i e |
|
9 | New York | Washington D.C. | 2 | 0.1447 | |
8 | Los Angeles | San Francisco | 2 | 0.2449 | |
7 | Atlanta | CL9 | 3 | 0.3832 | |
6 | CL7 | Chicago | 4 | 0.4142 | |
5 | CL6 | Miami | 5 | 0.4262 | |
4 | CL8 | Seattle | 3 | 0.4784 | |
3 | CL5 | Houston | 6 | 0.4947 | |
2 | Denver | CL4 | 4 | 0.5864 | |
1 | CL3 | CL2 | 10 | 0.6203 |
The following statements produce Output 29.1.9 and Output 29.1.10:
/*--- Two-stage density linkage with 3rd-nearest-neighbor ---*/ proc cluster data=mileages method=twostage k=3; id City; run; title2 'Using METHOD=TWOSTAGE K=3' ; proc tree horizontal; id City; run; title2;
Cluster History | |||||||
---|---|---|---|---|---|---|---|
NCL | FREQ | Normalized Fusion Density |
Maximum Density in Each Cluster |
T i e |
|||
Clusters Joined | Lesser | Greater | |||||
9 | Atlanta | Washington D.C. | 2 | 96.106 | 92.5043 | 100.0 | |
8 | CL9 | Chicago | 3 | 95.263 | 90.9548 | 100.0 | |
7 | CL8 | New York | 4 | 86.465 | 76.1571 | 100.0 | |
6 | CL7 | Miami | 5 | 74.079 | 58.8299 | 100.0 | T |
5 | CL6 | Houston | 6 | 74.079 | 61.7747 | 100.0 | |
4 | Los Angeles | San Francisco | 2 | 71.968 | 65.3430 | 80.0885 | |
3 | CL4 | Seattle | 3 | 66.341 | 56.6215 | 80.0885 | |
2 | CL3 | Denver | 4 | 63.509 | 61.7747 | 80.0885 | |
1 | CL5 | CL2 | 10 | 61.775 | 80.0885 | 100.0 |
The following statements produce Output 29.1.11 and Output 29.1.12:
/*------------- Ward's minimum variance method --------------*/ proc cluster data=mileages method=ward pseudo; id City; run; title2 'Using METHOD=WARD' ; proc tree horizontal; id City; run; title2;
Cluster History | ||||||||
---|---|---|---|---|---|---|---|---|
NCL | Clusters Joined | FREQ | SPRSQ | RSQ | PSF | PST2 | T i e |
|
9 | New York | Washington D.C. | 2 | 0.0019 | .998 | 66.7 | . | |
8 | Los Angeles | San Francisco | 2 | 0.0054 | .993 | 39.2 | . | |
7 | Atlanta | Chicago | 2 | 0.0153 | .977 | 21.7 | . | |
6 | CL7 | CL9 | 4 | 0.0296 | .948 | 14.5 | 3.4 | |
5 | Denver | Houston | 2 | 0.0344 | .913 | 13.2 | . | |
4 | CL8 | Seattle | 3 | 0.0391 | .874 | 13.9 | 7.3 | |
3 | CL6 | Miami | 5 | 0.0586 | .816 | 15.5 | 3.8 | |
2 | CL3 | CL5 | 7 | 0.1488 | .667 | 16.0 | 5.3 | |
1 | CL2 | CL4 | 10 | 0.6669 | .000 | . | 16.0 |
Copyright © 2009 by SAS Institute Inc., Cary, NC, USA. All rights reserved.