The CLUSTER Procedure

## Example 29.1 Cluster Analysis of Flying Mileages between 10 American Cities

This example clusters 10 American cities based on the flying mileages between them. Six clustering methods are shown with corresponding tree diagrams produced by the TREE procedure. The EML method cannot be used because it requires coordinate data. The other omitted methods produce the same clusters, although not the same distances between clusters, as one of the illustrated methods: complete linkage and the flexible-beta method yield the same clusters as Ward’s method, McQuitty’s similarity analysis produces the same clusters as average linkage, and the median method corresponds to the centroid method.

All of the methods suggest a division of the cities into two clusters along the east-west dimension. There is disagreement, however, about which cluster Denver should belong to. Some of the methods indicate a possible third cluster containing Denver and Houston.

```   title 'Cluster Analysis of Flying Mileages Between 10 American Cities';
data mileages(type=distance);
input (Atlanta Chicago Denver Houston LosAngeles
Miami NewYork SanFran Seattle WashDC) (5.)
@55 City \$15.;
datalines;
0                                                 Atlanta
587    0                                            Chicago
1212  920    0                                       Denver
701  940  879    0                                  Houston
1936 1745  831 1374    0                             Los Angeles
604 1188 1726  968 2339    0                        Miami
748  713 1631 1420 2451 1092    0                   New York
2139 1858  949 1645  347 2594 2571    0              San Francisco
2182 1737 1021 1891  959 2734 2408  678    0         Seattle
543  597 1494 1220 2300  923  205 2442 2329    0    Washington D.C.
;

goptions htext=0.15in htitle=0.15in;
```

The following statements produce Output 29.1.1 and Output 29.1.2:

```   /*---------------------- Average linkage --------------------*/
proc cluster data=mileages outtree=tree method=average pseudo;
id City;
run;

title2 'Using METHOD=AVERAGE' ;
proc tree horizontal; id City; run;
title2;
```

Output 29.1.1 Cluster History Using METHOD=AVERAGE
 Cluster Analysis of Flying Mileages Between 10 American Cities

The CLUSTER Procedure

Cluster History
NCL Clusters Joined FREQ PSF PST2 Norm
RMS
Dist
T
i
e
9 New York Washington D.C. 2 66.7 . 0.1297
8 Los Angeles San Francisco 2 39.2 . 0.2196
7 Atlanta Chicago 2 21.7 . 0.3715
6 CL7 CL9 4 14.5 3.4 0.4149
5 CL8 Seattle 3 12.4 7.3 0.5255
4 Denver Houston 2 13.9 . 0.5562
3 CL6 Miami 5 15.5 3.8 0.6185
2 CL3 CL4 7 16.0 5.3 0.8005
1 CL2 CL5 10 . 16.0 1.2967

Output 29.1.2 Tree Diagram Using METHOD=AVERAGE

The following statements produce Output 29.1.3 and Output 29.1.4:

```   /*---------------------- Centroid method --------------------*/
proc cluster data=mileages method=centroid pseudo;
id City;
run;

title2 'Using METHOD=CENTROID' ;
proc tree horizontal; id City; run;
title2;
```

Output 29.1.3 Cluster History Using METHOD=CENTROID
 Cluster Analysis of Flying Mileages Between 10 American Cities

The CLUSTER Procedure
Centroid Hierarchical Cluster Analysis

Cluster History
NCL Clusters Joined FREQ PSF PST2 Norm
Cent
Dist
T
i
e
9 New York Washington D.C. 2 66.7 . 0.1297
8 Los Angeles San Francisco 2 39.2 . 0.2196
7 Atlanta Chicago 2 21.7 . 0.3715
6 CL7 CL9 4 14.5 3.4 0.3652
5 CL8 Seattle 3 12.4 7.3 0.5139
4 Denver CL5 4 12.4 2.1 0.5337
3 CL6 Miami 5 14.2 3.8 0.5743
2 CL3 Houston 6 22.1 2.6 0.6091
1 CL2 CL4 10 . 22.1 1.173

Output 29.1.4 Tree Diagram Using METHOD=CENTROID

The following statements produce Output 29.1.5 and Output 29.1.6:

```   /*-------- Density linkage with 3rd-nearest-neighbor --------*/
proc cluster data=mileages method=density k=3;
id City;
run;

title2 'Using METHOD=DENSITY K=3' ;
proc tree horizontal; id City; run;
title2;
```

Output 29.1.5 Cluster History Using METHOD=DENSITY K=3
 Cluster Analysis of Flying Mileages Between 10 American Cities

The CLUSTER Procedure

Cluster History
NCL   FREQ Normalized
Fusion Density
Maximum Density
in Each Cluster
T
i
e
Clusters Joined Lesser Greater
9 Atlanta Washington D.C. 2 96.106   92.5043 100.0
8 CL9 Chicago 3 95.263   90.9548 100.0
7 CL8 New York 4 86.465   76.1571 100.0
6 CL7 Miami 5 74.079   58.8299 100.0 T
5 CL6 Houston 6 74.079   61.7747 100.0
4 Los Angeles San Francisco 2 71.968   65.3430 80.0885
3 CL4 Seattle 3 66.341   56.6215 80.0885
2 CL3 Denver 4 63.509   61.7747 80.0885
1 CL5 CL2 10 61.775 * 80.0885 100.0

Output 29.1.6 Tree Diagram Using METHOD=DENSITY K=3

The following statements produce Output 29.1.7 and Output 29.1.8:

```   /*--------------------- Single linkage ----------------------*/
proc cluster data=mileages method=single;
id City;
run;

title2 'Using METHOD=SINGLE' ;
proc tree horizontal; id City; run;
title2;
```

Output 29.1.7 Cluster History Using METHOD=SINGLE
 Cluster Analysis of Flying Mileages Between 10 American Cities

The CLUSTER Procedure

Cluster History
NCL Clusters Joined FREQ Norm
Min
Dist
T
i
e
9 New York Washington D.C. 2 0.1447
8 Los Angeles San Francisco 2 0.2449
7 Atlanta CL9 3 0.3832
6 CL7 Chicago 4 0.4142
5 CL6 Miami 5 0.4262
4 CL8 Seattle 3 0.4784
3 CL5 Houston 6 0.4947
2 Denver CL4 4 0.5864
1 CL3 CL2 10 0.6203

Output 29.1.8 Tree Diagram Using METHOD=SINGLE

The following statements produce Output 29.1.9 and Output 29.1.10:

```   /*--- Two-stage density linkage with 3rd-nearest-neighbor ---*/
proc cluster data=mileages method=twostage k=3;
id City;
run;

title2 'Using METHOD=TWOSTAGE K=3' ;
proc tree horizontal; id City; run;
title2;
```

Output 29.1.9 Cluster History Using METHOD=TWOSTAGE K=3
 Cluster Analysis of Flying Mileages Between 10 American Cities

The CLUSTER Procedure

Cluster History
NCL   FREQ Normalized
Fusion Density
Maximum Density
in Each Cluster
T
i
e
Clusters Joined Lesser Greater
9 Atlanta Washington D.C. 2 96.106 92.5043 100.0
8 CL9 Chicago 3 95.263 90.9548 100.0
7 CL8 New York 4 86.465 76.1571 100.0
6 CL7 Miami 5 74.079 58.8299 100.0 T
5 CL6 Houston 6 74.079 61.7747 100.0
4 Los Angeles San Francisco 2 71.968 65.3430 80.0885
3 CL4 Seattle 3 66.341 56.6215 80.0885
2 CL3 Denver 4 63.509 61.7747 80.0885
1 CL5 CL2 10 61.775 80.0885 100.0

Output 29.1.10 Tree Diagram Using METHOD=TWOSTAGE K=3

The following statements produce Output 29.1.11 and Output 29.1.12:

```   /*------------- Ward's minimum variance method --------------*/
proc cluster data=mileages method=ward pseudo;
id City;
run;

title2 'Using METHOD=WARD' ;
proc tree horizontal; id City; run;
title2;
```

Output 29.1.11 Cluster History Using METHOD=WARD
 Cluster Analysis of Flying Mileages Between 10 American Cities

The CLUSTER Procedure
Ward's Minimum Variance Cluster Analysis

Cluster History
NCL Clusters Joined FREQ SPRSQ RSQ PSF PST2 T
i
e
9 New York Washington D.C. 2 0.0019 .998 66.7 .
8 Los Angeles San Francisco 2 0.0054 .993 39.2 .
7 Atlanta Chicago 2 0.0153 .977 21.7 .
6 CL7 CL9 4 0.0296 .948 14.5 3.4
5 Denver Houston 2 0.0344 .913 13.2 .
4 CL8 Seattle 3 0.0391 .874 13.9 7.3
3 CL6 Miami 5 0.0586 .816 15.5 3.8
2 CL3 CL5 7 0.1488 .667 16.0 5.3
1 CL2 CL4 10 0.6669 .000 . 16.0

Output 29.1.12 Tree Diagram Using METHOD=WARD

 Previous Page | Next Page | Top of Page