The MODECLUS Procedure

Example 78.2 Cluster Analysis of Flying Mileages between Ten American Cities

This example uses distance data and illustrates the use of the TRANSPOSE procedure and the DATA step to fill in the upper triangle of the distance matrix. A data set containing a table of flying mileages between 10 U.S. cities is available in the Sashelp library. The results are displayed in Output 78.2.1 through Output 78.2.3.

The following statements produce Output 78.2.1:

title 'Modeclus Analysis of 10 American Cities';
title2 'Based on Flying Mileages';

*-----Fill in Upper Triangle of Distance Matrix---------------;
proc transpose data=sashelp.mileages out=tran;
   copy city;
run;
data mileages(type=distance drop=col: _: i);
   merge sashelp.mileages tran;
   array var[10] atlanta--washingtondc;
   array col[10];
   do i = 1 to 10;
      var[i] = sum(var[i], col[i]);
   end;
run;
*-----Clustering with K-Nearest-Neighbor Density Estimates-----;
proc modeclus data=mileages all m=1 k=3;
   id CITY;
run;

Output 78.2.1: Clustering with K-Nearest-Neighbor Density Estimates

Modeclus Analysis of 10 American Cities
Based on Flying Mileages

The MODECLUS Procedure

Nearest Neighbor List
City Neighbor Distance
Atlanta Washington D.C. 543.0000000
  Chicago 587.0000000
Chicago Atlanta 587.0000000
  Washington D.C. 597.0000000
Denver Los Angeles 831.0000000
  Houston 879.0000000
Houston Atlanta 701.0000000
  Denver 879.0000000
Los Angeles San Francisco 347.0000000
  Denver 831.0000000
Miami Atlanta 604.0000000
  Washington D.C. 923.0000000
New York Washington D.C. 205.0000000
  Chicago 713.0000000
San Francisco Los Angeles 347.0000000
  Seattle 678.0000000
Seattle San Francisco 678.0000000
  Los Angeles 959.0000000
Washington D.C. New York 205.0000000
  Atlanta 543.0000000

Modeclus Analysis of 10 American Cities
Based on Flying Mileages

The MODECLUS Procedure
K=3 METHOD=1

Sums of Density Estimates Within Neighborhood
Cluster City Estimated
Density
Same
Cluster
Other
Clusters
Total Cluster
Proportion
Same/Total
1 Atlanta 0.00025554 0.0005275 0 0.0005275 1.000
  Chicago 0.00025126 0.00053178 0 0.00053178 1.000
  Houston 0.00017065 0.00025554 0.00017065 0.00042619 0.600
  Miami 0.00016251 0.00053178 0 0.00053178 1.000
  New York 0.00021038 0.0005275 0 0.0005275 1.000
  Washington D.C. 0.00027624 0.00046592 0 0.00046592 1.000
2 Denver 0.00017065 0.00018051 0.00017065 0.00035115 0.514
  Los Angeles 0.00018051 0.00039189 0 0.00039189 1.000
  San Francisco 0.00022124 0.00033692 0 0.00033692 1.000
  Seattle 0.00015641 0.00040174 0 0.00040174 1.000

Boundary Objects -Cluster Proportions-
City Density Cluster 1 2
Denver 0.0001706485 2 0.486 0.514
Houston 0.0001706485 1 0.600 0.400

Cluster Statistics
Cluster Frequency Maximum
Estimated
Density
Boundary
Frequency
Estimated
Saddle
Density
1 6 0.00027624 1 0.00017065
2 4 0.00022124 1 0.00017065

Modeclus Analysis of 10 American Cities
Based on Flying Mileages

The MODECLUS Procedure

Cluster Summary
K Number of
Clusters
Frequency of
Unclassified
Objects
3 2 0



The following statements produce Output 78.2.2:

*------Clustering with Uniform-Kernel Density Estimates--------;
proc modeclus data=mileages all m=1 r=600 800;
   id CITY;
run;

Output 78.2.2: Clustering with Uniform-Kernel Density Estimates

Modeclus Analysis of 10 American Cities
Based on Flying Mileages

The MODECLUS Procedure

Nearest Neighbor List
City Neighbor Distance
Atlanta Washington D.C. 543.0000000
  Chicago 587.0000000
  Miami 604.0000000
  Houston 701.0000000
  New York 748.0000000
Chicago Atlanta 587.0000000
  Washington D.C. 597.0000000
  New York 713.0000000
Houston Atlanta 701.0000000
Los Angeles San Francisco 347.0000000
Miami Atlanta 604.0000000
New York Washington D.C. 205.0000000
  Chicago 713.0000000
  Atlanta 748.0000000
San Francisco Los Angeles 347.0000000
  Seattle 678.0000000
Seattle San Francisco 678.0000000
Washington D.C. New York 205.0000000
  Atlanta 543.0000000
  Chicago 597.0000000

Modeclus Analysis of 10 American Cities
Based on Flying Mileages

The MODECLUS Procedure
R=600 METHOD=1

Sums of Density Estimates Within Neighborhood
Cluster City Estimated
Density
Same
Cluster
Other
Clusters
Total Cluster
Proportion
Same/Total
1 Atlanta 0.00025 0.00058333 0 0.00058333 1.000
  Chicago 0.00025 0.00058333 0 0.00058333 1.000
  New York 0.00016667 0.00033333 0 0.00033333 1.000
  Washington D.C. 0.00033333 0.00066667 0 0.00066667 1.000
2 Los Angeles 0.00016667 0.00016667 0 0.00016667 1.000
  San Francisco 0.00016667 0.00016667 0 0.00016667 1.000
3 Denver 0.00008333 0 0 0 .
4 Houston 0.00008333 0 0 0 .
5 Miami 0.00008333 0 0 0 .
6 Seattle 0.00008333 0 0 0 .


No Boundary Objects

Cluster Statistics
Cluster Frequency Maximum
Estimated
Density
Boundary
Frequency
Estimated
Saddle
Density
1 4 0.00033333 0 .
2 2 0.00016667 0 .
3 1 0.00008333 0 .
4 1 0.00008333 0 .
5 1 0.00008333 0 .
6 1 0.00008333 0 .

Modeclus Analysis of 10 American Cities
Based on Flying Mileages

The MODECLUS Procedure
R=800 METHOD=1

Sums of Density Estimates Within Neighborhood
Cluster City Estimated
Density
Same
Cluster
Other
Clusters
Total Cluster
Proportion
Same/Total
1 Atlanta 0.000375 0.001 0 0.001 1.000
  Chicago 0.00025 0.000875 0 0.000875 1.000
  Houston 0.000125 0.000375 0 0.000375 1.000
  Miami 0.000125 0.000375 0 0.000375 1.000
  New York 0.00025 0.000875 0 0.000875 1.000
  Washington D.C. 0.00025 0.000875 0 0.000875 1.000
2 Los Angeles 0.000125 0.0001875 0 0.0001875 1.000
  San Francisco 0.0001875 0.00025 0 0.00025 1.000
  Seattle 0.000125 0.0001875 0 0.0001875 1.000
3 Denver 0.0000625 0 0 0 .


No Boundary Objects

Cluster Statistics
Cluster Frequency Maximum
Estimated
Density
Boundary
Frequency
Estimated
Saddle
Density
1 6 0.000375 0 .
2 3 0.0001875 0 .
3 1 0.0000625 0 .

Modeclus Analysis of 10 American Cities
Based on Flying Mileages

The MODECLUS Procedure

Cluster Summary
R Number of
Clusters
Frequency of
Unclassified
Objects
600 6 0
800 3 0



The following statements produce Output 78.2.3:

*------Clustering Neighborhoods Extended to Nearest Neighbor--------;
proc modeclus data=mileages list m=1 ck=2 r=600 800;
   id CITY;
run;

Output 78.2.3: Uniform-Kernel Density Estimates, Clustering Neighborhoods Extended to Nearest Neighbor

Modeclus Analysis of 10 American Cities
Based on Flying Mileages

The MODECLUS Procedure
CK=2 R=600 METHOD=1

Sums of Density Estimates Within Neighborhood
Cluster City Estimated
Density
Same
Cluster
Other
Clusters
Total Cluster
Proportion
Same/Total
1 Atlanta 0.00025 0.00058333 0 0.00058333 1.000
  Chicago 0.00025 0.00058333 0 0.00058333 1.000
  Houston 0.00008333 0.00025 0 0.00025 1.000
  Miami 0.00008333 0.00025 0 0.00025 1.000
  New York 0.00016667 0.00033333 0 0.00033333 1.000
  Washington D.C. 0.00033333 0.00066667 0 0.00066667 1.000
2 Denver 0.00008333 0.00016667 0 0.00016667 1.000
  Los Angeles 0.00016667 0.00016667 0 0.00016667 1.000
  San Francisco 0.00016667 0.00016667 0 0.00016667 1.000
  Seattle 0.00008333 0.00016667 0 0.00016667 1.000

Cluster Statistics
Cluster Frequency Maximum
Estimated
Density
Boundary
Frequency
Estimated
Saddle
Density
1 6 0.00033333 0 .
2 4 0.00016667 0 .

Modeclus Analysis of 10 American Cities
Based on Flying Mileages

The MODECLUS Procedure
CK=2 R=800 METHOD=1

Sums of Density Estimates Within Neighborhood
Cluster City Estimated
Density
Same
Cluster
Other
Clusters
Total Cluster
Proportion
Same/Total
1 Atlanta 0.000375 0.001 0 0.001 1.000
  Chicago 0.00025 0.000875 0 0.000875 1.000
  Houston 0.000125 0.000375 0 0.000375 1.000
  Miami 0.000125 0.000375 0 0.000375 1.000
  New York 0.00025 0.000875 0 0.000875 1.000
  Washington D.C. 0.00025 0.000875 0 0.000875 1.000
2 Denver 0.0000625 0.000125 0 0.000125 1.000
  Los Angeles 0.000125 0.0001875 0 0.0001875 1.000
  San Francisco 0.0001875 0.00025 0 0.00025 1.000
  Seattle 0.000125 0.0001875 0 0.0001875 1.000

Cluster Statistics
Cluster Frequency Maximum
Estimated
Density
Boundary
Frequency
Estimated
Saddle
Density
1 6 0.000375 0 .
2 4 0.0001875 0 .

Modeclus Analysis of 10 American Cities
Based on Flying Mileages

The MODECLUS Procedure

Cluster Summary
R CK Number of
Clusters
Frequency of
Unclassified
Objects
600 2 2 0
800 2 2 0