Previous Page | Next Page

The MODECLUS Procedure

Example 57.2 Cluster Analysis of Flying Mileages between Ten American Cities

This example uses distance data and illustrates the use of the TRANSPOSE procedure and the DATA step to fill in the upper triangle of the distance matrix. The results are displayed in Output 57.2.1 through Output 57.2.3.


The following statements produce Output 57.2.1:

title 'Modeclus Analysis of 10 American Cities';
title2 'Based on Flying Mileages';

data mileages(type=distance);
   input (Atlanta Chicago Denver Houston LosAngeles
   Miami NewYork SanFrancisco Seattle DC) (5.)
   @53 City $15.;
   datalines;
   0                                                Atlanta
 587    0                                           Chicago
1212  920    0                                      Denver
 701  940  879    0                                 Houston
1936 1745  831 1374    0                            Los Angeles
 604 1188 1726  968 2339    0                       Miami
 748  713 1631 1420 2451 1092    0                  New York
2139 1858  949 1645  347 2594 2571    0             San Francisco
2182 1737 1021 1891  959 2734 2408  678    0        Seattle
 543  597 1494 1220 2300  923  205 2442 2329    0   Washington D.C.
;
*-----Fill in Upper Triangle of Distance Matrix---------------;
proc transpose out=tran;
   copy city;
run;
data mileages(type=distance);
   merge mileages tran;
   array var[*] atlanta--dc;
   array col[*] col1-col10;
   do i = 1 to 10;
      var[i] = sum(var[i], col[i]);
   end;
   drop col1-col10 _name_ i;
run;
*-----Clustering with K-Nearest-Neighbor Density Estimates-----;
proc modeclus data=mileages all m=1 k=3;
   id CITY;
run;

Output 57.2.1 Clustering with K-Nearest-Neighbor Density Estimates
Modeclus Analysis of 10 American Cities
Based on Flying Mileages

The MODECLUS Procedure

Nearest Neighbor List
City Neighbor Distance
Atlanta Washington D.C. 543.0000000
  Chicago 587.0000000
Chicago Atlanta 587.0000000
  Washington D.C. 597.0000000
Denver Los Angeles 831.0000000
  Houston 879.0000000
Houston Atlanta 701.0000000
  Denver 879.0000000
Los Angeles San Francisco 347.0000000
  Denver 831.0000000
Miami Atlanta 604.0000000
  Washington D.C. 923.0000000
New York Washington D.C. 205.0000000
  Chicago 713.0000000
San Francisco Los Angeles 347.0000000
  Seattle 678.0000000
Seattle San Francisco 678.0000000
  Los Angeles 959.0000000
Washington D.C. New York 205.0000000
  Atlanta 543.0000000

Modeclus Analysis of 10 American Cities
Based on Flying Mileages

The MODECLUS Procedure
K=3 METHOD=1

Sums of Density Estimates Within Neighborhood
Cluster City Estimated
Density
Same
Cluster
Other
Clusters
Total Cluster
Proportion
Same/Total
1 Atlanta 0.00025554 0.0005275 0 0.0005275 1.000
  Chicago 0.00025126 0.00053178 0 0.00053178 1.000
  Houston 0.00017065 0.00025554 0.00017065 0.00042619 0.600
  Miami 0.00016251 0.00053178 0 0.00053178 1.000
  New York 0.00021038 0.0005275 0 0.0005275 1.000
  Washington D.C. 0.00027624 0.00046592 0 0.00046592 1.000
2 Denver 0.00017065 0.00018051 0.00017065 0.00035115 0.514
  Los Angeles 0.00018051 0.00039189 0 0.00039189 1.000
  San Francisco 0.00022124 0.00033692 0 0.00033692 1.000
  Seattle 0.00015641 0.00040174 0 0.00040174 1.000

Boundary Objects -Cluster Proportions-
City Density Cluster 1 2
Denver 0.0001706485 2 0.486 0.514
Houston 0.0001706485 1 0.600 0.400

Cluster Statistics
Cluster Frequency Maximum
Estimated
Density
Boundary
Frequency
Estimated
Saddle
Density
1 6 0.00027624 1 0.00017065
2 4 0.00022124 1 0.00017065

Modeclus Analysis of 10 American Cities
Based on Flying Mileages

The MODECLUS Procedure

Cluster Summary
K Number of
Clusters
Frequency of
Unclassified
Objects
3 2 0

The following statements produce Output 57.2.2:

*------Clustering with Uniform-Kernel Density Estimates--------;
proc modeclus data=mileages all m=1 r=600 800;
   id CITY;
run;

Output 57.2.2 Clustering with Uniform-Kernel Density Estimates
Modeclus Analysis of 10 American Cities
Based on Flying Mileages

The MODECLUS Procedure

Nearest Neighbor List
City Neighbor Distance
Atlanta Washington D.C. 543.0000000
  Chicago 587.0000000
  Miami 604.0000000
  Houston 701.0000000
  New York 748.0000000
Chicago Atlanta 587.0000000
  Washington D.C. 597.0000000
  New York 713.0000000
Houston Atlanta 701.0000000
Los Angeles San Francisco 347.0000000
Miami Atlanta 604.0000000
New York Washington D.C. 205.0000000
  Chicago 713.0000000
  Atlanta 748.0000000
San Francisco Los Angeles 347.0000000
  Seattle 678.0000000
Seattle San Francisco 678.0000000
Washington D.C. New York 205.0000000
  Atlanta 543.0000000
  Chicago 597.0000000

Modeclus Analysis of 10 American Cities
Based on Flying Mileages

The MODECLUS Procedure
R=600 METHOD=1

Sums of Density Estimates Within Neighborhood
Cluster City Estimated
Density
Same
Cluster
Other
Clusters
Total Cluster
Proportion
Same/Total
1 Atlanta 0.00025 0.00058333 0 0.00058333 1.000
  Chicago 0.00025 0.00058333 0 0.00058333 1.000
  New York 0.00016667 0.00033333 0 0.00033333 1.000
  Washington D.C. 0.00033333 0.00066667 0 0.00066667 1.000
2 Los Angeles 0.00016667 0.00016667 0 0.00016667 1.000
  San Francisco 0.00016667 0.00016667 0 0.00016667 1.000
3 Denver 0.00008333 0 0 0 .
4 Houston 0.00008333 0 0 0 .
5 Miami 0.00008333 0 0 0 .
6 Seattle 0.00008333 0 0 0 .


No Boundary Objects

Cluster Statistics
Cluster Frequency Maximum
Estimated
Density
Boundary
Frequency
Estimated
Saddle
Density
1 4 0.00033333 0 .
2 2 0.00016667 0 .
3 1 0.00008333 0 .
4 1 0.00008333 0 .
5 1 0.00008333 0 .
6 1 0.00008333 0 .

Modeclus Analysis of 10 American Cities
Based on Flying Mileages

The MODECLUS Procedure
R=800 METHOD=1

Sums of Density Estimates Within Neighborhood
Cluster City Estimated
Density
Same
Cluster
Other
Clusters
Total Cluster
Proportion
Same/Total
1 Atlanta 0.000375 0.001 0 0.001 1.000
  Chicago 0.00025 0.000875 0 0.000875 1.000
  Houston 0.000125 0.000375 0 0.000375 1.000
  Miami 0.000125 0.000375 0 0.000375 1.000
  New York 0.00025 0.000875 0 0.000875 1.000
  Washington D.C. 0.00025 0.000875 0 0.000875 1.000
2 Los Angeles 0.000125 0.0001875 0 0.0001875 1.000
  San Francisco 0.0001875 0.00025 0 0.00025 1.000
  Seattle 0.000125 0.0001875 0 0.0001875 1.000
3 Denver 0.0000625 0 0 0 .


No Boundary Objects

Cluster Statistics
Cluster Frequency Maximum
Estimated
Density
Boundary
Frequency
Estimated
Saddle
Density
1 6 0.000375 0 .
2 3 0.0001875 0 .
3 1 0.0000625 0 .

Modeclus Analysis of 10 American Cities
Based on Flying Mileages

The MODECLUS Procedure

Cluster Summary
R Number of
Clusters
Frequency of
Unclassified
Objects
600 6 0
800 3 0

The following statements produce Output 57.2.3:

*------Clustering Neighborhoods Extended to Nearest Neighbor--------;
proc modeclus data=mileages list m=1 ck=2 r=600 800;
   id CITY;
run;

Output 57.2.3 Uniform-Kernel Density Estimates, Clustering Neighborhoods Extended to Nearest Neighbor
Modeclus Analysis of 10 American Cities
Based on Flying Mileages

The MODECLUS Procedure
CK=2 R=600 METHOD=1

Sums of Density Estimates Within Neighborhood
Cluster City Estimated
Density
Same
Cluster
Other
Clusters
Total Cluster
Proportion
Same/Total
1 Atlanta 0.00025 0.00058333 0 0.00058333 1.000
  Chicago 0.00025 0.00058333 0 0.00058333 1.000
  Houston 0.00008333 0.00025 0 0.00025 1.000
  Miami 0.00008333 0.00025 0 0.00025 1.000
  New York 0.00016667 0.00033333 0 0.00033333 1.000
  Washington D.C. 0.00033333 0.00066667 0 0.00066667 1.000
2 Denver 0.00008333 0.00016667 0 0.00016667 1.000
  Los Angeles 0.00016667 0.00016667 0 0.00016667 1.000
  San Francisco 0.00016667 0.00016667 0 0.00016667 1.000
  Seattle 0.00008333 0.00016667 0 0.00016667 1.000

Cluster Statistics
Cluster Frequency Maximum
Estimated
Density
Boundary
Frequency
Estimated
Saddle
Density
1 6 0.00033333 0 .
2 4 0.00016667 0 .

Modeclus Analysis of 10 American Cities
Based on Flying Mileages

The MODECLUS Procedure
CK=2 R=800 METHOD=1

Sums of Density Estimates Within Neighborhood
Cluster City Estimated
Density
Same
Cluster
Other
Clusters
Total Cluster
Proportion
Same/Total
1 Atlanta 0.000375 0.001 0 0.001 1.000
  Chicago 0.00025 0.000875 0 0.000875 1.000
  Houston 0.000125 0.000375 0 0.000375 1.000
  Miami 0.000125 0.000375 0 0.000375 1.000
  New York 0.00025 0.000875 0 0.000875 1.000
  Washington D.C. 0.00025 0.000875 0 0.000875 1.000
2 Denver 0.0000625 0.000125 0 0.000125 1.000
  Los Angeles 0.000125 0.0001875 0 0.0001875 1.000
  San Francisco 0.0001875 0.00025 0 0.00025 1.000
  Seattle 0.000125 0.0001875 0 0.0001875 1.000

Cluster Statistics
Cluster Frequency Maximum
Estimated
Density
Boundary
Frequency
Estimated
Saddle
Density
1 6 0.000375 0 .
2 4 0.0001875 0 .

Modeclus Analysis of 10 American Cities
Based on Flying Mileages

The MODECLUS Procedure

Cluster Summary
R CK Number of
Clusters
Frequency of
Unclassified
Objects
600 2 2 0
800 2 2 0

Previous Page | Next Page | Top of Page