Previous Page | Next Page

 The MODECLUS Procedure

## Example 57.2 Cluster Analysis of Flying Mileages between Ten American Cities

This example uses distance data and illustrates the use of the TRANSPOSE procedure and the DATA step to fill in the upper triangle of the distance matrix. The results are displayed in Output 57.2.1 through Output 57.2.3.

The following statements produce Output 57.2.1:

```title 'Modeclus Analysis of 10 American Cities';
title2 'Based on Flying Mileages';

data mileages(type=distance);
input (Atlanta Chicago Denver Houston LosAngeles
Miami NewYork SanFrancisco Seattle DC) (5.)
@53 City \$15.;
datalines;
0                                                Atlanta
587    0                                           Chicago
1212  920    0                                      Denver
701  940  879    0                                 Houston
1936 1745  831 1374    0                            Los Angeles
604 1188 1726  968 2339    0                       Miami
748  713 1631 1420 2451 1092    0                  New York
2139 1858  949 1645  347 2594 2571    0             San Francisco
2182 1737 1021 1891  959 2734 2408  678    0        Seattle
543  597 1494 1220 2300  923  205 2442 2329    0   Washington D.C.
;
```
```*-----Fill in Upper Triangle of Distance Matrix---------------;
proc transpose out=tran;
copy city;
run;
```
```data mileages(type=distance);
merge mileages tran;
array var[*] atlanta--dc;
array col[*] col1-col10;
do i = 1 to 10;
var[i] = sum(var[i], col[i]);
end;
drop col1-col10 _name_ i;
run;
```
```*-----Clustering with K-Nearest-Neighbor Density Estimates-----;
proc modeclus data=mileages all m=1 k=3;
id CITY;
run;
```

Output 57.2.1 Clustering with K-Nearest-Neighbor Density Estimates
 Modeclus Analysis of 10 American Cities Based on Flying Mileages

The MODECLUS Procedure

Nearest Neighbor List
City Neighbor Distance
Atlanta Washington D.C. 543.0000000
Chicago 587.0000000
Chicago Atlanta 587.0000000
Washington D.C. 597.0000000
Denver Los Angeles 831.0000000
Houston 879.0000000
Houston Atlanta 701.0000000
Denver 879.0000000
Los Angeles San Francisco 347.0000000
Denver 831.0000000
Miami Atlanta 604.0000000
Washington D.C. 923.0000000
New York Washington D.C. 205.0000000
Chicago 713.0000000
San Francisco Los Angeles 347.0000000
Seattle 678.0000000
Seattle San Francisco 678.0000000
Los Angeles 959.0000000
Washington D.C. New York 205.0000000
Atlanta 543.0000000

 Modeclus Analysis of 10 American Cities Based on Flying Mileages

The MODECLUS Procedure
K=3 METHOD=1

Sums of Density Estimates Within Neighborhood
Cluster City Estimated
Density
Same
Cluster
Other
Clusters
Total Cluster
Proportion
Same/Total
1 Atlanta 0.00025554 0.0005275 0 0.0005275 1.000
Chicago 0.00025126 0.00053178 0 0.00053178 1.000
Houston 0.00017065 0.00025554 0.00017065 0.00042619 0.600
Miami 0.00016251 0.00053178 0 0.00053178 1.000
New York 0.00021038 0.0005275 0 0.0005275 1.000
Washington D.C. 0.00027624 0.00046592 0 0.00046592 1.000
2 Denver 0.00017065 0.00018051 0.00017065 0.00035115 0.514
Los Angeles 0.00018051 0.00039189 0 0.00039189 1.000
San Francisco 0.00022124 0.00033692 0 0.00033692 1.000
Seattle 0.00015641 0.00040174 0 0.00040174 1.000

Boundary Objects -Cluster Proportions-
City Density Cluster 1 2
Denver 0.0001706485 2 0.486 0.514
Houston 0.0001706485 1 0.600 0.400

Cluster Statistics
Cluster Frequency Maximum
Estimated
Density
Boundary
Frequency
Estimated
Density
1 6 0.00027624 1 0.00017065
2 4 0.00022124 1 0.00017065

 Modeclus Analysis of 10 American Cities Based on Flying Mileages

The MODECLUS Procedure

Cluster Summary
K Number of
Clusters
Frequency of
Unclassified
Objects
3 2 0

The following statements produce Output 57.2.2:

```*------Clustering with Uniform-Kernel Density Estimates--------;
proc modeclus data=mileages all m=1 r=600 800;
id CITY;
run;
```

Output 57.2.2 Clustering with Uniform-Kernel Density Estimates
 Modeclus Analysis of 10 American Cities Based on Flying Mileages

The MODECLUS Procedure

Nearest Neighbor List
City Neighbor Distance
Atlanta Washington D.C. 543.0000000
Chicago 587.0000000
Miami 604.0000000
Houston 701.0000000
New York 748.0000000
Chicago Atlanta 587.0000000
Washington D.C. 597.0000000
New York 713.0000000
Houston Atlanta 701.0000000
Los Angeles San Francisco 347.0000000
Miami Atlanta 604.0000000
New York Washington D.C. 205.0000000
Chicago 713.0000000
Atlanta 748.0000000
San Francisco Los Angeles 347.0000000
Seattle 678.0000000
Seattle San Francisco 678.0000000
Washington D.C. New York 205.0000000
Atlanta 543.0000000
Chicago 597.0000000

 Modeclus Analysis of 10 American Cities Based on Flying Mileages

The MODECLUS Procedure
R=600 METHOD=1

Sums of Density Estimates Within Neighborhood
Cluster City Estimated
Density
Same
Cluster
Other
Clusters
Total Cluster
Proportion
Same/Total
1 Atlanta 0.00025 0.00058333 0 0.00058333 1.000
Chicago 0.00025 0.00058333 0 0.00058333 1.000
New York 0.00016667 0.00033333 0 0.00033333 1.000
Washington D.C. 0.00033333 0.00066667 0 0.00066667 1.000
2 Los Angeles 0.00016667 0.00016667 0 0.00016667 1.000
San Francisco 0.00016667 0.00016667 0 0.00016667 1.000
3 Denver 0.00008333 0 0 0 .
4 Houston 0.00008333 0 0 0 .
5 Miami 0.00008333 0 0 0 .
6 Seattle 0.00008333 0 0 0 .

No Boundary Objects

Cluster Statistics
Cluster Frequency Maximum
Estimated
Density
Boundary
Frequency
Estimated
Density
1 4 0.00033333 0 .
2 2 0.00016667 0 .
3 1 0.00008333 0 .
4 1 0.00008333 0 .
5 1 0.00008333 0 .
6 1 0.00008333 0 .

 Modeclus Analysis of 10 American Cities Based on Flying Mileages

The MODECLUS Procedure
R=800 METHOD=1

Sums of Density Estimates Within Neighborhood
Cluster City Estimated
Density
Same
Cluster
Other
Clusters
Total Cluster
Proportion
Same/Total
1 Atlanta 0.000375 0.001 0 0.001 1.000
Chicago 0.00025 0.000875 0 0.000875 1.000
Houston 0.000125 0.000375 0 0.000375 1.000
Miami 0.000125 0.000375 0 0.000375 1.000
New York 0.00025 0.000875 0 0.000875 1.000
Washington D.C. 0.00025 0.000875 0 0.000875 1.000
2 Los Angeles 0.000125 0.0001875 0 0.0001875 1.000
San Francisco 0.0001875 0.00025 0 0.00025 1.000
Seattle 0.000125 0.0001875 0 0.0001875 1.000
3 Denver 0.0000625 0 0 0 .

No Boundary Objects

Cluster Statistics
Cluster Frequency Maximum
Estimated
Density
Boundary
Frequency
Estimated
Density
1 6 0.000375 0 .
2 3 0.0001875 0 .
3 1 0.0000625 0 .

 Modeclus Analysis of 10 American Cities Based on Flying Mileages

The MODECLUS Procedure

Cluster Summary
R Number of
Clusters
Frequency of
Unclassified
Objects
600 6 0
800 3 0

The following statements produce Output 57.2.3:

```*------Clustering Neighborhoods Extended to Nearest Neighbor--------;
proc modeclus data=mileages list m=1 ck=2 r=600 800;
id CITY;
run;
```

Output 57.2.3 Uniform-Kernel Density Estimates, Clustering Neighborhoods Extended to Nearest Neighbor
 Modeclus Analysis of 10 American Cities Based on Flying Mileages

The MODECLUS Procedure
CK=2 R=600 METHOD=1

Sums of Density Estimates Within Neighborhood
Cluster City Estimated
Density
Same
Cluster
Other
Clusters
Total Cluster
Proportion
Same/Total
1 Atlanta 0.00025 0.00058333 0 0.00058333 1.000
Chicago 0.00025 0.00058333 0 0.00058333 1.000
Houston 0.00008333 0.00025 0 0.00025 1.000
Miami 0.00008333 0.00025 0 0.00025 1.000
New York 0.00016667 0.00033333 0 0.00033333 1.000
Washington D.C. 0.00033333 0.00066667 0 0.00066667 1.000
2 Denver 0.00008333 0.00016667 0 0.00016667 1.000
Los Angeles 0.00016667 0.00016667 0 0.00016667 1.000
San Francisco 0.00016667 0.00016667 0 0.00016667 1.000
Seattle 0.00008333 0.00016667 0 0.00016667 1.000

Cluster Statistics
Cluster Frequency Maximum
Estimated
Density
Boundary
Frequency
Estimated
Density
1 6 0.00033333 0 .
2 4 0.00016667 0 .

 Modeclus Analysis of 10 American Cities Based on Flying Mileages

The MODECLUS Procedure
CK=2 R=800 METHOD=1

Sums of Density Estimates Within Neighborhood
Cluster City Estimated
Density
Same
Cluster
Other
Clusters
Total Cluster
Proportion
Same/Total
1 Atlanta 0.000375 0.001 0 0.001 1.000
Chicago 0.00025 0.000875 0 0.000875 1.000
Houston 0.000125 0.000375 0 0.000375 1.000
Miami 0.000125 0.000375 0 0.000375 1.000
New York 0.00025 0.000875 0 0.000875 1.000
Washington D.C. 0.00025 0.000875 0 0.000875 1.000
2 Denver 0.0000625 0.000125 0 0.000125 1.000
Los Angeles 0.000125 0.0001875 0 0.0001875 1.000
San Francisco 0.0001875 0.00025 0 0.00025 1.000
Seattle 0.000125 0.0001875 0 0.0001875 1.000

Cluster Statistics
Cluster Frequency Maximum
Estimated
Density
Boundary
Frequency
Estimated
Density
1 6 0.000375 0 .
2 4 0.0001875 0 .

 Modeclus Analysis of 10 American Cities Based on Flying Mileages

The MODECLUS Procedure

Cluster Summary
R CK Number of
Clusters
Frequency of
Unclassified
Objects
600 2 2 0
800 2 2 0

 Previous Page | Next Page | Top of Page