# The MODECLUS Procedure

### Example 60.2 Cluster Analysis of Flying Mileages between Ten American Cities

This example uses distance data and illustrates the use of the TRANSPOSE procedure and the DATA step to fill in the upper triangle of the distance matrix. A data set containing a table of flying mileages between 10 U.S. cities is available in the `Sashelp` library. The results are displayed in Output 60.2.1 through Output 60.2.3.

The following statements produce Output 60.2.1:

```title 'Modeclus Analysis of 10 American Cities';
title2 'Based on Flying Mileages';

*-----Fill in Upper Triangle of Distance Matrix---------------;
proc transpose data=sashelp.mileages out=tran;
copy city;
run;
```
```data mileages(type=distance drop=col: _: i);
merge sashelp.mileages tran;
array var[10] atlanta--washingtondc;
array col[10];
do i = 1 to 10;
var[i] = sum(var[i], col[i]);
end;
run;
```
```*-----Clustering with K-Nearest-Neighbor Density Estimates-----;
proc modeclus data=mileages all m=1 k=3;
id CITY;
run;
```

Output 60.2.1: Clustering with K-Nearest-Neighbor Density Estimates

 Modeclus Analysis of 10 American Cities Based on Flying Mileages

The MODECLUS Procedure

Nearest Neighbor List
City Neighbor Distance
Atlanta Washington D.C. 543.0000000
Chicago 587.0000000
Chicago Atlanta 587.0000000
Washington D.C. 597.0000000
Denver Los Angeles 831.0000000
Houston 879.0000000
Houston Atlanta 701.0000000
Denver 879.0000000
Los Angeles San Francisco 347.0000000
Denver 831.0000000
Miami Atlanta 604.0000000
Washington D.C. 923.0000000
New York Washington D.C. 205.0000000
Chicago 713.0000000
San Francisco Los Angeles 347.0000000
Seattle 678.0000000
Seattle San Francisco 678.0000000
Los Angeles 959.0000000
Washington D.C. New York 205.0000000
Atlanta 543.0000000

 Modeclus Analysis of 10 American Cities Based on Flying Mileages

The MODECLUS Procedure
K=3 METHOD=1

Sums of Density Estimates Within Neighborhood
Cluster City Estimated
Density
Same
Cluster
Other
Clusters
Total Cluster
Proportion
Same/Total
1 Atlanta 0.00025554 0.0005275 0 0.0005275 1.000
Chicago 0.00025126 0.00053178 0 0.00053178 1.000
Houston 0.00017065 0.00025554 0.00017065 0.00042619 0.600
Miami 0.00016251 0.00053178 0 0.00053178 1.000
New York 0.00021038 0.0005275 0 0.0005275 1.000
Washington D.C. 0.00027624 0.00046592 0 0.00046592 1.000
2 Denver 0.00017065 0.00018051 0.00017065 0.00035115 0.514
Los Angeles 0.00018051 0.00039189 0 0.00039189 1.000
San Francisco 0.00022124 0.00033692 0 0.00033692 1.000
Seattle 0.00015641 0.00040174 0 0.00040174 1.000

Boundary Objects -Cluster Proportions-
City Density Cluster 1 2
Denver 0.0001706485 2 0.486 0.514
Houston 0.0001706485 1 0.600 0.400

Cluster Statistics
Cluster Frequency Maximum
Estimated
Density
Boundary
Frequency
Estimated
Saddle
Density
1 6 0.00027624 1 0.00017065
2 4 0.00022124 1 0.00017065

 Modeclus Analysis of 10 American Cities Based on Flying Mileages

The MODECLUS Procedure

Cluster Summary
K Number of
Clusters
Frequency of
Unclassified
Objects
3 2 0

The following statements produce Output 60.2.2:

```*------Clustering with Uniform-Kernel Density Estimates--------;
proc modeclus data=mileages all m=1 r=600 800;
id CITY;
run;
```

Output 60.2.2: Clustering with Uniform-Kernel Density Estimates

 Modeclus Analysis of 10 American Cities Based on Flying Mileages

The MODECLUS Procedure

Nearest Neighbor List
City Neighbor Distance
Atlanta Washington D.C. 543.0000000
Chicago 587.0000000
Miami 604.0000000
Houston 701.0000000
New York 748.0000000
Chicago Atlanta 587.0000000
Washington D.C. 597.0000000
New York 713.0000000
Houston Atlanta 701.0000000
Los Angeles San Francisco 347.0000000
Miami Atlanta 604.0000000
New York Washington D.C. 205.0000000
Chicago 713.0000000
Atlanta 748.0000000
San Francisco Los Angeles 347.0000000
Seattle 678.0000000
Seattle San Francisco 678.0000000
Washington D.C. New York 205.0000000
Atlanta 543.0000000
Chicago 597.0000000

 Modeclus Analysis of 10 American Cities Based on Flying Mileages

The MODECLUS Procedure
R=600 METHOD=1

Sums of Density Estimates Within Neighborhood
Cluster City Estimated
Density
Same
Cluster
Other
Clusters
Total Cluster
Proportion
Same/Total
1 Atlanta 0.00025 0.00058333 0 0.00058333 1.000
Chicago 0.00025 0.00058333 0 0.00058333 1.000
New York 0.00016667 0.00033333 0 0.00033333 1.000
Washington D.C. 0.00033333 0.00066667 0 0.00066667 1.000
2 Los Angeles 0.00016667 0.00016667 0 0.00016667 1.000
San Francisco 0.00016667 0.00016667 0 0.00016667 1.000
3 Denver 0.00008333 0 0 0 .
4 Houston 0.00008333 0 0 0 .
5 Miami 0.00008333 0 0 0 .
6 Seattle 0.00008333 0 0 0 .

No Boundary Objects

Cluster Statistics
Cluster Frequency Maximum
Estimated
Density
Boundary
Frequency
Estimated
Saddle
Density
1 4 0.00033333 0 .
2 2 0.00016667 0 .
3 1 0.00008333 0 .
4 1 0.00008333 0 .
5 1 0.00008333 0 .
6 1 0.00008333 0 .

 Modeclus Analysis of 10 American Cities Based on Flying Mileages

The MODECLUS Procedure
R=800 METHOD=1

Sums of Density Estimates Within Neighborhood
Cluster City Estimated
Density
Same
Cluster
Other
Clusters
Total Cluster
Proportion
Same/Total
1 Atlanta 0.000375 0.001 0 0.001 1.000
Chicago 0.00025 0.000875 0 0.000875 1.000
Houston 0.000125 0.000375 0 0.000375 1.000
Miami 0.000125 0.000375 0 0.000375 1.000
New York 0.00025 0.000875 0 0.000875 1.000
Washington D.C. 0.00025 0.000875 0 0.000875 1.000
2 Los Angeles 0.000125 0.0001875 0 0.0001875 1.000
San Francisco 0.0001875 0.00025 0 0.00025 1.000
Seattle 0.000125 0.0001875 0 0.0001875 1.000
3 Denver 0.0000625 0 0 0 .

No Boundary Objects

Cluster Statistics
Cluster Frequency Maximum
Estimated
Density
Boundary
Frequency
Estimated
Saddle
Density
1 6 0.000375 0 .
2 3 0.0001875 0 .
3 1 0.0000625 0 .

 Modeclus Analysis of 10 American Cities Based on Flying Mileages

The MODECLUS Procedure

Cluster Summary
R Number of
Clusters
Frequency of
Unclassified
Objects
600 6 0
800 3 0

The following statements produce Output 60.2.3:

```*------Clustering Neighborhoods Extended to Nearest Neighbor--------;
proc modeclus data=mileages list m=1 ck=2 r=600 800;
id CITY;
run;
```

Output 60.2.3: Uniform-Kernel Density Estimates, Clustering Neighborhoods Extended to Nearest Neighbor

 Modeclus Analysis of 10 American Cities Based on Flying Mileages

The MODECLUS Procedure
CK=2 R=600 METHOD=1

Sums of Density Estimates Within Neighborhood
Cluster City Estimated
Density
Same
Cluster
Other
Clusters
Total Cluster
Proportion
Same/Total
1 Atlanta 0.00025 0.00058333 0 0.00058333 1.000
Chicago 0.00025 0.00058333 0 0.00058333 1.000
Houston 0.00008333 0.00025 0 0.00025 1.000
Miami 0.00008333 0.00025 0 0.00025 1.000
New York 0.00016667 0.00033333 0 0.00033333 1.000
Washington D.C. 0.00033333 0.00066667 0 0.00066667 1.000
2 Denver 0.00008333 0.00016667 0 0.00016667 1.000
Los Angeles 0.00016667 0.00016667 0 0.00016667 1.000
San Francisco 0.00016667 0.00016667 0 0.00016667 1.000
Seattle 0.00008333 0.00016667 0 0.00016667 1.000

Cluster Statistics
Cluster Frequency Maximum
Estimated
Density
Boundary
Frequency
Estimated
Saddle
Density
1 6 0.00033333 0 .
2 4 0.00016667 0 .

 Modeclus Analysis of 10 American Cities Based on Flying Mileages

The MODECLUS Procedure
CK=2 R=800 METHOD=1

Sums of Density Estimates Within Neighborhood
Cluster City Estimated
Density
Same
Cluster
Other
Clusters
Total Cluster
Proportion
Same/Total
1 Atlanta 0.000375 0.001 0 0.001 1.000
Chicago 0.00025 0.000875 0 0.000875 1.000
Houston 0.000125 0.000375 0 0.000375 1.000
Miami 0.000125 0.000375 0 0.000375 1.000
New York 0.00025 0.000875 0 0.000875 1.000
Washington D.C. 0.00025 0.000875 0 0.000875 1.000
2 Denver 0.0000625 0.000125 0 0.000125 1.000
Los Angeles 0.000125 0.0001875 0 0.0001875 1.000
San Francisco 0.0001875 0.00025 0 0.00025 1.000
Seattle 0.000125 0.0001875 0 0.0001875 1.000

Cluster Statistics
Cluster Frequency Maximum
Estimated
Density
Boundary
Frequency
Estimated
Saddle
Density
1 6 0.000375 0 .
2 4 0.0001875 0 .

 Modeclus Analysis of 10 American Cities Based on Flying Mileages

The MODECLUS Procedure

Cluster Summary
R CK Number of
Clusters
Frequency of
Unclassified
Objects
600 2 2 0
800 2 2 0