This example uses distance data and illustrates the use of the TRANSPOSE procedure and the DATA step to fill in the upper triangle of the distance matrix. A data set containing a table of flying mileages between
10 U.S. cities is available in the Sashelp library. The results are displayed in Output 66.2.1 through Output 66.2.3.
The following statements produce Output 66.2.1:
title 'Modeclus Analysis of 10 American Cities'; title2 'Based on Flying Mileages'; *-----Fill in Upper Triangle of Distance Matrix---------------; proc transpose data=sashelp.mileages out=tran; copy city; run;
data mileages(type=distance drop=col: _: i);
merge sashelp.mileages tran;
array var[10] atlanta--washingtondc;
array col[10];
do i = 1 to 10;
var[i] = sum(var[i], col[i]);
end;
run;
*-----Clustering with K-Nearest-Neighbor Density Estimates-----; proc modeclus data=mileages all m=1 k=3; id CITY; run;
Output 66.2.1: Clustering with K-Nearest-Neighbor Density Estimates
| Modeclus Analysis of 10 American Cities |
| Based on Flying Mileages |
| Nearest Neighbor List | ||
|---|---|---|
| City | Neighbor | Distance |
| Atlanta | Washington D.C. | 543.0000000 |
| Chicago | 587.0000000 | |
| Chicago | Atlanta | 587.0000000 |
| Washington D.C. | 597.0000000 | |
| Denver | Los Angeles | 831.0000000 |
| Houston | 879.0000000 | |
| Houston | Atlanta | 701.0000000 |
| Denver | 879.0000000 | |
| Los Angeles | San Francisco | 347.0000000 |
| Denver | 831.0000000 | |
| Miami | Atlanta | 604.0000000 |
| Washington D.C. | 923.0000000 | |
| New York | Washington D.C. | 205.0000000 |
| Chicago | 713.0000000 | |
| San Francisco | Los Angeles | 347.0000000 |
| Seattle | 678.0000000 | |
| Seattle | San Francisco | 678.0000000 |
| Los Angeles | 959.0000000 | |
| Washington D.C. | New York | 205.0000000 |
| Atlanta | 543.0000000 | |
| Modeclus Analysis of 10 American Cities |
| Based on Flying Mileages |
| Sums of Density Estimates Within Neighborhood | ||||||
|---|---|---|---|---|---|---|
| Cluster | City | Estimated Density |
Same Cluster |
Other Clusters |
Total | Cluster Proportion Same/Total |
| 1 | Atlanta | 0.00025554 | 0.0005275 | 0 | 0.0005275 | 1.000 |
| Chicago | 0.00025126 | 0.00053178 | 0 | 0.00053178 | 1.000 | |
| Houston | 0.00017065 | 0.00025554 | 0.00017065 | 0.00042619 | 0.600 | |
| Miami | 0.00016251 | 0.00053178 | 0 | 0.00053178 | 1.000 | |
| New York | 0.00021038 | 0.0005275 | 0 | 0.0005275 | 1.000 | |
| Washington D.C. | 0.00027624 | 0.00046592 | 0 | 0.00046592 | 1.000 | |
| 2 | Denver | 0.00017065 | 0.00018051 | 0.00017065 | 0.00035115 | 0.514 |
| Los Angeles | 0.00018051 | 0.00039189 | 0 | 0.00039189 | 1.000 | |
| San Francisco | 0.00022124 | 0.00033692 | 0 | 0.00033692 | 1.000 | |
| Seattle | 0.00015641 | 0.00040174 | 0 | 0.00040174 | 1.000 | |
The following statements produce Output 66.2.2:
*------Clustering with Uniform-Kernel Density Estimates--------; proc modeclus data=mileages all m=1 r=600 800; id CITY; run;
Output 66.2.2: Clustering with Uniform-Kernel Density Estimates
| Modeclus Analysis of 10 American Cities |
| Based on Flying Mileages |
| Nearest Neighbor List | ||
|---|---|---|
| City | Neighbor | Distance |
| Atlanta | Washington D.C. | 543.0000000 |
| Chicago | 587.0000000 | |
| Miami | 604.0000000 | |
| Houston | 701.0000000 | |
| New York | 748.0000000 | |
| Chicago | Atlanta | 587.0000000 |
| Washington D.C. | 597.0000000 | |
| New York | 713.0000000 | |
| Houston | Atlanta | 701.0000000 |
| Los Angeles | San Francisco | 347.0000000 |
| Miami | Atlanta | 604.0000000 |
| New York | Washington D.C. | 205.0000000 |
| Chicago | 713.0000000 | |
| Atlanta | 748.0000000 | |
| San Francisco | Los Angeles | 347.0000000 |
| Seattle | 678.0000000 | |
| Seattle | San Francisco | 678.0000000 |
| Washington D.C. | New York | 205.0000000 |
| Atlanta | 543.0000000 | |
| Chicago | 597.0000000 | |
| Modeclus Analysis of 10 American Cities |
| Based on Flying Mileages |
| Sums of Density Estimates Within Neighborhood | ||||||
|---|---|---|---|---|---|---|
| Cluster | City | Estimated Density |
Same Cluster |
Other Clusters |
Total | Cluster Proportion Same/Total |
| 1 | Atlanta | 0.00025 | 0.00058333 | 0 | 0.00058333 | 1.000 |
| Chicago | 0.00025 | 0.00058333 | 0 | 0.00058333 | 1.000 | |
| New York | 0.00016667 | 0.00033333 | 0 | 0.00033333 | 1.000 | |
| Washington D.C. | 0.00033333 | 0.00066667 | 0 | 0.00066667 | 1.000 | |
| 2 | Los Angeles | 0.00016667 | 0.00016667 | 0 | 0.00016667 | 1.000 |
| San Francisco | 0.00016667 | 0.00016667 | 0 | 0.00016667 | 1.000 | |
| 3 | Denver | 0.00008333 | 0 | 0 | 0 | . |
| 4 | Houston | 0.00008333 | 0 | 0 | 0 | . |
| 5 | Miami | 0.00008333 | 0 | 0 | 0 | . |
| 6 | Seattle | 0.00008333 | 0 | 0 | 0 | . |
| Modeclus Analysis of 10 American Cities |
| Based on Flying Mileages |
| Sums of Density Estimates Within Neighborhood | ||||||
|---|---|---|---|---|---|---|
| Cluster | City | Estimated Density |
Same Cluster |
Other Clusters |
Total | Cluster Proportion Same/Total |
| 1 | Atlanta | 0.000375 | 0.001 | 0 | 0.001 | 1.000 |
| Chicago | 0.00025 | 0.000875 | 0 | 0.000875 | 1.000 | |
| Houston | 0.000125 | 0.000375 | 0 | 0.000375 | 1.000 | |
| Miami | 0.000125 | 0.000375 | 0 | 0.000375 | 1.000 | |
| New York | 0.00025 | 0.000875 | 0 | 0.000875 | 1.000 | |
| Washington D.C. | 0.00025 | 0.000875 | 0 | 0.000875 | 1.000 | |
| 2 | Los Angeles | 0.000125 | 0.0001875 | 0 | 0.0001875 | 1.000 |
| San Francisco | 0.0001875 | 0.00025 | 0 | 0.00025 | 1.000 | |
| Seattle | 0.000125 | 0.0001875 | 0 | 0.0001875 | 1.000 | |
| 3 | Denver | 0.0000625 | 0 | 0 | 0 | . |
The following statements produce Output 66.2.3:
*------Clustering Neighborhoods Extended to Nearest Neighbor--------; proc modeclus data=mileages list m=1 ck=2 r=600 800; id CITY; run;
Output 66.2.3: Uniform-Kernel Density Estimates, Clustering Neighborhoods Extended to Nearest Neighbor
| Modeclus Analysis of 10 American Cities |
| Based on Flying Mileages |
| Sums of Density Estimates Within Neighborhood | ||||||
|---|---|---|---|---|---|---|
| Cluster | City | Estimated Density |
Same Cluster |
Other Clusters |
Total | Cluster Proportion Same/Total |
| 1 | Atlanta | 0.00025 | 0.00058333 | 0 | 0.00058333 | 1.000 |
| Chicago | 0.00025 | 0.00058333 | 0 | 0.00058333 | 1.000 | |
| Houston | 0.00008333 | 0.00025 | 0 | 0.00025 | 1.000 | |
| Miami | 0.00008333 | 0.00025 | 0 | 0.00025 | 1.000 | |
| New York | 0.00016667 | 0.00033333 | 0 | 0.00033333 | 1.000 | |
| Washington D.C. | 0.00033333 | 0.00066667 | 0 | 0.00066667 | 1.000 | |
| 2 | Denver | 0.00008333 | 0.00016667 | 0 | 0.00016667 | 1.000 |
| Los Angeles | 0.00016667 | 0.00016667 | 0 | 0.00016667 | 1.000 | |
| San Francisco | 0.00016667 | 0.00016667 | 0 | 0.00016667 | 1.000 | |
| Seattle | 0.00008333 | 0.00016667 | 0 | 0.00016667 | 1.000 | |
| Modeclus Analysis of 10 American Cities |
| Based on Flying Mileages |
| Sums of Density Estimates Within Neighborhood | ||||||
|---|---|---|---|---|---|---|
| Cluster | City | Estimated Density |
Same Cluster |
Other Clusters |
Total | Cluster Proportion Same/Total |
| 1 | Atlanta | 0.000375 | 0.001 | 0 | 0.001 | 1.000 |
| Chicago | 0.00025 | 0.000875 | 0 | 0.000875 | 1.000 | |
| Houston | 0.000125 | 0.000375 | 0 | 0.000375 | 1.000 | |
| Miami | 0.000125 | 0.000375 | 0 | 0.000375 | 1.000 | |
| New York | 0.00025 | 0.000875 | 0 | 0.000875 | 1.000 | |
| Washington D.C. | 0.00025 | 0.000875 | 0 | 0.000875 | 1.000 | |
| 2 | Denver | 0.0000625 | 0.000125 | 0 | 0.000125 | 1.000 |
| Los Angeles | 0.000125 | 0.0001875 | 0 | 0.0001875 | 1.000 | |
| San Francisco | 0.0001875 | 0.00025 | 0 | 0.00025 | 1.000 | |
| Seattle | 0.000125 | 0.0001875 | 0 | 0.0001875 | 1.000 | |