This example uses distance data and illustrates the use of the TRANSPOSE procedure and the DATA step to fill in the upper triangle of the distance matrix. A data set containing a table of flying mileages between 10 U.S. cities is available in the Sashelp library. The results are displayed in Output 59.2.1 through Output 59.2.3.
The following statements produce Output 59.2.1:
title 'Modeclus Analysis of 10 American Cities'; title2 'Based on Flying Mileages'; *-----Fill in Upper Triangle of Distance Matrix---------------; proc transpose data=sashelp.mileages out=tran; copy city; run;
data mileages(type=distance drop=col: _: i); merge sashelp.mileages tran; array var[10] atlanta--washingtondc; array col[10]; do i = 1 to 10; var[i] = sum(var[i], col[i]); end; run;
*-----Clustering with K-Nearest-Neighbor Density Estimates-----; proc modeclus data=mileages all m=1 k=3; id CITY; run;
Modeclus Analysis of 10 American Cities |
Based on Flying Mileages |
Nearest Neighbor List | ||
---|---|---|
City | Neighbor | Distance |
Atlanta | Washington D.C. | 543.0000000 |
Chicago | 587.0000000 | |
Chicago | Atlanta | 587.0000000 |
Washington D.C. | 597.0000000 | |
Denver | Los Angeles | 831.0000000 |
Houston | 879.0000000 | |
Houston | Atlanta | 701.0000000 |
Denver | 879.0000000 | |
Los Angeles | San Francisco | 347.0000000 |
Denver | 831.0000000 | |
Miami | Atlanta | 604.0000000 |
Washington D.C. | 923.0000000 | |
New York | Washington D.C. | 205.0000000 |
Chicago | 713.0000000 | |
San Francisco | Los Angeles | 347.0000000 |
Seattle | 678.0000000 | |
Seattle | San Francisco | 678.0000000 |
Los Angeles | 959.0000000 | |
Washington D.C. | New York | 205.0000000 |
Atlanta | 543.0000000 |
Modeclus Analysis of 10 American Cities |
Based on Flying Mileages |
Sums of Density Estimates Within Neighborhood | ||||||
---|---|---|---|---|---|---|
Cluster | City | Estimated Density |
Same Cluster |
Other Clusters |
Total | Cluster Proportion Same/Total |
1 | Atlanta | 0.00025554 | 0.0005275 | 0 | 0.0005275 | 1.000 |
Chicago | 0.00025126 | 0.00053178 | 0 | 0.00053178 | 1.000 | |
Houston | 0.00017065 | 0.00025554 | 0.00017065 | 0.00042619 | 0.600 | |
Miami | 0.00016251 | 0.00053178 | 0 | 0.00053178 | 1.000 | |
New York | 0.00021038 | 0.0005275 | 0 | 0.0005275 | 1.000 | |
Washington D.C. | 0.00027624 | 0.00046592 | 0 | 0.00046592 | 1.000 | |
2 | Denver | 0.00017065 | 0.00018051 | 0.00017065 | 0.00035115 | 0.514 |
Los Angeles | 0.00018051 | 0.00039189 | 0 | 0.00039189 | 1.000 | |
San Francisco | 0.00022124 | 0.00033692 | 0 | 0.00033692 | 1.000 | |
Seattle | 0.00015641 | 0.00040174 | 0 | 0.00040174 | 1.000 |
Boundary Objects -Cluster Proportions- | ||||
---|---|---|---|---|
City | Density | Cluster | 1 | 2 |
Denver | 0.0001706485 | 2 | 0.486 | 0.514 |
Houston | 0.0001706485 | 1 | 0.600 | 0.400 |
Cluster Statistics | ||||
---|---|---|---|---|
Cluster | Frequency | Maximum Estimated Density |
Boundary Frequency |
Estimated Saddle Density |
1 | 6 | 0.00027624 | 1 | 0.00017065 |
2 | 4 | 0.00022124 | 1 | 0.00017065 |
Modeclus Analysis of 10 American Cities |
Based on Flying Mileages |
Cluster Summary | ||
---|---|---|
K | Number of Clusters |
Frequency of Unclassified Objects |
3 | 2 | 0 |
The following statements produce Output 59.2.2:
*------Clustering with Uniform-Kernel Density Estimates--------; proc modeclus data=mileages all m=1 r=600 800; id CITY; run;
Modeclus Analysis of 10 American Cities |
Based on Flying Mileages |
Nearest Neighbor List | ||
---|---|---|
City | Neighbor | Distance |
Atlanta | Washington D.C. | 543.0000000 |
Chicago | 587.0000000 | |
Miami | 604.0000000 | |
Houston | 701.0000000 | |
New York | 748.0000000 | |
Chicago | Atlanta | 587.0000000 |
Washington D.C. | 597.0000000 | |
New York | 713.0000000 | |
Houston | Atlanta | 701.0000000 |
Los Angeles | San Francisco | 347.0000000 |
Miami | Atlanta | 604.0000000 |
New York | Washington D.C. | 205.0000000 |
Chicago | 713.0000000 | |
Atlanta | 748.0000000 | |
San Francisco | Los Angeles | 347.0000000 |
Seattle | 678.0000000 | |
Seattle | San Francisco | 678.0000000 |
Washington D.C. | New York | 205.0000000 |
Atlanta | 543.0000000 | |
Chicago | 597.0000000 |
Modeclus Analysis of 10 American Cities |
Based on Flying Mileages |
Sums of Density Estimates Within Neighborhood | ||||||
---|---|---|---|---|---|---|
Cluster | City | Estimated Density |
Same Cluster |
Other Clusters |
Total | Cluster Proportion Same/Total |
1 | Atlanta | 0.00025 | 0.00058333 | 0 | 0.00058333 | 1.000 |
Chicago | 0.00025 | 0.00058333 | 0 | 0.00058333 | 1.000 | |
New York | 0.00016667 | 0.00033333 | 0 | 0.00033333 | 1.000 | |
Washington D.C. | 0.00033333 | 0.00066667 | 0 | 0.00066667 | 1.000 | |
2 | Los Angeles | 0.00016667 | 0.00016667 | 0 | 0.00016667 | 1.000 |
San Francisco | 0.00016667 | 0.00016667 | 0 | 0.00016667 | 1.000 | |
3 | Denver | 0.00008333 | 0 | 0 | 0 | . |
4 | Houston | 0.00008333 | 0 | 0 | 0 | . |
5 | Miami | 0.00008333 | 0 | 0 | 0 | . |
6 | Seattle | 0.00008333 | 0 | 0 | 0 | . |
Cluster Statistics | ||||
---|---|---|---|---|
Cluster | Frequency | Maximum Estimated Density |
Boundary Frequency |
Estimated Saddle Density |
1 | 4 | 0.00033333 | 0 | . |
2 | 2 | 0.00016667 | 0 | . |
3 | 1 | 0.00008333 | 0 | . |
4 | 1 | 0.00008333 | 0 | . |
5 | 1 | 0.00008333 | 0 | . |
6 | 1 | 0.00008333 | 0 | . |
Modeclus Analysis of 10 American Cities |
Based on Flying Mileages |
Sums of Density Estimates Within Neighborhood | ||||||
---|---|---|---|---|---|---|
Cluster | City | Estimated Density |
Same Cluster |
Other Clusters |
Total | Cluster Proportion Same/Total |
1 | Atlanta | 0.000375 | 0.001 | 0 | 0.001 | 1.000 |
Chicago | 0.00025 | 0.000875 | 0 | 0.000875 | 1.000 | |
Houston | 0.000125 | 0.000375 | 0 | 0.000375 | 1.000 | |
Miami | 0.000125 | 0.000375 | 0 | 0.000375 | 1.000 | |
New York | 0.00025 | 0.000875 | 0 | 0.000875 | 1.000 | |
Washington D.C. | 0.00025 | 0.000875 | 0 | 0.000875 | 1.000 | |
2 | Los Angeles | 0.000125 | 0.0001875 | 0 | 0.0001875 | 1.000 |
San Francisco | 0.0001875 | 0.00025 | 0 | 0.00025 | 1.000 | |
Seattle | 0.000125 | 0.0001875 | 0 | 0.0001875 | 1.000 | |
3 | Denver | 0.0000625 | 0 | 0 | 0 | . |
Cluster Statistics | ||||
---|---|---|---|---|
Cluster | Frequency | Maximum Estimated Density |
Boundary Frequency |
Estimated Saddle Density |
1 | 6 | 0.000375 | 0 | . |
2 | 3 | 0.0001875 | 0 | . |
3 | 1 | 0.0000625 | 0 | . |
Modeclus Analysis of 10 American Cities |
Based on Flying Mileages |
Cluster Summary | ||
---|---|---|
R | Number of Clusters |
Frequency of Unclassified Objects |
600 | 6 | 0 |
800 | 3 | 0 |
The following statements produce Output 59.2.3:
*------Clustering Neighborhoods Extended to Nearest Neighbor--------; proc modeclus data=mileages list m=1 ck=2 r=600 800; id CITY; run;
Modeclus Analysis of 10 American Cities |
Based on Flying Mileages |
Sums of Density Estimates Within Neighborhood | ||||||
---|---|---|---|---|---|---|
Cluster | City | Estimated Density |
Same Cluster |
Other Clusters |
Total | Cluster Proportion Same/Total |
1 | Atlanta | 0.00025 | 0.00058333 | 0 | 0.00058333 | 1.000 |
Chicago | 0.00025 | 0.00058333 | 0 | 0.00058333 | 1.000 | |
Houston | 0.00008333 | 0.00025 | 0 | 0.00025 | 1.000 | |
Miami | 0.00008333 | 0.00025 | 0 | 0.00025 | 1.000 | |
New York | 0.00016667 | 0.00033333 | 0 | 0.00033333 | 1.000 | |
Washington D.C. | 0.00033333 | 0.00066667 | 0 | 0.00066667 | 1.000 | |
2 | Denver | 0.00008333 | 0.00016667 | 0 | 0.00016667 | 1.000 |
Los Angeles | 0.00016667 | 0.00016667 | 0 | 0.00016667 | 1.000 | |
San Francisco | 0.00016667 | 0.00016667 | 0 | 0.00016667 | 1.000 | |
Seattle | 0.00008333 | 0.00016667 | 0 | 0.00016667 | 1.000 |
Cluster Statistics | ||||
---|---|---|---|---|
Cluster | Frequency | Maximum Estimated Density |
Boundary Frequency |
Estimated Saddle Density |
1 | 6 | 0.00033333 | 0 | . |
2 | 4 | 0.00016667 | 0 | . |
Modeclus Analysis of 10 American Cities |
Based on Flying Mileages |
Sums of Density Estimates Within Neighborhood | ||||||
---|---|---|---|---|---|---|
Cluster | City | Estimated Density |
Same Cluster |
Other Clusters |
Total | Cluster Proportion Same/Total |
1 | Atlanta | 0.000375 | 0.001 | 0 | 0.001 | 1.000 |
Chicago | 0.00025 | 0.000875 | 0 | 0.000875 | 1.000 | |
Houston | 0.000125 | 0.000375 | 0 | 0.000375 | 1.000 | |
Miami | 0.000125 | 0.000375 | 0 | 0.000375 | 1.000 | |
New York | 0.00025 | 0.000875 | 0 | 0.000875 | 1.000 | |
Washington D.C. | 0.00025 | 0.000875 | 0 | 0.000875 | 1.000 | |
2 | Denver | 0.0000625 | 0.000125 | 0 | 0.000125 | 1.000 |
Los Angeles | 0.000125 | 0.0001875 | 0 | 0.0001875 | 1.000 | |
San Francisco | 0.0001875 | 0.00025 | 0 | 0.00025 | 1.000 | |
Seattle | 0.000125 | 0.0001875 | 0 | 0.0001875 | 1.000 |
Cluster Statistics | ||||
---|---|---|---|---|
Cluster | Frequency | Maximum Estimated Density |
Boundary Frequency |
Estimated Saddle Density |
1 | 6 | 0.000375 | 0 | . |
2 | 4 | 0.0001875 | 0 | . |
Modeclus Analysis of 10 American Cities |
Based on Flying Mileages |
Cluster Summary | |||
---|---|---|---|
R | CK | Number of Clusters |
Frequency of Unclassified Objects |
600 | 2 | 2 | 0 |
800 | 2 | 2 | 0 |