The MODECLUS Procedure |
This example uses distance data and illustrates the use of the TRANSPOSE procedure and the DATA step to fill in the upper triangle of the distance matrix. The results are displayed in Output 57.2.1 through Output 57.2.3.
The following statements produce Output 57.2.1:
data mileages(type=distance); title 'Modeclus Analysis of 10 American Cities'; title2 'Based on Flying Mileages'; input (ATLANTA CHICAGO DENVER HOUSTON LOSANGELES MIAMI NEWYORK SANFRAN SEATTLE WASHDC) (5.) @53 CITY $15.; datalines; 0 ATLANTA 587 0 CHICAGO 1212 920 0 DENVER 701 940 879 0 HOUSTON 1936 1745 831 1374 0 LOS ANGELES 604 1188 1726 968 2339 0 MIAMI 748 713 1631 1420 2451 1092 0 NEW YORK 2139 1858 949 1645 347 2594 2571 0 SAN FRANCISCO 2182 1737 1021 1891 959 2734 2408 678 0 SEATTLE 543 597 1494 1220 2300 923 205 2442 2329 0 WASHINGTON D.C. ;
*-----Fill in Upper Triangle of Distance Matrix---------------; proc transpose out=tran; copy CITY; run;
data mileages(type=distance); merge mileages tran; array var ATLANTA--WASHDC; array col col1-col10; drop col1-col10 _name_; do over var; var=sum(var,col); end;
*-----Clustering with K-Nearest-Neighbor Density Estimates-----; proc modeclus data=mileages all m=1 k=3; id CITY; run;
Nearest Neighbor List | ||
---|---|---|
CITY | Neighbor | Distance |
ATLANTA | WASHINGTON D.C. | 543.0000000 |
CHICAGO | 587.0000000 | |
CHICAGO | ATLANTA | 587.0000000 |
WASHINGTON D.C. | 597.0000000 | |
DENVER | LOS ANGELES | 831.0000000 |
HOUSTON | 879.0000000 | |
HOUSTON | ATLANTA | 701.0000000 |
DENVER | 879.0000000 | |
LOS ANGELES | SAN FRANCISCO | 347.0000000 |
DENVER | 831.0000000 | |
MIAMI | ATLANTA | 604.0000000 |
WASHINGTON D.C. | 923.0000000 | |
NEW YORK | WASHINGTON D.C. | 205.0000000 |
CHICAGO | 713.0000000 | |
SAN FRANCISCO | LOS ANGELES | 347.0000000 |
SEATTLE | 678.0000000 | |
SEATTLE | SAN FRANCISCO | 678.0000000 |
LOS ANGELES | 959.0000000 | |
WASHINGTON D.C. | NEW YORK | 205.0000000 |
ATLANTA | 543.0000000 |
Modeclus Analysis of 10 American Cities |
Based on Flying Mileages |
Sums of Density Estimates Within Neighborhood | ||||||
---|---|---|---|---|---|---|
Cluster | CITY | Estimated Density |
Same Cluster |
Other Clusters |
Total | Cluster Proportion Same/Total |
1 | ATLANTA | 0.00025554 | 0.0005275 | 0 | 0.0005275 | 1.000 |
CHICAGO | 0.00025126 | 0.00053178 | 0 | 0.00053178 | 1.000 | |
HOUSTON | 0.00017065 | 0.00025554 | 0.00017065 | 0.00042619 | 0.600 | |
MIAMI | 0.00016251 | 0.00053178 | 0 | 0.00053178 | 1.000 | |
NEW YORK | 0.00021038 | 0.0005275 | 0 | 0.0005275 | 1.000 | |
WASHINGTON D.C. | 0.00027624 | 0.00046592 | 0 | 0.00046592 | 1.000 | |
2 | DENVER | 0.00017065 | 0.00018051 | 0.00017065 | 0.00035115 | 0.514 |
LOS ANGELES | 0.00018051 | 0.00039189 | 0 | 0.00039189 | 1.000 | |
SAN FRANCISCO | 0.00022124 | 0.00033692 | 0 | 0.00033692 | 1.000 | |
SEATTLE | 0.00015641 | 0.00040174 | 0 | 0.00040174 | 1.000 |
Boundary Objects -Cluster Proportions- | ||||
---|---|---|---|---|
CITY | Density | Cluster | 1 | 2 |
DENVER | 0.0001706485 | 2 | 0.486 | 0.514 |
HOUSTON | 0.0001706485 | 1 | 0.600 | 0.400 |
The following statements produce Output 57.2.2:
*------Clustering with Uniform-Kernel Density Estimates--------; proc modeclus data=mileages all m=1 r=600 800; id CITY; run;
Nearest Neighbor List | ||
---|---|---|
CITY | Neighbor | Distance |
ATLANTA | WASHINGTON D.C. | 543.0000000 |
CHICAGO | 587.0000000 | |
MIAMI | 604.0000000 | |
HOUSTON | 701.0000000 | |
NEW YORK | 748.0000000 | |
CHICAGO | ATLANTA | 587.0000000 |
WASHINGTON D.C. | 597.0000000 | |
NEW YORK | 713.0000000 | |
HOUSTON | ATLANTA | 701.0000000 |
LOS ANGELES | SAN FRANCISCO | 347.0000000 |
MIAMI | ATLANTA | 604.0000000 |
NEW YORK | WASHINGTON D.C. | 205.0000000 |
CHICAGO | 713.0000000 | |
ATLANTA | 748.0000000 | |
SAN FRANCISCO | LOS ANGELES | 347.0000000 |
SEATTLE | 678.0000000 | |
SEATTLE | SAN FRANCISCO | 678.0000000 |
WASHINGTON D.C. | NEW YORK | 205.0000000 |
ATLANTA | 543.0000000 | |
CHICAGO | 597.0000000 |
Modeclus Analysis of 10 American Cities |
Based on Flying Mileages |
Sums of Density Estimates Within Neighborhood | ||||||
---|---|---|---|---|---|---|
Cluster | CITY | Estimated Density |
Same Cluster |
Other Clusters |
Total | Cluster Proportion Same/Total |
1 | ATLANTA | 0.00025 | 0.00058333 | 0 | 0.00058333 | 1.000 |
CHICAGO | 0.00025 | 0.00058333 | 0 | 0.00058333 | 1.000 | |
NEW YORK | 0.00016667 | 0.00033333 | 0 | 0.00033333 | 1.000 | |
WASHINGTON D.C. | 0.00033333 | 0.00066667 | 0 | 0.00066667 | 1.000 | |
2 | LOS ANGELES | 0.00016667 | 0.00016667 | 0 | 0.00016667 | 1.000 |
SAN FRANCISCO | 0.00016667 | 0.00016667 | 0 | 0.00016667 | 1.000 | |
3 | DENVER | 0.00008333 | 0 | 0 | 0 | . |
4 | HOUSTON | 0.00008333 | 0 | 0 | 0 | . |
5 | MIAMI | 0.00008333 | 0 | 0 | 0 | . |
6 | SEATTLE | 0.00008333 | 0 | 0 | 0 | . |
Cluster Statistics | ||||
---|---|---|---|---|
Cluster | Frequency | Maximum Estimated Density |
Boundary Frequency |
Estimated Saddle Density |
1 | 4 | 0.00033333 | 0 | . |
2 | 2 | 0.00016667 | 0 | . |
3 | 1 | 0.00008333 | 0 | . |
4 | 1 | 0.00008333 | 0 | . |
5 | 1 | 0.00008333 | 0 | . |
6 | 1 | 0.00008333 | 0 | . |
Modeclus Analysis of 10 American Cities |
Based on Flying Mileages |
Sums of Density Estimates Within Neighborhood | ||||||
---|---|---|---|---|---|---|
Cluster | CITY | Estimated Density |
Same Cluster |
Other Clusters |
Total | Cluster Proportion Same/Total |
1 | ATLANTA | 0.000375 | 0.001 | 0 | 0.001 | 1.000 |
CHICAGO | 0.00025 | 0.000875 | 0 | 0.000875 | 1.000 | |
HOUSTON | 0.000125 | 0.000375 | 0 | 0.000375 | 1.000 | |
MIAMI | 0.000125 | 0.000375 | 0 | 0.000375 | 1.000 | |
NEW YORK | 0.00025 | 0.000875 | 0 | 0.000875 | 1.000 | |
WASHINGTON D.C. | 0.00025 | 0.000875 | 0 | 0.000875 | 1.000 | |
2 | LOS ANGELES | 0.000125 | 0.0001875 | 0 | 0.0001875 | 1.000 |
SAN FRANCISCO | 0.0001875 | 0.00025 | 0 | 0.00025 | 1.000 | |
SEATTLE | 0.000125 | 0.0001875 | 0 | 0.0001875 | 1.000 | |
3 | DENVER | 0.0000625 | 0 | 0 | 0 | . |
Cluster Statistics | ||||
---|---|---|---|---|
Cluster | Frequency | Maximum Estimated Density |
Boundary Frequency |
Estimated Saddle Density |
1 | 6 | 0.000375 | 0 | . |
2 | 3 | 0.0001875 | 0 | . |
3 | 1 | 0.0000625 | 0 | . |
Modeclus Analysis of 10 American Cities |
Based on Flying Mileages |
Cluster Summary | ||
---|---|---|
R | Number of Clusters |
Frequency of Unclassified Objects |
600 | 6 | 0 |
800 | 3 | 0 |
The following statements produce Output 57.2.3:
*------Clustering Neighborhoods Extended to Nearest Neighbor--------; proc modeclus data=mileages list m=1 ck=2 r=600 800; id CITY; run;
Sums of Density Estimates Within Neighborhood | ||||||
---|---|---|---|---|---|---|
Cluster | CITY | Estimated Density |
Same Cluster |
Other Clusters |
Total | Cluster Proportion Same/Total |
1 | ATLANTA | 0.00025 | 0.00058333 | 0 | 0.00058333 | 1.000 |
CHICAGO | 0.00025 | 0.00058333 | 0 | 0.00058333 | 1.000 | |
HOUSTON | 0.00008333 | 0.00025 | 0 | 0.00025 | 1.000 | |
MIAMI | 0.00008333 | 0.00025 | 0 | 0.00025 | 1.000 | |
NEW YORK | 0.00016667 | 0.00033333 | 0 | 0.00033333 | 1.000 | |
WASHINGTON D.C. | 0.00033333 | 0.00066667 | 0 | 0.00066667 | 1.000 | |
2 | DENVER | 0.00008333 | 0.00016667 | 0 | 0.00016667 | 1.000 |
LOS ANGELES | 0.00016667 | 0.00016667 | 0 | 0.00016667 | 1.000 | |
SAN FRANCISCO | 0.00016667 | 0.00016667 | 0 | 0.00016667 | 1.000 | |
SEATTLE | 0.00008333 | 0.00016667 | 0 | 0.00016667 | 1.000 |
Cluster Statistics | ||||
---|---|---|---|---|
Cluster | Frequency | Maximum Estimated Density |
Boundary Frequency |
Estimated Saddle Density |
1 | 6 | 0.00033333 | 0 | . |
2 | 4 | 0.00016667 | 0 | . |
Modeclus Analysis of 10 American Cities |
Based on Flying Mileages |
Sums of Density Estimates Within Neighborhood | ||||||
---|---|---|---|---|---|---|
Cluster | CITY | Estimated Density |
Same Cluster |
Other Clusters |
Total | Cluster Proportion Same/Total |
1 | ATLANTA | 0.000375 | 0.001 | 0 | 0.001 | 1.000 |
CHICAGO | 0.00025 | 0.000875 | 0 | 0.000875 | 1.000 | |
HOUSTON | 0.000125 | 0.000375 | 0 | 0.000375 | 1.000 | |
MIAMI | 0.000125 | 0.000375 | 0 | 0.000375 | 1.000 | |
NEW YORK | 0.00025 | 0.000875 | 0 | 0.000875 | 1.000 | |
WASHINGTON D.C. | 0.00025 | 0.000875 | 0 | 0.000875 | 1.000 | |
2 | DENVER | 0.0000625 | 0.000125 | 0 | 0.000125 | 1.000 |
LOS ANGELES | 0.000125 | 0.0001875 | 0 | 0.0001875 | 1.000 | |
SAN FRANCISCO | 0.0001875 | 0.00025 | 0 | 0.00025 | 1.000 | |
SEATTLE | 0.000125 | 0.0001875 | 0 | 0.0001875 | 1.000 |
Cluster Statistics | ||||
---|---|---|---|---|
Cluster | Frequency | Maximum Estimated Density |
Boundary Frequency |
Estimated Saddle Density |
1 | 6 | 0.000375 | 0 | . |
2 | 4 | 0.0001875 | 0 | . |
Modeclus Analysis of 10 American Cities |
Based on Flying Mileages |
Cluster Summary | |||
---|---|---|---|
R | CK | Number of Clusters |
Frequency of Unclassified Objects |
600 | 2 | 2 | 0 |
800 | 2 | 2 | 0 |
Copyright © 2009 by SAS Institute Inc., Cary, NC, USA. All rights reserved.