# The DISTANCE Procedure

### Example 36.1 Divorce Grounds – the Jaccard Coefficient

A wide variety of distance and similarity measures are used in cluster analysis (Anderberg 1973; Sneath and Sokal 1973). If your data are in coordinate form and you want to use a non-Euclidean distance for clustering, you can compute a distance matrix by using the DISTANCE procedure.

Similarity measures must be converted to dissimilarities before being used in PROC CLUSTER. Such conversion can be done in a variety of ways, such as taking reciprocals or subtracting from a large value. The choice of conversion method depends on the application and the similarity measure. If applicable, PROC DISTANCE provides a corresponding dissimilarity measure for each similarity measure.

In the following example, the observations are states. Binary-valued variables correspond to various grounds for divorce and indicate whether the grounds for divorce apply in each of the U.S. states. A value of "1" indicates that the ground for divorce applies, and a value of "0" indicates the opposite. The 0-0 matches are treated as totally irrelevant; therefore, each variable has an asymmetric nominal level of measurement. The absence value is 0.

The DISTANCE procedure is used to compute the Jaccard coefficient (Anderberg 1973, pp. 89, 115, and 117) between each pair of states. The Jaccard coefficient is defined as the number of variables that are coded as 1 for both states divided by the number of variables that are coded as 1 for either or both states. Since dissimilarity measures are required by PROC CLUSTER, the DJACCARD coefficient is selected. Output 36.1.1 displays the distance matrix between the first 10 states.

The CENTROID method is used to perform the cluster analysis, and the resulting tree diagram from PROC CLUSTER is saved into the `tree` output data set. Output 36.1.2 displays the cluster history.

The TREE procedure generates nine clusters in the output data set `out`. After being sorted by the state, the `out` data set is then merged with the input data set `divorce`. After being sorted by the state, the merged data set is printed to display the cluster membership as shown in Output 36.1.3.

The following statements produce Output 36.1.1 through Output 36.1.3:

```data divorce;
length State \$ 15;
input State &\$
Incompatibility Cruelty Desertion Non_Support Alcohol
Felony Impotence Insanity Separation @@;
datalines;
Alabama          1 1 1 1 1 1 1 1 1    Alaska           1 1 1 0 1 1 1 1 0
Arizona          1 0 0 0 0 0 0 0 0    Arkansas         0 1 1 1 1 1 1 1 1
California       1 0 0 0 0 0 0 1 0    Colorado         1 0 0 0 0 0 0 0 0
Connecticut      1 1 1 1 1 1 0 1 1    Delaware         1 0 0 0 0 0 0 0 1

... more lines ...

Wisconsin        1 0 0 0 0 0 0 0 1    Wyoming          1 0 0 0 0 0 0 1 1
;
```
```title 'Grounds for Divorce';
proc distance data=divorce method=djaccard absent=0 out=distjacc;
var anominal(Incompatibility--Separation);
id state;
run;
```
```proc print data=distjacc(obs=10);
id state; var alabama--georgia;
title2 'First 10 States';
run;
title2;
```
```proc cluster data=distjacc method=centroid
pseudo outtree=tree;
id state;
var alabama--wyoming;
run;

proc tree data=tree noprint n=9 out=out;
id state;
run;

proc sort;
by state;
run;

data clus;
merge divorce out;
by state;
run;

proc sort;
by cluster;
run;
```
```proc print;
id state;
var Incompatibility--Separation;
by cluster;
run;
```

Output 36.1.1: Distance Matrix Based on the Jaccard Coefficient

 Grounds for Divorce First 10 States

Alabama 0.00000 . . . . . . . . .
Alaska 0.22222 0.00000 . . . . . . . .
Arizona 0.88889 0.85714 0.00000 . . . . . . .
Arkansas 0.11111 0.33333 1.00000 0.00000 . . . . . .
California 0.77778 0.71429 0.50000 0.88889 0.00000 . . . . .
Colorado 0.88889 0.85714 0.00000 1.00000 0.50000 0.00000 . . . .
Connecticut 0.11111 0.33333 0.87500 0.22222 0.75000 0.87500 0.00000 . . .
Delaware 0.77778 0.87500 0.50000 0.88889 0.66667 0.50000 0.75000 0.00000 . .
Florida 0.77778 0.71429 0.50000 0.88889 0.00000 0.50000 0.75000 0.66667 0.00000 .
Georgia 0.22222 0.00000 0.85714 0.33333 0.71429 0.85714 0.33333 0.87500 0.71429 0

Output 36.1.2: Clustering History

 Grounds for Divorce

The CLUSTER Procedure
Centroid Hierarchical Cluster Analysis

 Root-Mean-Square Distance Between Observations 0.694873

Cluster History
Number
of
Clusters
Clusters Joined Freq Pseudo F
Statistic
Pseudo
t-Squared
Norm
Centroid
Distance
Tie
49 Arizona Colorado 2 . . 0 T
48 California Florida 2 . . 0 T
47 Alaska Georgia 2 . . 0 T
46 Delaware Hawaii 2 . . 0 T
45 Connecticut Idaho 2 . . 0 T
44 CL49 Iowa 3 . . 0 T
43 CL47 Kansas 3 . . 0 T
42 CL44 Kentucky 4 . . 0 T
41 CL42 Michigan 5 . . 0 T
40 CL41 Minnesota 6 . . 0 T
39 CL43 Mississippi 4 . . 0 T
38 CL40 Missouri 7 . . 0 T
37 CL38 Montana 8 . . 0 T
36 CL37 Nebraska 9 . . 0 T
35 North Dakota Oklahoma 2 . . 0 T
34 CL36 Oregon 10 . . 0 T
33 Massachusetts Rhode Island 2 . . 0 T
32 New Hampshire Tennessee 2 . . 0 T
31 CL46 Washington 3 . . 0 T
30 CL31 Wisconsin 4 . . 0 T
29 Nevada Wyoming 2 . . 0
28 Alabama Arkansas 2 1561 . 0.1599 T
27 CL33 CL32 4 479 . 0.1799 T
26 CL39 CL35 6 265 . 0.1799 T
25 CL45 West Virginia 3 231 . 0.1799
24 Maryland Pennsylvania 2 199 . 0.2399
23 CL28 Utah 3 167 3.2 0.2468
22 CL27 Ohio 5 136 5.4 0.2698
21 CL26 Maine 7 111 8.9 0.2998
20 CL23 CL21 10 75.2 8.7 0.3004
19 CL25 New Jersey 4 71.8 6.5 0.3053 T
18 CL19 Texas 5 69.1 2.5 0.3077
17 CL20 CL22 15 48.7 9.9 0.3219
16 New York Virginia 2 50.1 . 0.3598
15 CL18 Vermont 6 49.4 2.9 0.3797
14 CL17 Illinois 16 47.0 3.2 0.4425
13 CL14 CL15 22 29.2 15.3 0.4722
12 CL48 CL29 4 29.5 . 0.4797 T
11 CL13 CL24 24 27.6 4.5 0.5042
10 CL11 South Dakota 25 28.4 2.4 0.5449
9 Louisiana CL16 3 30.3 3.5 0.5844
8 CL34 CL30 14 23.3 . 0.7196
7 CL8 CL12 18 19.3 15.0 0.7175
6 CL10 South Carolina 26 21.4 4.2 0.7384
5 CL6 New Mexico 27 24.0 4.7 0.8303
4 CL5 Indiana 28 28.9 4.1 0.8343
3 CL4 CL9 31 31.7 10.9 0.8472
2 CL3 North Carolina 32 55.1 4.1 1.0017
1 CL2 CL7 50 . 55.1 1.0663

Output 36.1.3: Cluster Membership

 Grounds for Divorce

State Incompatibility Cruelty Desertion Non_Support Alcohol Felony Impotence Insanity Separation
Arizona 1 0 0 0 0 0 0 0 0
Colorado 1 0 0 0 0 0 0 0 0
Iowa 1 0 0 0 0 0 0 0 0
Kentucky 1 0 0 0 0 0 0 0 0
Michigan 1 0 0 0 0 0 0 0 0
Minnesota 1 0 0 0 0 0 0 0 0
Missouri 1 0 0 0 0 0 0 0 0
Montana 1 0 0 0 0 0 0 0 0
Nebraska 1 0 0 0 0 0 0 0 0
Oregon 1 0 0 0 0 0 0 0 0

State Incompatibility Cruelty Desertion Non_Support Alcohol Felony Impotence Insanity Separation
California 1 0 0 0 0 0 0 1 0
Florida 1 0 0 0 0 0 0 1 0
Nevada 1 0 0 0 0 0 0 1 1
Wyoming 1 0 0 0 0 0 0 1 1

State Incompatibility Cruelty Desertion Non_Support Alcohol Felony Impotence Insanity Separation
Alabama 1 1 1 1 1 1 1 1 1
Alaska 1 1 1 0 1 1 1 1 0
Arkansas 0 1 1 1 1 1 1 1 1
Connecticut 1 1 1 1 1 1 0 1 1
Georgia 1 1 1 0 1 1 1 1 0
Idaho 1 1 1 1 1 1 0 1 1
Illinois 0 1 1 0 1 1 1 0 0
Kansas 1 1 1 0 1 1 1 1 0
Maine 1 1 1 1 1 0 1 1 0
Maryland 0 1 1 0 0 1 1 1 1
Massachusetts 1 1 1 1 1 1 1 0 1
Mississippi 1 1 1 0 1 1 1 1 0
New Hampshire 1 1 1 1 1 1 1 0 0
New Jersey 0 1 1 0 1 1 0 1 1
North Dakota 1 1 1 1 1 1 1 1 0
Ohio 1 1 1 0 1 1 1 0 1
Oklahoma 1 1 1 1 1 1 1 1 0
Pennsylvania 0 1 1 0 0 1 1 1 0
Rhode Island 1 1 1 1 1 1 1 0 1
South Dakota 0 1 1 1 1 1 0 0 0
Tennessee 1 1 1 1 1 1 1 0 0
Texas 1 1 1 0 0 1 0 1 1
Utah 0 1 1 1 1 1 1 1 0
Vermont 0 1 1 1 0 1 0 1 1
West Virginia 1 1 1 0 1 1 0 1 1

State Incompatibility Cruelty Desertion Non_Support Alcohol Felony Impotence Insanity Separation
Delaware 1 0 0 0 0 0 0 0 1
Hawaii 1 0 0 0 0 0 0 0 1
Washington 1 0 0 0 0 0 0 0 1
Wisconsin 1 0 0 0 0 0 0 0 1

State Incompatibility Cruelty Desertion Non_Support Alcohol Felony Impotence Insanity Separation
Louisiana 0 0 0 0 0 1 0 0 1
New York 0 1 1 0 0 1 0 0 1
Virginia 0 1 0 0 0 1 0 0 1

State Incompatibility Cruelty Desertion Non_Support Alcohol Felony Impotence Insanity Separation
South Carolina 0 1 1 0 1 0 0 0 1

State Incompatibility Cruelty Desertion Non_Support Alcohol Felony Impotence Insanity Separation
New Mexico 1 1 1 0 0 0 0 0 0

State Incompatibility Cruelty Desertion Non_Support Alcohol Felony Impotence Insanity Separation
Indiana 1 0 0 0 0 1 1 1 0

State Incompatibility Cruelty Desertion Non_Support Alcohol Felony Impotence Insanity Separation
North Carolina 0 0 0 0 0 0 1 1 1