DISTANCE Function :: SAS/IML(R) 13.1 User's Guide

DISTANCE Function

DISTANCE (x, <, method> ) ;

The DISTANCE function computes the pairwise distances between rows of x. The distances depend on the metric specified by the method argument. The arguments are as follows:

x

specifies an $n\times p$ numerical matrix that contains $n$ points in $p$ -dimensional space.

method

is an optional argument that specifies the method used to specify the distance between pairs of points. The method argument is either a numeric value, method $\geq 1$ , or a case-insensitive character value. Only the first four character values are used. The following are valid options:

“L2”: specifies that the function compute the Euclidean ( $L_2$ ) distance between two points. This is the default value. An equivalent alias is “Euclidean”.
“L1”: specifies that the function compute the Manhattan ( $L_1$ ) distance between two points. An equivalent alias is “CityBlock” or “Manhattan”.
“LInf”: specifies that the function compute the Chebyshev ( $L_\infty$ ) distance between two points. An equivalent alias is “Chebyshev”.
$p$: is a numeric value, $p \geq 1$ , that specifies the $L_ p$ -norm.

The DISTANCE function returns an $n \times n$ symmetric matrix. The $(i,j)$ th element is the distance between the $i$ th and $j$ th rows of x.

If $u$ and $v$ are two $p$ -dimensional points, then the following formulas are used to compute the distance between $u$ and $v$ :

The Euclidean distance: $\| u - v \| _2 = (\Sigma _ k |u_ k - v_ k |^2 )^{1/2}$ .
The $L_1$ distance: $\| u - v \| _1 = \Sigma _ k |u_ k - v_ k |$ ,
The $L_\infty$ distance: $\| u - v \| _\infty = \max (|u_1 - v_1 |,|u_2 - v_2 |,\ldots ,|u_ p - v_ p |)$ .
The $L_ p$ distance: $\| u - v \| _ p = (\Sigma _ k |u_ k - v_ k |^ p )^{1/p}$ .

The following statements illustrate the DISTANCE function:

x = {1 0,
     0 1,
    -1 0,
     0 -1};
d2 = distance(x, "L2");
print d2[format=best5.];

Figure 24.106: Euclidean Distance Between Pairs of Points

d2
0	1.414	2	1.414
1.414	0	1.414	2
2	1.414	0	1.414
1.414	2	1.414	0

The $i$ th column of d2 contains the distances between the $i$ th row of x and the other rows. Notice that the d2 matrix has zeros along the diagonal.

You can also compute non-Euclidean distances, as follows:

d1 = distance(x, "L1");
dInf = distance(x, "LInfinity");
print d1, dInf;

Figure 24.107: Distance Between Pairs of Points

d1
0	2	2	2
2	0	2	2
2	2	0	2
2	2	2	0

dInf
0	1	2	1
1	0	1	2
2	1	0	1
1	2	1	0

If a row contains a missing value, all distances that involve that row are assigned a missing value.