DISTANCE Function

DISTANCE (x, <, method> ) ;

The DISTANCE function computes the pairwise distances between rows of x. The distances depend on the metric specified by the method argument. The arguments are as follows:

x

specifies an $n\times p$ numerical matrix that contains $n$ points in $p$-dimensional space.

method

is an optional argument that specifies the method used to specify the distance between pairs of points. The method argument is either a numeric value, method$\geq 1$, or a case-insensitive character value. Only the first four character values are used. The following are valid options:

L2

specifies that the function compute the Euclidean ($L_2$) distance between two points. This is the default value. An equivalent alias is Euclidean.

L1

specifies that the function compute the Manhattan ($L_1$) distance between two points. An equivalent alias is CityBlock or Manhattan.

LInf

specifies that the function compute the Chebyshev ($L_\infty $) distance between two points. An equivalent alias is Chebyshev.

$p$

is a numeric value, $p \geq 1$, that specifies the $L_ p$-norm.

The DISTANCE function returns an $n \times n$ symmetric matrix. The $(i,j)$th element is the distance between the $i$th and $j$th rows of x.

If $u$ and $v$ are two $p$-dimensional points, then the following formulas are used to compute the distance between $u$ and $v$:

  • The Euclidean distance: $\|  u - v \| _2 = (\Sigma _ k |u_ k - v_ k |^2 )^{1/2}$.

  • The $L_1$ distance: $\|  u - v \| _1 = \Sigma _ k |u_ k - v_ k |$,

  • The $L_\infty $ distance: $\|  u - v \| _\infty = \max (|u_1 - v_1 |,|u_2 - v_2 |,\ldots ,|u_ p - v_ p |)$.

  • The $L_ p$ distance: $\|  u - v \| _ p = (\Sigma _ k |u_ k - v_ k |^ p )^{1/p}$.

The following statements illustrate the DISTANCE function:

x = {1 0,
     0 1,
    -1 0,
     0 -1};
d2 = distance(x, "L2");
print d2[format=best5.];

Figure 24.106: Euclidean Distance Between Pairs of Points

d2
0 1.414 2 1.414
1.414 0 1.414 2
2 1.414 0 1.414
1.414 2 1.414 0


The $i$th column of d2 contains the distances between the $i$th row of x and the other rows. Notice that the d2 matrix has zeros along the diagonal.

You can also compute non-Euclidean distances, as follows:

d1 = distance(x, "L1");
dInf = distance(x, "LInfinity");
print d1, dInf;

Figure 24.107: Distance Between Pairs of Points

d1
0 2 2 2
2 0 2 2
2 2 0 2
2 2 2 0

dInf
0 1 2 1
1 0 1 2
2 1 0 1
1 2 1 0


If a row contains a missing value, all distances that involve that row are assigned a missing value.