Language Reference


DISTANCE Function

DISTANCE (x, <, method> );

The DISTANCE function computes the pairwise distances between rows of x. The distances depend on the metric specified by the method argument. The arguments are as follows:

x

specifies an $n\times p$ numerical matrix that contains n points in p-dimensional space.

method

is an optional argument that specifies the method used to specify the distance between pairs of points. The method argument is either a numeric value, method$\geq 1$, or a case-insensitive character value. Only the first four character values are used. The following are valid options:

"L2"

specifies that the function compute the Euclidean ($L_2$) distance between two points. This is the default value. An equivalent alias is "Euclidean".

"L1"

specifies that the function compute the Manhattan ($L_1$) distance between two points. An equivalent alias is "CityBlock" or "Manhattan".

"LInf"

specifies that the function compute the Chebyshev ($L_\infty $) distance between two points. An equivalent alias is "Chebyshev".

p

is a numeric value, $p \geq 1$, that specifies the $L_ p$-norm.

The DISTANCE function returns an $n \times n$ symmetric matrix. The (i, j) element is the distance between the ith and jth rows of x.

If u and v are two p-dimensional points, then the following formulas are used to compute the distance between u and v:

  • The Euclidean distance: $\|  u - v \| _2 = (\Sigma _ k |u_ k - v_ k |^2 )^{1/2}$.

  • The $L_1$ distance: $\|  u - v \| _1 = \Sigma _ k |u_ k - v_ k |$,

  • The $L_\infty $ distance: $\|  u - v \| _\infty = \max (|u_1 - v_1 |,|u_2 - v_2 |,\ldots ,|u_ p - v_ p |)$.

  • The $L_ p$ distance: $\|  u - v \| _ p = (\Sigma _ k |u_ k - v_ k |^ p )^{1/p}$.

The following statements illustrate the DISTANCE function:

x = {1 0,
     0 1,
    -1 0,
     0 -1};
d2 = distance(x, "L2");
print d2[format=best5.];

Figure 25.106: Euclidean Distance Between Pairs of Points

d2
0 1.414 2 1.414
1.414 0 1.414 2
2 1.414 0 1.414
1.414 2 1.414 0



The ith column of d2 contains the distances between the ith row of x and the other rows. Notice that the d2 matrix has zeros along the diagonal.

You can also compute non-Euclidean distances, as follows:

d1 = distance(x, "L1");
dInf = distance(x, "LInfinity");
print d1, dInf;

Figure 25.107: Distance Between Pairs of Points

d1
0 2 2 2
2 0 2 2
2 2 0 2
2 2 2 0

dInf
0 1 2 1
1 0 1 2
2 1 0 1
1 2 1 0



If a row contains a missing value, all distances that involve that row are assigned a missing value.