The SOUNDEX function
encodes a character string according to an algorithm that was originally
developed by Margaret K. Odell and Robert C. Russel (US Patents 1261167
(1918) and 1435663 (1922)). The algorithm is described in Knuth,
The Art of Computer Programming, Volume 3. (See
References.) Note that the SOUNDEX algorithm is English-biased and
is less useful for languages other than English.
The SOUNDEX function
returns a copy of the
argument that is encoded by using the following steps:
-
Retain the first letter
in the
argument and discard
the following letters:
-
Assign the following
numbers to these classes of letters:
-
If two or more adjacent
letters have the same classification from Step 2, then discard all
but the first. (Adjacent refers to the position in the word before
discarding letters.)
The algorithm that is
described in Knuth adds trailing zeros and truncates the result to
the length of 4. You can perform these operations with other SAS functions.