SOUNDEX Function

Encodes a string to facilitate searching.

Category: Character
Restrictions: SOUNDEX algorithm is English-biased.
I18N Level 0 functions are designed for use with Single Byte Character Sets (SBCS) only.



Required Argument


specifies a character constant, variable, or expression.


Length of Returned Variable

In a DATA step, if the SOUNDEX function returns a value to a variable that has not previously been assigned a length, then that variable is given a length of 200 bytes.

The Basics

The SOUNDEX function encodes a character string according to an algorithm that was originally developed by Margaret K. Odell and Robert C. Russel (US Patents 1261167 (1918) and 1435663 (1922)). The algorithm is described in Knuth, The Art of Computer Programming, Volume 3. (See References.) Note that the SOUNDEX algorithm is English-biased and is less useful for languages other than English.
The SOUNDEX function returns a copy of the argument that is encoded by using the following steps:
  1. Retain the first letter in the argument and discard the following letters:
    A E H I O U W Y
  2. Assign the following numbers to these classes of letters:
    • 1: B F P V
    • 2: C G J K Q S X Z
    • 3: D T
    • 4: L
    • 5: M N
    • 6: R
  3. If two or more adjacent letters have the same classification from Step 2, then discard all but the first. (Adjacent refers to the position in the word before discarding letters.)
The algorithm that is described in Knuth adds trailing zeros and truncates the result to the length of 4. You can perform these operations with other SAS functions.


The following SAS statements produce these results.
SAS Statement
put x;
put x;