Previous Page | Next Page

Functions and CALL Routines

SOUNDEX Function



Encodes a string to facilitate searching.
Category: Character
Restriction: SOUNDEX algorithm is English-biased.
Restriction: I18N Level 0

Syntax
Arguments
Details
Length of Returned Variable
The Basics
Examples

Syntax

SOUNDEX(argument)


Arguments

argument

specifies a character constant, variable, or expression.


Details


Length of Returned Variable

In a DATA step, if the SOUNDEX function returns a value to a variable that has not previously been assigned a length, then that variable is given a length of 200 bytes.


The Basics

The SOUNDEX function encodes a character string according to an algorithm that was originally developed by Margaret K. Odell and Robert C. Russel (US Patents 1261167 (1918) and 1435663 (1922)). The algorithm is described in Knuth, The Art of Computer Programming, Volume 3. (See References.) Note that the SOUNDEX algorithm is English-biased and is less useful for languages other than English.

The SOUNDEX function returns a copy of the argument that is encoded by using the following steps:

  1. Retain the first letter in the argument and discard the following letters:

    A E H I O U W Y

  2. Assign the following numbers to these classes of letters:

    1: B F P V

    2: C G J K Q S X Z

    3: D T

    4: L

    5: M N

    6: R

  3. If two or more adjacent letters have the same classification from Step 2, then discard all but the first. (Adjacent refers to the position in the word before discarding letters.)

The algorithm that is described in Knuth adds trailing zeros and truncates the result to the length of 4. You can perform these operations with other SAS functions.


Examples

SAS Statements Results
x=soundex('Paul');
put x;
 
P4
word='amnesty';
x=soundex(word);
put x;
 

A523

Previous Page | Next Page | Top of Page