SPEDIS Function

Determines the likelihood of two words matching, expressed as the asymmetric spelling distance between the two words.

Category: Character
Restriction: I18N Level 0 functions are designed for use with Single Byte Character Sets (SBCS) only.

Syntax

SPEDIS(query,keyword)

Required Arguments

query

identifies the word to query for the likelihood of a match. SPEDIS removes trailing blanks before comparing the value.

keyword

specifies a target word for the query. SPEDIS removes trailing blanks before comparing the value.

Details

Length of Returned Variable

In a DATA step, if the SPEDIS function returns a value to a variable that has not previously been assigned a length, then that variable is given a length of 200 bytes.

The Basics

SPEDIS returns the distance between the query and a keyword, a nonnegative value that is usually less than 100 but never greater than 200 with the default costs.
SPEDIS computes an asymmetric spelling distance between two words as the normalized cost for converting the keyword to the query word by using a sequence of operations. SPEDIS(QUERY, KEYWORD) is not the same as SPEDIS(KEYWORD, QUERY).
Costs for each operation that is required to convert the keyword to the query are listed in the following table:
Operation
Cost
Explanation
match
0
no change
singlet
25
delete one of a double letter
doublet
50
double a letter
swap
50
reverse the order of two consecutive letters
truncate
50
delete a letter from the end
append
35
add a letter to the end
delete
50
delete a letter from the middle
insert
100
insert a letter in the middle
replace
100
replace a letter in the middle
firstdel
100
delete the first letter
firstins
200
insert a letter at the beginning
firstrep
200
replace the first letter
The distance is the sum of the costs divided by the length of the query. If this ratio is greater than one, the result is rounded down to the nearest whole number.

Comparisons

The SPEDIS function is similar to the COMPLEV and COMPGED functions, but COMPLEV and COMPGED are much faster, especially for long strings.

Example

data words;
   input Operation $ Query $ Keyword $;
   Distance = spedis(query,keyword);
   Cost = distance * length(query);
   datalines;
match       fuzzy        fuzzy
singlet     fuzy         fuzzy
doublet     fuuzzy       fuzzy
swap        fzuzy        fuzzy
truncate    fuzz         fuzzy
append      fuzzys       fuzzy
delete      fzzy         fuzzy
insert      fluzzy       fuzzy
replace     fizzy        fuzzy
firstdel    uzzy         fuzzy
firstins    pfuzzy       fuzzy
firstrep    wuzzy        fuzzy
several     floozy       fuzzy
;
proc print data = words;
run;
Costs for SPEDIS Operations
Costs for SPEDIS Operations

See Also