Previous Page | Next Page

Functions and CALL Routines

SPEDIS Function



Determines the likelihood of two words matching, expressed as the asymmetric spelling distance between the two words.
Category: Character
Restriction: I18N Level 0

Syntax
Arguments
Details
Length of Returned Variable
The Basics
Comparisons
Examples
See Also

Syntax

SPEDIS(query,keyword)


Arguments

query

identifies the word to query for the likelihood of a match. SPEDIS removes trailing blanks before comparing the value.

keyword

specifies a target word for the query. SPEDIS removes trailing blanks before comparing the value.


Details


Length of Returned Variable

In a DATA step, if the SPEDIS function returns a value to a variable that has not previously been assigned a length, then that variable is given a length of 200 bytes.


The Basics

SPEDIS returns the distance between the query and a keyword, a nonnegative value that is usually less than 100 but never greater than 200 with the default costs.

SPEDIS computes an asymmetric spelling distance between two words as the normalized cost for converting the keyword to the query word by using a sequence of operations. SPEDIS(QUERY, KEYWORD) is not the same as SPEDIS(KEYWORD, QUERY).

Costs for each operation that is required to convert the keyword to the query are listed in the following table:

Operation Cost Explanation
match 0 no change
singlet 25 delete one of a double letter
doublet 50 double a letter
swap 50 reverse the order of two consecutive letters
truncate 50 delete a letter from the end
append 35 add a letter to the end
delete 50 delete a letter from the middle
insert 100 insert a letter in the middle
replace 100 replace a letter in the middle
firstdel 100 delete the first letter
firstins 200 insert a letter at the beginning
firstrep 200 replace the first letter

The distance is the sum of the costs divided by the length of the query. If this ratio is greater than one, the result is rounded down to the nearest whole number.


Comparisons

The SPEDIS function is similar to the COMPLEV and COMPGED functions, but COMPLEV and COMPGED are much faster, especially for long strings.


Examples

options nodate pageno=1 linesize=64;

data words;
   input Operation $ Query $ Keyword $;
   Distance = spedis(query,keyword);
   Cost = distance * length(query);
   datalines;
match       fuzzy        fuzzy
singlet     fuzy         fuzzy
doublet     fuuzzy       fuzzy
swap        fzuzy        fuzzy
truncate    fuzz         fuzzy
append      fuzzys       fuzzy
delete      fzzy         fuzzy
insert      fluzzy       fuzzy
replace     fizzy        fuzzy
firstdel    uzzy         fuzzy
firstins    pfuzzy       fuzzy
firstrep    wuzzy        fuzzy
several     floozy       fuzzy
;

proc print data = words;
run;

The output from the DATA step is as follows.

Costs for SPEDIS Operations

                         The SAS System                        1

   Obs    Operation    Query     Keyword    Distance    Cost

     1    match        fuzzy      fuzzy         0          0
     2    singlet      fuzy       fuzzy         6         24
     3    doublet      fuuzzy     fuzzy         8         48
     4    swap         fzuzy      fuzzy        10         50
     5    truncate     fuzz       fuzzy        12         48
     6    append       fuzzys     fuzzy         5         30
     7    delete       fzzy       fuzzy        12         48
     8    insert       fluzzy     fuzzy        16         96
     9    replace      fizzy      fuzzy        20        100
    10    firstdel     uzzy       fuzzy        25        100
    11    firstins     pfuzzy     fuzzy        33        198
    12    firstrep     wuzzy      fuzzy        40        200
    13    several      floozy     fuzzy        50        300

See Also

Functions:

COMPLEV Function

COMPGED Function

Previous Page | Next Page | Top of Page