You are here: Definition Types>Match Definitions

SAS Quality Knowledge Base for Contact Information 26

Match Definitions

Match definitions contain data and logic that can be used to generate a match code for a data string. A match code is a normalized, encrypted string that represents portions of a data string that are considered to be significant with regard to the semantic identity of the data. Two data strings are said to "match" if the same match code is generated for each data string.

As an example, consider these two names:

Bill Smith
William L. Smith

When the Name match definition is applied to these strings, the same match code is generated for each string. The match codes can then be used to cluster the data. Strings with the same match code are assigned the same Cluster ID. For example:

Input Match Code Cluster ID
Bill Smith 4B~2$$$LWB$$$$$ 1
William L. Smith 4B~2$$$LWB$$$$$ 1

Because the same match codes are generated for each of these two strings, they will be assigned the same Cluster ID, and we say that the strings are a match.

Match definitions have many uses. You could use a match definition to identify and eliminate duplicate records in a table, or to do a fuzzy search for an item in a table. You could also use a match definition to generate match codes that you can use as keys when joining two tables. In this way you are really doing a fuzzy join, which increases the effectiveness of your data integration efforts. To learn more about the uses of match definitions, refer to your SAS data management product documentation.