SAS Quality Knowledge Base for Contact Information 26
Match definitions contain data and logic that can be used to generate a match code for a data string. A match code is a normalized, encrypted string that represents portions of a data string that are considered to be significant with regard to the semantic identity of the data. Two data strings are said to "match" if the same match code is generated for each data string.
As an example, consider these two names:
Bill Smith
William L. Smith
When the Name match definition is applied to these strings, the same match code is generated for each string. The match codes can then be used to cluster the data. Strings with the same match code are assigned the same Cluster ID. For example:
Input | Match Code | Cluster ID |
---|---|---|
Bill Smith | 4B~2$$$LWB$$$$$ | 1 |
William L. Smith | 4B~2$$$LWB$$$$$ | 1 |
Because the same match codes are generated for each of these two strings, they will be assigned the same Cluster ID, and we say that the strings are a match.
Match definitions have many uses. You could use a match definition to identify and eliminate duplicate records in a table, or to do a fuzzy search for an item in a table. You could also use a match definition to generate match codes that you can use as keys when joining two tables. In this way you are really doing a fuzzy join, which increases the effectiveness of your data integration efforts. To learn more about the uses of match definitions, refer to your SAS data management product documentation.
Documentation Feedback: yourturn@sas.com |
Doc ID: QKBCI_match_defs.html |