Match codes are encoded
representations of character values that are used for analysis, transformation,
and standardization of data. Use the following procedures and functions
to create match codes:
creates match codes
for one or more variables or parsed tokens that have been extracted
from a variable. The procedure can also assign cluster numbers to
values with identical match codes. See
DQMATCH Procedure for additional
information.
generates match codes
for tokens that have been parsed from a variable. See
DQMATCH Function for additional
information.
The DQMATCHPARSED Function
Match codes are created
by the DQMATCH procedure and by the DQMATCH and DQMATCHPARSED functions.
The functions DQMATCH and DQMATCHPARSED return one match code for
one input character variable. With these tools, you can create match
codes for an entire character value or a parsed token extracted from
a character value.
-
During processing, match codes
are generated according to the specified locale, match definition,
and sensitivity-level.
-
The locale identifies the language
and geographical region of the source data. For example, the locale
ENUSA specifies that the source data uses the English language as
it is used in the United States of America.
-
The match definition in the Quality
Knowledge Base identifies the category of the data and determines
the content of the match codes. Examples of match definitions are
named ADDRESS, ORGANIZATION, and DATE(YMD).
To determine the match
definitions that are available in a Quality Knowledge Base, consult
the QKB documentation from DataFlux (a SAS company). Alternatively,
use the DQLOCALEINFOLIST function to return the names of the locale's
match definitions. Use the DQLOCALEINFOLIST function if your site
modifies the default Quality Knowledge Base using DataFlux dfPower
Customize software.
The sensitivity level
is a value between 50 and 95 that determines the amount of information
that is captured in the match code, as described in
Sensitivity.
If two or more match
codes are identical, a cluster number can be assigned to a specified
variable, as described in
Clusters.
The content of the output
data set is determined by option values. You can include values that
generate unique match codes, and you can include and add a cluster
number to blank or missing values. You can also concatenate multiple
match codes.
Match codes are also
generated internally when you create a scheme with the DQSCHEME procedure,
as described in
Schemes. Match codes
are also created internally by the DQSCHEMEAPPLY function and the
DQSCHEMEAPPLY CALL routine. The match codes are used in the process
of creating or applying a scheme.