Match codes are encoded representations of character values that are used for analysis,
transformation, and standardization of data. Use the following procedures and functions to create
match codes:
The DQMATCH procedure
creates match codes for one or more variables or parsed tokens that have been extracted
from a
variable. The procedure can also assign
cluster numbers to values with identical match codes. See
DQMATCH Procedure for additional
information.
The DQMATCH Function
generates match codes for tokens that have been parsed from a variable. See
DQMATCH Function for additional
information.
The DQMATCHPARSED Function
Match codes are created by the DQMATCH procedure and by the DQMATCH and DQMATCHPARSED
functions. The functions DQMATCH and DQMATCHPARSED return one
match code for one input character variable. With these tools, you can create match codes for
an entire character value or a parsed
token extracted from a character value.
-
-
The locale identifies the language and geographical region of the source data. For
example,
the locale ENUSA specifies that the source data uses the English language as it is
used in the United States of America.
-
The match definition in the
Quality Knowledge Base identifies the category of the data and determines the content of the match codes.
Examples of match definitions are named ADDRESS, ORGANIZATION, and DATE (YMD).
To determine the match definitions that are available in a Quality Knowledge Base,
consult the Help for that QKB. Alternatively, use the DQLOCALEINFOLIST function
to return the names of the locale's match definitions. Use the DQLOCALEINFOLIST function
if one (or both) of the following
statements are true:
-
Your site has added definitions
to your Quality Knowledge Base.
-
Your site has modified the default Quality Knowledge Base using the Customize software
in DataFlux Data Management Studio.
The sensitivity level is a value between 50 and 95 that determines the amount of information
that
is captured in the match code, as described in
Sensitivity.
If two or more match codes are identical, a cluster number can be assigned to a specified
variable, as described in
Clusters.
The content of the output data set is determined by option values. You can include
values that generate unique match codes, and you can include and add a cluster number
to blank or missing values. You can also concatenate multiple match codes.
Match codes are also generated internally when you create a
scheme with the DQSCHEME procedure, as described in
Schemes. Match codes are also created internally by the DQSCHEMEAPPLY function and the DQSCHEMEAPPLY
CALL routine. The match codes are used in the process of creating or applying a scheme.