PROC DQMATCH
creates match codes as a basis for standardization or transformation.
The match codes reflect the relative similarity of data values. Match
codes are created based on a specified match definition in a specified
locale. The match codes are written to an output SAS data set. Values
that generate the same match codes are candidates for transformation
or standardization.
The DQMATCH
procedure can generate cluster numbers for input values that generate
identical match codes. Cluster numbers are not assigned to input values
that generate unique match codes. Input values that generate a unique
match code (no cluster number) can be excluded from the output data
set. Blank values can be retained in the output data set, and they
can receive a cluster number.
A specified sensitivity level determines
the amount of information in the match codes. The amount of information
in the match code determines the number of clusters and the number
of entries in each cluster. Higher
sensitivity–levels produce fewer clusters, with fewer entries per cluster. Use higher
sensitivity–levels when you need matches
that are more exact. Use lower
sensitivity–levels to sort data into general categories or to capture all values that
use different spellings to convey the same information.