PROC DQMATCH creates
match codes as a basis for standardization or
transformation. The match codes reflect the relative similarity of data values. Match codes are
created based on a specified
match definition in a specified
locale. The match codes are written to an output SAS data set. Values that generate the
same match codes are candidates for transformation or standardization.
The DQMATCH procedure can generate
cluster numbers for input values that generate identical match codes. Cluster numbers are
not assigned to input values that generate unique match
codes. Input values that generate a unique
match code (no cluster number) can be excluded from the output data set. Blank values can be
retained in the output data set, and they can receive a cluster number.
A specified
sensitivity level determines the amount of information in the match codes. The amount of information
in the match code determines the number of clusters and the number of entries in each
cluster. Higher
sensitivity–levels produce
fewer clusters, with fewer entries per cluster. Use higher
sensitivity–levels when
you need matches that are more exact. Use lower
sensitivity–levels to
sort data into general categories or to capture all values that use
different spellings to convey the same information.