The amount of information contained in
match codes is determined by a specified
sensitivity level. Changing the sensitivity level enables you to change what is considered a
match. Match codes created at lower levels of sensitivity capture little information
about the input values. The result is more matches, fewer clusters, and more values
in each
cluster. See
Clusters for additional
information.
Higher sensitivity levels require that input values are more similar to receive the
same
match code. Clusters are more numerous, and each cluster contains fewer entries. For example,
when collecting customer data that is based
on account numbers, cluster on account numbers with a high sensitivity value.
In some
data cleansing jobs, a lower sensitivity value is needed. To transform the following names to one
consistent value using a
scheme, specify a lower sensitivity level.
Patricia J. Fielding
Patty Fielding
Patricia Feelding
Patty Fielding
All four values are assigned to the same cluster. The clusters are transformed to
the most common value, Patty Fielding
.
Sensitivity values range
from 50 to 95. The default value is 85.
To arrive at the sensitivity level that fits your data and your application, test
with the DQMATCH procedure.
Alternatively, create analysis data sets with the DQSCHEME procedure.