The amount of information
contained in match codes is determined by a specified sensitivity
level. Changing the sensitivity level enables you to change what is
considered a match. Match codes created at lower levels of sensitivity
capture little information about the input values. The result is more
matches, fewer clusters, and more values in each cluster. See
Clusters for additional
information.
Higher sensitivity
levels require that input values are more similar to receive the same
match code. Clusters are more numerous, and each cluster contains
fewer entries. For example, when collecting customer data based on
account numbers, cluster on account numbers with a high sensitivity
value.
In some data cleansing
jobs, a lower sensitivity value is needed. To transform the following
names to one consistent value using a scheme, specify a lower sensitivity
level.
Patricia J. Fielding
Patty Fielding
Patricia Feelding
Patty Fielding
All four values are
assigned to the same cluster. The clusters are transformed to the
most common value,
Patty Fielding
.
Sensitivity values range
from 50 to 95. The default value is 85.
To arrive at the sensitivity
level that fits your data and your application, test with the DQMATCH
procedure. Alternatively, create analysis data sets with the DQSCHEME
procedure.