Concepts

Sensitivity

The amount of information contained in match codes is determined by a specified sensitivity level. Changing the sensitivity level enables you to change what is considered a match. Match codes created at lower levels of sensitivity capture little information about the input values. The result is more matches, fewer clusters, and more values in each cluster. See Clusters.

Higher sensitivity levels require that input values are more similar to receive the same match code. Clusters are more numerous, and each cluster contains fewer entries. For example, when collecting customer data based on account numbers, cluster on account numbers, with a high sensitivity value.

In some data cleansing jobs, a lower sensitivity value is needed. To transform the following names to one consistent value using a scheme, specify a lower sensitivity level.

Patricia J. Fielding
Patty Fielding
Patricia Feelding
Patty Fielding

All four values are assigned to the same cluster. The clusters are transformed to the most common value, Patty Fielding.

Sensitivity values range from 50 to 95. The default value is 85.

To arrive at the sensitivity level that fits your data and your application, test with the DQMATCH procedure. Alternatively create analysis data sets with the DQSCHEME procedure.

Top of Page