SAS Institute. The Power to Know

SAS(R) Data Quality Server 9.2: Reference

space
Previous Page | Next Page

Using the SAS Data Quality Server Software

About Sensitivity

The amount of information contained in match codes is determined by a specified sensitivity level. Changing the sensitivity level allows you to change what is considered a match. Match codes that are created at lower levels of sensitivity capture little information about the input values. The result is more matches, fewer clusters (see About Clusters), and more values in each cluster. At higher sensitivity levels, input values must be more similar to receive the same match code. Clusters are more numerous, and the number of entries in each cluster is smaller.

In some data cleansing jobs, a lower sensitivity value is needed. For example, if you wanted to transform the following names to a single consistent value using a scheme, you would need to specify a lower sensitivity level:

Patricia J. Fielding
Patty Fielding
Patricia Feelding
Patty Fielding

In this, all four values would be assigned to the same cluster and would be transformed to the most-common value, Patty Fielding.

In other cases, a higher sensitivity level is needed. For example, if you were collecting customer data based on account numbers, you would want to cluster on individual account numbers. A high sensitivity value would be needed.

In the SAS Data Quality Server software, sensitivity values range from 50 to 95, and the default value is 85.

To arrive at the sensitivity level that fits your data and your application, run tests with DQMATCH or create analysis data sets with PROC DQSCHEME.

space
Previous Page | Next Page | Top of Page