The output data set, OUT_DB2, includes the new variables MATCH_CD and CLUSTERGRP.
The MATCH_CD variable contains the match code that represents both the customer name
and address. Because the default argument
DELIMITER was used, the resulting match code contains two match code components (one
from each CRITERIA statement) that are
separated by an exclamation point.
The CLUSTERGRP variable contains values that indicate that six of the character values
are grouped in a single cluster and that the other two are not part of any cluster.
The clustering is based on the
values of the MATCH_CD variable.
This result is different
than in
Generate Composite Match Codes, where only five values were clustered based on NAME and ADDRESS. This difference
is caused by the lower sensitivity setting for the ADDRESS criteria in the current
example. This makes the matching
less sensitive to variations in the address field. Therefore, the value Bobby Becket
has now been included in Cluster 1.392 Main St. is considered a match with 392 S.
Main St. PO Box 2270 and the other variations, this was not true at a
sensitivity of 85.
Note: This example is available
in the SAS Sample Library under the name DQMCMIXD.