The output data set,
OUT_DB2, includes the new variables MATCH_CD and CLUSTERGRP. The MATCH_CD
variable contains the match code that represents both the customer
name and address. Because the default argument DELIMITER was used,
the resulting match code contains two match code components (one from
each CRITERIA statement) that are separated by an exclamation point.
The CLUSTERGRP variable
contains values that indicate that six of the character values are
grouped in a single cluster and that the other two are not part of
any cluster. The clustering is based on the values of the MATCH_CD
variable.
This result is different
than in
Generate Composite Match Codes, where only
five values were clustered based on NAME and ADDRESS. This difference
is caused by the lower sensitivity setting for the ADDRESS criteria
in the current example. This makes the matching less sensitive to
variations in the address field. Therefore, the value Bobby Becket
has now been included in Cluster 1.392 Main St. is considered a match
with 392 S. Main St. PO Box 2270 and the other variations, this was
not true at a sensitivity of 85.
Note: This example is available
in the SAS Sample Library under the name DQMCMIXD.