The DQMATCH Procedure |
The following example uses the DQMATCH procedure to create composite match codes and cluster numbers. The default sensitivity level of 85 is used in both CRITERIA statements. The locale ENUSA is assumed to have been loaded into memory previously with the %DQLOAD AUTOCALL macro.
/* Create the input data set. */ data cust_db; length customer $ 22; length address $ 31; input customer $char22. address $char31.; datalines; Bob Beckett 392 S. Main St. PO Box 2270 Robert E. Beckett 392 S. Main St. PO Box 2270 Rob Beckett 392 S. Main St. PO Box 2270 Paul Becker 392 N. Main St. PO Box 7720 Bobby Becket 392 Main St. Mr. Robert J. Beckeit P. O. Box 2270 392 S. Main St. Mr. Robert E Beckett 392 South Main Street #2270 Mr. Raul Becker 392 North Main St. ; run; /* Run the DQMATCH procedure. */ proc dqmatch data=cust_db out=out_db1 matchcode=match_cd cluster=clustergrp locale='ENUSA'; criteria matchdef='Name' var=customer; criteria matchdef='Address' var=address; run; /* Print the results. */ proc print data=out_db1; run;
PROC Print Output
The output data set, OUT_DB1, includes the new variables MATCH_CD and CLUSTERGRP. The MATCH_CD variable contains the composite match code that represents both the customer name and address. Because the default argument DELIMITER was used, the resulting match code contains two match code components (one from each CRITERIA statement) that are separated by an exclamation point.
The CLUSTERGRP variable contains values that indicate that five of the character values are grouped in a single cluster and that the other three are not part of a cluster. The clustering is based on the values of the MATCH_CD variable. By looking at the values for MATCH_CD, you can see that five character values have identical match code values. Although the match code value for customer Bobby Becket is similar to the Cluster 1 match codes, the address difference caused it to be excluded in Cluster 1.
Matching Values Using Mixed Sensitivity Levels shows how the use of non-default sensitivity levels increases the accuracy of the analysis.
Note: This example is available in the SAS Sample Library under the name DQMCDFLT.
Copyright © 2010 by SAS Institute Inc., Cary, NC, USA. All rights reserved.