The DQMATCH Procedure |
The following example shows how minimal sensitivity levels can generate inaccurate clusters. A sensitivity of 50 is used in both CRITERIA statements, which is the minimum value for this argument.
/* Create the input data set. */ data cust_db; length customer $ 22; length address $ 31; input customer $char22. address $char31.; datalines; Bob Beckett 392 S. Main St. PO Box 2270 Robert E. Beckett 392 S. Main St. PO Box 2270 Rob Beckett 392 S. Main St. PO Box 2270 Paul Becker 392 N. Main St. PO Box 7720 Bobby Becket 392 Main St. Mr. Robert J. Beckeit P. O. Box 2270 392 S. Main St. Mr. Robert E Beckett 392 South Main Street #2270 Mr. Raul Becker 392 North Main St. ; run; /* Run the DQMATCH procedure. */ proc dqmatch data=cust_db out=out_db3 matchcode=match_cd cluster=clustergrp locale='ENUSA'; criteria matchdef='Name' var=customer sensitivity=50; criteria matchdef='Address' var=address sensitivity=50; run; /* Print the results. */ proc print data=out_db3; run;
PROC Print Output
The output data set OUT_DB3 includes the variables MATCH_CD and CLUSTERGRP. The MATCH_CD variable contains the match code that represents both the customer name and address. Because the default argument DELIMITER was used, the resulting match code contains two match code components (one from each CRITERIA statement) that are separated by an exclamation point.
The CLUSTERGRP variable contains values that indicate that six of the values are grouped in one cluster and that the other two are grouped in another. The clustering is based on the values of the MATCH_CD variable. This example shows that, with a minimal sensitivity level of 50, the following values match and form a cluster.
Mr. Raul Beckett
Paul Becker
A higher sensitivity level would not cluster these observations.
Note: This example is available in the SAS Sample Library under the name DQMCMIN.
Copyright © 2010 by SAS Institute Inc., Cary, NC, USA. All rights reserved.