The DQMATCH Procedure |
The following example assigns cluster numbers based on a logical OR of two pairs of CRITERIA statements. Each pair of CRITERIA statements is evaluated as a logical AND. The cluster numbers are assigned based on a match between the customer name and address, or the organization name and address.
/* Load the ENUSA locale. The system option DQSETUPLOC= is already set. */ %dqload(dqlocale=(enusa)) data customer; length custid 8 name org addr $ 20; input custid name $char20. org $char20. addr $char20.; datalines; 1 Mr. Robert Smith Orion Star Corporation 8001 Weston Blvd. 2 The Orion Star Corp. 8001 Westin Ave 3 Bob Smith 8001 Weston Parkway 4 Sandi Booth Belleview Software 123 N Main Street 5 Mrs. Sandra Booth Belleview Inc. 801 Oak Ave. 6 sandie smith Booth Orion Star Corp. 123 Maine Street 7 Bobby J. Smythe ABC Plumbing 8001 Weston Pkwy ; run; /* Generate the cluster data. Because more than one condition is defined, a variable named CLUSTER is created automatically */ proc dqmatch data=customer out=customer_out; criteria condition=1 var=name sensitivity=85 matchdef='Name'; criteria condition=1 var=addr sensitivity=70 matchdef='Address'; criteria condition=2 var=org sensitivity=85 matchdef='Organization'; criteria condition=2 var=addr sensitivity=70 matchdef='Address'; run; /* Print the result. */ proc print data=customer_out noobs; run;
The output is as follows:
custid name org addr CLUSTER 4 Sandi Booth Belleview Software 123 N Main Street 1 6 sandie smith Booth Orion Star Corp. 123 Maine Street 1 1 Mr. Robert Smith Orion Star Corporation 8001 Weston Blvd. 2 7 Bobby J. Smythe ABC Plumbing 8001 Weston Pkwy 2 3 Bob Smith 8001 Weston Parkway 2 2 The Orion Star Corp. 8001 Westin Ave 2 5 Mrs. Sandra Booth Belleview Inc. 801 Oak Ave.
In the preceding output, the two rows in cluster 1 matched on name and address. The rows in cluster 2 matched on name and address as well as organization and address. The inclusion of Bobby J. Smythe in cluster 2 indicates either a data error or a need for further refinement of the criteria and conditions. The last row in the output did not receive a cluster number because that row did not match any other rows.
Note: This example is available in the SAS Sample Library under the name DQMLTCND.
Copyright © 2010 by SAS Institute Inc., Cary, NC, USA. All rights reserved.