DQMATCH Procedure

Example 5: Clustering with Multiple CRITERIA Statements

The following example assigns cluster numbers based on a logical OR of two pairs of CRITERIA statements. Each pair of CRITERIA statements is evaluated as a logical AND. The cluster numbers are assigned based on a match between the customer name and address, or the organization name and address.
 /* Load the ENUSA locale. The system option DQSETUPLOC= is already set.*/

   %dqload(dqlocale=(enusa))

   data customer;
      length custid 8 name org addr $ 20;
      input custid name $char20. org $char20. addr $char20.;
   
datalines;
   1  Mr. Robert Smith    Orion Star Corporation   8001 Weston Blvd.
   2                      The Orion Star Corp.     8001 Westin Ave
   3  Bob Smith                                    8001 Weston Parkway
   4  Sandi Booth         Belleview Software       123 N Main Street
   5  Mrs. Sandra Booth   Belleview Inc.           801 Oak Ave.
   6  sandie smith Booth  Orion Star Corp.         123 Maine Street
   7  Bobby J. Smythe     ABC Plumbing             8001 Weston Pkwy
   ;
   run;

   /* Generate the cluster data. Because more than one condition
      is defined, a variable named CLUSTER is created automatically */

   proc dqmatch data=customer
                out=customer_out;
      criteria condition=1 var=name sensitivity=85 matchdef='Name';
      criteria condition=1 var=addr sensitivity=70 matchdef='Address';

      criteria condition=2 var=org  sensitivity=85 matchdef='Organization';
      criteria condition=2 var=addr sensitivity=70 matchdef='Address';
   run;

   /* Print the result. */
   
   proc print data=customer_out noobs;
   run;
PROC Print Output
PROC Print Output for Parsed Values Example

Details

In the preceding output, the two rows in cluster 1 matched on name and address. The rows in cluster 2 matched on name and address as well as organization and address. The inclusion of Bobby J. Smythe in cluster 2 indicates either a data error or a need for further refinement of the criteria and conditions. The last row in the output did not receive a cluster number because that row did not match any other rows.
Note: This example is available in the SAS Sample Library under the name DQMLTCND.