SAS Institute. The Power to Know

SAS(R) Data Quality Server 9.2: Reference

space
Previous Page | Next Page

The DQMATCH Procedure

Example 5: Clustering with Multiple CRITERIA Statements


The following example assigns cluster numbers based on a logical OR of two pairs of CRITERIA statements. Each pair of CRITERIA statements is evaluated as a logical AND. The cluster numbers are assigned based on a match between the customer name and address, or the organization name and address.

   /* Load the ENUSA locale.  The system option DQSETUPLOC= is already set. */
   %dqload(dqlocale=(enusa))

   data customer;
      length custid 8 name org addr $ 20;
      input custid name $char20. org $char20. addr $char20.;
   cards;
   1  Mr. Robert Smith    Orion Star Corporation    8001 Weston Blvd.
   2              The Orion Star Corp.    8001 Westin Ave
   3  Bob Smith                 8001 Weston Parkway
   4  Sandi Booth   Belleview Software    123 N Main Street
   5  Mrs. Sandra Booth    Belleview Inc.    801 Oak Ave.
   6  sandie smith Booth    Orion Star Corp.    123 Maine Street
   7  Bobby J. Smythe    ABC Plumbing    8001 Weston Pkwy
   ;
   run;

   /* Generate the cluster data.  Since more than one condition
      is defined, a variable named CLUSTER is created automatically */
   proc dqmatch data=customer
                out=customer_out;
      criteria condition=1 var=name sensitivity=85 matchdef='Name';
      criteria condition=1 var=addr sensitivity=70 matchdef='Address';

      criteria condition=2 var=org  sensitivity=85 matchdef='Organization';
      criteria condition=2 var=addr sensitivity=70 matchdef='Address';
   run;

   /* Print the result. */
   proc print data=customer_out noobs;
   run;

The output is as follows:

 custid   name                         org                                 addr                               CLUSTER       
     4    Sandi Booth          Belleview Software         123 N Main Street         1       
     6    sandie smith Booth   Orion Star Corp.         123 Maine Street           1       
     1    Mr. Robert Smith     Orion Star Corporation   8001 Weston Blvd.         2       
     7    Bobby J. Smythe      ABC Plumbing             8001 Weston Pkwy        2       
     3    Bob Smith                                                       8001 Weston Parkway   2       
     2                         The Orion Star Corp.      8001 Westin Ave           2       
     5    Mrs. Sandra Booth    Belleview Inc.        801 Oak Ave.                 . 

In the preceding output, the two rows in cluster 1 matched on name and address.

The rows in cluster 2 matched on name and address as well as organization and address. The inclusion of Bobby J. Smythe in cluster 2 indicates either a data error or a need for further refinement of the criteria and conditions.

The last row in the output did not receive a cluster number because that row did not match any other rows.

This example is available in the SAS Sample Library under the name DQMLTCND.

space
Previous Page | Next Page | Top of Page