SAS Institute. The Power to Know

SAS(R) Data Quality Server 9.2: Reference

Previous Page | Next Page

The DQSCHEME Procedure

Example 1: Creating an Analysis Data Set


This example generates an analysis of the STATE variable in the VENDORS data set.

Note:   you do not have to create a scheme to generate the analysis data set  [cautionend]

Note:   the locale ENUSA is assumed to have been loaded into memory as part of the locale list.  [cautionend]

 /* Create the input data set. */
data vendors;
   input city $char16. state $char22. company $char34.;
datalines;
Detroit         MI                    Ford Motor
Dallas          Texas                 Wal-mart Inc.
Washington      District of Columbia  Federal Reserve Bank
SanJose         CA                    Wal mart
New York        New York              Ernst & Young
Virginia Bch    VA                    TRW INC - Space Defense
Dallas          TX                    Walmart Corp.
San Francisco   California            The Jackson Data Corp.
New York        NY                    Ernst & Young
Washington      DC                    Federal Reserve Bank 12th District
New York        N.Y.                  Ernst & Young
San Francisco   CA                    Jackson Data Corporation
Atlanta         GA                    Farmers Insurance Group
RTP             NC                    Kaiser Permanente
New York        NY                    Ernest and Young
Virginia Beach  VIRGINIA              TRW Space & Defense
Detroit         Michigan              Ford Motor Company
San Jose        CA                    Jackson Data Corp
Washington      District of Columbia  Federal Reserve Bank
Atlanta         GEORGIA               Target
;
run;

 /* Create the analysis data set. */
proc dqscheme data=vendors;
   create analysis=a_state
          matchdef='State (Scheme Build)'
          var=state
          locale='ENUSA';
run;

 /* Print the analysis data set. */
title 'Analysis of state name variations';
proc print data=a_state;
run;

For each value of the STATE variable, the analysis data set WORK.A_STATE shows the number of occurrences and the associated cluster number. Variables that are not clustered with any other values have a blank value for the cluster number.

Note:   This example is available in the SAS Sample Library under the name DQANALYZ.  [cautionend]

Previous Page | Next Page | Top of Page