Sample 25446: Demonstrates relationship between Data Step and SQL
These sample files and code examples are provided by SAS Institute
Inc. "as is" without warranty of any kind, either express or implied, including
but not limited to the implied warranties of merchantability and fitness for a
particular purpose. Recipients acknowledge and agree that SAS Institute shall
not be liable for any damages whatsoever arising out of their use of this material.
In addition, SAS Institute will provide no support for the materials contained herein.
This sample is from the SAS Sample Library. For additional information refer to SAS Help and Online Documentation.
/****************************************************************/
/* S A S S A M P L E L I B R A R Y */
/* */
/* NAME: SQLFUN10 */
/* TITLE: Demonstrates Relationship Between Data Step and SQL */
/* PRODUCT: BASE */
/* SYSTEM: ALL */
/* KEYS: SQL DATMAN WHERE TRANSLATE LIKE CREATE TABLE */
/* PROCS: SQL */
/* DATA: */
/* SUPPORT: UPDATE: */
/* REF: */
/* MISC: This example was contributed by Howard Schreier */
/* of the US Dept. of Commerce, via BITNET */
/* */
/****************************************************************/
/* A User asks: */
/* I have two sas data sets, which need to be merged by social */
/* security number (SSN). The variable SSN in DATA 1 is clean */
/* and non-missing. But some SSN in DATA 2 have some digits */
/* missing, but the remaining ones ARE still in the RIGHT columns. */
/* For instance, the correct one in DATA 1 is 123456789. But the */
/* missing one in DATA 2 is 1 345 789. By the way, both variables */
/* are character variables. How can I still merge the two data sets */
/* based on the remaining figures, which are in the RIGHT columns? */
/* I think PROC SQL is the tool of choice here, for two */
/* reasons. First of all, it is necessary to compare each of */
/* the corrupted SSN values with all of the clean ones in the */
/* other data set; this is essentially a "join" operation, */
/* which is pretty much characteristic of SQL. Secondly, PROC */
/* SQL offers the LIKE operator (pattern matching *with* wild */
/* cards), which is exactly what is needed here. */
/* Here is a sample test. After the two data sets are created, */
/* a special data set containing *only* the corrupted values is */
/* created (in actual applications, this should enormously */
/* reduce the volume of comparisons to be made); the */
/* single-character wildcard, an underscore, is substituted for */
/* each blank. Note that this preparation could readily be */
/* done by a statement within PROC SQL. The LIKE condition */
/* serves as a screen, so that the resulting data set lists all */
/* possible matches found for each of the corrupted SSN values. */
data clean;
input ssn $ 1-9;
cards;
123456789
234567890
345678901
345678902
;
data dirty;
input ssn $ 1-9;
cards;
12 4567 9
456 89012 no match
34567890 ambiguous
234567890 clean
;
data withwild;
set dirty;
length withwild $ 9;
/* Substitute underscores for blanks */
withwild = translate(ssn,'_',' ');
/* Get rid of observations having all nine digits */
if index(withwild,'_');
run;
proc sql;
create table possible as
select withwild.ssn, clean.ssn as possible
from withwild, clean
where clean.ssn like withwild.withwild;
title2 'Possible Matches';
select * from possible;
quit;
These sample files and code examples are provided by SAS Institute
Inc. "as is" without warranty of any kind, either express or implied, including
but not limited to the implied warranties of merchantability and fitness for a
particular purpose. Recipients acknowledge and agree that SAS Institute shall
not be liable for any damages whatsoever arising out of their use of this material.
In addition, SAS Institute will provide no support for the materials contained herein.
Possible Matches
ssn possible
--------------------
12 4567 9 123456789
34567890 345678901
34567890 345678902
This example demonstrates relationship between Data Step and SQL.
Type: | Sample |
Topic: | SAS Reference ==> Procedures ==> SQL
|
Date Modified: | 2005-08-24 16:06:33 |
Date Created: | 2005-05-23 13:54:20 |
Operating System and Release Information
SAS System | Base SAS | All | n/a | n/a |