Sample 25448: Demonstrates SQL JOIN performance with macros
These sample files and code examples are provided by SAS Institute
Inc. "as is" without warranty of any kind, either express or implied, including
but not limited to the implied warranties of merchantability and fitness for a
particular purpose. Recipients acknowledge and agree that SAS Institute shall
not be liable for any damages whatsoever arising out of their use of this material.
In addition, SAS Institute will provide no support for the materials contained herein.
This sample is from the SAS Sample Library. For additional information refer to SAS Help and Online Documentation.
/****************************************************************/
/* S A S S A M P L E L I B R A R Y */
/* */
/* NAME: SQLJMAC */
/* TITLE: Demonstrates SQL Join Performance with Macros */
/* PRODUCT: BASE */
/* SYSTEM: ALL */
/* KEYS: SQL DATMAN MACRO %UPCASE %SCAN %EVAL SELECT WHERE */
/* ALIAS */
/* PROCS: SQL */
/* DATA: */
/* SUPPORT: UPDATE: */
/* REF: */
/* MISC: */
/* */
/****************************************************************/
/* Users have noticed that joining a small SAS dataset with a Large */
/* DBMS table using PROC SQL can take a very long time, even if the */
/* "joining variables" are indexed in the DBMS. */
/* The reson for this is that DBMS Software does not make the */
/* indexes visible to applications software thru the SQL interface, */
/* so the PROC SQL query optimiser has no choice but to make a */
/* complete pass over all the DBMS rows to see if they match with */
/* the rows in the smaller SAS dataset. (This problem does not */
/* arise when the larger dataset is a SAS dataset with indexes */
/* as SAS is able to perform direct access operations on only */
/* those rows that it needs from the larger table) */
/* This performace gap can be remedied by constructing a where */
/* clause that requests only the needed rows from the larger file. */
/* PROC SQL will send any parts of a where clause that relate */
/* to a single table only off to the engine that is responsible */
/* for that table -- this gives the DBMS a chance to put the index */
/* to work. */
/* If the number of unique keys is small relative to the size */
/* of the DBMS table, this approach may give you very good */
/* performance gains. */
/* This technique of calculating the keys required for one */
/* side of the join, and then restricting the rows retrieved */
/* from the other side is a relatively new optimisation made */
/* by DBMS Software. (DB2 2.3 is the first db2 release that */
/* implements this scheme). A future Version of PROC SQL may */
/* perform this kind of optimisation transparently. */
/* If the number of unique keys in the small dataset gets large */
/* it may prove inefficient to create macro variables for all the */
/* values. The datastep in the prejoin macro could be modified */
/* to write the needed where clause to a temporary file. */
/* The drawback to this approach is that the other parts of the */
/* query must be written out to temporary files too, and then */
/* all the parts included together like this: */
/* %inc(pre where post); */
/* If you know something about the range of keys in the smaller */
/* file, it may be more efficient to construct a BETWEEN x AND y */
/* to restrict the rows accessed from the larger table. This */
/* generates a smaller where clause and requires fewer macro variables, */
/* but is not as general in its application... it would be useful if */
/* (for example) you knew that the smaller dataset contained dates */
/* from just one month in a history file. */
/* The prejoin macro constructs the list of unique key values */
/* that are needed to build a where clause to apply to the */
/* larger dataset. The pfx argument allows you to choose the */
/* "root" of the macro names created by the macro... */
/* Assuming the default prefix "TJ" */
/* TJKN is the number of key variables. */
/* TJNN is the number of unique ocurrences of the key. */
/* TJK1 .. TJKn are the key variable names in the smaller table. */
/* TJT1 .. TJTn are the datatypes of those key variables. */
/* TJ1X1 .. TJnXm are the values of the keys. */
%macro prejoin( ds= /* the smaller dataset name */
, key= /* the names of the join variables */
, keytype= /* the types of the keys (C or N) */
, pfx=TJ /* work datasets and macros prefix */
);
/* Load up the individual keynames into their own macro */
/* variables. */
%local i j var;
%let i = 1;
%let var = %scan(&key, &i);
%do %while ( &var ne );
%global &pfx.K&i;
%let &pfx.K&i = &var;
%global &pfx.T&i;
%let &pfx.T&i = %upcase(%scan(&keytype, &i));
%let i = %eval(&i+1);
%let var = %scan(&key, &i);
%end;
%global &pfx.KN &pfx.NN;
%let &pfx.KN = %eval(&i - 1);
/* Create a table with the unique values of the key from */
/* the smaller dataset. These values will be used to build */
/* a where clause to apply to the larger dataset. */
proc sql;
create table work.&pfx as
select distinct
%do i = 1 %to &&&pfx.KN;
%if (&i > 1) %then %str(,);
&&&pfx.K&i
%end;
from &ds;
%global &pfx.NN;
%let &pfx.NN = &sqlobs;
%put NOTE: There are &sqlobs unique keys in the smaller table.;
quit;
%do i = 1 %to &&&pfx.NN;
%do j = 1 %to &&&pfx.KN;
%global &pfx.&j.X&i;
%end;
%end;
/* Place these unique values into macro variables, so that */
/* we can use them later to build a where clause to apply to */
/* the larger table. */
/* The quote() function is new with sas6.07, and takes care */
/* of the messy details of character strings with embedded */
/* quotes.. If you know your keys dont have quotes, then */
/* you could omit the quote() call. */
data _null_;
set work.&pfx;
length cn $6 mvar $8;
cn = 'X' || left(put(_n_, 5.));
%do i = 1 %to &&&pfx.KN;
mvar = "&pfx.&i" || cn;
* put mvar= &&&pfx.K&i=;
%if ( N = &&&pfx.T&i ) %then %do;
call symput( mvar, put(&&&pfx.K&i, best12.) );
%end;
%else %do;
call symput( mvar, quote(&&&pfx.K&i) );
* call symput( mvar, &&&pfx.K&i ); /* if no quotes */
%end;
%end;
run;
%mend;
/* The keyjoin macro constructs the where clause from the */
/* values saved away by the prejoin macro. */
%macro keyjoin( alias= /* the alias of the big dataset */
, key= /* the names of the join variables */
, pfx=TJ /* work datasets and macros prefix */
);
%local i j;
/* If they supply a key value, then the "names" of the keys */
/* are not the same in the BIG dataset. So remember these */
/* new key names. (They better be in the same order with */
/* repect to the order of the names given to prejoin.) */
%if ( %str(X&key) ne X ) %then %do;
%let i = 1;
%let var = %scan(&key, &i);
%do %while ( &var ne );
%local &pfx.K&i;
%let &pfx.K&i = &var;
%let i = %eval(&i+1);
%let var = %scan(&key, &i);
%end;
%end;
/* Go emit the key variables and their values that were */
/* found in the "smaller" dataset. */
(
%do i = 1 %to &&&pfx.NN;
%if (&i > 1) %then %str(OR);
(
%do j = 1 %to &&&pfx.KN;
%if (&j > 1) %then %str(AND);
&alias..&&&pfx.K&j = &&&pfx.&j.X&i
%end;
)
%end;
)
%mend;
/* Lets make some data to play with. */
/* Doing this exercise when joining SAS datasets helps, */
/* as it sometimes makes the size of the data we must */
/* sort smaller. But in 607 we often do joins without */
/* any sorting, so you should test it out on YOUR data. */
/* Doing this exercise when joining SAS datasets with */
/* datasets that come form an external source via */
/* SAS/ACCESS software helps, as we do not have to */
/* transfer as many records for the DBMS into SAS to */
/* begin with, let alone sort them all. */
data big;
length k $20;
do i = 1 to 100;
do j = 1 to 5;
k = trim(put(i, words13.)) || '*' || put(j, words6.);
output;
end;
end;
run;
data small;
length kk $20;
do i = 7 to 270 by 4;
j = 3;
kk = trim(put(i, words13.)) || '*' || put(j, words6.);
output;
end;
run;
/* This example joins the variables on one numeric key -- I */
/* The %prejoin step computes the unique key values of I */
/* in the "small" dataset. */
/* Then the %keyjoin macro emits a where expression that */
/* selects only those values computed by %prejoin */
/* from the big table. */
%prejoin(ds=small, key=i, keytype=n);
proc sql feedback;
select *
from big,small
where big.i=small.i
and big.j=2
and %keyjoin(alias=big);
/* This example joins the variables on two keys -- I and J */
%prejoin(ds=small, key=i j, keytype=n n);
proc sql feedback;
select *
from big,small
where big.i=small.i and big.j=small.j
and %keyjoin(alias=big)
;
/* Join on one character key. */
/* Just to make it interesting, its called K in one dataset */
/* (the bigger one) and "KK" in the smaller one. */
%prejoin(ds=small, key=kk, keytype=c);
proc sql feedback;
select *
from big,small
where big.k=small.kk
and %keyjoin(alias=big,key=k)
;
quit;
These sample files and code examples are provided by SAS Institute
Inc. "as is" without warranty of any kind, either express or implied, including
but not limited to the implied warranties of merchantability and fitness for a
particular purpose. Recipients acknowledge and agree that SAS Institute shall
not be liable for any damages whatsoever arising out of their use of this material.
In addition, SAS Institute will provide no support for the materials contained herein.
k i j kk i j
----------------------------------------------------------------------------------
seven*two 7 2 seven*three 7 3
eleven*two 11 2 eleven*three 11 3
fifteen*two 15 2 fifteen*three 15 3
nineteen*two 19 2 nineteen*three 19 3
twenty-three*two 23 2 twenty-three*three 23 3
twenty-seven*two 27 2 twenty-seven*three 27 3
thirty-one*two 31 2 thirty-one*three 31 3
thirty-five*two 35 2 thirty-five*three 35 3
thirty-nine*two 39 2 thirty-nine*three 39 3
forty-three*two 43 2 forty-three*three 43 3
forty-seven*two 47 2 forty-seven*three 47 3
fifty-one*two 51 2 fifty-one*three 51 3
fifty-five*two 55 2 fifty-five*three 55 3
fifty-nine*two 59 2 fifty-nine*three 59 3
sixty-three*two 63 2 sixty-three*three 63 3
sixty-seven*two 67 2 sixty-seven*three 67 3
seventy-one*two 71 2 seventy-one*three 71 3
seventy-five*two 75 2 seventy-five*three 75 3
seventy-nine*two 79 2 seventy-nine*three 79 3
eighty-three*two 83 2 eighty-three*three 83 3
eighty-seven*two 87 2 eighty-seven*three 87 3
ninety-one*two 91 2 ninety-one*three 91 3
ninety-five*two 95 2 ninety-five*three 95 3
ninety-nine*two 99 2 ninety-nine*three 99 3
k i j kk i j
----------------------------------------------------------------------------------
seven*three 7 3 seven*three 7 3
eleven*three 11 3 eleven*three 11 3
fifteen*three 15 3 fifteen*three 15 3
nineteen*three 19 3 nineteen*three 19 3
twenty-three*three 23 3 twenty-three*three 23 3
twenty-seven*three 27 3 twenty-seven*three 27 3
thirty-one*three 31 3 thirty-one*three 31 3
thirty-five*three 35 3 thirty-five*three 35 3
thirty-nine*three 39 3 thirty-nine*three 39 3
forty-three*three 43 3 forty-three*three 43 3
forty-seven*three 47 3 forty-seven*three 47 3
fifty-one*three 51 3 fifty-one*three 51 3
fifty-five*three 55 3 fifty-five*three 55 3
fifty-nine*three 59 3 fifty-nine*three 59 3
sixty-three*three 63 3 sixty-three*three 63 3
sixty-seven*three 67 3 sixty-seven*three 67 3
seventy-one*three 71 3 seventy-one*three 71 3
seventy-five*three 75 3 seventy-five*three 75 3
seventy-nine*three 79 3 seventy-nine*three 79 3
eighty-three*three 83 3 eighty-three*three 83 3
eighty-seven*three 87 3 eighty-seven*three 87 3
ninety-one*three 91 3 ninety-one*three 91 3
ninety-five*three 95 3 ninety-five*three 95 3
ninety-nine*three 99 3 ninety-nine*three 99 3
k i j kk i j
----------------------------------------------------------------------------------
seven*three 7 3 seven*three 7 3
eleven*three 11 3 eleven*three 11 3
fifteen*three 15 3 fifteen*three 15 3
nineteen*three 19 3 nineteen*three 19 3
twenty-three*three 23 3 twenty-three*three 23 3
twenty-seven*three 27 3 twenty-seven*three 27 3
thirty-one*three 31 3 thirty-one*three 31 3
thirty-five*three 35 3 thirty-five*three 35 3
thirty-nine*three 39 3 thirty-nine*three 39 3
forty-three*three 43 3 forty-three*three 43 3
forty-seven*three 47 3 forty-seven*three 47 3
fifty-one*three 51 3 fifty-one*three 51 3
fifty-five*three 55 3 fifty-five*three 55 3
fifty-nine*three 59 3 fifty-nine*three 59 3
sixty-three*three 63 3 sixty-three*three 63 3
sixty-seven*three 67 3 sixty-seven*three 67 3
seventy-one*three 71 3 seventy-one*three 71 3
seventy-five*three 75 3 seventy-five*three 75 3
seventy-nine*three 79 3 seventy-nine*three 79 3
eighty-three*three 83 3 eighty-three*three 83 3
eighty-seven*three 87 3 eighty-seven*three 87 3
ninety-one*three 91 3 ninety-one*three 91 3
ninety-five*three 95 3 ninety-five*three 95 3
ninety-nine*three 99 3 ninety-nine*three 99 3
This example demonstrates SQL JOIN performance with macros.
| Type: | Sample |
| Topic: | SAS Reference ==> Macro SAS Reference ==> Procedures ==> SQL
|
| Date Modified: | 2005-08-24 16:06:34 |
| Date Created: | 2005-05-23 13:54:28 |
Operating System and Release Information
| SAS System | Base SAS | All | n/a | n/a |