Previous Page | Next Page

Practical Problem-Solving with PROC SQL

Creating and Using Macro Variables


Problem

You want to create a separate data set for each unique value of a column.


Background Information

The SQL.FEATURES data set contains information on various geographical features around the world.

FEATURES (Partial Output)

                                      FEATURES

Name             Type        Location             Area    Height     Depth    Length
------------------------------------------------------------------------------------
Aconcagua        Mountain    Argentina               .     22834         .         .
Amazon           River       South America           .         .         .      4000
Amur             River       Asia                    .         .         .      2700
Andaman          Sea                            218100         .      3667         .
Angel Falls      Waterfall   Venezuela               .      3212         .         .
Annapurna        Mountain    Nepal                   .     26504         .         .
Aral Sea         Lake        Asia                25300         .       222         .
Ararat           Mountain    Turkey                  .     16804         .         .
Arctic           Ocean                         5105700         .     17880         .
Atlantic         Ocean                        33420000         .     28374         .

Solution

To create a separate data set for each type of feature, you could go through the data set manually to determine all the unique values of Type, and then write a separate DATA step for each type (or a single DATA step with multiple OUTPUT statements). This approach is labor-intensive, error-prone, and impractical for large data sets. The following PROC SQL code counts the unique values of Type and puts each value in a separate macro variable. The SAS macro that follows the PROC SQL code uses these macro variables to create a SAS data set for each value. You do not need to know beforehand how many unique values there are or what the values are.

proc sql noprint;
   select count(distinct type)
      into :n
      from sql.features;
   select distinct type
      into :type1 - :type%left(&n)
      from sql.features;
quit;

%macro makeds;
   %do i=1 %to &n;
      data &&type&i (drop=type);
         set sql.features;
         if type="&&type&i";
      run;
   %end;
%mend makeds;
%makeds;

Log

240  proc sql noprint;
241     select count(distinct type)
242        into :n
243        from sql.features;
244     select distinct type
245        into :type1 - :type%left(&n)
246        from sql.features;
247  quit;
NOTE: PROCEDURE SQL used (Total process time):
      real time           0.04 seconds
      cpu time            0.03 seconds
      
248  
249  %macro makeds;
250     %do i=1 %to &n;
251        data &&type&i (drop=type);
252           set sql.features;
253           if type="&&type&i";
254        run;
255     %end;
256  %mend makeds;
257  %makeds;
NOTE: There were 74 observations read from the data set SQL.FEATURES.
NOTE: The data set WORK.DESERT has 7 observations and 6 variables.
NOTE: DATA statement used (Total process time):
      real time           1.14 seconds
      cpu time            0.41 seconds
      
NOTE: There were 74 observations read from the data set SQL.FEATURES.
NOTE: The data set WORK.ISLAND has 6 observations and 6 variables.
NOTE: DATA statement used (Total process time):
      real time           0.02 seconds
      cpu time            0.00 seconds
      
NOTE: There were 74 observations read from the data set SQL.FEATURES.
NOTE: The data set WORK.LAKE has 10 observations and 6 variables.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds
      
NOTE: There were 74 observations read from the data set SQL.FEATURES.
NOTE: The data set WORK.MOUNTAIN has 18 observations and 6 variables.
NOTE: DATA statement used (Total process time):
      real time           0.02 seconds
      cpu time            0.01 seconds
      
NOTE: There were 74 observations read from the data set SQL.FEATURES.
NOTE: The data set WORK.OCEAN has 4 observations and 6 variables.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds
      
NOTE: There were 74 observations read from the data set SQL.FEATURES.
NOTE: The data set WORK.RIVER has 12 observations and 6 variables.
NOTE: DATA statement used (Total process time):
      real time           0.02 seconds
      cpu time            0.02 seconds
      
NOTE: There were 74 observations read from the data set SQL.FEATURES.
NOTE: The data set WORK.SEA has 13 observations and 6 variables.
NOTE: DATA statement used (Total process time):
      real time           0.03 seconds
      cpu time            0.02 seconds
      
NOTE: There were 74 observations read from the data set SQL.FEATURES.
NOTE: The data set WORK.WATERFALL has 4 observations and 6 variables.
NOTE: DATA statement used (Total process time):
      real time           0.02 seconds
      cpu time            0.02 seconds

How It Works

This solution uses the INTO clause to store values in macro variables. The first SELECT statement counts the unique variables and stores the result in macro variable N. The second SELECT statement creates a range of macro variables, one for each unique value, and stores each unique value in one of the macro variables. Note the use of the %LEFT function, which trims leading blanks from the value of the N macro variable.

The MAKEDS macro uses all the macro variables that were created in the PROC SQL step. The macro uses a %DO loop to execute a DATA step for each unique value, writing rows that contain a given value of Type to a SAS data set of the same name. The Type variable is dropped from the output data sets.

For more information about SAS macros, see SAS Macro Language: Reference.

Previous Page | Next Page | Top of Page