Practical Problem-Solving with PROC SQL |
Problem |
You want to create a separate data set for each unique value of a column.
Background Information |
The SQL.FEATURES data set contains information on various geographical features around the world.
FEATURES Name Type Location Area Height Depth Length ------------------------------------------------------------------------------------ Aconcagua Mountain Argentina . 22834 . . Amazon River South America . . . 4000 Amur River Asia . . . 2700 Andaman Sea 218100 . 3667 . Angel Falls Waterfall Venezuela . 3212 . . Annapurna Mountain Nepal . 26504 . . Aral Sea Lake Asia 25300 . 222 . Ararat Mountain Turkey . 16804 . . Arctic Ocean 5105700 . 17880 . Atlantic Ocean 33420000 . 28374 .
Solution |
To create a separate data set for each type of feature, you could go through the data set manually to determine all the unique values of Type, and then write a separate DATA step for each type (or a single DATA step with multiple OUTPUT statements). This approach is labor-intensive, error-prone, and impractical for large data sets. The following PROC SQL code counts the unique values of Type and puts each value in a separate macro variable. The SAS macro that follows the PROC SQL code uses these macro variables to create a SAS data set for each value. You do not need to know beforehand how many unique values there are or what the values are.
proc sql noprint; select count(distinct type) into :n from sql.features; select distinct type into :type1 - :type%left(&n) from sql.features; quit; %macro makeds; %do i=1 %to &n; data &&type&i (drop=type); set sql.features; if type="&&type&i"; run; %end; %mend makeds; %makeds;
240 proc sql noprint; 241 select count(distinct type) 242 into :n 243 from sql.features; 244 select distinct type 245 into :type1 - :type%left(&n) 246 from sql.features; 247 quit; NOTE: PROCEDURE SQL used (Total process time): real time 0.04 seconds cpu time 0.03 seconds 248 249 %macro makeds; 250 %do i=1 %to &n; 251 data &&type&i (drop=type); 252 set sql.features; 253 if type="&&type&i"; 254 run; 255 %end; 256 %mend makeds; 257 %makeds; NOTE: There were 74 observations read from the data set SQL.FEATURES. NOTE: The data set WORK.DESERT has 7 observations and 6 variables. NOTE: DATA statement used (Total process time): real time 1.14 seconds cpu time 0.41 seconds NOTE: There were 74 observations read from the data set SQL.FEATURES. NOTE: The data set WORK.ISLAND has 6 observations and 6 variables. NOTE: DATA statement used (Total process time): real time 0.02 seconds cpu time 0.00 seconds NOTE: There were 74 observations read from the data set SQL.FEATURES. NOTE: The data set WORK.LAKE has 10 observations and 6 variables. NOTE: DATA statement used (Total process time): real time 0.01 seconds cpu time 0.01 seconds NOTE: There were 74 observations read from the data set SQL.FEATURES. NOTE: The data set WORK.MOUNTAIN has 18 observations and 6 variables. NOTE: DATA statement used (Total process time): real time 0.02 seconds cpu time 0.01 seconds NOTE: There were 74 observations read from the data set SQL.FEATURES. NOTE: The data set WORK.OCEAN has 4 observations and 6 variables. NOTE: DATA statement used (Total process time): real time 0.01 seconds cpu time 0.01 seconds NOTE: There were 74 observations read from the data set SQL.FEATURES. NOTE: The data set WORK.RIVER has 12 observations and 6 variables. NOTE: DATA statement used (Total process time): real time 0.02 seconds cpu time 0.02 seconds NOTE: There were 74 observations read from the data set SQL.FEATURES. NOTE: The data set WORK.SEA has 13 observations and 6 variables. NOTE: DATA statement used (Total process time): real time 0.03 seconds cpu time 0.02 seconds NOTE: There were 74 observations read from the data set SQL.FEATURES. NOTE: The data set WORK.WATERFALL has 4 observations and 6 variables. NOTE: DATA statement used (Total process time): real time 0.02 seconds cpu time 0.02 seconds
How It Works |
This solution uses the INTO clause to store values in macro variables. The first SELECT statement counts the unique variables and stores the result in macro variable N. The second SELECT statement creates a range of macro variables, one for each unique value, and stores each unique value in one of the macro variables. Note the use of the %LEFT function, which trims leading blanks from the value of the N macro variable.
The MAKEDS macro uses all the macro variables that were created in the PROC SQL step. The macro uses a %DO loop to execute a DATA step for each unique value, writing rows that contain a given value of Type to a SAS data set of the same name. The Type variable is dropped from the output data sets.
For more information about SAS macros, see SAS Macro Language: Reference.
Copyright © 2008 by SAS Institute Inc., Cary, NC, USA. All rights reserved.