Creating Subsets of Observations |
Understanding the OUTPUT Statement |
SAS enables you to create multiple SAS data sets in a single DATA step using an OUTPUT statement:
OUTPUT <SAS-data-set(s)>; |
When you use an OUTPUT statement without specifying a data set name, SAS writes the current observation to all data sets named in the DATA statement. If you want to write observations to a selected data set, then you specify that data set name directly in the OUTPUT statement. Any data set name appearing in the OUTPUT statement must also appear in the DATA statement.
Example for Conditionally Writing Observations to Multiple Data Sets |
One of the SAS data sets contains tours that are guided by the tour guide Lucas and the other contains tours led by other guides. Writing to multiple data sets is accomplished by doing one of the following:
using an OUTPUT statement in the THEN and ELSE clauses to output the observations to the appropriate data sets
options pagesize=60 linesize=80 pageno=1 nodate; data lucastour othertours; set mylib.arts; if TourGuide = 'Lucas' then output lucastour; else output othertours; ; proc print data=lucastour; title "Data Set with TourGuide = 'Lucas'"; ; proc print data=othertours; title "Data Set with Other Guides"; run;
The following output displays the results:
Creating Two Data Sets with One DATA Step
Data Set with TourGuide = 'Lucas' 1 Land Tour Obs City Nights Cost Budget Guide 1 Paris 8 1680 High Lucas 2 New York 6 . Lucas
Data Set with Other Guides 2 Land Tour Obs City Nights Cost Budget Guide 1 Rome 3 750 Medium D'Amico 2 London 6 1230 High Wilson 3 Madrid 3 370 Low Torres 4 Amsterdam 4 580 Low
A Common Mistake When Writing to Multiple Data Sets |
If you use an OUTPUT statement, then you suppress the automatic output of observations at the end of the DATA step. Therefore, if you plan to use any OUTPUT statements in a DATA step, then you must program all output for that step with OUTPUT statements. For example, in the previous DATA step you sent output to both LUCASTOUR and OTHERTOURS. For comparison, the following program shows what would happen if you omit the ELSE statement in the DATA step:
options pagesize=60 linesize=80 pageno=1 nodate; data lucastour2 othertour2; set mylib.arts; if TourGuide = 'Lucas' then output lucastour2; run; proc print data=lucastour2; title "Data Set with Guide = 'Lucas'"; run; proc print data=othertour2; title "Data Set with Other Guides"; run;
The following output displays the results:
Failing to Direct Output to a Second Data Set
Data Set with Guide = 'Lucas' 1 Land Tour Obs City Nights Cost Budget Guide 1 Paris 8 1680 High Lucas 2 New York 6 . Lucas
No observations are written to OTHERTOUR2 because output was not directed to it.
Understanding Why the Placement of the OUTPUT Statement Is Important |
By default SAS writes an observation to the output data set at the end of each iteration. When you use an OUTPUT statement, you override the automatic output feature. Where you place the OUTPUT statement, therefore, is very important. For example, if a variable value is calculated after the OUTPUT statement executes, then that value is not available when the observation is written to the output data set.
For example, in the following DATA step, an assignment statement is placed after the IF-THEN/ELSE group:
/* first attempt to combine assignment and OUTPUT statements */ options pagesize=60 linesize=80 pageno=1 nodate; data lucasdays otherdays; set mylib.arts; if TourGuide = 'Lucas' then output lucasdays; else output otherdays; Days = Nights+1; run; proc print data=lucasdays; title "Number of Days in Lucas's Tours"; run; proc print data=otherdays; title "Number of Days in Other Guides' Tours"; run;
Unintended Results: Outputting Observations before Assigning Values
Number of Days in Lucas's Tours 1 Land Tour Obs City Nights Cost Budget Guide Days 1 Paris 8 1680 High Lucas . 2 New York 6 . Lucas .
Number of Days in Other Guides' Tours 2 Land Tour Obs City Nights Cost Budget Guide Days 1 Rome 3 750 Medium D'Amico . 2 London 6 1230 High Wilson . 3 Madrid 3 370 Low Torres . 4 Amsterdam 4 580 Low .
The value of DAYS is missing in all observations because the OUTPUT statement writes the observation to the SAS data sets before the assignment statement is processed. If you want the value of DAY to appear in the data sets, then use the assignment statement before you use the OUTPUT statement. The following program shows the correct position:
/* correct position of assignment statement */ options pagesize=60 linesize=80 pageno=1 nodate; data lucasdays2 otherdays2; set mylib.arts; Days = Nights + 1; if TourGuide = 'Lucas' then output lucasdays2; else output otherdays2; run; proc print data=lucasdays2; title "Number of Days in Lucas's Tours"; run; proc print data=otherdays2; title "Number of Days in Other Guides' Tours"; run;
Intended Results: Assigning Values after Outputting Observations
Number of Days in Lucas's Tours 1 Land Tour Obs City Nights Cost Budget Guide Days 1 Paris 8 1680 High Lucas 9 2 New York 6 . Lucas 7
Number of Days in Other Guides' Tours 2 Land Tour Obs City Nights Cost Budget Guide Days 1 Rome 3 750 Medium D'Amico 4 2 London 6 1230 High Wilson 7 3 Madrid 3 370 Low Torres 4 4 Amsterdam 4 580 Low 5
Writing an Observation Multiple Times to One or More Data Sets |
After SAS processes an OUTPUT statement, the observation remains in the program data vector and you can continue programming with it. You can even output it again to the same SAS data set or to a different one. The following example creates two pairs of data sets, one pair based on the name of the tour guide and one pair based on the number of nights.
options pagesize=60 linesize=80 pageno=1 nodate; data lucastour othertour weektour daytour; set mylib.arts; if TourGuide = 'Lucas' then output lucastour; else output othertour; if nights >= 6 then output weektour; else output daytour; run; proc print data=lucastour; title "Lucas's Tours"; run; proc print data=othertour; title "Other Guides' Tours"; run; proc print data=weektour; title 'Tours Lasting a Week or More'; run; proc print data=daytour; title 'Tours Lasting Less Than a Week'; run;
The following output displays the results:
Assigning Observations to More Than One Data Set
Lucas's Tours 1 Land Tour Obs City Nights Cost Budget Guide 1 Paris 8 1680 High Lucas 2 New York 6 . Lucas
Other Guides' Tours 2 Land Tour Obs City Nights Cost Budget Guide 1 Rome 3 750 Medium D'Amico 2 London 6 1230 High Wilson 3 Madrid 3 370 Low Torres 4 Amsterdam 4 580 Low
Tours Lasting a Week or More 3 Land Tour Obs City Nights Cost Budget Guide 1 Paris 8 1680 High Lucas 2 London 6 1230 High Wilson 3 New York 6 . Lucas
Tours Lasting Less Than a Week 4 Land Tour Obs City Nights Cost Budget Guide 1 Rome 3 750 Medium D'Amico 2 Madrid 3 370 Low Torres 3 Amsterdam 4 580 Low
The first IF-THEN/ELSE group outputs all observations to either data set LUCASTOUR or OTHERTOUR. The second IF-THEN/ELSE group outputs the same observations to a different pair of data sets, WEEKTOUR and DAYTOUR. This repetition is possible because each observation remains in the program data vector after the first OUTPUT statement is processed and can be output again.
Copyright © 2012 by SAS Institute Inc., Cary, NC, USA. All rights reserved.