Previous Page | Next Page

Creating Subsets of Observations

Conditionally Writing Observations to One or More SAS Data Sets


Understanding the OUTPUT Statement

SAS enables you to create multiple SAS data sets in a single DATA step using an OUTPUT statement:

OUTPUT <SAS-data-set(s)>;

When you use an OUTPUT statement without specifying a data set name, SAS writes the current observation to all data sets named in the DATA statement. If you want to write observations to a selected data set, then you specify that data set name directly in the OUTPUT statement. Any data set name appearing in the OUTPUT statement must also appear in the DATA statement.


Example for Conditionally Writing Observations to Multiple Data Sets

One of the SAS data sets contains tours that are guided by the tour guide Lucas and the other contains tours led by other guides. Writing to multiple data sets is accomplished by doing one of the following:

  1. naming both data sets in the DATA statement.

  2. selecting the observations using an IF condition

  3. using an OUTPUT statement in the THEN and ELSE clauses to output the observations to the appropriate data sets

The following DATA step shows these steps:
options pagesize=60 linesize=80 pageno=1 nodate;
data lucastour othertours;
   set mylib.arts;
   if TourGuide = 'Lucas' then output lucastour;
   else output othertours;
;

proc print data=lucastour;
   title "Data Set with TourGuide = 'Lucas'";
;

proc print data=othertours;
   title "Data Set with Other Guides";
run;

The following output displays the results:

Creating Two Data Sets with One DATA Step

                       Data Set with TourGuide = 'Lucas'                       1

                                           Land              Tour
              Obs    City        Nights    Cost    Budget    Guide

               1     Paris          8      1680     High     Lucas
               2     New York       6         .              Lucas
                           Data Set with Other Guides                          2

                                          Land               Tour
            Obs    City         Nights    Cost    Budget     Guide

             1     Rome            3       750    Medium    D'Amico
             2     London          6      1230    High      Wilson 
             3     Madrid          3       370    Low       Torres 
             4     Amsterdam       4       580    Low              

A Common Mistake When Writing to Multiple Data Sets

If you use an OUTPUT statement, then you suppress the automatic output of observations at the end of the DATA step. Therefore, if you plan to use any OUTPUT statements in a DATA step, then you must program all output for that step with OUTPUT statements. For example, in the previous DATA step you sent output to both LUCASTOUR and OTHERTOURS. For comparison, the following program shows what would happen if you omit the ELSE statement in the DATA step:

options pagesize=60 linesize=80 pageno=1 nodate;
data lucastour2 othertour2;
   set mylib.arts;
   if TourGuide = 'Lucas' then output lucastour2;
run;

proc print data=lucastour2;
   title "Data Set with Guide = 'Lucas'";
run;

proc print data=othertour2;
   title "Data Set with Other Guides";
run;

The following output displays the results:

Failing to Direct Output to a Second Data Set

                         Data Set with Guide = 'Lucas'                         1

                                           Land              Tour
              Obs    City        Nights    Cost    Budget    Guide

               1     Paris          8      1680     High     Lucas
               2     New York       6         .              Lucas

No observations are written to OTHERTOUR2 because output was not directed to it.


Understanding Why the Placement of the OUTPUT Statement Is Important

By default SAS writes an observation to the output data set at the end of each iteration. When you use an OUTPUT statement, you override the automatic output feature. Where you place the OUTPUT statement, therefore, is very important. For example, if a variable value is calculated after the OUTPUT statement executes, then that value is not available when the observation is written to the output data set.

For example, in the following DATA step, an assignment statement is placed after the IF-THEN/ELSE group:

   /* first attempt to combine assignment and OUTPUT statements */
options pagesize=60 linesize=80 pageno=1 nodate;
data lucasdays otherdays;
   set mylib.arts;
   if TourGuide = 'Lucas' then output lucasdays;
   else output otherdays;
   Days = Nights+1;
run;

proc print data=lucasdays;
   title "Number of Days in Lucas's Tours";
run;

proc print data=otherdays;
   title "Number of Days in Other Guides' Tours";
run;

Unintended Results: Outputting Observations before Assigning Values

                        Number of Days in Lucas's Tours                        1

                                       Land              Tour
          Obs    City        Nights    Cost    Budget    Guide    Days

           1     Paris          8      1680     High     Lucas      . 
           2     New York       6         .              Lucas      . 
                     Number of Days in Other Guides' Tours                     2

                                      Land               Tour
        Obs    City         Nights    Cost    Budget     Guide     Days

         1     Rome            3       750    Medium    D'Amico      . 
         2     London          6      1230    High      Wilson       . 
         3     Madrid          3       370    Low       Torres       . 
         4     Amsterdam       4       580    Low                    . 

The value of DAYS is missing in all observations because the OUTPUT statement writes the observation to the SAS data sets before the assignment statement is processed. If you want the value of DAY to appear in the data sets, then use the assignment statement before you use the OUTPUT statement. The following program shows the correct position:

   /* correct position of assignment statement */
options pagesize=60 linesize=80 pageno=1 nodate;
data lucasdays2 otherdays2;
   set mylib.arts;
   Days = Nights + 1;
   if TourGuide = 'Lucas' then output lucasdays2;
   else output otherdays2;
run;

proc print data=lucasdays2;
   title "Number of Days in Lucas's Tours";
run;
proc print data=otherdays2;
   title "Number of Days in Other Guides' Tours";
run;

Intended Results: Assigning Values after Outputting Observations

                        Number of Days in Lucas's Tours                        1

                                       Land              Tour
          Obs    City        Nights    Cost    Budget    Guide    Days

           1     Paris          8      1680     High     Lucas      9 
           2     New York       6         .              Lucas      7 
                     Number of Days in Other Guides' Tours                     2

                                      Land               Tour
        Obs    City         Nights    Cost    Budget     Guide     Days

         1     Rome            3       750    Medium    D'Amico      4 
         2     London          6      1230    High      Wilson       7 
         3     Madrid          3       370    Low       Torres       4 
         4     Amsterdam       4       580    Low                    5 

Writing an Observation Multiple Times to One or More Data Sets

After SAS processes an OUTPUT statement, the observation remains in the program data vector and you can continue programming with it. You can even output it again to the same SAS data set or to a different one. The following example creates two pairs of data sets, one pair based on the name of the tour guide and one pair based on the number of nights.

options pagesize=60 linesize=80 pageno=1 nodate;
data lucastour othertour weektour daytour;
   set mylib.arts;
   if TourGuide = 'Lucas' then output lucastour;
   else output othertour;
   if nights >= 6 then output weektour;
   else output daytour;
run;

proc print data=lucastour;
   title "Lucas's Tours";
run;

proc print data=othertour;
   title "Other Guides' Tours";
run;
proc print data=weektour;
   title 'Tours Lasting a Week or More';
run;

proc print data=daytour;
   title 'Tours Lasting Less Than a Week';
run;

The following output displays the results:

Assigning Observations to More Than One Data Set

                                 Lucas's Tours                                 1

                                           Land              Tour
              Obs    City        Nights    Cost    Budget    Guide

               1     Paris          8      1680     High     Lucas
               2     New York       6         .              Lucas
                              Other Guides' Tours                              2

                                          Land               Tour
            Obs    City         Nights    Cost    Budget     Guide

             1     Rome            3       750    Medium    D'Amico
             2     London          6      1230    High      Wilson 
             3     Madrid          3       370    Low       Torres 
             4     Amsterdam       4       580    Low              
                          Tours Lasting a Week or More                         3

                                          Land               Tour
             Obs    City        Nights    Cost    Budget    Guide

              1     Paris          8      1680     High     Lucas 
              2     London         6      1230     High     Wilson
              3     New York       6         .              Lucas 
                         Tours Lasting Less Than a Week                        4

                                          Land               Tour
            Obs    City         Nights    Cost    Budget     Guide

             1     Rome            3       750    Medium    D'Amico
             2     Madrid          3       370    Low       Torres 
             3     Amsterdam       4       580    Low              

The first IF-THEN/ELSE group outputs all observations to either data set LUCASTOUR or OTHERTOUR. The second IF-THEN/ELSE group outputs the same observations to a different pair of data sets, WEEKTOUR and DAYTOUR. This repetition is possible because each observation remains in the program data vector after the first OUTPUT statement is processed and can be output again.

Previous Page | Next Page | Top of Page