Previous Page | Next Page

Using More Than One Observation in a Calculation

Using a Value in a Later Observation

A further requirement of Tradewinds Travel is a separate SAS data set that contains the tour that generated the most revenue. (The revenue total equals the price of the tour multiplied by the number of schedulings.) One method of creating the new data set might be to follow these three steps:

  1. Calculate the revenue in a DATA step.

  2. Sort the data set in descending order by the revenue.

  3. Use another DATA step with the OBS= data set option to write that observation.

A more efficient method compares the revenue from all observations in a single DATA step. SAS can retain a value from the current observation to use in future observations. When the processing of the DATA step reaches the next observation, the held value represents information from the previous observation.

The RETAIN statement causes a variable that is created in the DATA step to retain its value from the current observation into the next observation rather than being set to missing at the beginning of each iteration of the DATA step. It is a declarative statement, not an executable statement. This statement has the following form:

RETAIN variable-1 < . . . variable-n>;

To compare the Revenue value in one observation to the Revenue value in the next observation, create a retained variable named HoldRevenue and assign the value of the current Revenue variable to it. In the next observation, the HoldRevenue variable contains the Revenue value from the previous observation, and its value can be compared to that of Revenue in the current observation.

To see how the RETAIN statement works, look at the next example. The following DATA step outputs observations to data set TEMP before SAS assigns the current revenue to HoldRevenue:

options pagesize=60 linesize=80 pageno=1 nodate;
data temp;
   set mylib.tourrevenue;
   retain HoldRevenue;
   Revenue = LandCost * NumberOfBookings;
   output;
   HoldRevenue = Revenue;
run;

proc print data=temp;
   var Country LandCost NumberOfBookings Revenue HoldRevenue;
   title 'Tour Revenue';
run;

The following output displays the results:

Retaining a Value By Using the Retain Statement

                                  Tour Revenue                                 1

                                         Number
                                Land       Of                    Hold
          Obs    Country        Cost    Bookings    Revenue    Revenue

            1    France          575       10         5750          . 
            2    Spain           510       12         6120       5750 
            3    Brazil          540        6         3240       6120 
            4    India           489        .            .       3240 
            5    Japan           720       10         7200          . 
            6    Greece          698       20        13960       7200 
            7    New Zealand    1489        6         8934      13960 
            8    Venezuela       425        8         3400       8934 
            9    Italy           468        9         4212       3400 
           10    USSR            924        6         5544       4212 
           11    Switzerland     734       20        14680       5544 
           12    Australia      1079       10        10790      14680 
           13    Ireland         558        9         5022      10790 

The value of HoldRevenue is missing at the beginning of the first observation; it is still missing when the OUTPUT statement writes the first observation to TEMP. After the OUTPUT statement, an assignment statement assigns the value of Revenue to HoldRevenue. Because HoldRevenue is retained, that value is present at the beginning of the next iteration of the DATA step. When the OUTPUT statement executes again, the value of HoldRevenue still contains that value.

To find the largest value of Revenue, assign the value of Revenue to HoldRevenue only when Revenue is larger than HoldRevenue, as shown in the following program:

options pagesize=60 linesize=80 pageno=1 nodate;
data mostrevenue;
   set mylib.tourrevenue;
   retain HoldRevenue;
   Revenue = LandCost * NumberOfBookings;
   if Revenue > HoldRevenue then HoldRevenue = Revenue;
run;

proc print data=mostrevenue;
   var Country LandCost NumberOfBookings Revenue HoldRevenue;
   title 'Tour Revenue';
run;

The following output displays the results:

Holding the Largest Value in a Retained Variable

                                  Tour Revenue                                 1

                                         Number
                                Land       Of                    Hold
          Obs    Country        Cost    Bookings    Revenue    Revenue

            1    France          575       10         5750       5750 
            2    Spain           510       12         6120       6120 
            3    Brazil          540        6         3240       6120 
            4    India           489        .            .       6120 
            5    Japan           720       10         7200       7200 
            6    Greece          698       20        13960      13960 
            7    New Zealand    1489        6         8934      13960 
            8    Venezuela       425        8         3400      13960 
            9    Italy           468        9         4212      13960 
           10    USSR            924        6         5544      13960 
           11    Switzerland     734       20        14680      14680 
           12    Australia      1079       10        10790      14680 
           13    Ireland         558        9         5022      14680 

The value of HoldRevenue in the last observation represents the largest revenue that is generated by any tour. To determine which observation the value came from, create a variable named HoldCountry to hold the name of the country from the observations with the largest revenue. Include HoldCountry in the RETAIN statement to retain its value until explicitly changed. Then use the END= data set option to select the last observation, and use the KEEP= data set option to keep only HoldRevenue and HoldCountry in MOSTREVENUE.

options pagesize=60 linesize=80 pageno=1 nodate;
data mostrevenue (keep=HoldCountry HoldRevenue);
   set mylib.tourrevenue  end=LastOne;
   retain HoldRevenue HoldCountry;
   Revenue = LandCost * NumberOfBookings;
   if Revenue > HoldRevenue then
      do;
         HoldRevenue = Revenue;
         HoldCountry = Country;
      end;
   if LastOne;
run;
proc print data=mostrevenue;
   title 'Country with the Largest Value of Revenue';
run;

Note:   The program uses a DO group. Using DO groups enables the program to evaluate a condition once and take more than one action as a result. For more information on DO groups, see Performing More Than One Action in an IF-THEN Statement.  [cautionend]

The following output displays the results:

Selecting a New Data Set Using RETAIN and Subsetting IF Statements

                   Country with the Largest Value of Revenue                   1

                                  Hold
                         Obs    Revenue    HoldCountry

                          1      14680     Switzerland

Previous Page | Next Page | Top of Page