Using More Than One Observation in a Calculation |
Creating a Running Total |
The first task in performing calculations on the data set MYLIB.TOURREVENUE is to find out the total number of people who scheduled tours with Tradewinds Travel. Therefore, a variable is needed whose value starts at 0 and increases by the number of schedulings in each observation. The sum statement gives you that capability:
variable + expression |
In a sum statement, the value of the variable on the left side of the plus sign is 0 before the statement is processed for the first time. Processing the statement adds the value of the expression on the right side of the plus sign to the initial value; the sum variable retains the new value until the next processing of the statement. The sum statement ignores a missing value for the expression; the previous total remains unchanged.
The following statement creates the total number of schedulings :
TotalBookings + NumberOfBookings;
The following DATA step includes the sum statement above:
options pagesize=60 linesize=80 pageno=1 nodate; data total; set mylib.tourrevenue; TotalBookings + NumberOfBookings; run; proc print data=total; var Country NumberOfBookings TotalBookings; title 'Total Tours Booked'; run;
The following output displays the results:
Accumulating a Total for a Data Set
Total Tours Booked 1 Number Of Total Obs Country Bookings Bookings 1 France 10 10 2 Spain 12 22 3 Brazil 6 28 4 India . 28 5 Japan 10 38 6 Greece 20 58 7 New Zealand 6 64 8 Venezuela 8 72 9 Italy 9 81 10 USSR 6 87 11 Switzerland 20 107 12 Australia 10 117 13 Ireland 9 126The TotalBookings variable in the last observation of the TOTAL data set contains the total number of schedulings for the year.
Printing Only the Total |
If the total is the only information that is needed from the data set, a data set that contains only one observation and one variable (the TotalBookings variable) can be created by writing a DATA step that does all of the following:
specifies the END= option in the SET statement to determine if the current observation is the last observation
uses a subsetting IF to write only the last observation to the SAS data set
specifies the KEEP= option in the DATA step to keep only the variable that totals the schedulings.
When the END= option in the SET statement is specified, the variable that is named in the END= option is set to 1 when the DATA step is processing the last observation; the variable that is named in the END= option is set to 0 for other observations:
SET SAS-data-set <END=variable>; |
SAS does not add the END= variable to the data set that is being created. By testing the value of the END= variable, you can determine which observation is the last observation.
The following program selects the last observation with a subsetting IF statement and uses a KEEP= data set option to keep only the variable TotalBookings in the data set:
options pagesize=60 linesize=80 pageno=1 nodate; data total2(keep=TotalBookings); set mylib.tourrevenue end=Lastobs; TotalBookings + NumberOfBookings; if Lastobs; run; proc print data=total2; title 'Total Number of Tours Booked'; run;
The following output displays the results:
Selecting the Last Observation in a Data Set
Total Number of Tours Booked 1 Total Obs Bookings 1 126
The condition in the subsetting IF statement is true when Lastobs has a value of 1. When SAS is processing the last observation from MYLIB.TOURREVENUE, it assigns to Lastobs the value 1. Therefore, the subsetting IF statement accepts only the last observation from MYLIB.TOURREVENUE, and SAS writes the last observation to the data set TOTAL2.
Copyright © 2012 by SAS Institute Inc., Cary, NC, USA. All rights reserved.