Using More Than One Observation in a Calculation |
An additional requirement of Tradewinds Travel is to determine the number of tours that are scheduled with each vendor. In order to accomplish this task, a program must group the data by a variable; that is, the program must organize the data set into groups of observations, with one group for each vendor. In this case, the program must group the data by the Vendor variable. Each group is known generically as a BY group; the variable that is used to determine the groupings is called a BY variable.
In order to group the data by the Vendor variable, the program must
include a PROC SORT step to group the observations by the Vendor variable
reset the sum variable to 0 at the beginning of each group of observations.
options pagesize=60 linesize=80 pageno=1 nodate; proc sort data=mylib.tourrevenue out=mylib.sorttour; by Vendor; run; data totalby; set mylib.sorttour; by Vendor; if First.Vendor then VendorBookings = 0; VendorBookings + NumberOfBookings; run; proc print data=totalby; title 'Summary of Bookings by Vendor'; run;
In the preceding program, the FIRST.Vendor variable is used in an IF-THEN statement to set the sum variable (VendorBookings) to 0 in the first observation of each BY group. (For more information on the FIRST.variable and LAST.variable temporary variables, see Finding the First or Last Observation in a Group.)
The following output displays the results.
Summary of Bookings by Vendor 1 Number Land Of Vendor Obs Country Cost Vendor Bookings Bookings 1 France 575 Express 10 10 2 India 489 Express . 10 3 Japan 720 Express 10 20 4 Greece 698 Express 20 40 5 Italy 468 Express 9 49 6 Ireland 558 Express 9 58 7 New Zealand 1489 Southsea 6 6 8 Australia 1079 Southsea 10 16 9 Spain 510 World 12 12 10 Brazil 540 World 6 18 11 Venezuela 425 World 8 26 12 USSR 924 World 6 32 13 Switzerland 734 World 20 52
Notice that while this output does in fact include the total number of schedulings for each vendor, it also includes a great deal of extraneous information. Reporting the total schedulings for each vendor requires only the variables Vendor and VendorBookings from the last observation for each vendor. Therefore, the program can
use the DROP= or KEEP= data set options to eliminate the variables Country, LandCost, and NumberOfBookings from the output data set
use the LAST.Vendor variable in a subsetting IF statement to write only the last observation in each group to the data set TOTALBY.
The following program creates data set TOTALBY:
options pagesize=60 linesize=80 pageno=1 nodate; proc sort data=mylib.tourrevenue out=mylib.sorttour; by Vendor; run; data totalby(drop=country landcost); set mylib.sorttour; by Vendor; if First.Vendor then VendorBookings = 0; VendorBookings + NumberOfBookings; if Last.Vendor; run; proc print data=totalby; title 'Total Bookings by Vendor'; run;
The following output displays the results:
Putting Totals for Each BY Group in a New Data Set
Total Bookings by Vendor 1 Vendor Obs Vendor Bookings 1 Express 58 2 Southsea 16 3 World 52
Copyright © 2012 by SAS Institute Inc., Cary, NC, USA. All rights reserved.