Conditionally Processing Observations from Multiple SAS Data Sets |
Understanding When the Last Observation Is Processed |
Many applications require that you determine when the DATA step processes the last observation in the input data set. For example, you might want to perform calculations only on the last observation in a data set, or you might want to write an observation only after the last observation has been processed. For this purpose, you can use the END= option for the SET, MERGE, MODIFY, or UPDATE statement. The syntax for this option is:
SET SAS-data-set-list END=variable; |
The END= option defines a temporary variable whose value is 1 when the DATA step is processing the last observation. At all other times, the value of variable is 0. Although the DATA step can use the END= variable, SAS does not add it to the resulting data set.
Note: Using More Than One Observation in a Calculation explains how to use the END= option in the SET statement with a single data set. The END= option works the same way with multiple data sets, but it is important to note that END= is set to 1 only when the last observation from all input data sets is being processed.
The Program |
This example uses the data in SOUTHAMERICAN and EUROPEAN to calculate how many years a team from each continent won the World Cup from 1954 to 1998.
To perform this calculation, this program must perform the following tasks:
keep a running total of how many times a team from each continent won the World Cup.
after processing all observations, multiply the final total for each continent by 4 (the length of time between World Cups) to determine the length of time each continent has been a World Cup champion.
write only the final observation to the output data set. The variables that contain the totals do not contain the final total until the last observation is processed.
The following DATA step calculates the running totals and produces the output data set that contains only those totals.
data timespan (keep=YearsSouthAmerican keep=YearsEuropean);4 set southamerican (in=S) european end=LastYear;1 3 by Year; if result='won' then do; if S then SouthAmericanWins+1;2 else EuropeanWins+1;2 end; if lastyear then3 do; YearsSouthAmerican=SouthAmericanWins*4; YearsEuropean=EuropeanWins*4; output;4 end; proc print data=timespan; title 'Total Years as Reigning World Cup Champions'; title2 'from 1954 to 1998'; run;
The following list corresponds to the numbered items in the preceding program:
Using the END= Option to Perform a Calculation Based on the Last Observation in the Data Sets
Total Years as Reigning World Cup Champions 3 from 1954 to 1998 Years South Years Obs American European 1 24 24
Copyright © 2012 by SAS Institute Inc., Cary, NC, USA. All rights reserved.