Conditionally Processing Observations from Multiple SAS Data Sets |
Understanding the IN= Data Set Option |
When you create a new data set by combining observations from two or more data sets, knowing which data set an observation came from can be useful. For example, you might want to perform a calculation based on which data set contributed an observation. Otherwise, you might lose important contextual information that you need for later processing. You can determine which data set contributed a particular observation by using the IN= data set option.
The IN= data set option enables you to determine which data sets have contributed to the observation that is currently in the program data vector. The syntax for this option on the SET statement is
SET SAS-data-set-1 (IN=variable) SAS-data-set-2; |
BY a-common-variable; |
When you use the IN= option with a data set in a SET, MERGE, MODIFY, or UPDATE statement, SAS creates a temporary variable associated with that data set. The value of variable is 1 if the data set has contributed to the observation currently in the program data vector. The value is 0 if it has not contributed. You can use the IN= option with any or all the data sets you name in a SET, MERGE, MODIFY, or UPDATE statement, but use a different variable name in each case.
Note: The IN= variable exists during the execution of the DATA step only; it is not written to the output data set that is created.
The Program |
The original data sets, SOUTHAMERICAN and EUROPEAN, do not need a variable that identifies the countries' continent because all observations in SOUTHAMERICAN pertain to the South American continent, and all observations in EUROPEAN pertain to the European continent. However, when you combine the data sets, you lose the context, which in this case is the relevant continent for each observation. The following example uses the SET statement with a BY statement to combine the two data sets into one data set that contains all the observations in chronological order:
options pagesize=60 linesize=80 pageno=1 nodate; data finalists; set southamerican european; by year; run; proc print data=finalists; title 'World Cup Finalists'; title2 'from 1958 to 1998'; run;
World Cup Finalists Grouped by Year
World Cup Finalists 1 from 1958 to 1998 Obs Year Country Score Result 1 1954 West Germany 3-2 won 2 1954 Hungary 2-3 lost 3 1958 Brazil 5-2 won 4 1958 Sweden 2-5 lost 5 1962 Brazil 3-1 won 6 1962 Czechoslovakia 1-3 lost 7 1966 England 4-2 won 8 1966 West Germany 2-4 lost 9 1970 Brazil 4-1 won 10 1970 Italy 1-4 lost 11 1974 West Germany 2-1 won 12 1974 Holland 1-2 lost 13 1978 Argentina 3-1 won 14 1978 Holland 1-2 lost 15 1982 Italy 3-1 won 16 1982 West Germany 1-3 lost 17 1986 Argentina 3-2 won 18 1986 West Germany 2-3 lost 19 1990 Argentina 0-1 lost 20 1990 West Germany 1-0 won 21 1994 Brazil 3-2 won 22 1994 Italy 2-3 lost 23 1998 Brazil 0-3 lost 24 1998 France 3-0 won
Notice that this output would be more useful if it showed from which data set each observation originated. To solve this problem, the following program uses the IN= data set option in conjunction with IF-THEN/ELSE statements. By determining which data set contributed an observation, the conditional statement executes and assigns the appropriate value to the variable Continent in each observation in the new data set FINALISTS.
options pagesize=60 linesize=80 pageno=1 nodate; data finalists; set southamerican (in=S) european;1 by Year; if S then Continent='South America';2 else Continent='Europe'; run; proc print data=finalists; title 'World Cup Finalists'; title2 'from 1954 to 1998'; run;
The following list corresponds to the numbered items in the preceding program:
The following output shows the results:
World Cup Finalists with Continent
World Cup Finalists 1 from 1954 to 1998 Obs Year Country Score Result Continent 1 1954 West Germany 3-2 won Europe 2 1954 Hungary 2-3 lost Europe 3 1958 Brazil 5-2 won South America 4 1958 Sweden 2-5 lost Europe 5 1962 Brazil 3-1 won South America 6 1962 Czechoslovakia 1-3 lost Europe 7 1966 England 4-2 won Europe 8 1966 West Germany 2-4 lost Europe 9 1970 Brazil 4-1 won South America 10 1970 Italy 1-4 lost Europe 11 1974 West Germany 2-1 won Europe 12 1974 Holland 1-2 lost Europe 13 1978 Argentina 3-1 won South America 14 1978 Holland 1-2 lost Europe 15 1982 Italy 3-1 won Europe 16 1982 West Germany 1-3 lost Europe 17 1986 Argentina 3-2 won South America 18 1986 West Germany 2-3 lost Europe 19 1990 Argentina 0-1 lost South America 20 1990 West Germany 1-0 won Europe 21 1994 Brazil 3-2 won South America 22 1994 Italy 2-3 lost Europe 23 1998 Brazil 0-3 lost South America 24 1998 France 3-0 won Europe
Copyright © 2012 by SAS Institute Inc., Cary, NC, USA. All rights reserved.