| Conditionally Processing Observations from Multiple SAS Data Sets |
| Understanding the IN= Data Set Option |
When you create a new data set by combining observations from two or more data sets, knowing which data set an observation came from can be useful. For example, you might want to perform a calculation based on which data set contributed an observation. Otherwise, you might lose important contextual information that you need for later processing. You can determine which data set contributed a particular observation by using the IN= data set option.
The IN= data set option enables you to determine which data sets have contributed to the observation that is currently in the program data vector. The syntax for this option on the SET statement is
| SET SAS-data-set-1 (IN=variable) SAS-data-set-2; |
| BY a-common-variable; |
When you use the IN= option with a data set in a SET, MERGE, MODIFY, or UPDATE statement, SAS creates a temporary variable associated with that data set. The value of variable is 1 if the data set has contributed to the observation currently in the program data vector. The value is 0 if it has not contributed. You can use the IN= option with any or all the data sets you name in a SET, MERGE, MODIFY, or UPDATE statement, but use a different variable name in each case.
Note: The IN= variable exists during the execution of
the DATA step only; it is not written to the output data set that is created. ![[cautionend]](../../../../common/64368/HTML/default/images/cautend.gif)
| The Program |
The original data sets, SOUTHAMERICAN and EUROPEAN, do not need a variable that identifies the countries' continent because all observations in SOUTHAMERICAN pertain to the South American continent, and all observations in EUROPEAN pertain to the European continent. However, when you combine the data sets, you lose the context, which in this case is the relevant continent for each observation. The following example uses the SET statement with a BY statement to combine the two data sets into one data set that contains all the observations in chronological order:
options pagesize=60 linesize=80 pageno=1 nodate; data finalists; set southamerican european; by year; run; proc print data=finalists; title 'World Cup Finalists'; title2 'from 1958 to 1998'; run;
World Cup Finalists Grouped by Year
World Cup Finalists 1
from 1958 to 1998
Obs Year Country Score Result
1 1954 West Germany 3-2 won
2 1954 Hungary 2-3 lost
3 1958 Brazil 5-2 won
4 1958 Sweden 2-5 lost
5 1962 Brazil 3-1 won
6 1962 Czechoslovakia 1-3 lost
7 1966 England 4-2 won
8 1966 West Germany 2-4 lost
9 1970 Brazil 4-1 won
10 1970 Italy 1-4 lost
11 1974 West Germany 2-1 won
12 1974 Holland 1-2 lost
13 1978 Argentina 3-1 won
14 1978 Holland 1-2 lost
15 1982 Italy 3-1 won
16 1982 West Germany 1-3 lost
17 1986 Argentina 3-2 won
18 1986 West Germany 2-3 lost
19 1990 Argentina 0-1 lost
20 1990 West Germany 1-0 won
21 1994 Brazil 3-2 won
22 1994 Italy 2-3 lost
23 1998 Brazil 0-3 lost
24 1998 France 3-0 won
Notice that this output would be more useful if it showed from which data set each observation originated. To solve this problem, the following program uses the IN= data set option in conjunction with IF-THEN/ELSE statements. By determining which data set contributed an observation, the conditional statement executes and assigns the appropriate value to the variable Continent in each observation in the new data set FINALISTS.
options pagesize=60 linesize=80 pageno=1 nodate; data finalists; set southamerican (in=S) european;1 by Year; if S then Continent='South America';2 else Continent='Europe'; run; proc print data=finalists; title 'World Cup Finalists'; title2 'from 1954 to 1998'; run;
The following list corresponds to the numbered items in the preceding program:
The following output shows the results:
World Cup Finalists with Continent
World Cup Finalists 1
from 1954 to 1998
Obs Year Country Score Result Continent
1 1954 West Germany 3-2 won Europe
2 1954 Hungary 2-3 lost Europe
3 1958 Brazil 5-2 won South America
4 1958 Sweden 2-5 lost Europe
5 1962 Brazil 3-1 won South America
6 1962 Czechoslovakia 1-3 lost Europe
7 1966 England 4-2 won Europe
8 1966 West Germany 2-4 lost Europe
9 1970 Brazil 4-1 won South America
10 1970 Italy 1-4 lost Europe
11 1974 West Germany 2-1 won Europe
12 1974 Holland 1-2 lost Europe
13 1978 Argentina 3-1 won South America
14 1978 Holland 1-2 lost Europe
15 1982 Italy 3-1 won Europe
16 1982 West Germany 1-3 lost Europe
17 1986 Argentina 3-2 won South America
18 1986 West Germany 2-3 lost Europe
19 1990 Argentina 0-1 lost South America
20 1990 West Germany 1-0 won Europe
21 1994 Brazil 3-2 won South America
22 1994 Italy 2-3 lost Europe
23 1998 Brazil 0-3 lost South America
24 1998 France 3-0 won Europe
Copyright © 2012 by SAS Institute Inc., Cary, NC, USA. All rights reserved.