Previous Page | Next Page

Conditionally Processing Observations from Multiple SAS Data Sets

Determining Which Data Set Contributed the Observation


Understanding the IN= Data Set Option

When you create a new data set by combining observations from two or more data sets, knowing which data set an observation came from can be useful. For example, you might want to perform a calculation based on which data set contributed an observation. Otherwise, you might lose important contextual information that you need for later processing. You can determine which data set contributed a particular observation by using the IN= data set option.

The IN= data set option enables you to determine which data sets have contributed to the observation that is currently in the program data vector. The syntax for this option on the SET statement is

SET SAS-data-set-1 (IN=variable) SAS-data-set-2;
BY a-common-variable;

When you use the IN= option with a data set in a SET, MERGE, MODIFY, or UPDATE statement, SAS creates a temporary variable associated with that data set. The value of variable is 1 if the data set has contributed to the observation currently in the program data vector. The value is 0 if it has not contributed. You can use the IN= option with any or all the data sets you name in a SET, MERGE, MODIFY, or UPDATE statement, but use a different variable name in each case.

Note:   The IN= variable exists during the execution of the DATA step only; it is not written to the output data set that is created.  [cautionend]


The Program

The original data sets, SOUTHAMERICAN and EUROPEAN, do not need a variable that identifies the countries' continent because all observations in SOUTHAMERICAN pertain to the South American continent, and all observations in EUROPEAN pertain to the European continent. However, when you combine the data sets, you lose the context, which in this case is the relevant continent for each observation. The following example uses the SET statement with a BY statement to combine the two data sets into one data set that contains all the observations in chronological order:

options pagesize=60 linesize=80 pageno=1 nodate;

data finalists;
   set southamerican european;
   by year;
run;

proc print data=finalists;
   title 'World Cup Finalists';
   title2 'from 1958 to 1998';
run;

World Cup Finalists Grouped by Year

                              World Cup Finalists                              1
                               from 1958 to 1998

                Obs    Year    Country           Score    Result

                  1    1954    West Germany       3-2      won  
                  2    1954    Hungary            2-3      lost 
                  3    1958    Brazil             5-2      won  
                  4    1958    Sweden             2-5      lost 
                  5    1962    Brazil             3-1      won  
                  6    1962    Czechoslovakia     1-3      lost 
                  7    1966    England            4-2      won  
                  8    1966    West Germany       2-4      lost 
                  9    1970    Brazil             4-1      won  
                 10    1970    Italy              1-4      lost 
                 11    1974    West Germany       2-1      won  
                 12    1974    Holland            1-2      lost 
                 13    1978    Argentina          3-1      won  
                 14    1978    Holland            1-2      lost 
                 15    1982    Italy              3-1      won  
                 16    1982    West Germany       1-3      lost 
                 17    1986    Argentina          3-2      won  
                 18    1986    West Germany       2-3      lost 
                 19    1990    Argentina          0-1      lost 
                 20    1990    West Germany       1-0      won  
                 21    1994    Brazil             3-2      won  
                 22    1994    Italy              2-3      lost 
                 23    1998    Brazil             0-3      lost 
                 24    1998    France             3-0      won  

Notice that this output would be more useful if it showed from which data set each observation originated. To solve this problem, the following program uses the IN= data set option in conjunction with IF-THEN/ELSE statements. By determining which data set contributed an observation, the conditional statement executes and assigns the appropriate value to the variable Continent in each observation in the new data set FINALISTS.

options pagesize=60 linesize=80 pageno=1 nodate;

data finalists;
   set southamerican (in=S) european;1 
   by Year;
   if S then Continent='South America';2 
   else Continent='Europe';
run;

proc print data=finalists;
   title 'World Cup Finalists';
   title2 'from 1954 to 1998';
   run;

The following list corresponds to the numbered items in the preceding program:

[1] The IN= option in the SET statement tells SAS to create a variable named S.

[2] When the current observation comes from the data set SOUTHAMERICAN, the value of S is 1. Otherwise, the value is 0. The IF-THEN/ELSE statements execute one of two assignment statements, depending on the value of S. If the observation comes from the data set SOUTHAMERICAN, then the value that is assigned to Continent is South America. If the observation comes from the data set EUROPEAN, then the value that is assigned to Continent is Europe.

The following output shows the results:

World Cup Finalists with Continent

                              World Cup Finalists                              1
                               from 1954 to 1998
                         
       Obs    Year    Country           Score    Result    Continent

         1    1954    West Germany       3-2      won      Europe       
         2    1954    Hungary            2-3      lost     Europe       
         3    1958    Brazil             5-2      won      South America
         4    1958    Sweden             2-5      lost     Europe       
         5    1962    Brazil             3-1      won      South America
         6    1962    Czechoslovakia     1-3      lost     Europe       
         7    1966    England            4-2      won      Europe       
         8    1966    West Germany       2-4      lost     Europe       
         9    1970    Brazil             4-1      won      South America
        10    1970    Italy              1-4      lost     Europe       
        11    1974    West Germany       2-1      won      Europe       
        12    1974    Holland            1-2      lost     Europe       
        13    1978    Argentina          3-1      won      South America
        14    1978    Holland            1-2      lost     Europe       
        15    1982    Italy              3-1      won      Europe       
        16    1982    West Germany       1-3      lost     Europe       
        17    1986    Argentina          3-2      won      South America
        18    1986    West Germany       2-3      lost     Europe       
        19    1990    Argentina          0-1      lost     South America
        20    1990    West Germany       1-0      won      Europe       
        21    1994    Brazil             3-2      won      South America
        22    1994    Italy              2-3      lost     Europe       
        23    1998    Brazil             0-3      lost     South America
        24    1998    France             3-0      won      Europe       

Previous Page | Next Page | Top of Page