Interleaving SAS Data Sets

Interleaving Data Sets

Before you can interleave data sets, the data must be sorted by the same variable or variables you will use with the BY statement that accompanies your SET statement.

For example, the Research and Development division and the Publications division of a company both maintain data sets containing information about each project currently under way. Each data set includes these variables:

Project	is a unique code that identifies the project.
Department	is the name of a department involved in the project.
Manager	is the last name of the manager from Department.
StaffCount	is the number of people working for Manager on this project.

Senior management for the company wants to combine the data sets by Project so that the new data set shows the resources that both divisions are devoting to each project. Both data sets must be sorted by Project before they can be interleaved.

The program that follows creates and displays the data set RESEARCH_DEVELOPMENT. See The RESEARCH_DEVELOPMENT Data Set. Note that the input data is already sorted by Project.

data research_development;
   length Department Manager $ 10;
   input Project $ Department $ Manager $ StaffCount;
   datalines;
MP971 Designing Daugherty 10
MP971 Coding Newton 8
MP971 Testing Miller 7
SL827 Designing Ramirez 8
SL827 Coding Cho 10
SL827 Testing Baker 7
WP057 Designing Hascal 11
WP057 Coding Constant 13
WP057 Testing Slivko 10
;
run;

proc print data=research_development;
   title 'Research and Development Project Staffing';
run;

The RESEARCH_DEVELOPMENT Data Set

                   Research and Development Project Staffing                   1

                                                            Staff
               Obs    Department    Manager      Project    Count

                1     Designing     Daugherty     MP971       10 
                2     Coding        Newton        MP971        8 
                3     Testing       Miller        MP971        7 
                4     Designing     Ramirez       SL827        8 
                5     Coding        Cho           SL827       10 
                6     Testing       Baker         SL827        7 
                7     Designing     Hascal        WP057       11 
                8     Coding        Constant      WP057       13 
                9     Testing       Slivko        WP057       10

The following program creates, sorts, and displays the second data set, PUBLICATIONS. The PUBLICATIONS Data Set shows the data set sorted by Project.

data publications;
   length Department Manager $ 10;
   input Manager $ Department $ Project $ StaffCount;
   datalines;
Cook Writing WP057 5
Deakins Writing SL827 7
Franscombe Editing MP971 4
Henry Editing WP057 3
King Production SL827 5
Krysonski Production WP057 3
Lassiter Graphics SL827 3
Miedema Editing SL827 5
Morard Writing MP971 6
Posey Production MP971 4
Spackle Graphics WP057 2
;
run;
proc sort data=publications;
   by Project;
run;

proc print data=publications;
   title 'Publications Project Staffing';
run;

The PUBLICATIONS Data Set

                         Publications Project Staffing                         1
                                                            Staff
              Obs    Department    Manager       Project    Count

                1    Editing       Franscombe     MP971       4  
                2    Writing       Morard         MP971       6  
                3    Production    Posey          MP971       4  
                4    Writing       Deakins        SL827       7  
                5    Production    King           SL827       5  
                6    Graphics      Lassiter       SL827       3  
                7    Editing       Miedema        SL827       5  
                8    Writing       Cook           WP057       5  
                9    Editing       Henry          WP057       3  
               10    Production    Krysonski      WP057       3  
               11    Graphics      Spackle        WP057       2

Understanding the Interleaving Process

When interleaving, SAS creates a new data set as follows:

Before executing the SET statement, SAS reads the descriptor portion of each data set that you name in the SET statement. Then SAS creates a program data vector that, by default, contains all the variables from all data sets as well as any variables created by the DATA step. SAS sets the value of each variable to missing.
SAS looks at the first BY group in each data set in the SET statement in order to determine which BY group should appear first in the new data set.
SAS copies to the new data set all observations in that BY group from each data set that contains observations in the BY group. SAS copies from the data sets in the same order as they appear in the SET statement.
SAS looks at the next BY group in each data set to determine which BY group should appear next in the new data set.
SAS sets the value of each variable in the program data vector to missing.
SAS repeats steps 3 through 5 until it has copied all observations to the new data set.

Using the Interleaving Process

The following program uses the SET and BY statements to interleave the data sets RESEARCH_DEVELOPMENT and PUBLICATIONS. Interleaving Data Sets shows the new data set.

data rnd_pubs;
   set research_development publications;
   by Project;
run;

proc print data=rnd_pubs;
   title 'Project Participation by Research and Development';
   title2 'and Publications Departments';
   title3 'Sorted by Project'
run;

Interleaving the Data Sets

               Project Participation by Research and Development               1
                          and Publications Departments
                            Sorted by Project

                                                            Staff
              Obs    Department    Manager       Project    Count

                1    Designing     Daugherty      MP971       10 
                2    Coding        Newton         MP971        8 
                3    Testing       Miller         MP971        7 
                4    Editing       Franscombe     MP971        4 
                5    Writing       Morard         MP971        6 
                6    Production    Posey          MP971        4 
                7    Designing     Ramirez        SL827        8 
                8    Coding        Cho            SL827       10 
                9    Testing       Baker          SL827        7 
               10    Writing       Deakins        SL827        7 
               11    Production    King           SL827        5 
               12    Graphics      Lassiter       SL827        3 
               13    Editing       Miedema        SL827        5 
               14    Designing     Hascal         WP057       11 
               15    Coding        Constant       WP057       13 
               16    Testing       Slivko         WP057       10 
               17    Writing       Cook           WP057        5 
               18    Editing       Henry          WP057        3 
               19    Production    Krysonski      WP057        3 
               20    Graphics      Spackle        WP057        2

The new data set RND_PUBS includes all observations from both data sets. Each BY group in the new data set contains observations from RESEARCH_DEVELOPMENT followed by observations from PUBLICATIONS.

Top of Page