Interleaving SAS Data Sets |
Preparing to Interleave Data Sets |
Before you can interleave data sets, the data must be sorted by the same variable or variables you will use with the BY statement that accompanies your SET statement.
For example, the Research and Development division and the Publications division of a company both maintain data sets containing information about each project currently under way. Each data set includes these variables:
Project | |
Department | |
Manager | |
StaffCount |
is the number of people working for Manager on this project. |
Senior management for the company wants to combine the data sets by Project so that the new data set shows the resources that both divisions are devoting to each project. Both data sets must be sorted by Project before they can be interleaved.
The program that follows creates and displays the data set RESEARCH_DEVELOPMENT. See The RESEARCH_DEVELOPMENT Data Set. Note that the input data is already sorted by Project.
data research_development; length Department Manager $ 10; input Project $ Department $ Manager $ StaffCount; datalines; MP971 Designing Daugherty 10 MP971 Coding Newton 8 MP971 Testing Miller 7 SL827 Designing Ramirez 8 SL827 Coding Cho 10 SL827 Testing Baker 7 WP057 Designing Hascal 11 WP057 Coding Constant 13 WP057 Testing Slivko 10 ; run;
proc print data=research_development; title 'Research and Development Project Staffing'; run;
The RESEARCH_DEVELOPMENT Data Set
Research and Development Project Staffing 1 Staff Obs Department Manager Project Count 1 Designing Daugherty MP971 10 2 Coding Newton MP971 8 3 Testing Miller MP971 7 4 Designing Ramirez SL827 8 5 Coding Cho SL827 10 6 Testing Baker SL827 7 7 Designing Hascal WP057 11 8 Coding Constant WP057 13 9 Testing Slivko WP057 10
The following program creates, sorts, and displays the second data set, PUBLICATIONS. The PUBLICATIONS Data Set shows the data set sorted by Project.
data publications; length Department Manager $ 10; input Manager $ Department $ Project $ StaffCount; datalines; Cook Writing WP057 5 Deakins Writing SL827 7 Franscombe Editing MP971 4 Henry Editing WP057 3 King Production SL827 5 Krysonski Production WP057 3 Lassiter Graphics SL827 3 Miedema Editing SL827 5 Morard Writing MP971 6 Posey Production MP971 4 Spackle Graphics WP057 2 ; run; proc sort data=publications; by Project; run;
proc print data=publications; title 'Publications Project Staffing'; run;
Publications Project Staffing 1 Staff Obs Department Manager Project Count 1 Editing Franscombe MP971 4 2 Writing Morard MP971 6 3 Production Posey MP971 4 4 Writing Deakins SL827 7 5 Production King SL827 5 6 Graphics Lassiter SL827 3 7 Editing Miedema SL827 5 8 Writing Cook WP057 5 9 Editing Henry WP057 3 10 Production Krysonski WP057 3 11 Graphics Spackle WP057 2
Understanding the Interleaving Process |
When interleaving, SAS creates a new data set as follows:
Before executing the SET statement, SAS reads the descriptor portion of each data set that you name in the SET statement. Then SAS creates a program data vector that, by default, contains all the variables from all data sets as well as any variables created by the DATA step. SAS sets the value of each variable to missing.
SAS looks at the first BY group in each data set in the SET statement in order to determine which BY group should appear first in the new data set.
SAS copies to the new data set all observations in that BY group from each data set that contains observations in the BY group. SAS copies from the data sets in the same order as they appear in the SET statement.
SAS looks at the next BY group in each data set to determine which BY group should appear next in the new data set.
SAS sets the value of each variable in the program data vector to missing.
SAS repeats steps 3 through 5 until it has copied all observations to the new data set.
Using the Interleaving Process |
The following program uses the SET and BY statements to interleave the data sets RESEARCH_DEVELOPMENT and PUBLICATIONS. Interleaving Data Sets shows the new data set.
data rnd_pubs; set research_development publications; by Project; run; proc print data=rnd_pubs; title 'Project Participation by Research and Development'; title2 'and Publications Departments'; title3 'Sorted by Project' run;
Project Participation by Research and Development 1 and Publications Departments Sorted by Project Staff Obs Department Manager Project Count 1 Designing Daugherty MP971 10 2 Coding Newton MP971 8 3 Testing Miller MP971 7 4 Editing Franscombe MP971 4 5 Writing Morard MP971 6 6 Production Posey MP971 4 7 Designing Ramirez SL827 8 8 Coding Cho SL827 10 9 Testing Baker SL827 7 10 Writing Deakins SL827 7 11 Production King SL827 5 12 Graphics Lassiter SL827 3 13 Editing Miedema SL827 5 14 Designing Hascal WP057 11 15 Coding Constant WP057 13 16 Testing Slivko WP057 10 17 Writing Cook WP057 5 18 Editing Henry WP057 3 19 Production Krysonski WP057 3 20 Graphics Spackle WP057 2
The new data set RND_PUBS includes all observations from both data sets. Each BY group in the new data set contains observations from RESEARCH_DEVELOPMENT followed by observations from PUBLICATIONS.
Copyright © 2012 by SAS Institute Inc., Cary, NC, USA. All rights reserved.