24579 - Collapsing observations within a BY group into a single observation when the data set has three or more variables

Sample 24579: Collapsing observations within a BY group into a single observation when the data set has three or more variables

This technique is one you will hear referred to as 'converting a data set from long to wide'. Regardless of the number of observations per BY group, this code puts all values from the same BY group on one single observation. A sample data set with student information per semester is provided for testing purposes. You will need to know the largest number of observations in a BY group so the arrays will be large enough to hold all values of that BY group. PROC FREQ writes to a data set and the variable COUNT contains the number of observations per BY group. Use this to populate the ARRAY statements in the DATA step that follows. The variables created in the arrays have to be retained until the OUTPUT statement writes an observation after reading the last observation in a BY group.

/* Data set for testing purposes */
/* Varying number of observations for each BY group */
data in;
   infile datalines truncover;
   input stud_id semester $ year GPA;
   datalines;
165 Fall 2015 3.25
165 Spring 2016 3.4
165 Fall 2016 3.12
165 Spring 2017 3.5
165 Fall 2017 3.62
198 Fall 2018 3.75
200 Fall 2016 2.9
200 Spring 2017 3.7
200 Fall 2017 3.9
;
run;

/* View the data set */
proc print;
run;

/* Determine the number of observations in each BY group */
proc freq data=in;
   tables stud_id/out=myfreq(drop=percent);
run;

/* View the output data set */
/* Use the largest value of Count in the next step */
proc print data=myfreq;
run;

/* Create new variables needed to hold all values in the BY group with an ARRAY */
/* statement and then retain them so they can be output on the last observation */
/* in the BY group. At the beginning of the BY group, set all variables to      */
/* missing so that when there are uneven numbers of observations, there aren't  */
/* 'bleed over' values from the previous BY group.  Assign the current          */
/* observation's values to the appropriate array elements.                      */

data out(drop=i semester year gpa);
   set in;
   by stud_id;
   array sem(5) $ sem1-sem5;
   array yr(5) yr1-yr5;
   array grades(5) gpa1-gpa5;
   retain sem1-sem5 yr1-yr5 gpa1-gpa5;
   if first.stud_id then do;
      i=1;
      call missing(of sem(*), of yr(*), of grades(*));
   end;
   sem(i)=semester;
   yr(i)=year;
   grades(i)=gpa;
   if last.stud_id then output;
   i+1;
run;

proc print;
run;

These sample files and code examples are provided by SAS Institute Inc. "as is" without warranty of any kind, either express or implied, including but not limited to the implied warranties of merchantability and fitness for a particular purpose. Recipients acknowledge and agree that SAS Institute shall not be liable for any damages whatsoever arising out of their use of this material. In addition, SAS Institute will provide no support for the materials contained herein.

/* proc print output of data set IN */

                            Obs    stud_id    semester    year     GPA

                             1       165       Fall       2015    3.25
                             2       165       Spring     2016    3.40
                             3       165       Fall       2016    3.12
                             4       165       Spring     2017    3.50
                             5       165       Fall       2017    3.62
                             6       198       Fall       2018    3.75
                             7       200       Fall       2016    2.90
                             8       200       Spring     2017    3.70
                             9       200       Fall       2017    3.90


/* PROC PRINT output of data set MYFREQ   */
                                     Obs    stud_id    COUNT

                                      1       165        5
                                      2       198        1
                                      3       200        3


/* PROC PRINT output of final data set OUT */

    Obs stud_id sem1  sem2  sem3  sem4  sem5  yr1  yr2  yr3  yr4  yr5 gpa1 gpa2 gpa3 gpa4 gpa5
 
     1    165   Fall Spring Fall Spring Fall 2015 2016 2016 2017 2017 3.25  3.4 3.12  3.5 3.62
     2    198   Fall                         2018    .    .    .    . 3.75   .   .     .   .
     3    200   Fall Spring Fall             2016 2017 2017    .    . 2.90  3.7 3.90   .   .

Type:	Sample
Topic:	SAS Reference ==> DATA Step Data Management ==> Manipulation and Transformation ==> BY-group processing

Date Modified:	2005-12-08 11:34:06
Date Created:	2004-09-30 14:08:55

Product Family	Product	Host	SAS Release
			Starting	Ending
SAS System	Base SAS	All	n/a	n/a

Support

Sample 24579: Collapsing observations within a BY group into a single observation when the data set has three or more variables

Operating System and Release Information