48705 - A one-to-many merge with common variables that are not the BY variables will have values from the many data set after the first observation

SUPPORT / SAMPLES & SAS NOTES

Support

Usage Note 48705: A one-to-many merge with common variables that are not the BY variables will have values from the many data set after the first observation

In a one-to-many merge with common variables that are not the BY variables, customers sometimes expect that the value for the common variable from the "one" data set will be retained throughout the BY group if the "one" data set is listed second in the MERGE statement. It is correct that in a one-to-one merge, and for the first matching observation in a one-to-many merge, the value of a common variable in the latter data set(s) overwrites the value from the previous data set. However, on subsequent iterations of the MERGE statement for the same BY group, the "one" data set is not read again. Therefore, the resulting output contains the value of the variable from the "many" data set.

Consider the following two data sets in which we want to use the NAMES data set (the "one" data set since it has one observation per ID) to correct the spelling of the names in the SCORES data set (which is the "many" data set since it has multiple observations per ID.)

SCORES                                NAMES

id name score                         id name     
1 Jo    100                           1  Joanna
1 Jo     99                           2  Jon
1 Jo     90
2 John   89
2 John   92

Performing a merge by ID as illustrated in the following statements generates the results shown below:

merge scores names;
   by id;


                                 Obs    id    name      score

                                  1      1    Joanna     100
                                  2      1    Jo          99
                                  3      1    Jo          90
                                  4      2    Jon         89
                                  5      2    John        92

As you can see, NAME is not updated for all observations with the same ID. When SAS compiles the program, it creates the Program Data Vector (PDV), which lists all of the variables and contains their values throughout the DATA step. During the merge, SAS first writes "Jo" into the variable NAME since it reads the data set SCORES first. Then the value is overwritten by the value "Joanna" from NAMES since that data set comes later in the MERGE statement. At the end of the first iteration of the merge, SAS outputs the current values of the PDV, which includes "Joanna" for NAME.

On the next iteration of the DATA step, SAS reads the second observation from the SCORES data set. It writes "Jo" into the PDV for NAME. The data set NAMES is not read again since there are no more observations for that BY group. Thus, at the end of the second iteration of the DATA step, the value "Jo" is output for NAME.

To replace NAME for all observations within the BY group, the variable from the "many" data set (SCORES) should be dropped or renamed. In that case, SAS reads the value from the "one" data set (NAMES) on the first iteration of the merge. Since variables that are read from data sets are automatically retained throughout the BY group, all observations for that BY group have the value from the "one" data set.

Click the Full Code tab to see complete examples of merging with and without common variables.

For more information about the PDV, DATA step processing, and combining data sets, see SAS Language Reference: Concepts.

Missing values in a one-to-many merge behave the same way as nonmissing values. For more information, see Usage Note 41064: "Missing values in common variables in a one-to-many MERGE can overwrite nonmissing values."

Operating System and Release Information

Product Family	Product	System	SAS Release
Product Family	Product	System	Reported	Fixed*
SAS System	Base SAS	z/OS
		Z64
		OpenVMS VAX
		Microsoft® Windows® for 64-Bit Itanium-based Systems
		Microsoft Windows Server 2003 Datacenter 64-bit Edition
		Microsoft Windows Server 2003 Enterprise 64-bit Edition
		Microsoft Windows XP 64-bit Edition
		Microsoft® Windows® for x64
		OS/2
		Microsoft Windows 8 Pro
		Microsoft Windows 95/98
		Microsoft Windows 2000 Advanced Server
		Microsoft Windows 2000 Datacenter Server
		Microsoft Windows 2000 Server
		Microsoft Windows 2000 Professional
		Microsoft Windows NT Workstation
		Microsoft Windows Server 2003 Datacenter Edition
		Microsoft Windows Server 2003 Enterprise Edition
		Microsoft Windows Server 2003 Standard Edition
		Microsoft Windows Server 2003 for x64
		Microsoft Windows Server 2008
		Microsoft Windows Server 2008 for x64
		Microsoft Windows Server 2012
		Microsoft Windows XP Professional
		Windows 7 Enterprise 32 bit
		Windows 7 Enterprise x64
		Windows 7 Home Premium 32 bit
		Windows 7 Home Premium x64
		Windows 7 Professional 32 bit
		Windows 7 Professional x64
		Windows 7 Ultimate 32 bit
		Windows 7 Ultimate x64
		Windows Millennium Edition (Me)
		Windows Vista
		Windows Vista for x64
		64-bit Enabled AIX
		64-bit Enabled HP-UX
		64-bit Enabled Solaris
		AIX
		HP-UX
		HP-UX IPF
		IRIX
		Linux
		Linux for x64
		Linux on Itanium
		OpenVMS Alpha
		OpenVMS on HP Integrity
		Solaris
		Solaris for x64
		Tru64 UNIX

* For software releases that are not yet generally available, the Fixed Release is the software release in which the problem is planned to be fixed.

/* Create the "many" data set, SCORES, that has multiple observations per ID.        */
/* It contains some incorrect names.                                                 */

data scores;
   input id name :$8. score;
   datalines;
1 Jo 100
1 Jo 90
1 Jo 95
2 John 89
2 John 92
;
run;

/* Create the "one" data set, NAMES, that has one observation per ID.  It contains   */
/* corrected values for names.                                                       */

data names;
   input id name :$8.;
   datalines;
1 Joanna
2 Jon
;
run;

/* First merge with the common variable and note how the values from the second and */
/* subsequent observations of the "many" data set are not overwritten.              */

data results_common_variable;
   merge scores names;
   by id;
run;
title "Results from Common Variable";
proc print; 
run;

/* Rename the variable from the "many" data set so that the value from the "one" data */
/* set is carried down the BY group.                                                  */

data results_renamed_variable;
   merge scores(rename=(name=old_name)) names;
   by id;
run;
title "Results from Renamed Variable";
proc print; 
run;

                                 Results from Common Variable

                                 Obs    id    name      score

                                   1      1    Joanna     100
                                   2      1    Jo          90
                                   3      1    Jo          95
                                   4      2    Jon         89
                                   5      2    John        92


                                Results from Renamed Variable

                            Obs    id    old_name    score    name

                             1      1      Jo         100     Joanna
                             2      1      Jo          90     Joanna
                             3      1      Jo          95     Joanna
                             4      2      John        89     Jon
                             5      2      John        92     Jon

Type:	Usage Note
Priority:
Topic:	Common Programming Tasks ==> Combining Data Data Management ==> Manipulation and Transformation ==> BY-group processing Data Management ==> Manipulation and Transformation ==> Combining and Modifying Data Sets SAS Reference ==> Statements ==> File-handling ==> MERGE ==> with BY

Date Modified:	2020-11-02 13:42:49
Date Created:	2012-12-15 16:42:28

Support

Usage Note 48705: A one-to-many merge with common variables that are not the BY variables will have values from the many data set after the first observation

Operating System and Release Information

Follow Us

What is...