SUPPORT / SAMPLES & SAS NOTES
 

Support

Usage Note 48705: A one-to-many merge with common variables that are not the BY variables will have values from the many data set after the first observation

DetailsCodeOutputAboutRate It

In a one-to-many merge with common variables that are not the BY variables, customers sometimes expect that the value for the common variable from the "one" data set will be retained throughout the BY group if the "one" data set is listed second in the MERGE statement. It is correct that in a one-to-one merge, and for the first matching observation in a one-to-many merge, the value of a common variable in the latter data set(s) overwrites the value from the previous data set. However, on subsequent iterations of the MERGE statement for the same BY group, the "one" data set is not read again. Therefore, the resulting output contains the value of the variable from the "many" data set.

Consider the following two data sets in which we want to use the NAMES data set (the "one" data set since it has one observation per ID) to correct the spelling of the names in the SCORES data set (which is the "many" data set since it has multiple observations per ID.)

SCORES                                NAMES

id name score                         id name     
1 Jo    100                           1  Joanna
1 Jo     99                           2  Jon
1 Jo     90
2 John   89
2 John   92

Performing a merge by ID as illustrated in the following statements generates the results shown below:

merge scores names;
   by id;


                                 Obs    id    name      score

                                  1      1    Joanna     100
                                  2      1    Jo          99
                                  3      1    Jo          90
                                  4      2    Jon         89
                                  5      2    John        92

As you can see, NAME is not updated for all observations with the same ID. When SAS compiles the program, it creates the Program Data Vector (PDV), which lists all of the variables and contains their values throughout the DATA step. During the merge, SAS first writes "Jo" into the variable NAME since it reads the data set SCORES first. Then the value is overwritten by the value "Joanna" from NAMES since that data set comes later in the MERGE statement. At the end of the first iteration of the merge, SAS outputs the current values of the PDV, which includes "Joanna" for NAME.

On the next iteration of the DATA step, SAS reads the second observation from the SCORES data set. It writes "Jo" into the PDV for NAME. The data set NAMES is not read again since there are no more observations for that BY group. Thus, at the end of the second iteration of the DATA step, the value "Jo" is output for NAME.

To replace NAME for all observations within the BY group, the variable from the "many" data set (SCORES) should be dropped or renamed. In that case, SAS reads the value from the "one" data set (NAMES) on the first iteration of the merge. Since variables that are read from data sets are automatically retained throughout the BY group, all observations for that BY group have the value from the "one" data set.

Click the Full Code tab to see complete examples of merging with and without common variables.

For more information about the PDV, DATA step processing, and combining data sets, see SAS Language Reference: Concepts.

Missing values in a one-to-many merge behave the same way as nonmissing values. For more information, see Usage Note 41064: "Missing values in common variables in a one-to-many MERGE can overwrite nonmissing values."



Operating System and Release Information

Product FamilyProductSystemSAS Release
ReportedFixed*
SAS SystemBase SASz/OS
Z64
OpenVMS VAX
Microsoft® Windows® for 64-Bit Itanium-based Systems
Microsoft Windows Server 2003 Datacenter 64-bit Edition
Microsoft Windows Server 2003 Enterprise 64-bit Edition
Microsoft Windows XP 64-bit Edition
Microsoft® Windows® for x64
OS/2
Microsoft Windows 8 Pro
Microsoft Windows 95/98
Microsoft Windows 2000 Advanced Server
Microsoft Windows 2000 Datacenter Server
Microsoft Windows 2000 Server
Microsoft Windows 2000 Professional
Microsoft Windows NT Workstation
Microsoft Windows Server 2003 Datacenter Edition
Microsoft Windows Server 2003 Enterprise Edition
Microsoft Windows Server 2003 Standard Edition
Microsoft Windows Server 2003 for x64
Microsoft Windows Server 2008
Microsoft Windows Server 2008 for x64
Microsoft Windows Server 2012
Microsoft Windows XP Professional
Windows 7 Enterprise 32 bit
Windows 7 Enterprise x64
Windows 7 Home Premium 32 bit
Windows 7 Home Premium x64
Windows 7 Professional 32 bit
Windows 7 Professional x64
Windows 7 Ultimate 32 bit
Windows 7 Ultimate x64
Windows Millennium Edition (Me)
Windows Vista
Windows Vista for x64
64-bit Enabled AIX
64-bit Enabled HP-UX
64-bit Enabled Solaris
AIX
HP-UX
HP-UX IPF
IRIX
Linux
Linux for x64
Linux on Itanium
OpenVMS Alpha
OpenVMS on HP Integrity
Solaris
Solaris for x64
Tru64 UNIX
* For software releases that are not yet generally available, the Fixed Release is the software release in which the problem is planned to be fixed.