![]() | ![]() | ![]() | ![]() | ![]() | ![]() |
Missing values are treated the same as nonmissing values in a merge.
In a one-to-many merge with common variables that are not the BY variable, customers sometimes expect that only the first nonmissing value of the common variable will be written to the Program Data Vector (PDV) and output and that subsequent missing values will be ignored. However, a missing value is a valid value and it will be handled the same way as a nonmissing value.
Consider the following two data sets:
ONE TWO id age score id age name 1 11 90 1 8 Sarah 1 . 100 2 10 John 1 . 95 2 9 80 2 . 100
Doing a simple merge of these two data sets by ID will result in the following:
Obs id age score name 1 1 8 90 Sarah 2 1 . 100 Sarah 3 1 . 95 Sarah 4 2 10 80 John 5 2 . 100 JohnAfter the first observation from each data set is combined, SAS reads data set ONE for the next observation for the BY group. It overwrites the values in the PDV with those values, including the missing value for age. It does not read from data set TWO again since that BY group is complete. Therefore, the next observation will have the missing value for age. See Combining Data Sets for a detailed explanation of the processing during the MERGE statement.
Obs id age score name 1 1 8 90 Sarah 2 1 8 100 Sarah 3 1 8 95 Sarah 4 2 10 80 John 5 2 10 100 John
Below is the additional processing where we create a variable which has the last nonmissing value for the BY group and use that variable to fill in the cases where the value is missing.
data merge2 (drop=tempage); merge one two; by id; retain tempage; if first.id then tempage = .; if age = . then age = tempage; else tempage = age; run;
Product Family | Product | System | SAS Release | |
Reported | Fixed* | |||
SAS System | Base SAS | z/OS | ||
Z64 | ||||
OpenVMS VAX | ||||
Microsoft® Windows® for 64-Bit Itanium-based Systems | ||||
Microsoft Windows Server 2003 Datacenter 64-bit Edition | ||||
Microsoft Windows Server 2003 Enterprise 64-bit Edition | ||||
Microsoft Windows XP 64-bit Edition | ||||
Microsoft® Windows® for x64 | ||||
OS/2 | ||||
Microsoft Windows 95/98 | ||||
Microsoft Windows 2000 Advanced Server | ||||
Microsoft Windows 2000 Datacenter Server | ||||
Microsoft Windows 2000 Server | ||||
Microsoft Windows 2000 Professional | ||||
Microsoft Windows NT Workstation | ||||
Microsoft Windows Server 2003 Datacenter Edition | ||||
Microsoft Windows Server 2003 Enterprise Edition | ||||
Microsoft Windows Server 2003 Standard Edition | ||||
Microsoft Windows Server 2003 for x64 | ||||
Microsoft Windows Server 2008 | ||||
Microsoft Windows Server 2008 for x64 | ||||
Microsoft Windows XP Professional | ||||
Windows 7 Enterprise 32 bit | ||||
Windows 7 Enterprise x64 | ||||
Windows 7 Home Premium 32 bit | ||||
Windows 7 Home Premium x64 | ||||
Windows 7 Professional 32 bit | ||||
Windows 7 Professional x64 | ||||
Windows 7 Ultimate 32 bit | ||||
Windows 7 Ultimate x64 | ||||
Windows Millennium Edition (Me) | ||||
Windows Vista | ||||
Windows Vista for x64 | ||||
64-bit Enabled AIX | ||||
64-bit Enabled HP-UX | ||||
64-bit Enabled Solaris | ||||
ABI+ for Intel Architecture | ||||
AIX | ||||
HP-UX | ||||
HP-UX IPF | ||||
IRIX | ||||
Linux | ||||
Linux for x64 | ||||
Linux on Itanium | ||||
OpenVMS Alpha | ||||
OpenVMS on HP Integrity | ||||
Solaris | ||||
Solaris for x64 | ||||
Tru64 UNIX |
/*Data set one has missing values for age after the first value for each ID*/
data one;
input id age score;
datalines;
1 11 90
1 . 100
1 . 95
2 9 80
2 . 100
;
data two;
input id name $ age;
datalines;
1 Sarah 8
2 John 10
;
run;
/*A regular merge will result in the missing values be written to the output.*/
data merge1;
merge one two;
by id;
run;
proc print;
run;
/*In this merge, we check for missing values and use the previous age for the ID if the*/
/*age is missing.*/
data merge2 (drop=tempage);
merge one two;
by id;
retain tempage;
if first.id then tempage = .;
if age = . then age = tempage;
else tempage = age;
run;
proc print;
run;
Simple merge:
Obs id age score name 1 1 8 90 Sarah 2 1 . 100 Sarah 3 1 . 95 Sarah 4 2 10 80 John 5 2 . 100 JohnMerge with additional logic:
Obs id age score name 1 1 8 90 Sarah 2 1 8 100 Sarah 3 1 8 95 Sarah 4 2 10 80 John 5 2 10 100 John
Type: | Usage Note |
Priority: | |
Topic: | Common Programming Tasks ==> Combining Data Data Management ==> Manipulation and Transformation ==> BY-group processing Data Management ==> Manipulation and Transformation ==> Combining and Modifying Data Sets |
Date Modified: | 2010-09-30 13:05:58 |
Date Created: | 2010-09-29 09:15:59 |