This SAS tip has two purposes: First, it solves a fairly common problem that is encountered when you want to combine two data sets and, second, it uses an interesting (and powerful) technique where you have a SAS program that writes a SAS program that then gets submitted.
If you want to append values in one data set to the end of another data set, you can either use PROC APPEND, or use a SET statement that lists the names of the data sets you want to combine.
However, what if the two data sets have the same variable names, but some of the character variables have different lengths in the two data sets? Furthermore, the character variables stored with longer lengths are not consistent in one or the other data set.
You cannot use PROC APPEND because that procedure will use all the attributes from the base data set, which will truncate any character values that are longer in the second data set.
You can use a SET statement, but you will have to manually enter the length of each character variable (in a LENGTH statement) from either data set one or data set two, whichever is longer.
Because this is a fairly common problem, there is a macro that combines two data sets and automatically uses the maximum length for each character variable.
Here is the macro (followed by an explanation):
proc contents data=&dsn1 noprint
out=out1(keep=name type length where=(type=2));
proc contents data=&dsn2 noprint
out=out2(keep=name type length where=(type=2));
run;
proc sort data=out1;
by name;
run;
proc sort data=out2;
by name;
run;
data _null_;
file "combined.sas";
merge out1 out2(rename=(length=length2)) end=last;
by name;
if _n_ = 1 then put "Data &out;";
l = max(length,length2);
put " length " name " $ " l 2. ";";
if last then do;
put " set &dsn1 &dsn2;";
put "run;";
end;
run;
%include "combined.sas";
%mend union;
%union(dsn1=one,dsn2=two,out=union)
Although there are many methods for determining information on the variables in your data set (variable information functions, library tables, etc.), this program uses PROC CONTENTS to output a data set that contains information on the character variables in each of the two data sets. The WHERE= data set option selects the character variables from each data set.
To see more clearly how this program works, we created two small test data sets (called ONE and TWO). Here are the listings of the two data sets produced by PROC CONTENTS:
Notice that the length of C1 is longer in the first data set and the length of C2 is longer in the second data set.
The DATA _NULL_ step starts by writing out a DATA statement to an external file. This is followed by a LENGTH statement that uses the maximum length for each variable. Finally, a SET statement lists the two data sets, followed by a RUN statement. A %INCLUDE statement then runs the SAS code that was produced.
Below is a listing of the COMBINED.SAS program created in this data step (using the two test data sets ONE and TWO):
To see that this worked as advertised, here is part of the output from PROC CONTENTS on the resulting data set (UNION):
This sample was authored by Ron Cody. His books are available from the SAS Publishing online bookstore:
These sample files and code examples are provided by SAS Institute Inc. "as is" without warranty of any kind, either express or implied, including but not limited to the implied warranties of merchantability and fitness for a particular purpose. Recipients acknowledge and agree that SAS Institute shall not be liable for any damages whatsoever arising out of their use of this material. In addition, SAS Institute will provide no support for the materials contained herein.
These sample files and code examples are provided by SAS Institute Inc. "as is" without warranty of any kind, either express or implied, including but not limited to the implied warranties of merchantability and fitness for a particular purpose. Recipients acknowledge and agree that SAS Institute shall not be liable for any damages whatsoever arising out of their use of this material. In addition, SAS Institute will provide no support for the materials contained herein.
Type: | Sample |
Date Modified: | 2020-11-19 13:03:44 |
Date Created: | 2008-09-24 11:12:45 |
Product Family | Product | Host | SAS Release | |
Starting | Ending | |||
SAS System | Base SAS | z/OS | ||
OpenVMS VAX | ||||
Microsoft® Windows® for 64-Bit Itanium-based Systems | ||||
Microsoft Windows Server 2003 Datacenter 64-bit Edition | ||||
Microsoft Windows Server 2003 Enterprise 64-bit Edition | ||||
Microsoft Windows XP 64-bit Edition | ||||
Microsoft® Windows® for x64 | ||||
OS/2 | ||||
Microsoft Windows 95/98 | ||||
Microsoft Windows 2000 Advanced Server | ||||
Microsoft Windows 2000 Datacenter Server | ||||
Microsoft Windows 2000 Server | ||||
Microsoft Windows 2000 Professional | ||||
Microsoft Windows NT Workstation | ||||
Microsoft Windows Server 2003 Datacenter Edition | ||||
Microsoft Windows Server 2003 Enterprise Edition | ||||
Microsoft Windows Server 2003 Standard Edition | ||||
Microsoft Windows XP Professional | ||||
Windows Millennium Edition (Me) | ||||
Windows Vista | ||||
64-bit Enabled AIX | ||||
64-bit Enabled HP-UX | ||||
64-bit Enabled Solaris | ||||
ABI+ for Intel Architecture | ||||
AIX | ||||
HP-UX | ||||
HP-UX IPF | ||||
IRIX | ||||
Linux | ||||
Linux for x64 | ||||
Linux on Itanium | ||||
OpenVMS Alpha | ||||
OpenVMS on HP Integrity | ||||
Solaris | ||||
Solaris for x64 | ||||
Tru64 UNIX |