Sample 49583: Fixing multiple length problems when merging many data sets
Many times multiple data sets in a library need to be merged, but because they were created from different text files or different spread sheets, some variables will have different lengths in different data sets. This can cause truncation of data or a failure to match when merging.
This program will inspect each data set in a library, find all of the character variables and find the maximum length of that variable in any data set. A map is created that has each data set, the character variables in it, and what the length needs to be to avoid problems.
The program then writes a DATA step for each data set that adjusts the lengths of the character variables that the data set contains.
Because the program needs to create a data set, there are three pieces of information that the user needs to supply:
- The library to examine and update
- The library to create the temporary data set
- The name of the temporary data set
When the program is complete, the temporary data set is deleted.
These sample files and code examples are provided by SAS Institute
Inc. "as is" without warranty of any kind, either express or implied, including
but not limited to the implied warranties of merchantability and fitness for a
particular purpose. Recipients acknowledge and agree that SAS Institute shall
not be liable for any damages whatsoever arising out of their use of this material.
In addition, SAS Institute will provide no support for the materials contained herein.
The first part of the program creates sample data sets to work on. The SQL code creates a data set that has the name of each data set, the character variables in it, and the required length for that variable. The DATA _NULL_ code writes a DATA Step to update each data set. The generated DATA steps are written directly to macro and execute immediately when the DATA _NULL_ ends. Finally the DATASETS code deletes the data set created by the SQL code.
/* build data sets to work with */
data aa;
a='123456789';
b='12345';
c='1234567890123456';
d='123456789012345678901234567890';
e='12345678901234567890';
f='1';
run;
data ab;
a='123456789';
b='12345';
c='1234567890123456';
d='12345678901234567890';
e='123456789012345678901234567890';
g='1';
run;
data ac;
a='123456789';
b='12345678901234567890';
c='1234567890123456';
d='123456789012345678901234567890';
e='12345';
h='1';
run;
/* Library containing data sets to be worked on */
%let targetlib=work;
/* Library to store temporary data set */
%let workinglib=sasuser;
/* Name of temporary data set. This data set will
be deleted at the end of the program. */
%let workingtbl=xyzzy;
/* This creates the temporary data set that contains
the name of each data set in the target library that
contains character variables, the names of the character
variables in that data set, and the maximum length
of that character variable in any of the data sets
in the target library. */
proc sql;
create table &workinglib..&workingtbl. as
select memname as dsname, name as varnam,
max(length)as varlen from dictionary.columns
where libname=upcase("&targetlib") and type='char'
group by name
order by memname, name;
quit;
/* This step writes DATA Step for each data set named in
the temporary data set and issues a LENGTH statement
for each character variable in that data set setting
the length to the maximum found in any data set.
The code is written directly to the macro processor
and all DATA Steps will run after the DATA _NULL_
step completes. */
data _null_;
set &workinglib..&workingtbl. end=done;
by dsname;
if first.dsname then do;
call execute(cat('data ',"&targetlib.."||trim(dsname),';'));
end;
call execute(cat('length ',trim(varnam),'$',varlen,';'));
if last.dsname then do;
call execute(cat('set ',"&targetlib.."||trim(dsname),'; run;'));
end;
run;
/* This deletes the temporary data set created by the SQL
code above. */
proc datasets lib=&workinglib. nolist;
delete &workingtbl. ;
quit;
These sample files and code examples are provided by SAS Institute
Inc. "as is" without warranty of any kind, either express or implied, including
but not limited to the implied warranties of merchantability and fitness for a
particular purpose. Recipients acknowledge and agree that SAS Institute shall
not be liable for any damages whatsoever arising out of their use of this material.
In addition, SAS Institute will provide no support for the materials contained herein.
This program will examine each data set in a library to find the maximum length of each character variable in any data set in the library, and make each character variable the same length in every data set that contains that variable.
Type: | Sample |
Topic: | SAS Reference ==> DATA Step SAS Reference ==> Statements ==> File-handling ==> MERGE
|
Date Modified: | 2013-04-05 09:22:32 |
Date Created: | 2013-04-03 19:32:29 |
Operating System and Release Information
SAS System | Base SAS | Aster Data nCluster on Linux x64 | | |
DB2 Universal Database on AIX | | |
DB2 Universal Database on Linux x64 | | |
Greenplum on Linux x64 | | |
Netezza TwinFin 32bit blade | | |
Netezza TwinFin 32-bit SMP Hosts | | |
Netezza TwinFin 64-bit S-Blades | | |
Netezza TwinFin 64-bit SMP Hosts | | |
Teradata on Linux | | |
z/OS | | |
Z64 | | |
OpenVMS VAX | | |
Microsoft® Windows® for 64-Bit Itanium-based Systems | | |
Microsoft Windows Server 2003 Datacenter 64-bit Edition | | |
Microsoft Windows Server 2003 Enterprise 64-bit Edition | | |
Microsoft Windows XP 64-bit Edition | | |
Microsoft® Windows® for x64 | | |
OS/2 | | |
Microsoft Windows 8 Pro | | |
Microsoft Windows 95/98 | | |
Microsoft Windows 2000 Advanced Server | | |
Microsoft Windows 2000 Datacenter Server | | |
Microsoft Windows 2000 Server | | |
Microsoft Windows 2000 Professional | | |
Microsoft Windows NT Workstation | | |
Microsoft Windows Server 2003 Datacenter Edition | | |
Microsoft Windows Server 2003 Enterprise Edition | | |
Microsoft Windows Server 2003 Standard Edition | | |
Microsoft Windows Server 2003 for x64 | | |
Microsoft Windows Server 2008 | | |
Microsoft Windows Server 2008 for x64 | | |
Microsoft Windows Server 2012 | | |
Microsoft Windows XP Professional | | |
Windows 7 Enterprise 32 bit | | |
Windows 7 Enterprise x64 | | |
Windows 7 Home Premium 32 bit | | |
Windows 7 Home Premium x64 | | |
Windows 7 Professional 32 bit | | |
Windows 7 Professional x64 | | |
Windows 7 Ultimate 32 bit | | |
Windows 7 Ultimate x64 | | |
Windows Millennium Edition (Me) | | |
Windows Vista | | |
Windows Vista for x64 | | |
64-bit Enabled AIX | | |
64-bit Enabled HP-UX | | |
64-bit Enabled Solaris | | |
ABI+ for Intel Architecture | | |
AIX | | |
HP-UX | | |
HP-UX IPF | | |
IRIX | | |
Linux | | |
Linux for x64 | | |
Linux on Itanium | | |
OpenVMS Alpha | | |
OpenVMS on HP Integrity | | |
Solaris | | |
Solaris for x64 | | |
Tru64 UNIX | | |