Sample 41880: Read all files from a directory and create separate SAS® data sets with unique names
The sample code on the Full Code tab shows how to read all files in a directory and create separate SAS data sets with unique names. Use the PIPE engine in the FILENAME statement to access the directory information. Then, use macro code with a %DO loop to execute a DATA step separately to read each of all files in the directory. To view the SAS log showing the execution of the sample program, click the Results tab.
This sample assumes that there are four comma-delimited files in the folder C:\_today\. The names are:
file1.csv, file2.csv, file3.csv, and file4.csv.
The first DATA step is used to get the directory information to create a data set that contains each full pathname of all the files with a .csv extension.
The next DATA step uses _NULL_ in the DATA statement to avoid creating a SAS data set since the purpose of this step is to create macro variables. In the SET statement, the END= option is used to create a variable that indicates when you have reached the end of a data set. A variable called COUNT is created to increment each time an observation is read to keep track of how many files there are. The first CALL SYMPUTX creates macro variables that each contains the full path and name of each file to read. The COUNT variable is used to create separate macro variables for each path and name combination. The second CALL SYMPUTX creates macro variables that each contains the name of the file that is used in the DATA statement to create a unique SAS data set for each file read. The last macro variable is created after all observations of the path and names have been read in order to place the total count of files that will be read into a macro variable called MAX.
The macro is created so that a separate DATA step is run for each file to be read from the directory. This is controlled by the macro %DO loop. The DATA statement uses the macro variable with the %DO loops index variable I to use each of the separate macro variables created that contain the unique data set names. The INFILE statement uses the %DO loops index variable also except the separate macro variables created contain each path and name. Because this sample is reading comma delimited files, the DSD option and the TRUNCOVER option are used in the INFILE statement. If the input files have records longer than 256 bytes, you might need to use the LRECL= option in the INFILE statement to increase the size of the input buffer, depending on your release of SAS.
These sample files and code examples are provided by SAS Institute
Inc. "as is" without warranty of any kind, either express or implied, including
but not limited to the implied warranties of merchantability and fitness for a
particular purpose. Recipients acknowledge and agree that SAS Institute shall
not be liable for any damages whatsoever arising out of their use of this material.
In addition, SAS Institute will provide no support for the materials contained herein.
In order to run the sample code below, first create the folder
C:\_today\. In that folder, create two or more sample CSV files named File1.csv, File2.csv, and so on. You need to create only one record in each file with three comma-separated values.
filename DIRLIST pipe 'dir "C:\_today\file*.csv" /b ';
data dirlist ;
infile dirlist lrecl=200 truncover;
input file_name $100.;
run;
data _null_;
set dirlist end=end;
count+1;
call symputx('read'||put(count,4.-l),cats('c:\_today\',file_name));
call symputx('dset'||put(count,4.-l),scan(file_name,1,'.'));
if end then call symputx('max',count);
run;
options mprint symbolgen;
%macro readin;
%do i=1 %to &max;
data &&dset&i;
infile "&&read&i" lrecl=1000 truncover dsd;
input var1 $ var2 $ var3 $;
run;
%end;
%mend readin;
%readin;
These sample files and code examples are provided by SAS Institute
Inc. "as is" without warranty of any kind, either express or implied, including
but not limited to the implied warranties of merchantability and fitness for a
particular purpose. Recipients acknowledge and agree that SAS Institute shall
not be liable for any damages whatsoever arising out of their use of this material.
In addition, SAS Institute will provide no support for the materials contained herein.
785 filename DIRLIST pipe 'dir "C:\_today\file*.csv" /b ';
786
787 data dirlist ;
788 infile dirlist lrecl=200 truncover;
789 input file_name $100.;
790 run;
NOTE: The infile DIRLIST is:
Unnamed Pipe Access Device,
PROCESS=dir "C:\_today\file*.csv" /b,RECFM=V,
LRECL=200
NOTE: 4 records were read from the infile DIRLIST.
The minimum record length was 9.
The maximum record length was 9.
NOTE: The data set WORK.DIRLIST has 4 observations and 1 variables.
NOTE: DATA statement used (Total process time):
real time 0.05 seconds
cpu time 0.01 seconds
791
792 data _null_;
793 set dirlist end=end;
794 count+1;
795 call symputx('read'||put(count,4.-l),cats('c:\_today\',file_name));
796 call symputx('dset'||put(count,4.-l),scan(file_name,1,'.'));
797 if end then call symputx('max',count);
798 run;
NOTE: There were 4 observations read from the data set WORK.DIRLIST.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.01 seconds
799
800 options mprint symbolgen;
801 %macro readin;
802 %do i=1 %to &max;
803
804 data &&dset&i;
805 infile "&&read&i" lrecl=1000 truncover dsd;
806 input var1 $ var2 $ var3 $;
807 run;
808
809 %end;
810 %mend readin;
811
812 %readin;
SYMBOLGEN: Macro variable MAX resolves to 4
SYMBOLGEN: && resolves to &.
SYMBOLGEN: Macro variable I resolves to 1
SYMBOLGEN: Macro variable DSET1 resolves to file1
MPRINT(READIN): data file1;
SYMBOLGEN: && resolves to &.
SYMBOLGEN: Macro variable I resolves to 1
SYMBOLGEN: Macro variable READ1 resolves to c:\_today\file1.csv
MPRINT(READIN): infile "c:\_today\file1.csv" lrecl=1000 truncover dsd;
MPRINT(READIN): input var1 $ var2 $ var3 $;
MPRINT(READIN): run;
NOTE: The infile "c:\_today\file1.csv" is:
Filename=c:\_today\file1.csv,
RECFM=V,LRECL=1000,File Size (bytes)=2021,
Last Modified=02Dec2015:16:01:30,
Create Time=02Dec2015:16:01:30
NOTE: 101 records were read from the infile "c:\_today\file1.csv".
The minimum record length was 16.
The maximum record length was 23.
NOTE: The data set WORK.FILE1 has 101 observations and 3 variables.
NOTE: DATA statement used (Total process time):
real time 0.02 seconds
cpu time 0.01 seconds
SYMBOLGEN: && resolves to &.
SYMBOLGEN: Macro variable I resolves to 2
SYMBOLGEN: Macro variable DSET2 resolves to file2
MPRINT(READIN): data file2;
SYMBOLGEN: && resolves to &.
SYMBOLGEN: Macro variable I resolves to 2
SYMBOLGEN: Macro variable READ2 resolves to c:\_today\file2.csv
MPRINT(READIN): infile "c:\_today\file2.csv" lrecl=1000 truncover dsd;
MPRINT(READIN): input var1 $ var2 $ var3 $;
MPRINT(READIN): run;
NOTE: The infile "c:\_today\file2.csv" is:
Filename=c:\_today\file2.csv,
RECFM=V,LRECL=1000,File Size (bytes)=2021,
Last Modified=02Dec2015:16:01:30,
Create Time=02Dec2015:16:01:30
NOTE: 101 records were read from the infile "c:\_today\file2.csv".
The minimum record length was 16.
The maximum record length was 23.
NOTE: The data set WORK.FILE2 has 101 observations and 3 variables.
NOTE: DATA statement used (Total process time):
real time 0.02 seconds
cpu time 0.01 seconds
SYMBOLGEN: && resolves to &.
SYMBOLGEN: Macro variable I resolves to 3
SYMBOLGEN: Macro variable DSET3 resolves to file3
MPRINT(READIN): data file3;
SYMBOLGEN: && resolves to &.
SYMBOLGEN: Macro variable I resolves to 3
SYMBOLGEN: Macro variable READ3 resolves to c:\_today\file3.csv
MPRINT(READIN): infile "c:\_today\file3.csv" lrecl=1000 truncover dsd;
MPRINT(READIN): input var1 $ var2 $ var3 $;
MPRINT(READIN): run;
NOTE: The infile "c:\_today\file3.csv" is:
Filename=c:\_today\file3.csv,
RECFM=V,LRECL=1000,File Size (bytes)=2221,
Last Modified=02Dec2015:16:01:30,
Create Time=02Dec2015:16:01:30
NOTE: 101 records were read from the infile "c:\_today\file3.csv".
The minimum record length was 19.
The maximum record length was 23.
NOTE: The data set WORK.FILE3 has 101 observations and 3 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
SYMBOLGEN: && resolves to &.
SYMBOLGEN: Macro variable I resolves to 4
SYMBOLGEN: Macro variable DSET4 resolves to file4
MPRINT(READIN): data file4;
SYMBOLGEN: && resolves to &.
SYMBOLGEN: Macro variable I resolves to 4
SYMBOLGEN: Macro variable READ4 resolves to c:\_today\file4.csv
MPRINT(READIN): infile "c:\_today\file4.csv" lrecl=1000 truncover dsd;
MPRINT(READIN): input var1 $ var2 $ var3 $;
MPRINT(READIN): run;
NOTE: The infile "c:\_today\file4.csv" is:
Filename=c:\_today\file4.csv,
RECFM=V,LRECL=1000,File Size (bytes)=1925,
Last Modified=02Dec2015:16:01:30,
Create Time=02Dec2015:16:01:30
NOTE: 101 records were read from the infile "c:\_today\file4.csv".
The minimum record length was 17.
The maximum record length was 23.
NOTE: The data set WORK.FILE4 has 101 observations and 3 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.00 seconds
Date Modified: | 2019-06-11 14:56:29 |
Date Created: | 2010-12-16 13:14:05 |
Operating System and Release Information
SAS System | Base SAS | Windows 7 Ultimate 32 bit | | |
Windows 7 Professional x64 | | |
Windows 7 Professional 32 bit | | |
Windows 7 Home Premium x64 | | |
Windows 7 Home Premium 32 bit | | |
Windows 7 Enterprise x64 | | |
Windows 7 Enterprise 32 bit | | |
Microsoft Windows XP Professional | | |
Microsoft Windows Server 2008 for x64 | | |
Microsoft Windows Server 2008 | | |
Microsoft Windows Server 2003 for x64 | | |
Microsoft Windows Server 2003 Standard Edition | | |
Microsoft Windows Server 2003 Enterprise Edition | | |
Microsoft Windows Server 2003 Datacenter Edition | | |
Microsoft Windows NT Workstation | | |
Microsoft Windows 2000 Professional | | |
Microsoft Windows 2000 Server | | |
Microsoft Windows 2000 Datacenter Server | | |
Microsoft Windows 2000 Advanced Server | | |
Microsoft Windows 95/98 | | |
OS/2 | | |
Microsoft® Windows® for x64 | | |
Microsoft Windows XP 64-bit Edition | | |
Microsoft Windows Server 2003 Enterprise 64-bit Edition | | |
Microsoft Windows Server 2003 Datacenter 64-bit Edition | | |
Microsoft® Windows® for 64-Bit Itanium-based Systems | | |
Windows 7 Ultimate x64 | | |
Windows Millennium Edition (Me) | | |
Windows Vista | | |
Windows Vista for x64 | | |
64-bit Enabled AIX | | |
64-bit Enabled HP-UX | | |
64-bit Enabled Solaris | | |
ABI+ for Intel Architecture | | |
AIX | | |
HP-UX | | |
HP-UX IPF | | |
IRIX | | |
Linux | | |
Linux for x64 | | |
Linux on Itanium | | |
OpenVMS Alpha | | |
OpenVMS on HP Integrity | | |
Solaris | | |
Solaris for x64 | | |
Tru64 UNIX | | |