FCMP Procedure

Directory Transversal

Overview of Directory Transversal

Implementing functionality that enables functions to traverse a directory hierarchy is difficult if you use the DATA step or macros. With the DATA step and macro code recursion or pseudo-recursion is not easy to code. This section describes how to develop a routine named DIR_ENTRIES that fills an array with the full pathname of all of the files in a directory hierarchy. This example shows the similarity between PROC FCMP and DATA step syntax and underscores how PROC FCMP routines simplify a program and produce independent, reusable code. DIR_ENTRIES uses as input the following parameters:
  • a starting directory
  • a result array to fill with pathnames
  • an output parameter that is the number of pathnames placed in the result array
  • an output parameter that indicates whether the complete result set was truncated because the result array was not large enough
The flow of control for DIR_ENTRIES is as follows:
  1. Open the starting directory.
  2. For each entry in the directory, do one of the following tasks:
    • If the entry is a directory, call DIR_ENTRIES to fill the result array with the subdirectory's pathnames.
    • Otherwise, the entry is a file, and you must add the file's path to the result array.
  3. Close the starting directory.

Directory Transversal Example

Opening and Closing a Directory

Opening and closing a directory are handled by the CALL routines DIROPEN and DIRCLOSE. DIROPEN accepts a directory path and has the following flow of control:
  1. Create a fileref for the path by using the FILENAME function.
  2. If the FILENAME function fails, write an error message to the log and then return.
  3. Otherwise, use the DOPEN function to open the directory and retrieve a directory ID.
  4. Clear the directory fileref.
  5. Return the directory ID.
The DIRCLOSE CALL routine is passed a directory ID, which is passed to DCLOSE. DIRCLOSE sets the passed directory ID to missing so that an error occurs if a program tries to use the directory ID after the directory has been closed. The following code implements the DIROPEN and DIRCLOSE CALL routines:
proc fcmp outlib=sasuser.funcs.dir;
   function diropen(dir $);
   length dir $ 256 fref $ 8;
   rc = filename(fref, dir);
   if rc = 0 then do;
      did = dopen(fref);
      rc = filename(fref);
   end;
   else do;
      msg = sysmsg();
      put msg '(DIROPEN(' dir= ')';
      did = .;
   end;
   return(did);
endsub;

subroutine dirclose(did);
   outargs did;
   rc = dclose(did);
   did = .;
endsub;

Gathering Filenames

File paths are collected by the DIR_ENTRIES CALL routine. DIR_ENTRIES uses the following arguments:
  • a starting directory
  • a result array to fill
  • an output parameter to fill with the number of entries in the result array
  • an output parameter to set to 0 if all pathnames fit in the result array; or an output parameter to set to 1 if some of the pathnames do not fit into the array
The body of DIR_ENTRIES is almost identical to the code that is used to implement this functionality in a DATA step; yet DIR_ENTRIES is a CALL routine that is easily reused in several programs.
DIR_ENTRIES calls DIROPEN to open a directory and retrieve a directory ID. The routine then calls DNUM to retrieve the number of entries in the directory. For each entry in the directory, DREAD is called to retrieve the name of the entry. Now that the entry name is available, the routine calls MOPEN to determine whether the entry is a file or a directory.
If the entry is a file, then MOPEN returns a positive value. In this case, the full path to the file is added to the result array. If the result array is full, the truncation output argument is set to 1.
If the entry is a directory, then MOPEN returns a value that is less than or equal to 0. In this case, DIR_ENTRIES gathers the pathnames for the entries in this subdirectory. It gathers the pathnames by recursively calling DIR_ENTRIES and passing the subdirectory's path as the starting path. When DIR_ENTRIES returns, the result array contains the paths of the subdirectory's entries.
subroutine dir_entries(dir $, files[*] $, n, trunc);
   outargs files, n, trunc;
   length dir entry $ 256;

   if trunc then return;

   did = diropen(dir);
   if did <= 0 then return;

   dnum = dnum(did);
   do i = 1 to dnum;
      entry = dread(did, i);
      /* If this entry is a file, then add to array */
      /* Else entry is a directory, recurse. */
      fid = mopen(did, entry);
      entry = trim(dir) || '\' || entry;
      if fid > 0 then do;
         rc = fclose(fid);
         if n < dim(files) then do;
            trunc = 0;
            n = n + 1;
            files[n] = entry;
         end;
         else do;
            trunc = 1;
            call dirclose(did);
            return;
         end;
      end;
      else
         call dir_entries(entry, files, n, trunc);
   end;

   call dirclose(did);
   return;
endsub;

Calling DIR_ENTRIES from a DATA Step

You invoke DIR_ENTRIES like any other DATA step CALL routine. Declare an array with enough entries to hold all the files that might be found. Then call the DIR_ENTRIES routine. When the routine returns, the result array is looped over and each entry in the array is written to the SAS log.
options cmplib=sasuser.funcs;
data _null_;
   array files[1000] $ 256 _temporary_;
   dnum = 0;
   trunc = 0;
   call dir_entries("c:\logs", files, dnum, trunc);
   if trunc then put 'ERROR: Not enough result array entries. Increase array
size.';
   do i = 1 to dnum;
      put files[i];
   end;
run;
Output from Calling DIR_ENTRIES from a DATA Step

c:\logs\2004\qtr1.log
c:\logs\2004\qtr2.log
c:\logs\2004\qtr3.log
c:\logs\2004\qtr4.log
c:\logs\2005\qtr1.log
c:\logs\2005\qtr2.log
c:\logs\2005\qtr3.log
c:\logs\2005\qtr4.log
c:\logs\2006\qtr1.log
c:\logs\2006\qtr2.log
This example shows the similarity between PROC FCMP syntax and the DATA step. For example, numeric expressions and flow of control statements are identical. The abstraction of DIROPEN into a PROC FCMP function simplifies DIR_ENTRIES. All of the PROC FCMP routines that are created can be reused by other DATA steps without any need to modify the routines to work in a new context.