Accessing UNIX System Services Files

Overview of UNIX System Services

IBM's UNIX System Services (USS) implements a directory-based file system that is very similar to the file systems that are used in UNIX. SAS software under z/OS enables you to read and write UNIX System Services files and to pipe data between SAS and UNIX System Services commands. For information about USS terminology, see HFS, UFS, and zFS Terminology.

Allocating UNIX System Services Files

You can allocate a UNIX System Services file either externally (using a JCL DD statement or the TSO ALLOCATE command) or internally (using the SAS FILENAME statement or FILENAME function). For information about allocating USS files externally, see your IBM documentation.
There are four ways to specify that a file is in USS when you use the FILENAME statement or FILENAME function:
  • Include a slash or tilde in the pathname:
    filename input1 '/u/sasusr/data/testset.dat';
    filename input2 '~/data/testset2.dat';
  • Specify HFS (for hierarchical file system) as the file type:
    filename input hfs 'testset.dat';
  • Specify HFS as the file prefix:
    filename input 'HFS:testset.dat';
  • Rely on the setting of the FILESYSTEM= system option:
    options filesystem=HFS;
    filename 'testset.dat';
You can also use these specifications in combination. For example, you can specify the USS file type and use a slash in the pathname.
If you do not specify the entire pathname of a USS file, then the directory component of the pathname is the working directory that was current when the file was allocated, not when the fileref is used. For example, if your working directory was
/usr/local/sasusr
when you allocated the file, then the following FILENAME statement associates the INPUT fileref with the following path:
/usr/local/sasusr/testset.dat
filename input hfs 'testset.dat';
If you change your current working directory to
/usr/local/sasusr/testdata
then the FILENAME statement still refers to
/usr/local/sasusr/testset.dat
not to
/usr/local/sasusr/testdata/testset.dat:
infile input;

Allocating a UNIX System Services Directory

To allocate a USS directory, create the directory if necessary, and then allocate the directory using any standard method, such as a JCL DD statement, a TSO ALLOCATE command, or a FILENAME statement (as shown in Allocating UNIX System Services Files).
To open a particular file in a directory for input or output, you must specify the filename in the SAS INFILE or FILE statement, as described in Accessing a Particular File in a UNIX System Services Directory.

Specifying File-Access Permissions and Attributes

Overview of Specifying File-Access Permissions and Attributes

How you specify file-access permissions and attributes depends on whether you use SAS statements or operating system facilities to allocate a UNIX System Services file.

Using SAS

If you use the FILENAME statement or FILENAME function to allocate a USS file, or if you use a JCL DD statement or a TSO ALLOCATE command but do not specify values for PATHMODE and PATHOPTS, then SAS uses the following values for those options:
  • For PATHMODE, SAS uses the file-access mode -rw-rw-rw-. However, this mode can be modified by the current file-mode creation mask. (For detailed information about the file-mode creation mask, see your IBM documentation.)
  • For PATHOPTS, the file-access mode that SAS supplies depends on how the fileref or ddname is being used:
    • If the fileref or ddname appears only in a FILE statement, SAS opens the file for writing only. If the file does not exist, SAS creates it.
    • If the fileref appears only in an INFILE statement, SAS opens the file for reading only.
    • If the fileref appears in both FILE and INFILE statements within the same DATA step, SAS opens the file for reading and writing. For the FILE statement, SAS also creates the file if it does not already exist.

Using Operating System Facilities

When you use a JCL DD statement or a TSO ALLOCATE command to allocate a USS file, you can use the PATHMODE and PATHOPTS options to specify file-access permissions and attributes for the file. If you later use the file's ddname in a SAS session, SAS uses the values of those options when it opens the file.
For example, if you use the following TSO ALLOCATE command to allocate the ddname INDATA and SAS attempts to open it for output, then SAS issues an “insufficient authorization” error message and does not permit the file to be opened for output. (The ORDONLY value of PATHOPTS specifies "open for reading only.")
alloc file(indata)
   path('/u/sasusr/data/testset.dat')
   pathopts(ordonly)
In other words, you could use the ddname INDATA in a SAS INFILE statement, but not in a FILE statement. Similarly, if you specify OWRONLY, then you can use the ddname in a FILE statement but not in an INFILE statement.
CAUTION:
PATHOPTS values OAPPEND and OTRUNC take precedence over FILE statement options OLD and MOD.
If you specify OAPPEND ("add new data to the end of the file"), the FILE statement option OLD does not override this behavior. Similarly, if you specify OTRUNC ("if the file exists, erase it and re-create it"), the FILE statement options OLD and MOD do not override this behavior. For details about these FILE statement options, see Standard Host Options for the FILE Statement under z/OS.

Using UNIX System Services Filenames in SAS Statements and Commands

Overview of Using UNIX System Services Filenames in SAS Statements and Commands

To use an actual USS filename (rather than a fileref or ddname) in a SAS statement or command, include a slash or tilde in the pathname, or use the HFS prefix with the filename. You can use a USS filename anywhere that an external filename can be used, such as in a FILE or INFILE statement, in an INCLUDE or FILE command in the windowing environment, or in the SAS Explorer window. If the file is in the current directory, specify the directory component as ./. For example:
include './testprg.sas'

Concatenating UNIX System Services Files

You can concatenate USS files or directories with the following methods:
  • associating a fileref with multiple explicit pathnames enclosed in parentheses
  • specifying a combination of explicit pathnames and pathname patterns enclosed in parentheses
  • using a single pathname pattern
A pathname pattern is formed by including one or more UNIX wildcards in a partial pathname.
The parenthesis method is specified in the FILENAME statement. You can use the wildcard method in the FILENAME, INFILE, and %INCLUDE statements and in the INCLUDE command. The wildcard method is for input only; you cannot use wildcards in the FILE statement. The parenthesis method supports input and output. However, for output, data is written to the first file in the concatenation. That first file cannot be the result of resolving a wildcard. By requiring the user to explicitly specify the entire pathname of the first file, the possibility of accidentally writing to the wrong file is greatly reduced.
The set of supported wildcard characters are the asterisk (*), the question mark(?), the square brackets ([]), and the backslash (\).
The asterisk wildcard provides an automatic match to zero or more contiguous characters in the corresponding position of the pathname except for a period (.) at the beginning of the filename of a hidden file.
Here are some examples that use the asterisk as a wildcard:
  • In the following FILENAME statement, the stand-alone asterisk concatenates all of the files (in the specified directory) except for the hidden UNIX files.
    filename test '/u/userid/data/*';
  • In the following INCLUDE statement, the leading asterisk includes all of the files (in the specified directory) that end with test.dat.
     include '/u/userid/data/*test.dat';
  • In the following INCLUDE statement, the trailing asterisk includes all of the files (in the specified directory) that begin with test.
     include '/u/userid/data/test*'; 
  • In the following INCLUDE statement, the period with a trailing asterisk selects all of the hidden UNIX files in the specified directory.
    include '/u/userid/data/.*';
  • In the following INFILE statement, the embedded asterisk inputs all of the files (in the specified directory) that begin with test and end with file.
    infile '/u/userid/data/test*file'; 
The question mark wildcard provides an automatic match for any character found in the same relative position in the pathname. Use one or more question marks instead of an asterisk to control the length of the matching strings.
Here are some examples that use the question mark as a wildcard:
  • In the following FILENAME statement, the stand-alone question mark concatenates all of the files (in the specified directory) that have a one-character filename.
    filename test '/u/userid/data/?';
  • In the following INCLUDE statement, the leading question mark includes all of the files (in the specified directory) that have filenames that are nine characters long and end with test.dat.
     include '/u/userid/data/?test.dat';
  • In the following INCLUDE statement, the trailing question mark includes all of the files (in the specified directory) that have filenames that are five characters long and begin with test.
     include '/u/userid/data/test?';
  • In the following INFILE statement, the embedded question mark inputs all of the files (in the specified directory) with filenames that are ten characters long, begin with test, and end with file.
     infile '/u/userid/data/test??file';
Square brackets provide a match to all characters that are found in the list enclosed by the brackets that appear in the corresponding relative position in the pathname. The list can be specified as a string of characters or as a range. A range is defined by a starting character and an ending character separated by a hyphen (-).
The interpretation of what is included between the starting and ending characters is controlled by the value of the LC_COLLATE variable of the locale that is being used by UNIX System Services. Attempting to include both uppercase and lowercase characters, or both alphabetic characters and digits in a range, increases the risk of unexpected results. The risk can be minimized by creating a list with multiple ranges and limiting each range to one of the following sets:
  • a set of lowercase characters
  • a set of uppercase characters
  • a set of digits
Here are some examples of using square brackets as wildcard characters:
  • In the following FILENAME statement, the bracketed list sets up a fileref that concatenates any files (in the specified directory) that are named a, b, or c and that exist.
    filename test '/u/userid/data/[abc]';
  • In the following INCLUDE statement, the leading bracketed list includes all of the files (in the specified directory) that have filenames that are nine characters long, start with m, n, o, p, or z, and end with test.dat.
    include '/u/userid/data/[m-pz]test.dat';
  • In the following INCLUDE statement, the trailing bracketed list includes all files (in the specified directory) with filenames that are five characters long, begin with test, and end with a decimal digit.
    include '/u/userid/data/test[0-9]';
  • In the following INCLUDE statement, the embedded bracketed list inputs all files (in the specified directory) with filenames that are ten characters long, begin with test, followed by an upper or lower case a, b, or c, and end with file.
    infile '/u/userid/data/test[a-cA-C]file'; 
The backslash is used as an escape character. It indicates that the character that it precedes should not be used as a wildcard.
All of the pathnames in a concatenation must be for USS files or directories. If your program reads data from different types of files in the same DATA step, then you can use the EOF= option in each INFILE statement to direct program control to a new INFILE statement after each file has been read. For more information about the EOF= option of the INFILE statement, see SAS Statements: Reference. A wildcard character that generates a list of mixed file types results in an error.
Wildcards that you use when you pipe data from SAS to USS commands are not expanded within the SAS session, but they are passed directly to the USS commands for interpretation.

Accessing a Particular File in a UNIX System Services Directory

If you have associated a fileref with a USS directory or with a concatenation of USS directories, then you can open a particular file in the directory for reading or writing by using an INFILE or FILE statement in the following form:
infile fileref(file);
file fileref(file);
This form is referred to as aggregate syntax. If you do not enclose file in quotation marks, and the filename does not already contain an extension, then SAS appends a file extension to the filename. In the windowing environment commands INCLUDE and FILE, and with the %INCLUDE statement, the file extension is .sas. In the INFILE and FILE statements, the file extension is .dat.
If a filename is in quotation marks, or if it has a file extension, SAS uses the filename as it is specified. If the filename is not in quotation marks, and if it does not have a file extension, SAS converts the filename to lowercase before it accesses the file.
If the file is opened for input, then SAS searches all of the directories that are associated with the fileref in the order in which they appear in the FILENAME statement or FILENAME function. If the file is opened for output, SAS creates the file in the first directory that was specified. If the file is opened for updating but does not exist, SAS creates the file in the first directory.

Piping Data between SAS and UNIX System Services Commands

Overview of Piping Data between SAS and UNIX System Services Commands

To pipe data between SAS and USS commands, you first specify the PIPE file type and the command in a FILENAME statement or FILENAME function. Enclose the command in single quotation marks. For example, this FILENAME statement assigns the command ls -lr to the fileref OECMD:
filename oecmd pipe 'ls -lr';
To send the output from the command as input to SAS software, you then specify the fileref in an INFILE statement. To use output from SAS as input to the command, you specify the fileref in a FILE statement.
You can use shell command delimiters such as semicolons to associate more than one command with a single fileref. The syntax within the quoted string is identical to the one that you use to enter multiple commands on a single line when you use an interactive UNIX shell. The commands are executed in the order in which they appear in the FILENAME statement or FILENAME function during a single invocation of a non-login UNIX shell. Commands have the ability to modify the environment for subsequent commands only within this quoted string. This tool enables you to manipulate and customize the environment for each command group without affecting the settings that you have established in your SAS session. The following example demonstrates this action:
filename oecmd pipe 'umask; umask 022; umask; umask';
data _null_;
infile oecmd;
input;
put _infile_;
run;
You should avoid using the concatenation form of the FILENAME command or FILENAME function when you pipe data between SAS and USS. Members of concatenations are handled by separate invocations of a non-login UNIX shell. Any changes made to the environment by earlier members of the concatenation do not persist. Running the following example demonstrates the drawback of this technique:
filename oecmd pipe ('umask' 'umask 022; umask' 'umask');
data _null_;
infile oecmd;
input;
put _infile_;
run;
Note the difference in the FILENAME statements in the preceding two examples. The first example places all of the UMASK shell variables in one set of single quotation marks, and does not use parentheses. The second example includes the variables in parentheses, and places each of the variables in single quotation marks.
The UMASK shell variable was selected for these examples to help emphasize the point that the command or command group is now running in a non-login shell. The use of a non-login shell suppresses the running of the ‘/etc/profile’ (site-wide profile) and the ‘$HOME/.profile’ (personal profile). Suppressing the running of these profiles eliminates the possibility of having the pipes contaminated by output data that these files might generate. The elimination of profiles also ensures a consistent starting environment for the command or group of commands. Setting UMASK to a site-wide default is often done with the /etc/profile, which is one of the profiles that runs when a login shell is invoked.

Piping Data from a UNIX System Services Command to SAS

When a pipe is opened for input by the INFILE statement, any output that the command writes to standard output or to standard error is available for input. For example, here is a DATA step that reads the output of the ls -l command and saves it in a SAS data set:
filename oecmd pipe 'ls -l';
data dirlist;
   infile oecmd truncover;
   input mode $ 1-10 nlinks 12-14 user $ 16-23
         group $25-32 size 34-40 lastmod $ 42-53
         name $ 54-253;
run;

Piping Data from SAS to a UNIX System Services Command

When a pipe is opened for output by the FILE statement, any lines that are written to the pipe by the PUT statement are sent to the command's standard input. For example, here is a DATA step that uses the USS od command to write the contents of the file in hexadecimal format to the USS file dat/dump.dat, as follows:
filename oecmd pipe 'od -x -tc - >dat/dump.dat';
data _null_;
   file oecmd;
   input line $ 1-60;
   put line;
datalines;
SAS software is an integrated system of software
products, enabling you to perform data management,
data analysis, and data presentation tasks.
;
run;