I/O Functions

Technical Background

This section provides a fairly in-depth summary of the fundamentals of C I/O. It begins with a discussion of traditional C I/O concepts, then discusses UNIX low-level, ISO/ANSI, and IBM 370 I/O concepts. These concepts are combined in SAS/C I/O Concepts and 370 Perspectives on SAS/C Library I/O. The final section provides guidelines for choosing an I/O method, based on the needs of your application.

Traditional C (UNIX) I/O Concepts

When C was initially designed, no library, and therefore no I/O, was included. It was assumed that libraries suitable for use with particular systems would be developed. Because most early use of the C language was associated with UNIX operating systems, the UNIX I/O functions were considered the standard I/O method for C. As the C language has evolved, the I/O definition has changed to some extent, but understanding the underlying UNIX concepts is still important.

In addition, many useful C programs were first developed under UNIX operating systems, and such programs frequently are unaware of the existence of other systems or I/O techniques. Such programs cannot run on systems as different from UNIX as CMS or OS/390 without carefully considering their original environment.

Some systems, such as the 370 operating systems, do not record physical characters to indicate line breaks. Consider a file on such a system composed of two lines of data, the first containing the single character 1 and the second containing the single character 2 . A program accessing this file as a text stream receives the characters { mono 1\n2\n}. The program must process four characters, although only two are physically present in the file. A request to position to the second character is ambiguous. The library cannot determine whether the next character read should be \n or 2 .

Even if you resolve the ambiguity of file positioning in favor of portability (by counting the characters seen by the program rather than physical characters), implementation difficulties may preclude seeking to characters by number using a text stream. Under PC DOS, the only way to seek accurately to the 10,000th character of a file is to read 10,000 characters because the number of carriage return and line feed pairs in the file is not known in advance. If the file is opened for both reading and writing, replacing a printable character with a new-line character requires replacing one physical character with two. This replacement requires rewriting the entire file after the point of change. Such difficulties make it impractical on many systems to seek for text streams based on a character number.

Situations such as those discussed in this section show that on most systems where text and binary access are not identical, positioning in a text stream by character number cannot be implemented easily. Therefore, the ISO/ANSI standard permits a library to implement random access to a text stream using some indicator of file position other than character number. For instance, a file position may be defined as a value derived from the line number and the offset of the character in the line.

File positions in text streams cannot be used arithmetically. For instance, you cannot assume that adding 1 to the position of a particular character results in the position of the next character. Such file positions can be used only as tokens. This means that you can obtain the current file position (using the ftell function) and later return to that position (using the fseek function), but no other portable use of the file position is possible.

This change from UNIX behavior applies only to text streams. When you use fseek and ftell with a binary stream, the ISO/ANSI standard still requires that the file position be the physical character number.

File positioning with fgetpos and fsetpos

Even with the liberal definition of random access to a text stream given in the previous section, implementation of random access can present major problems for a file system that is very different from that of a traditional UNIX file system. The traditional OS/390 file system is an example of such a system. To assist users of these file systems, the Standard includes two non UNIX functions, fsetpos and fgetpos .

File systems like the OS/390 file system have two difficulties implementing random access in the UNIX (ISO/ANSI binary) fashion:

They do not record character-oriented position information. For many OS/390 files, such as those with record format VB, a request to position to the 10,000th character can be satisfied only by positioning to the first character and then reading until 10,000 characters have been read. (To determine the number of characters in the file, it is necessary to read the entire file.)
Some files may contain more characters than the largest possible long int value. Because UNIX operating systems and the Standard define the file position to have type long int , random access to all such enormous files cannot be supported. The functions fgetpos and fsetpos are defined by the Standard to perform operations similar to those of fseek and ftell , except that the representation of a file position is completely implementation-defined. This allows an implementation to choose a representation for the file position that is large enough to address all characters of the largest possible file and that can take into account all the idiosyncrasies of the host operating system. (For example, the file position may reference a disk track number rather than a record number or byte number.) Thus, using fgetpos and fsetpos for random access produces the greatest likelihood that a program will run on a system dissimilar to UNIX.

The fsetpos and fgetpos functions did not exist prior to the definition of the ISO/ANSI C standard. Because many C libraries have not yet implemented them, they are at this time less portable than fseek and ftell , which are compatible with UNIX operating systems.

However, it is a relatively straightforward task to implement them as macros that call fseek and ftell in such systems. After these macros have been written, fsetpos and fgetpos are essentially as portable as their UNIX counterparts and will offer substantial additional functionality where provided by the library on systems such as OS/390.

The ISO/ANSI I/O model

The following list describes the I/O model for ISO/ANSI C. The points are listed in the same order as the corresponding points for the UNIX I/O model, as presented in the previous section.

A file may be processed in one of two ways: as a text stream, or as a binary stream. When a file is processed as a binary stream, it appears to the program as a sequence of characters. It may not be possible to create a file containing no characters.
A file accessed as a text stream appears to the program as a sequence of lines separated by occurrences of the new-line character ( '\n' ). The effects of reading or writing control characters using a text stream are not predictable. An implementation is permitted to record line separators using some technique other than physical new-line characters.
When a file is accessed as a binary stream, its characters are numbered sequentially starting at 0. It is possible to position a binary stream to any particular character. When a file is accessed as a text stream, its characters are addressable, but not necessarily by a physical character number. It is possible to position a file accessed as a text stream to any character, provided the address of that character was obtained at the time of some previous access to that character.
An implementation may restrict the size of lines or files. The implementation may pad files accessed as a binary stream with null characters at the end of the file, and it may pad files accessed as a text stream with blanks at the end of each line.
Files can be opened for reading or writing, or both. When a file is opened for writing, the previous contents can optionally be erased. It is undefined whether writing a character before the end of file shortens the length of the file or leaves it unchanged.

IBM 370 I/O Concepts

Programmers accustomed to other systems frequently find the unique nature of 370 I/O confusing. This section organizes the most significant information about 370 I/O for SAS/C users. Note that this description is general rather than specific. Details and complex special cases are generally omitted to avoid obscuring the basic principles. See the introduction to the SAS/C Compiler and Library User's Guide for a small bibliography of relevant IBM publications that should be consulted for additional information.

Fundamental principles

There are two 370 operating systems of interest, OS/390 and CMS. They implement different file systems. (CMS also implements OS simulation, which emulates OS/390 I/O under CMS. The emulation is not perfect and is actually a third I/O implementation.)
Many file systems feature exactly one kind of file. For instance, in UNIX all files are simply sequences of characters. The 370 operating systems, especially OS/390, go to the opposite extreme and handle many different types of files, each with its own peculiarities and uses. In general, the programmer must decide during program design which sorts of files a program will use.
370 I/O is record oriented. That is, files are treated as sequences of records, not sequences of characters. The idea that a physical character or character sequence may be used as a record or line separator is completely alien to the 370 systems. (An analogy that may be helpful is that UNIX operating systems and PC DOS treat line-oriented files as virtual terminals, with lines separated by layout characters such as the new-line and form feed characters. The 370 systems handle files as if they were virtual card decks consisting of physical records separated by gaps.)
Most file systems allow the same program to replace old data in a file and to add new data at the end. In general, 370 I/O does not permit you to mix these two kinds of updates within the same program. When a file is opened using a technique that permits the addition of new data, the replacement of old data generally causes any following data to be discarded.
370 I/O is hardware oriented. It uses physical disk addresses to encode file positions. Under OS/390, you cannot address records efficiently, even with a record number. For common file types, you must use an actual disk address to position to a record without reading from the start of a file.
Another aspect of the hardware orientation of 370 I/O is the large number of file attributes that must be assigned, either by the program or by the user. Many of these attributes have no effect other than to alter the physical layout of the data. Such attributes are defined for the sole purpose of enabling the programmer to trade off various aspects of program performance. For example, you can permit a program to execute faster by using more memory for buffer space. In some cases, the ability to tailor these attributes is vital, but frequently the programmer is forced to make such choices when performance is not an important consideration.
The 370 file systems are lacking in disk space management. This means that programs must deal with the inability to enlarge files. It also means that users must provide size estimates to the system when files are created. It is necessary with some commonly used file types to run utilities to reclaim wasted file space. These problems are most notable under OS/390, but they can also be a factor under CMS.
For programmers accustomed to the UNIX file system, the conventions for 370 file naming may seem strange. Under OS/390, filenames are often given only as indirect names (DDnames in OS/390 jargon) that can be connected to actual filenames only by the use of a control language. (It is possible to refer to a file by its actual name rather than a DDname, but the absence of directories and reliable user identification under OS/390 make this an inconvenient and often difficult technique.) Under CMS, either DDnames or more natural filenames can be used, but some programs choose to use DDnames to achieve closer compatibility with OS/390.

File organizations under OS/390

Under OS/390, files are classified first by file organization. A number of different organizations are defined, each tailored for an expected type of usage. For instance, files with sequential organization are oriented towards processing records in sequential order, while most files with VSAM (Virtual Storage Access Method) organization are oriented toward processing based on key fields in the data.

For each file organization, there is a corresponding OS/390 access method for processing such files. (An OS/390 access method is a collection of routines that can be called by a program to perform I/O.) For instance, files with sequential organization are normally processed with the Basic Sequential Access Method (BSAM). Sometimes, a file can be processed in more than one way. For example, files with direct organization can be processed either with BSAM or with the Basic Direct Access Method (BDAM).

The file organizations of most interest to C programmers are sequential and partitioned. The remainder of this section relates primarily to these file organizations, but many of the considerations apply equally to the others. A number of additional considerations apply specifically to files with partitioned organization. These considerations are summarized in OS/390 partitioned data sets.

Note: An important type of OS/390 file, the Virtual Storage Access Method (VSAM) file, was omitted from the previous list. VSAM files are organized as records identified by a character string or a binary key. Because these files differ so greatly from the expected C file organization, they are difficult to access using standard C functions. Because of the importance of VSAM files in the OS/390 environment, full access to them is provided by nonportable extensions to the standard C library. [cautionend]

Note: Also, if your system supports UNIX System Services (USS) OS/390, it provides a hierarchical file system similar to the system offered on UNIX. The behavior of files in the hierarchical file system is described in UNIX Low-Level I/O. Only traditional OS/390 file behavior is described here. [cautionend]

The characteristics of a sequential or partitioned file are defined by a set of attributes called data control block (DCB) parameters. The three DCB parameters of most interest are record format (RECFM), logical record length (LRECL), and block size (BLKSIZE).

As stated earlier, OS/390 files are stored as a sequence of records. To improve I/O performance, records are usually combined into blocks before they are written to a device. The record format of a file describes how record lengths are allowed to vary and how records are combined into blocks. The logical record length of a file is the maximum length of any record in a file, possibly including control information. The block size of a file is the maximum size of a block of data.

The three primary record formats for files are F (fixed), V (variable), and U (undefined). Files with record format F contain records that are all of equal length. Files with format V or U may contain records of different lengths. (The differences between V and U are mostly technical.) Files of both F and V format are frequently used; the preferred format for specific kinds of data (for instance, program source) varies from site to site.

Ideally, the DCB parameters for a file are not relevant to the C program that processes it, but sometimes a C program has to vary its processing based on the format of a file, or to require a file to have a particular format. Some of the reasons for this are as follows:

Because most C programs do not write lines of equal length, a C library implementation must add trailing blanks to the end of output lines in a record format F file and remove them on input. If this is inappropriate for an application, you may need to require the use of a record format V or U file, or to use a nonportable function to inhibit library padding.
When writing to a file with a small logical record length as a text stream, the library may be forced to divide a long line into several records. In this case, when the file is read, the data are not identical to what was written.
Some programs and system utilities require specific DCB attributes. For instance, the OS/390 linkage editor cannot handle object files whose block size is greater than 3200 bytes. C programs producing input for such programs must be aware of these requirements.
One of the secondary DCB attributes a file can have is the ANSI control characters (RECFM=A) option, which means that the first character position of each record will be used as a FORTRAN carriage control character. The UNIX convention of using characters such as form feed and carriage return to create page formatting can be used only when the output file is defined to use ANSI control characters. Since some editors do not allow such files to be edited, it is generally not appropriate to assign this attribute to all files.
The standard C language does not provide any way for you to interrogate or define file attributes. In cases in which a program depends on file attribute information, you have two choices. You can use control language when files are created or used to define the file attributes, or you can use nonportable mechanisms to access or specify this information during execution.

File organizations under CMS

Like most operating systems, CMS has its own native file system. (In fact, it has two: the traditional minidisk file system and the more hierarchical shared file system.) Unlike most operating systems, CMS has the ability to simulate the file systems of other IBM operating systems, notably OS and VSE. Also, CMS can transfer data between users in spool files with the VM control program (CP).

Therefore, CMS files are classified first by the type of I/O simulation (or lack thereof) used to read or write to them. The three types are

CMS-format files, which are read and written by native CMS I/O support. This category includes spool files (virtual reader and printer files) and CMS disk files, either mini-disk based or in the shared file system.
OS-format files, particularly MACLIBs and TXTLIBs (simulated OS PDS's) and OS files on OS disks. These files are read and written by the CMS simulation of OS BSAM and other OS access methods.
VSE-format files, particularly VSAM files, including VSAM files on OS or VSE disks. These files are read and written by the VSE implementation of VSAM under CMS.

CMS I/O simulation can be used to read files created by OS or VSE, but these operating systems cannot read files created by CMS, even when the files are created using CMS's simulation of their I/O system. In general, CMS adequately simulates OS and VSE file organizations, and the rules that apply in the real operating system also apply under CMS. However, the simulation is not exact. CMS's simulation differs in some details and some facilities are not supported at all.

CMS-format files, particularly disk files, are of most interest to C programmers. CMS disk files have a logical record length (LRECL) and a record format (RECFM). The LRECL is the length of the largest record; it may vary between 1 and 65,535. The RECFM may be F (for files with fixed-length records) or V (for files with variable-length records). Other file attributes are handled transparently under CMS. Files are grouped by minidisk, a logical representation of a physical direct-access device. The attributes of the minidisk, such as writability and block size, apply to the files it contains. Files in the shared file system are organized into directories, conceptually similar to UNIX directories.

Records in RECFM F files must all have the same LRECL. The LRECL is assigned when the file is created and may not be changed. Some CMS commands require that input data be in a RECFM F file. To support RECFM F files, a C implementation must either pad or split lines of output data to conform to the LRECL, and remove the padding from input records.

RECFM V files have records of varying length. The LRECL is the length of the longest record in the file, so it may be changed at any time by appending a new record that is longer than any other record. However, changing the record length of RECFM V files causes any following records to be erased. The length of any given record can be determined only by reading the record. (Note that the CMS LRECL concept is different from the OS/390 concept for V format files, as the LRECL under OS/390 includes extra bytes used for control information.)

Some rules apply for both RECFM F and RECFM V files. Records in CMS files contain only data. No control information is embedded in the records. Records may be updated without causing loss of data. Files may be read sequentially or accessed randomly by record number.

As under OS/390, files that are intended to be printed reserve the first character of each record for an ANSI carriage control character. Under CMS, these files can be given a filetype of LISTING, which is recognized and treated specially by commands such as PRINT. If a C program writes layout characters, such as form feeds or carriage returns, to a file to effect page formatting, the file should have the filetype LISTING to ensure proper interpretation by CMS.

Be aware that the standard C language does not provide any way for you to interrogate or define file attributes. In cases in which a program depends on file attribute information, you have two choices. You can use the FILEDEF command to define file attributes (if your program uses DDnames), or you can use nonportable mechanisms to access or specify this information during execution.

OS/390 partitioned data sets

As stated earlier, one of the important OS/390 file organizations is the partitioned organization. A file with partitioned organization is more commonly called a partitioned data set (PDS) or a library. A PDS is a collection of sequential files, called members, all of which share the same area of disk space. Each member has an eight-character member name. Under OS/390, source and object modules are usually stored as PDS members. Also, almost any other sort of data may be stored as a PDS member rather than as an ordinary sequential file.

Partitioned data sets have several properties that make them particularly difficult for programs that were written for other file systems to handle:

It is not possible to add data to the end of a PDS member. Because each member is usually adjacent to the end of the previous member on the disk, adding data to the end of one member would destroy the next one. To change the size of a PDS member, it usually is necessary to copy and rewrite the entire member.
Members are always added to a PDS at the end of the file. For this reason, it is impossible to write to two members of the same PDS at the same time, as this causes the two members to overlap randomly.
When a member is replaced in a PDS, the space used by any previous member with the same name is not reclaimed. This makes PDS's particularly susceptible to running out of space. It is necessary to run a system utility to reclaim unused space in a PDS.
A member does not always occupy the same spot in a PDS. Because PDS file positions are represented relative to the start of the entire PDS, file positions may differ between identical copies of the same data, even if all file attributes are identical.

These limitations may cause ISO/ANSI-conforming programs to fail when they use PDS members as input or output files. For instance, it is reasonable for a program to assume that it can append data to the end of a file. But due to the nature of PDS members, it is not feasible for a C implementation to support this, except by saving a copy of the member and then replacing the member with the copy. Although this technique is viable, it is very inefficient in both time and disk space. (This tradeoff between poor performance and reduced functionality is one that must be faced frequently when using C I/O on the 370. PDS members, which are perhaps the most commonly used kind of OS/390 file, are the most prominent examples of such a tradeoff.)

Note: Recent versions of OS/390 support an extended form of PDS, called a PDSE. Some of the previously described restrictions on a PDS do not apply to a PDSE. For example, unused space is reclaimed automatically in a PDSE. [cautionend]

CMS MACLIBs and TXTLIBs

Two important OS-simulated file types on CMS are the files known as MACLIBs and TXTLIBs. Both of these are simulations of OS-partitioned data sets. MACLIBs are typically used to collect textual data or source code; TXTLIBs may contain only object code. Unlike OS PDS's, these files always have fixed-length, 80-character records.

In general, MACLIBs and TXTLIBs may not be written by OS-simulated I/O. Instead, data are added or removed a member at a time by CMS commands. Input from MACLIBs and TXTLIBs can be performed using either OS-simulation or native CMS I/O.

Identifying files

In UNIX operating systems and similar systems, files are identified in programs by name, and program flexibility with files is achieved by organizing files into directories. Files with the same name may appear in several directories, and the use of a command language to establish working directories enables the user of a program to define program input and output flexibly at run time.

In the traditional OS/390 file system, all files occupy a single name space. (This is an oversimplification, but a necessary one.) Programs that open files by a physical filename are limited to the use of exactly one file at a site. You can use several techniques to increase program flexibility in this area, none of which is completely satisfactory. These techniques include the following:

Specify filenames in TSO format. When the time-sharing option of OS/390 (TSO) is used, each user's files usually begin with a userid, thereby ensuring that the filenames chosen by different users do not overlap. By convention, a user running under TSO can omit the userid from a filename specification. This helps considerably for those programs that always run interactively and never in batch mode. However, userid is a TSO concept and, unless a site uses optional software (such as an IBM or other vendor security system), programs cannot be associated with a userid when running in batch.
Specify filenames as DDnames. Under OS/390-batch, using DDnames to identify files is traditional. A DDname is an indirect name associated with an actual filename or device addressed by a DD statement in batch or an ALLOCATE command under TSO. Programs that use DDnames to identify files are completely flexible. They can produce printed output, terminal output, or disk output, depending only on their control language. Unfortunately, control language must always be used, because there are no default file definitions.
Because most traditional filenames include periods, which are not permitted in DDnames, programs from other environments may need to be modified if they are to use DDnames, and if the logic of the program will withstand such a change.
Determine filenames dynamically at run time rather than putting them in the program. For instance, you may get filenames from the user or from a profile or configuration file. This is the most flexible technique, but it may require extensive program changes.

Under CMS, you can use other techniques to increase program flexibility:

The concept of the CMS minidisk replaces the UNIX directory concept. However, CMS minidisks are not arranged hierarchically, as UNIX directories are arranged. CMS minidisks are not identified by name or device address but by filemode letter, which is assigned by using the CMS ACCESS command and can be changed at any time. (Because the same filename may exist on several minidisks, it may be necessary to include a filemode letter in a filename to make it unambiguous.) In many ways, the minidisk with filemode letter A corresponds to the UNIX working directory, but this analogy is only approximate.
CMS filenames use spaces in filenames rather than periods. This is not a problem, because it is natural for a C library to treat the filename xyz.c as XYZ C under CMS.
The CMS shared file system is hierarchically arranged, so there is often a natural correspondence between a UNIX pathname and a shared filename. Unfortunately, the differing character conventions of CMS and UNIX will generally inhibit a UNIX oriented program from running unchanged with the shared file system. For example, the UNIX pathname /tools/asm/main.c is the same as the shared filename MAIN C TOOLS.ASM.
CMS supports using DDnames for filenames instead of physical filenames. This feature allows programs to be easily ported between OS/390 and CMS. The file referred to by a DDname must be defined by using the CMS FILEDEF command before a program that uses the DDname is executed.

File existence

Under OS/390, the concept of file existence is not nearly so clear-cut as on other systems, due primarily to the use of DDnames and control language. Since DDnames are indirect filenames, the actual filename must be provided through control language before the start of program execution. If the file does not already exist at the time the DD statement or ALLOCATE command is processed, it is created at that time. Therefore, a file accessed with a DDname must already exist before program execution.

An alternate interpretation of file existence under OS/390 that avoids this problem is to declare that a file exists after a program has opened it for output. By this interpretation, a file created by control language immediately before execution does not yet exist. Unfortunately, this definition of existence cannot be implemented because of the following technicalities:

OS/390 does not distinguish in the catalog or Volume Table of Contents (VTOC) between a newly created file that has never been written and one that has been written but is empty (contains no characters).
Attempting to read a file that has never been written produces random results because OS/390 does not erase disk space when it is allocated or freed. This makes it impossible to distinguish an empty file from a newly created file by trying to read it.

A third interpretation of existence is to say that an OS/390 file exists if it contains any data (as recorded in the VTOC). This has the disadvantage of making it impossible to read an empty file but the much stronger advantage that a file created by control language immediately before program execution is perceived as not existing. (footnote 1)

This ambiguity about the meaning of existence applies only to files with sequential organization. For files with partitioned organization, only the file as a whole is created by control language; the individual members are created by program action. This means that existence has its natural meaning for PDS members, and that it is possible to create a PDS member containing no characters.

CMS does not allow the existence of files containing no characters, and it is not possible to create such a file.

Miscellaneous differences from UNIX operating systems

The following section lists some additional features of UNIX operating systems and UNIX I/O that some programmers expect to be available on the 370 systems. These features are generally foreign to the 370 environment and impractical to implement. Code that expects the availability of these features is not portable to the 370 no matter how successfully it runs on other architectures.

UNIX operating systems and many other systems support single-character unbuffered terminal I/O in which characters can be read from a terminal one at a time and may not appear on the screen until echoed by the program. This sort of full-duplex protocol is not supported by 370 terminal controllers or operating systems.
Many programs assume that screen formatting is controlled by standard control sequences, such as those used by the DEC VT100 and similar terminals. The common 370 terminal architecture (the 3270 family) bears no similarities whatsoever to that of terminals commonly used with UNIX operating systems. Although OS/390 and CMS support the use of terminals similar to the VT100, they are not commonly used and are not supported well enough to make running UNIX full-screen applications on them a viable proposition.
The 370 operating systems offer little or no support for the use of files by more than one program simultaneously. Programs that want to do file sharing must issue system calls to synchronize with each other and obey a number of restrictions in the way the shared files are used. Because common system programs such as compilers, linkers, and copy utilities do not attempt to synchronize in this way, attempting to share files with these programs is unsafe.
There is no OS/390 or CMS concept corresponding to the pipe. Data are usually passed from program to program by means of temporary files.
In general, the size of a file cannot be determined in any way other than by reading the entire file. The OS/390 and CMS equivalents of directories and inodes record the file size in terms of either the number of records or the hardware address of the end of file.
In UNIX operating systems and many other systems, the time at which a file was last written or accessed can be determined easily. Under OS/390, this information is not recorded. For PDS members, popular editors frequently store such information in a control area of the file, but this information is both difficult to access and not reliable, because updates by programs that do not support this feature (such as linkers and copy utilities) do not maintain the data appropriately.

Summary of 370 I/O characteristics

The following list describes the characteristics of 370 files (without any special reference to the C language). The points are listed in the same order as the corresponding points for the UNIX and ISO/ANSI I/O models as presented earlier:

Many different kinds of files are possible. In general, files are not simply sequences of characters, as an additional structure is imposed by grouping the characters of a file into records. Whether a file can contain no characters depends on the file type.
The records of a file are separated by logical or physical gaps. Control characters have no special significance and never serve as record or line separators.
It is not possible to position a file to a particular character. Usually, it is possible to position efficiently to a particular record, but records are frequently identified by hardware-oriented addresses rather than by record numbers.
Most files have restrictions on record length and file size, depending on their attributes. It is frequently necessary to write padding characters to force a file to conform to these attributes.
Files can be opened for reading or writing or both. Usually, it is not possible to open a file so that new characters can be added and old characters replaced. It depends on file type and how it is accessed whether replacing an existing character truncates the file or leaves its length unchanged.

SAS/C I/O Concepts

In an ideal C implementation, C I/O would possess all three of the following properties:

It would be compatible with UNIX operating systems.
It would be efficient.
It would work with all kinds of files.

For the reasons detailed in IBM 370 I/O Concepts, C I/O on the 370 cannot support all three of these properties simultaneously. The library provides several different kinds of I/O to allow the programmer to select the properties that are most important.

The library offers two separate I/O packages:

Standard I/O is defined according to the ISO/ANSI standard. It is efficient and works with all kinds of files, but in many ways it is not compatible with UNIX operating systems. For files with suitable attributes (as described in the next section, "Standard I/O"), standard I/O is efficient and compatible with UNIX operating systems, but many files are not of this sort, especially files under OS/390. Besides the ISO/ANSI standard I/O functions, the library provides a number of augmented functions, which provide non-portable access to mainframe-specific functionality.
UNIX style I/O is compatible with UNIX low-level I/O and supports all types of files, but it is generally not efficient.

Details on both of these I/O packages are presented in the following sections. (footnote 2)

Standard I/O

Standard I/O is implemented by the library in accordance with its definition in the C Standard. A file may be accessed as a binary stream, in which case all characters of the file are presented to the program unchanged. When file access is via a binary stream, all information about the record structure of the file is lost. On the other hand, a file may be accessed as a text stream, in which case record breaks are presented to the program as new-line characters ('\n'). When data are written to a text file and then read, the data may not be identical to what was written because of the need to translate control characters and possibly to pad or split text lines to conform to the attributes of the file.

Besides the I/O functions defined by the Standard, several augmented functions are provided to exploit 370-specific features. For instance, the afopen function is provided to allow the program to specify 370-dependent file attributes, and the afread routine is provided to allow the program to process records that may include control characters. Both standard I/O functions and augmented functions may be used with the same file.

Library access methods

The low-level C library routines that interface with the OS/390 or CMS physical I/O routines are called C library access methods or access methods for short. (The term OS/390 access method always refers to access methods such as BSAM, BPAM, and VSAM to avoid confusion.) Standard I/O supports five library access methods: "term" , "seq" , "rel" , "kvs" , and "fd" . The file can span multiple volumes.

When a file is opened, the library ordinarily selects the access method to be used. However, when you use the afopen function to open a file, you can specify one of these particular access methods.

The library uses the "term" access method to perform terminal I/O; this access method applies only to terminal files. (See Terminal I/O for more information on this access method.)
The "rel" access method is used for nonterminal files whose attributes permit them to support UNIX file behavior when accessed as binary streams.
The "kvs" access method is used for VSAM files when access is via the SAS/C nonstandard keyed I/O functions. (See Using VSAM Files.)
The "fd" access method is used for files in the USS hierarchical file system.
The "seq" access method is used with all text streams and for binary streams that cannot support the "rel" access method, except when "fd" is used.

The "rel" access method Under OS/390, the "rel" access method can be used for files with sequential organization and RECFM F, FS, or FBS. (The limitation to sequential organization means that the "rel" access method cannot be used to process a PDS member.) Under CMS, the "rel" access method can be used for disk files with RECFM F. The "rel" access method is designed to behave like UNIX disk I/O:

All characters are addressable by their character number. It is possible to position efficiently to any character.
It is possible to replace characters before the end of file and add new data after the end of file without closing and reopening the file. A file never becomes smaller, except when the open call requests that the file's previous contents be discarded.

Because of the nature of the 370 file system, complete UNIX compatibility is not possible. In particular, the following differences still apply:

It is not possible to create a file containing no characters using the "rel" access method.
Padding null characters '( \0 )' will be added at the end of file, if necessary to complete a record when the file is closed. If you define a file processed by the "rel" access method to have a record length of 1, you can avoid this padding.

The "kvs" access method The "kvs" access method processes any file opened with the extension open mode "k" (indicating keyed I/O). This access method is discussed in more detail in Using VSAM Files.

The "fd" access method The "fd" access method processes any file residing in the USS OS/390 hierarchical file system. These files are fully compatible with UNIX. In files processed with the "fd" access method, there is no difference between text and binary access.

The "seq" access method The "seq" access method processes a nonterminal non USS file if any one of the following apply:

the file is to be accessed as text
the file is not suitable for "rel" access
the use of the "seq" access method is specifically requested.

In general, the "seq" access method is implemented to use efficient native interfaces, forsaking compatibility with UNIX operating systems where necessary. Some specific incompatibilities are listed here:

The operating system being used and the file type determine whether an empty file can be created.
File positions are represented in a way natural to the file type and the operating system, not as character numbers. The ISO/ANSI fsetpos and fgetpos functions are fully supported, except for certain files with unusual attributes such as multivolume disk files. (See Tables 3.5 and 3.6 for a complete list of restricted file types.)
The fseek and ftell functions are supported only for text streams. This restriction is necessary because the C Standard requires that the file position be defined as a relative character number for binary streams, which cannot be efficiently determined on 370 systems. If an application requires access to binary data by character number, it should be either restricted to using files that can be processed by the "rel" access method, or it should use the UNIX style I/O package.
Padding of lines for a text stream and padding at end of file for a binary stream frequently occurs. The afopen function gives you some control over the way padding is performed.
For some files, changing data within a file causes the file to be truncated at the point of change; that is, all data following the change is lost. This behavior is system and file type dependent. With afopen , the program can inform the library of any dependence on truncation or lack of truncation. For CMS disk files, truncation is optional and you can use afopen to indicate whether truncation should occur.

UNIX style I/O

The library provides UNIX style I/O to meet two separate needs:

to support the same functions as UNIX low-level I/O: open , read , write , lseek , and close . This allows programs that use these functions to run easily with the SAS/C library.
to support seeking by character number for all files(footnote 3) whether or not this is convenient and efficient to implement. This allows programs that require this property to execute successfully, although more slowly, with the SAS/C library.

As a result of the second property, UNIX style I/O is less efficient than standard I/O for the same file, unless the file is suitable for "rel" access, or it is in the USS hierarchical file system. In these cases, there is little additional overhead.

For files suitable for "rel" access, UNIX style I/O simply translates I/O requests into corresponding calls to standard I/O routines. Thus, for these files there is no decrease in performance.

For files in the USS hierarchical file system, UNIX style I/O calls the operating system low-level I/O routines directly. For these files, use of standard I/O by UNIX style I/O is completely avoided.

For other files, UNIX style I/O copies the file to a temporary file using the "rel" access method and then performs all requested I/O to this file. When the file is closed, the temporary file is copied back to the user's file, and the temporary file is then removed. This means that UNIX style I/O for files not suitable for "rel" access has the following characteristics:

The necessity of copying the data makes UNIX style I/O somewhat inefficient. However, after the copying is done, file operations are efficient, except for close of an output file, when all the data must be copied back. As an optimization, input data are copied from the user's file only as necessary, rather than copying all the data when the file is opened.
If there is a system failure while a file is being processed with UNIX style I/O, the file is unchanged, because no data are written to an output file until the file is closed.
It is possible for the processing of a file with UNIX style I/O to fail if there is not enough disk space available to make a temporary copy.
Because UNIX style I/O completely rewrites an output file when the file is closed, file truncation does not occur. That is, characters are not dropped as a result of updates before the end of file.

All of the discussion within this section assumes that the user's file is accessed as a binary file: that is, without reference to any line structure. Occasionally, there are programs that want to use this interface to access a file as a text file. (Most frequently, such programs come from non UNIX environments like the IBM PC.)

As an extension, the library supports using UNIX style I/O to process a file as text. However, file positioning by character number is not supported in this case, and no copying of data takes place. Instead, UNIX style I/O translates I/O requests to calls equivalent to standard I/O routines.

Note that UNIX style I/O represents open files as small integers called file descriptors. Unlike UNIX, with OS/390 and CMS, file descriptors have no inherent significance. Some UNIX programs assume that certain file descriptors (0, 1, and 2) are always associated with standard input, output, and error files. This assumption is nonportable, but the library attempts to support it where possible. Programs that use file 0 only for input, and files 1 and 2 only for output, and that do not issue seeks to these files, are likely to execute successfully. Programs that use these file numbers in other ways or that mix UNIX style and standard I/O access to these files are likely to fail.

UNIX operating systems follow specific rules when assigning file descriptors to open files. The library follows these rules for USS files and for sockets. However, OS/390 or CMS files accessed using UNIX I/O are assigned file descriptors outside of the normal UNIX range to avoid affecting the number of USS files or sockets the program can open. UNIX programs that use UNIX style I/O to access OS/390 or CMS files may therefore need to be changed if they require the UNIX algorithm for allocation of file descriptors.

370 Perspectives on SAS/C Library I/O

This section describes SAS/C I/O from a 370 systems programmer's perspective. In contrast to the other parts of this chapter, this section assumes some knowledge of 370 I/O techniques and terminology.

OS/390 I/O implementation

Under OS/390, the five C library access methods are implemented as follows:

The "term" access method uses TPUT ASIS to write to the terminal and TGET EDIT to read from the terminal. Access to SYSTERM in batch is performed using QSAM.
The "seq" access method uses BSAM and BPAM for both input and output. VSAM is used for access to VSAM ESDS and KSDS data sets.
The "rel" access method uses XDAP and BSAM. XDAP is used for input and to update all blocks of the file except the last block. BSAM is used to update the last block of the file or to add new blocks. VSAM is used to access VSAM relative record data sets, and DIV is used to access VSAM linear data sets.
The "kvs" access method uses VSAM for all operations.
The "fd" access method uses USS service routines for all operations.

Although BDAM is not used by the "rel" access method, direct organization files that are normally processed by BDAM are supported, provided they have fixed-length records and no physical keys.

CMS I/O implementation

The C library access methods are implemented under CMS as follows:

The "term" access method uses TYPLIN or LINEWRT to write to the terminal and WAITRD or LINERD to read from the terminal.
The "seq" access method uses device-dependent techniques. For CMS disk files, it uses FSCB macros (FSREAD, FSWRITE, and so on). For access to shared files, it uses the CSL DMSOPEN, DMSREAD, and DMSWRITE services. For access to shared file system directories, it uses the DMSOPDIR and DMSGETDI services.
For spool files, it uses CMS native macros such as RDCARD. For tape files, filemode 4 disk files, and files on OS disks, it uses simulated OS/390 BSAM. VSAM KSDS and ESDS data sets are processed using simulated VSE/VSAM.
The "rel" access method uses FSCB macros. Where appropriate, it creates sparse CMS files. VSAM RRDS data sets are processed using simulated VSE VSAM.
The "kvs" access method uses VSE VSAM for all operations.

File attributes for "rel" under OS/390

Under OS/390, a file can be processed by the "rel" access method if it is not a PDS or PDS member, and if it has RECFM F, FS, or FBS. These record formats ensure that there are no short blocks or unfilled tracks in the file, except the last, and make it possible to reliably convert a character number into a block address (in CCHHR form) for the use of XDAP. Use of "rel" may also be specified for regular files in the USS file system (in which case the "fd" access method is used).

If the LRECL of an FBS file is 1, then an accurate end-of-file pointer can be maintained without adding any padding characters. Because of the use of BSAM and XDAP to process the file, use of this tiny record size does not affect program efficiency (data are still transferred a block at a time). However, it may lead to inefficient processing of the file by other programs or languages, notably ones that use QSAM.

File attributes for "rel" access under CMS

Under CMS, a file can be processed by the "rel" access method if it is a CMS disk file (not filemode 4) with RECFM F. Use of RECFM F ensures that a character number can be converted reliably to a record number and an offset within the record.

If the LRECL of a RECFM F file is 1, then an accurate end-of-file pointer can be maintained without ever adding any padding characters. Because the file is processed in large blocks (using the multiple record feature of the FSREAD and FSWRITE macros), use of this tiny record size does not affect program efficiency. Nor does it lead to inefficient use of disk space, because the files are physically blocked according to the minidisk block size. However, it may lead to inefficient processing of the file by other programs or languages that process one record at a time.

Temporary files under OS/390

Temporary files are created by the library under two circumstances.

They are created if the program calls the tmpfile function.
They are created if the program uses UNIX style I/O and it is necessary to copy a file.

A program can create more than one temporary file during its execution. Each temporary file is assigned a temporary file number, starting sequentially at 1. When a temporary file is closed, its file number becomes available again. When a new temporary file is opened, the lowest available file number is assigned.

One of two methods is used to create a temporary file with number nn. First, a check is made for a SYSTMPnn DD statement. If the file number is larger than 99, the last two characters of the DDname are in an encoded form. If this DDname is allocated and references a temporary data set, this data set is associated with the temporary file. If no SYSTMPnn DDname is allocated, the library uses dynamic allocation to create a new temporary file whose data set name depends on the file number. (The system is allowed to select the DDname, so there is no dependency on the SYSTMPnn style of name.) The data set name depends also on information associated with the running C programs, so that several C programs can run in the same address space without conflicts occurring between temporary filenames.

Note: If an attempt is made to open temporary file nn and a SYSTMPnn DD statement of the appropriate kind is defined but the file is in use by another SAS/C program running in the same address space, the file number is considered to be unavailable, and the lowest available file number not in use by another such program is used instead. [cautionend]

If a program is compiled with the posix compiler option, then temporary files are created in the USS hierarchical file system, rather than as OS/390 temporary files. The USS temporary files are created in the /tmp HFS directory.

Temporary files are normally allocated using a unit name of VIO and a space allocation of 50 tracks. The unit name and default space allocation can be changed by a site, as described in the SAS/C installation instructions. If a particular application requires a larger space allocation than the default, use of a SYSTMPnn DD statement specifying the required amount of space is recommended.

Temporary files under CMS

Temporary files are created by the library under two circumstances.

They are created if the program calls the tmpfile function.
They are created if the program uses UNIX style I/O and it is necessary to copy the file.

A program can create more than one temporary file during its execution. Each temporary file is assigned a temporary file number, starting sequentially at 1.

One of two methods is used to create the temporary file whose number is nn. First, a check is made for a FILEDEF of the DDname SYSTMPnn. If this DDname is defined, then it is associated with the temporary file. If no SYSTMPnn DDname is defined, the library creates a file whose name has the form $$$$$$nn $$$$xxxx, where nn is the temporary file number, and the xxxx part of the filetype is associated with the calling C program. This naming convention allows several C programs to execute simultaneously without conflicts occurring between temporary filenames.

Temporary files are normally created by the library on the write-accessed minidisk with the most available space. Using FILEDEF to define a SYSTMPnn DDname with another filemode allows you to use some other technique if necessary.

Be aware that these temporary files are not known to CMS as temporary files. Therefore, they are not erased if a program terminates abnormally or if the system fails during its execution.

VSAM usage and restrictions

The SAS/C library supports two different kinds of access to VSAM files: standard access, and keyed access. Standard access is used when a VSAM file is opened in text or binary mode, and it is limited to standard C functionality. Keyed access is used when a VSAM file is used in keyed mode. Keyed mode is discussed in detail in Using VSAM Files.

Any kind of VSAM file may be used via standard access. Restrictions apply to particular file types, for example, a KSDS may not be opened for output using standard I/O.

The library supports VSAM ESDS data sets as it supports other sequentially organized file types. A VSAM ESDS can be accessed as a text stream or a binary stream using standard I/O or UNIX style I/O. A VSAM ESDS is not suitable for processing by the "rel" access method because it is not possible, given a character position, to determine the RBA (relative byte address) of the record containing the character.
The library supports VSAM KSDS data sets for input only. Output is not supported for standard access because the C I/O routines are unaware of the existence of keys and cannot guarantee that new records are added in key order. Use keyed access instead. Also, file positioning with fseek or fsetpos is not supported, because records are ordered by key, and it is not possible to transform a key value into the file position formats used for other library file types. When a KSDS is used for input, records are always presented to the program in key order, not in the order of their physical appearance in the data set. Note that KSDS output is available when keyed access is used.
The library supports VSAM RRDS data sets for access via the "rel" access method only. Only RRDS data sets with a fixed record length are supported. As with all other files accessed via "rel" , file positioning using fseek and ftell are fully supported.
The library supports VSAM linear data sets that are also known as Data-in-Virtual (DIV) objects. You must access a DIV object as a binary stream, and you must use the "rel" access method. As with all other files accessed via "rel" , file positioning using fseek and ftell are fully supported.

VSAM ESDS, KSDS and RRDS files are processed using a single RPL. Move mode is used to support spanned records. A VSAM file cannot be opened for write only (open mode "w" ) unless it was defined by Access Method Services to be a reusable file.

FOOTNOTE 1: This is the interpretation used in the SAS/C implementation.

FOOTNOTE 2: Two other I/O packages are provided: CMS low-level I/O, defined for low-level access to CMS disk files, and OS low-level I/O, which performs OS-style sequential I/O. These forms of I/O are nonportable and are discussed in Chapter 2, "CMS Low-Level I/O Functions," and Chapter 3, "OS/390 Low-Level I/O Functions," in SAS/C Library Reference, Volume 2.

FOOTNOTE 3: The library connects the use of the UNIX low-level I/O interface and the ability to do seeking by character number because UNIX documentation has traditionally stressed that seeking by character number is not guaranteed when standard I/O is used. The UNIX Version 7 Programmer's Manual states that the file position used by standard I/O "is measured in bytes only on UNIX; on some other systems it is a magic cookie."

Chapter Contents
Previous
Next
Top of Page