This section provides a fairly in-depth summary of the
fundamentals of C I/O. It begins with a discussion of traditional C I/O concepts,
then discusses UNIX low-level, ISO/ANSI, and IBM 370 I/O concepts. These concepts
are combined in SAS/C I/O Concepts and 370 Perspectives on SAS/C Library I/O.
The final section provides guidelines for choosing an I/O method, based on
the needs of your application.
When C was
initially designed, no library, and therefore no I/O,
was included. It was assumed that libraries suitable for use with particular
systems would be developed. Because most early use of the C language was
associated with UNIX operating systems, the UNIX I/O functions were considered
the standard I/O method for C. As the C language has evolved, the I/O definition
has changed to some extent, but understanding the underlying UNIX concepts
is still important.
In addition, many useful C programs were first developed
under UNIX operating systems, and such programs frequently are unaware of
the existence of other systems or I/O techniques. Such programs cannot run
on systems as different from UNIX as CMS or OS/390 without carefully considering
their original environment.
The
main features of the UNIX I/O model are as follows:
One complication in programs developed under UNIX operating systems
is that UNIX defines two different I/O interfaces: standard I/O and low-level
I/O (sometimes called unbuffered I/O). Standard I/O
is a more portable form of I/O than low-level I/O, and UNIX documentation
recommends that portable programs be written using this form. However, UNIX low-level I/O is widely recognized as more efficient than standard
I/O, and it provides some additional capabilities, such as the ability to
test whether a file exists before it is opened. For these and other reasons,
many programs use low-level I/O, despite its documented lack of portability.
UNIX operating systems also support a mixed-level form of I/O, wherein a
file is accessed simultaneously with standard
I/O and low-level I/O. C implementations that support the UNIX low-level
functions may be unable to support mixed-level I/O, if the two forms of I/O
are not closely related in the UNIX manner.
UNIX low-level I/O is not included in the ISO/ANSI C
standard, so it may be unavailable with recently developed C compilers. Also,
do not assume that this form of I/O is truly low-level on any system other
than UNIX.
The definition
of the C I/O library contained in the ISO/ANSI C standard is based on the
traditional UNIX standard I/O definition, but differs from it in many ways.
These differences exist to support efficient I/O implementations on systems
other than UNIX, and to provide some functionality not offered by UNIX. In
general, where definitions of I/O routines differ between ISO/ANSI C and UNIX
C, programs should assume the ISO/ANSI definitions for maximum portability.
The ISO/ANSI definitions are designed for use on many systems including UNIX,
while the applicability of the UNIX definitions is more limited.
In the
UNIX I/O model, files are divided into lines by the new-line character ('\n
'). For this reason, C programs that process input files
one line at a time traditionally read characters until a new-line character
is encountered. Similarly, programs that write output one line at a time
write a new-line character after each line of data.
Many systems
other than UNIX use other conventions for separating lines of text. For instance,
the IBM PC operating system, PC DOS, separates lines of text with two characters,
a carriage return followed by a line feed. The IBM 370 uses yet another method.
To enable a line-oriented C program written for UNIX to execute under PC
DOS, a C implementation must translate a carriage return and line feed to
a new-line character on input, and must translate a new-line character to
a carriage return and line feed on output. Although this translation is appropriate
for a line-oriented program, it is not appropriate for other programs. For
instance, a program that writes object code to a file cannot tolerate replacement
of a new-line character in its output by a carriage return and a line feed.
For this reason, most systems other than UNIX require two distinct forms
of file access: text access and binary access.
The
ISO/ANSI
I/O definition requires that when a program opens a file, it must specify
whether the file is to be accessed as a text stream or a binary stream. When
a file is accessed as a binary stream, the implementation
must read or write the characters without modification. When a file is accessed
as a text stream, the implementation must present the file to
the program as a series of lines separated by new-line characters, even if
a new-line character is not used by the system as a physical line separator.
Thus, under PC DOS, when a program writes a file using a binary stream, any
new-line characters in the output data are written to the output file without
modification. But when a program writes a file using a text stream, a new-line
character in the output data is replaced by a carriage return and a line feed
to serve as a standard PC DOS line separator.
If a file contains a real new-line character (one that
is not a line separator) and the file is read as a text stream, the program
will probably misinterpret the new-line character as a line separator. Similarly,
a program that writes a carriage return to a text stream may generate a line
separator unintentionally. For this reason, the ISO/ANSI library definition
leaves the results undefined when any nonprintable characters (other than
horizontal tab, vertical tab, form feed, and the new-line character) are read
from or written to a text stream. Therefore, text access should be used only
for files that truly contain text, that is, lines of printable data.
Programs that open a file without explicitly specifying
binary access are assumed to require text access, because the formats of binary
data, such as object code, vary widely from system to system. Thus, portable
programs are more likely to require text access than binary access.
Many
non UNIX file systems require files to consist of one or more data blocks
of a fixed size. In these systems, the number of characters stored in a file
must be a multiple of this block size. This requirement can present problems
for programs that need to read or write arbitrary amounts of data unrelated
to the block size; however, it is not a problem for text streams. When a
text stream is used, the implementation can use a control character to indicate
the logical end of file. This approach cannot be used with a binary stream,
because the implementation must pass all data in the file to the program,
whether it has control characters or not.
The ISO/ANSI C library definition deals with fixed data blocks
by permitting output files accessed as binary streams to be padded with null
('\0
') characters. This padding permits systems
that use fixed-size data blocks to always write blocks of the correct size.
Because of the possibility of padding, files created with binary streams on
such systems may contain one or more null characters after the last character
written by the program. Programs that use binary streams and require an exact
end-of-file indication must write their own end-of-file marker (which may
be a control character or sequence of control characters) to be portable.
A similar padding concern can occur with text access.
Some systems support files where all lines must be the same length. (Files
defined under OS/390 or CMS with record format F are of this sort.) ISO/ANSI
permits the implementation to pad output lines with blanks when these files
are written and to remove the blanks at the end of lines when the files are
read. (A blank is used in place of a null character, because text access
requires a printable padding character.) Therefore, portable programs cannot
write lines containing trailing blanks and expect to read the blanks back
if the file will be processed later as input.
Similarly, some systems (such as CMS) support only nonempty
lines. Again, ISO/ANSI permits padding to circumvent such system limitations.
When a text stream is written, the Standard permits the implementation to
write a line containing a single blank, rather than one containing no characters,
provided that this line is always read back as one containing no characters.
Therefore, portable programs should distinguish empty lines from ones that
contain a single blank.
Finally, some systems (such as CMS) do not permit files
containing no characters. A program is nonportable if it assumes a file can
be created merely by opening it and closing it, without writing any characters.
As stated earlier, the UNIX I/O definition features seeking by
character number. For instance, it is possible to position directly to the
10,000th character of a file. On a system where text access and binary access
are different, the meaning of a request to seek to the 10,000th character
of a text stream is not well defined.
ftell
and
fseek
enable you to obtain the current
file position and return to that position, no matter how the system implements
text and binary access.
Consider a system such as PC DOS, where the combination
of carriage return and line feed is used as a line separator. Because of
the translation, a program that counts the characters it reads is likely to
determine a different character position from the position maintained by the
operating system. (A line that the program interprets as n characters,
including a final new-line character, is known by the operating system to
contain n+ 1 characters.)
Some systems, such as the 370 operating systems, do
not record physical characters to indicate line breaks. Consider a file on
such a system composed of two lines of data, the first containing the single
character
1
and the second containing the
single character
2
. A program accessing
this file as a text stream receives the characters { mono 1\n2\n
}. The program must process four characters, although only
two are physically present in the file. A request to position to the second
character is ambiguous. The library cannot determine whether the next character
read should be
\n
or
2
.
Even if you resolve the ambiguity of file positioning
in favor of portability (by counting the characters seen by the program rather
than physical characters), implementation difficulties may preclude seeking
to characters by number using a text stream. Under PC DOS, the only way to
seek accurately to the 10,000th character of a file is to read 10,000 characters
because the number of carriage return and line feed pairs in the file is not
known in advance. If the file is opened for both reading and writing, replacing
a printable character with a new-line character requires replacing one physical
character with two. This replacement requires rewriting the entire file after
the point of change. Such difficulties make it impractical on many systems
to seek for text streams based on a character number.
Situations such as those discussed in this section show
that on most systems where text and binary access are not identical, positioning
in a text stream by character number cannot be implemented easily. Therefore,
the ISO/ANSI standard permits a library to implement random access to a text
stream using some indicator of file position other than character number.
For instance, a file position may be defined as a value derived from the
line number and the offset of the character in the line.
File positions in text streams cannot be used arithmetically.
For instance, you cannot assume that adding 1 to the position of a particular
character results in the position of the next character. Such file positions
can be used only as tokens. This means that you can obtain the current file
position (using the
ftell
function) and
later return to that position (using the
fseek
function), but no other portable use of the file position is possible.
This change from UNIX behavior applies only to text
streams. When you use
fseek
and
ftell
with a binary stream, the ISO/ANSI standard still requires that
the file position be the physical character number.
Even with the liberal definition of random access to a text stream
given in the previous section, implementation of random access can present
major problems for a file system that is very different from that of a traditional
UNIX file system. The traditional OS/390 file system is an example of such
a system. To assist users of these file systems, the Standard includes two
non UNIX functions,
fsetpos
and
fgetpos
.
File systems like the OS/390 file system have two difficulties
implementing random access in the UNIX (ISO/ANSI binary) fashion:
The
fsetpos
and
fgetpos
functions did not exist prior to the
definition of the ISO/ANSI C standard. Because many C libraries have not
yet implemented them, they are at this time less portable than
fseek
and
ftell
, which are compatible
with UNIX operating systems.
However, it is a relatively straightforward task to
implement them as macros that call
fseek
and
ftell
in such systems. After these
macros have been written,
fsetpos
and
fgetpos
are essentially as portable as their
UNIX counterparts and will offer substantial additional functionality where
provided by the library on systems such as OS/390.
The following list describes the
I/O model for ISO/ANSI C. The
points are listed in the same order as the corresponding points for the UNIX
I/O model, as presented in the previous section.
Programmers accustomed
to other systems frequently find the unique nature of 370 I/O confusing.
This section organizes the most significant information about 370 I/O for
SAS/C users. Note that this description is general rather than specific.
Details and complex special cases are generally omitted to avoid obscuring
the basic principles. See the introduction to the
SAS/C Compiler and Library User's Guide for a small bibliography
of relevant IBM publications that should be consulted for additional information.
-
There are two
370 operating systems of interest, OS/390 and CMS. They implement different
file systems. (CMS also implements OS simulation, which emulates
OS/390 I/O under CMS. The emulation is not perfect and is actually a third
I/O implementation.)
-
Many
file systems feature exactly one kind of file. For instance, in UNIX all
files are simply sequences of characters. The 370 operating systems, especially
OS/390, go to the opposite extreme and handle many different types of files,
each with its own peculiarities and uses. In general, the programmer must
decide during program design which sorts of files a program will use.
-
370 I/O is
record oriented. That is, files are treated as sequences
of records, not sequences of characters. The idea that a physical character
or character sequence may be used as a record or line separator is completely
alien to the 370 systems. (An analogy that may be helpful is that UNIX operating
systems and PC DOS treat line-oriented files as virtual terminals, with lines
separated by layout characters such as the new-line and form feed characters.
The 370 systems handle files as if they were virtual card decks consisting
of physical records separated by gaps.)
-
Most file systems allow the same program to
replace old data
in a file and to add new data at the end. In general, 370 I/O does not permit
you to mix these two kinds of updates within the same program. When a file
is opened using a technique that permits the addition of new data, the replacement
of old data generally causes any following data to be discarded.
-
370 I/O is
hardware oriented. It uses physical disk addresses
to encode file positions. Under OS/390, you cannot address records efficiently,
even with a record number. For common file types, you must use an actual
disk address to position to a record without reading from the start of a file.
-
Another
aspect of the hardware orientation of 370 I/O is the large number of file
attributes that must be assigned, either by the program or by the user. Many
of these attributes have no effect other than to alter the physical layout
of the data. Such attributes are defined for the sole purpose of enabling
the programmer to trade off various aspects of program performance. For example,
you can permit a program to execute faster by using more memory for buffer
space. In some cases, the ability to tailor these attributes is vital, but
frequently the programmer is forced to make such choices when performance
is not an important consideration.
-
The 370 file systems are lacking in disk space management. This
means that programs must deal with the inability to enlarge files. It also
means that users must provide size estimates to the system when files are
created. It is necessary with some commonly used file types to run utilities
to reclaim wasted file space. These problems are most notable under OS/390,
but they can also be a factor under CMS.
-
For programmers
accustomed to
the UNIX file system, the conventions for 370 file naming may seem strange.
Under OS/390, filenames are often given only as indirect names (DDnames in
OS/390 jargon) that can be connected to actual filenames only by the use of
a control language. (It is possible to refer to a file by its actual name
rather than a DDname, but the absence of directories and reliable user identification
under OS/390 make this an inconvenient and often difficult technique.) Under
CMS, either DDnames or more natural filenames can be used, but some programs
choose to use DDnames to achieve closer compatibility with OS/390.
Under
OS/390, files are classified first by file organization. A number of different
organizations are defined, each tailored for an expected type of usage. For
instance, files with sequential organization are oriented towards processing
records in sequential order, while most files with VSAM (Virtual Storage Access
Method) organization are oriented toward processing based on key fields in
the data.
For each file organization, there is a corresponding
OS/390 access method for processing such files. (An OS/390 access
method is a collection of routines that can be called by a program
to perform I/O.) For instance, files with sequential organization are normally
processed with the Basic Sequential Access Method (BSAM). Sometimes, a file
can be processed in more than one way. For example, files with direct organization
can be processed either with BSAM or with the Basic Direct Access Method (BDAM).
The
file organizations of most interest to C programmers are sequential and partitioned. The remainder of this section relates
primarily to these file organizations, but many of the considerations apply
equally to the others. A number of additional considerations apply specifically
to files with partitioned organization. These considerations are summarized
in OS/390 partitioned data sets.
Note:
An important type of OS/390 file, the Virtual Storage Access
Method (VSAM) file, was omitted from the previous list. VSAM files are organized
as records identified by a character string or a binary key. Because these
files differ so greatly from the expected C file organization, they are difficult
to access using standard C functions. Because of the importance of VSAM files
in the OS/390 environment, full access to them is provided by nonportable
extensions to the standard C library.
Note:
Also,
if your system supports UNIX System Services (USS) OS/390, it provides a hierarchical
file system similar to the system offered on UNIX. The behavior of files in
the hierarchical file system is described in UNIX Low-Level I/O. Only traditional OS/390 file behavior
is described here.
The characteristics
of a sequential or partitioned file are defined by a set of attributes called
data control block (DCB) parameters. The three DCB parameters of most interest
are record format (RECFM), logical record length (LRECL), and block size (BLKSIZE).
As stated earlier, OS/390 files are stored as a sequence
of records. To improve I/O performance, records are usually combined into
blocks before they are written to a device. The record format of a file describes
how record lengths are allowed to vary and how records are combined into blocks.
The logical record length of a file is the maximum length of any record in
a file, possibly including control information. The block size of a file
is the maximum size of a block of data.
The
three primary record formats for files are F (fixed), V (variable), and U
(undefined). Files with record format F contain records that are all of equal
length. Files with format V or U may contain records of different lengths.
(The differences between V and U are mostly technical.) Files of both F and
V format are frequently used; the preferred format for specific kinds of data
(for instance, program source) varies from site to site.
Ideally, the DCB parameters for a file are not relevant
to the C program that processes it, but sometimes a C program has to vary
its processing based on the format of a file, or to require a file to have
a particular format. Some of the reasons for this are as follows:
-
Because most C programs do not write lines of
equal length, a C library implementation must add trailing blanks to the end
of output lines in a record format F file and remove them on input. If this
is inappropriate for an application, you may need to require the use of a
record format V or U file, or to use a nonportable function to inhibit library
padding.
-
When writing to a file with a small logical record
length as a text stream, the library may be forced to divide a long line into
several records. In this case, when the file is read, the data are not identical
to what was written.
-
Some programs and system utilities require specific
DCB attributes. For instance, the OS/390 linkage editor cannot handle object
files whose block size is greater than 3200 bytes. C programs producing input
for such programs must be aware of these requirements.
-
One of the secondary DCB attributes a file can have is the ANSI
control characters (RECFM=A) option, which means that the first character
position of each record will be used as a FORTRAN carriage control character.
The UNIX convention of using characters such as form feed and carriage return
to create page formatting can be used only when the output file is defined
to use ANSI control characters. Since some editors do not allow such files
to be edited, it is generally not appropriate to assign this attribute to
all files.
-
The standard C language does not provide any way
for you to interrogate or define file attributes. In cases in which a program
depends on file attribute information, you have two choices. You can use
control language when files are created or used to define the file attributes,
or you can use nonportable mechanisms to access or specify this information
during execution.
Like most operating systems,
CMS has its own native file
system. (In fact, it has two: the traditional minidisk file system and the
more hierarchical shared file system.) Unlike most operating systems, CMS
has the ability to simulate the file systems of other IBM operating systems,
notably OS and VSE. Also, CMS can transfer data between users in spool files
with the VM control program (CP).
Therefore,
CMS files are classified first by the type of I/O simulation (or lack thereof)
used to read or write to them. The three types are
CMS
I/O simulation can be used to read files created by OS or VSE, but these operating
systems cannot read files created by CMS, even when the files are created
using CMS's simulation of their I/O system. In general, CMS adequately simulates
OS and VSE file organizations, and the rules that apply in the real operating system also apply under CMS. However, the simulation is
not exact. CMS's simulation differs in some details and some facilities are
not supported at all.
CMS-format files, particularly disk files, are of most
interest to C programmers. CMS disk files have a logical record length (LRECL)
and a record format (RECFM). The LRECL is the length of the largest record;
it may vary between 1 and 65,535. The RECFM may be F (for files with fixed-length
records) or V (for files with variable-length records). Other file attributes
are handled transparently under CMS. Files are grouped by minidisk, a logical representation of a physical direct-access device.
The attributes of the minidisk, such as writability and block size, apply
to the files it contains. Files in the shared file system are organized into
directories, conceptually similar to UNIX directories.
Records in RECFM F files must all have the same LRECL.
The LRECL is assigned when the file is created and may not be changed. Some
CMS commands require that input data be in a RECFM F file. To support RECFM
F files, a C implementation must either pad or split lines of output data
to conform to the LRECL, and remove the padding from input records.
RECFM V files have records of varying length. The LRECL
is the length of the longest record in the file, so it may be changed at any
time by appending a new record that is longer than any other record. However,
changing the record length of RECFM V files causes any following records to
be erased. The length of any given record can be determined only by reading
the record. (Note that the CMS LRECL concept is different from the OS/390
concept for V format files, as the LRECL under OS/390 includes extra bytes
used for control information.)
Some rules apply for both RECFM F and RECFM V files.
Records in CMS files contain only data. No control information is embedded
in the records. Records may be updated without causing loss of data. Files
may be read sequentially or accessed randomly by record number.
As under OS/390, files that are intended to be printed reserve
the first character of each record for an ANSI carriage control character.
Under CMS, these files can be given a filetype of LISTING, which is recognized
and treated specially by commands such as PRINT. If a C program writes layout
characters, such as form feeds or carriage returns, to a file to effect page
formatting, the file should have the filetype LISTING to ensure proper interpretation
by CMS.
Be aware that the standard C language does not provide
any way for you to interrogate or define file attributes. In cases in which
a program depends on file attribute information, you have two choices. You
can use the FILEDEF command to define file attributes (if your program uses
DDnames), or you can use nonportable mechanisms to access or specify this
information during execution.
As stated earlier, one of the important
OS/390 file organizations is the partitioned organization. A file with partitioned
organization is more commonly called a partitioned data set
(PDS) or a library. A PDS is a collection of sequential
files, called members, all of which share the same area of disk space. Each
member has an eight-character member name. Under OS/390, source and object
modules are usually stored as PDS members. Also, almost any other sort of
data may be stored as a PDS member rather than as an ordinary sequential file.
Partitioned data sets have several properties that make
them particularly difficult for programs that were written for other file
systems to handle:
These limitations may cause ISO/ANSI-conforming programs
to fail when they use PDS members as input or output files. For instance,
it is reasonable for a program to assume that it can append data to the end
of a file. But due to the nature of PDS members, it is not feasible for a
C implementation to support this, except by saving a copy of the member and
then replacing the member with the copy. Although this technique is viable,
it is very inefficient in both time and disk space. (This tradeoff between
poor performance and reduced functionality is one that must be faced frequently
when using C I/O on the 370. PDS members, which are perhaps the most commonly
used kind of OS/390 file, are the most prominent examples of such a
tradeoff.)
Note:
Recent
versions of OS/390 support an extended form of PDS, called a PDSE. Some of
the previously described restrictions on a PDS do not apply to a PDSE. For
example, unused space is reclaimed automatically in a PDSE.
Two important OS-simulated
file types on CMS are the files known as MACLIBs and TXTLIBs. Both of these
are simulations of OS-partitioned data sets. MACLIBs are typically used to
collect textual data or source code; TXTLIBs may contain only object code.
Unlike OS PDS's, these files always have fixed-length, 80-character records.
In general, MACLIBs and TXTLIBs may not be written by
OS-simulated I/O. Instead, data are added or removed a member at a time by
CMS commands. Input from MACLIBs and TXTLIBs can be performed using either
OS-simulation or native CMS I/O.
In UNIX operating systems and similar systems, files
are identified in programs by name, and program flexibility with files is
achieved by organizing files into directories. Files with the same name may
appear in several directories, and the use of a command language to establish
working directories enables the user of a program to define program input
and output flexibly at run time.
In the traditional OS/390 file system, all files occupy
a single name space. (This is an oversimplification, but a necessary one.)
Programs that open files by a physical filename are limited to the use of
exactly one file at a site. You can use several techniques to increase program
flexibility in this area, none of which is completely satisfactory. These
techniques include the following:
-
Specify filenames in TSO format. When the time-sharing option
of OS/390 (TSO) is used, each user's files usually begin with a userid, thereby
ensuring that the filenames chosen by different users do not overlap. By convention,
a user running under TSO can omit the userid from a filename specification.
This helps considerably for those programs that always run interactively
and never in batch mode. However, userid is a TSO concept and, unless a site
uses optional software (such as an IBM or other vendor security system), programs
cannot be associated with a userid when running in batch.
-
Specify filenames as DDnames. Under OS/390-batch, using DDnames
to identify files is traditional. A DDname is an indirect name associated
with an actual filename or device addressed by a DD statement in batch or
an ALLOCATE command under TSO. Programs that use DDnames to identify files
are completely flexible. They can produce printed output, terminal output,
or disk output, depending only on their control language. Unfortunately,
control language must always be used, because there are no default file definitions.
Because most traditional filenames include periods,
which are not permitted in DDnames, programs from other environments may need
to be modified if they are to use DDnames, and if the logic of the program
will withstand such a change.
-
Determine filenames dynamically at run time rather than putting
them in the program. For instance, you may get filenames from the user or
from a profile or configuration file. This is the most flexible technique,
but it may require extensive program changes.
Under CMS,
you can use other techniques to increase program flexibility:
-
The concept of the CMS minidisk replaces the UNIX
directory concept. However, CMS minidisks are not arranged hierarchically,
as UNIX directories are arranged. CMS minidisks are not identified by name
or device address but by filemode letter, which is assigned by
using the CMS ACCESS command and can be changed at any time. (Because the
same filename may exist on several minidisks, it may be necessary to include
a filemode letter in a filename to make it unambiguous.) In many ways, the
minidisk with filemode letter A corresponds to the UNIX working directory,
but this analogy is only approximate.
-
CMS filenames use spaces in filenames rather than periods. This
is not a problem, because it is natural for a C library to treat the filename
xyz.c
as XYZ C under CMS.
-
The CMS shared file system is hierarchically
arranged, so there is often a natural correspondence between a UNIX pathname
and a shared filename. Unfortunately, the differing character conventions
of CMS and UNIX will generally inhibit a UNIX oriented program from running
unchanged with the shared file system. For example, the UNIX pathname
/tools/asm/main.c
is the same as the shared filename
MAIN C TOOLS.ASM.
-
CMS supports using DDnames for filenames instead
of physical filenames. This feature allows programs to be easily ported between
OS/390 and CMS. The file referred to by a DDname must be defined by using
the CMS FILEDEF command before a program that uses the DDname is executed.
Under OS/390, the concept of file existence is not nearly
so clear-cut as on other systems, due primarily to the use of DDnames and
control language. Since DDnames are indirect filenames, the actual filename
must be provided through control language before the start of program execution.
If the file does not already exist at the time the DD statement or ALLOCATE
command is processed, it is created at that time. Therefore, a file accessed
with a DDname must already exist before program execution.
An alternate interpretation of file existence under
OS/390 that avoids this problem is to declare that a file exists after a program
has opened it for output. By this interpretation, a file created by control
language immediately before execution does not yet exist. Unfortunately, this
definition of existence cannot be implemented because of the following technicalities:
A third interpretation of existence is to say that an
OS/390 file exists if it contains any data (as recorded in the VTOC). This
has the disadvantage of making it impossible to read an empty file but the
much stronger advantage that a file created by control language immediately
before program execution is perceived as not existing. (footnote 1)
This ambiguity about the meaning of existence applies
only to files with sequential organization. For files with partitioned organization,
only the file as a whole is created by control language; the individual members
are created by program action. This means that existence has its natural
meaning for PDS members, and that it is possible to create a PDS member containing
no characters.
CMS does not allow the existence of files containing
no characters, and it is not possible to create such a file.
The following
section lists some additional features of UNIX
operating systems and UNIX I/O that some programmers expect to be available
on the 370 systems. These features are generally foreign to the 370 environment
and impractical to implement. Code that expects the availability of these
features is not portable to the 370 no matter how successfully it runs on
other architectures.
-
UNIX
operating systems and many other systems support single-character unbuffered
terminal I/O in which characters can be read from a terminal one at a time
and may not appear on the screen until echoed by the program. This sort of
full-duplex protocol is not supported by 370 terminal controllers or operating
systems.
-
Many
programs assume that screen formatting is controlled by standard control sequences,
such as those used by the DEC VT100 and similar terminals. The common 370
terminal architecture (the 3270 family) bears no similarities whatsoever to
that of terminals commonly used with UNIX operating systems. Although OS/390
and CMS support the use of terminals similar to the VT100, they are not commonly
used and are not supported well enough to make running UNIX full-screen applications
on them a viable proposition.
-
The 370 operating systems offer little or
no support for the use of files by more than one program simultaneously.
Programs that want to do file sharing must issue system calls to synchronize
with each other and obey a number of restrictions in the way the shared files
are used. Because common system programs such as compilers, linkers, and
copy utilities do not attempt to synchronize in this way, attempting to share
files with these programs is unsafe.
-
There
is no OS/390 or CMS concept corresponding to the pipe. Data are usually passed
from program to program by means of temporary files.
-
In general, the size of a file cannot be determined in any way
other than by reading the entire file. The OS/390 and CMS equivalents of
directories and inodes record the file size in terms of either the number
of records or the hardware address of the end of file.
-
In UNIX operating systems and many other systems, the time at
which a file was last written or accessed can be determined easily. Under
OS/390, this information is not recorded. For PDS members, popular editors
frequently store such information in a control area of the file, but this
information is both difficult to access and not reliable, because updates
by programs that do not support this feature (such as linkers and copy utilities)
do not maintain the data appropriately.
The following list
describes the characteristics of 370 files
(without any special reference to the C language). The points are listed
in the same order as the corresponding points for the UNIX and ISO/ANSI I/O
models as presented earlier:
In an ideal C implementation, C I/O would
possess all three of
the following properties:
For the reasons detailed in
IBM 370 I/O Concepts, C I/O on the 370 cannot support all three
of these properties simultaneously. The library provides several different
kinds of I/O to allow the programmer to select the properties that are most
important.
The library offers two separate I/O packages:
Details on both of these I/O packages are presented
in the following sections. (footnote 2)
Standard I/O is implemented by the library in
accordance with
its definition in the C Standard. A file may be accessed as a binary stream,
in which case all characters of the file are presented to the program unchanged.
When file access is via a binary stream, all information about the record
structure of the file is lost. On the other hand, a file may be accessed
as a text stream, in which case record breaks are presented to the program
as new-line characters ('\n'). When data are written to a text file and then
read, the data may not be identical to what was written because of the need
to translate control characters and possibly to pad or split text lines to
conform to the attributes of the file.
Besides the I/O functions defined by the Standard, several
augmented functions are provided to exploit 370-specific features. For instance,
the
afopen
function is provided to allow
the program to specify 370-dependent file attributes, and the
afread
routine is provided to allow the program to process records
that may include control characters. Both standard I/O functions and augmented
functions may be used with the same file.
The
low-level C library routines that interface with
the OS/390 or CMS physical I/O routines are called C library access
methods or access methods for short. (The
term OS/390 access method always refers to access methods such
as BSAM, BPAM, and VSAM to avoid confusion.) Standard I/O supports five library
access methods:
"term"
,
"seq"
,
"rel"
,
"kvs"
, and
"fd"
. The file can span
multiple volumes.
When a file is opened, the library ordinarily selects
the access method to be used. However, when you use the
afopen
function to open a file, you can specify one of these particular
access methods.
-
The library uses the
"term"
access method to perform terminal I/O; this access method applies only to
terminal files. (See Terminal I/O for more information on this access
method.)
-
The
"rel"
access method is
used for nonterminal files whose attributes permit them to support UNIX file
behavior when accessed as binary streams.
-
The
"kvs"
access method is
used for VSAM files when access is via the SAS/C nonstandard keyed I/O functions.
(See Using VSAM Files.)
-
The
"fd"
access method is
used for files in the USS
hierarchical file system.
-
The
"seq"
access method is used
with all text streams
and for binary streams that cannot support the
"rel"
access method, except when
"fd"
is used.
The
"rel"
access method Under
OS/390, the
"rel"
access method can be
used for files with sequential organization and RECFM F, FS, or FBS. (The
limitation to sequential organization means that the
"rel"
access method cannot be used to process a PDS member.) Under
CMS, the
"rel"
access method can be used
for disk files with RECFM F. The
"rel"
access method is designed to behave like UNIX disk I/O:
Because
of the nature of the 370 file system, complete UNIX compatibility
is not possible. In particular, the following differences still apply:
The
"kvs"
access method The
"kvs"
access
method processes any file opened with the extension open mode
"k"
(indicating keyed I/O). This access method is discussed in more
detail in Using VSAM Files.
The
"fd"
access method The
"fd"
access method processes any file residing
in the USS OS/390 hierarchical file system. These files are fully compatible
with UNIX. In files processed with the
"fd"
access method, there is no difference between text and binary access.
The
"seq"
access method The
"seq"
access
method processes a nonterminal non USS file if any one of the following apply:
In general, the
"seq"
access method is implemented to use efficient native interfaces, forsaking
compatibility with UNIX operating systems where necessary. Some specific
incompatibilities are listed here:
The library provides UNIX style I/O to
meet two separate needs:
As a result of the second property, UNIX style I/O is
less efficient than standard I/O for the same file, unless the file is suitable
for
"rel"
access, or it is in the USS hierarchical
file system. In these cases, there is little additional overhead.
For files suitable for
"rel"
access, UNIX style I/O simply translates I/O requests into corresponding
calls to standard I/O routines. Thus, for these files there is no decrease
in performance.
For files in the USS hierarchical file system, UNIX
style I/O calls the operating system low-level I/O routines directly. For
these files, use of standard I/O by UNIX style I/O is completely avoided.
For other files, UNIX style I/O copies the file to a
temporary file using the
"rel"
access method
and then performs all requested I/O to this file. When the file is closed,
the temporary file is copied back to the user's file, and the temporary file
is then removed. This means that UNIX style I/O for files not suitable for
"rel"
access has the following characteristics:
-
The necessity of copying the data
makes UNIX style
I/O somewhat inefficient. However, after the copying is done, file operations
are efficient, except for
close
of an output
file, when all the data must be copied back. As an optimization, input data
are copied from the user's file only as necessary, rather than copying all
the data when the file is opened.
-
If there is a system failure while a file is being
processed with UNIX style I/O, the file is unchanged, because no data are
written to an output file until the file is closed.
-
It is possible for the processing of a file with
UNIX style I/O to fail if there is not enough disk space available to make
a temporary copy.
-
Because UNIX style I/O completely rewrites an
output file when the file is closed, file truncation does not occur. That
is, characters are not dropped as a result of updates before the end of file.
All of the discussion within this section assumes that
the user's file is accessed as a binary file: that is, without reference to
any line structure. Occasionally, there are programs that want to use this
interface to access a file as a text file. (Most frequently, such programs
come from non UNIX environments like the IBM PC.)
As an extension, the library supports using UNIX style
I/O to process a file as text. However, file positioning by character number
is not supported in this case, and no copying of data takes place. Instead,
UNIX style I/O translates I/O requests to calls equivalent to standard I/O
routines.
Note that UNIX style I/O represents open files as small
integers called file descriptors. Unlike UNIX, with OS/390 and CMS, file
descriptors have no inherent significance. Some UNIX programs assume that
certain file descriptors (0, 1, and 2) are always associated with standard
input, output, and error files. This assumption is nonportable, but the library
attempts to support it where possible. Programs that use file 0 only for input,
and files 1 and 2 only for output, and that do not issue seeks to these files,
are likely to execute successfully. Programs that use these file numbers
in other ways or that mix UNIX style and standard I/O access to these files
are likely to fail.
UNIX operating systems follow specific rules when assigning
file descriptors to open files. The library follows these rules for USS files
and for sockets. However, OS/390 or CMS files accessed using UNIX I/O are
assigned file descriptors outside of the normal UNIX range to avoid affecting
the number of USS files or sockets the program can open. UNIX programs that
use UNIX style I/O to access OS/390 or CMS files may therefore need to be
changed if they require the UNIX algorithm for allocation of file descriptors.
This section
describes SAS/C I/O from a 370 systems programmer's
perspective. In contrast to the other parts of this chapter, this section
assumes some knowledge of 370 I/O techniques and terminology.
Under OS/390, the five C library access methods are implemented
as follows:
-
The
"term"
access method uses TPUT ASIS to write to the terminal and TGET EDIT to read
from the terminal. Access to SYSTERM in batch is performed using QSAM.
-
The
"seq"
access
method uses BSAM and BPAM for both input and output. VSAM is used for access
to VSAM ESDS and KSDS data sets.
-
The
"rel"
access
method uses XDAP and BSAM. XDAP is used for input and to update all blocks
of the file except the last block. BSAM is used to update the last block
of the file or to add new blocks. VSAM is used to access VSAM relative record
data sets, and DIV is used to access VSAM linear data sets.
-
The
"kvs"
access
method uses VSAM for all operations.
-
The
"fd"
access
method uses USS service routines for all operations.
Although BDAM is not used by the
"rel"
access method, direct organization files that are
normally processed
by BDAM are supported, provided they have fixed-length records and no physical
keys.
The C library access methods
are implemented under CMS as follows:
-
The
"term"
access method uses TYPLIN or LINEWRT to write to the terminal and WAITRD or
LINERD to read from the terminal.
-
The
"seq"
access
method uses device-dependent techniques. For CMS disk files, it uses FSCB
macros (FSREAD, FSWRITE, and so on). For access to shared files, it uses
the CSL DMSOPEN, DMSREAD, and DMSWRITE services. For access to shared file
system directories, it uses the DMSOPDIR and DMSGETDI services.
For spool files, it uses CMS native macros such as RDCARD.
For tape files, filemode 4 disk files, and files on OS disks, it uses simulated
OS/390 BSAM. VSAM KSDS and ESDS data sets are processed using simulated VSE/VSAM.
-
The
"rel"
access
method uses FSCB macros. Where appropriate, it creates sparse CMS files.
VSAM RRDS data sets are processed using simulated VSE VSAM.
-
The
"kvs"
access
method uses VSE VSAM for all operations.
Under OS/390, a file can be processed by the
"rel"
access method if it is not a PDS or PDS
member, and if it has RECFM F, FS, or FBS. These record formats ensure that
there are no short blocks or unfilled tracks in the file, except the last,
and make it possible to reliably convert a character number into a block address
(in CCHHR form) for the use of XDAP. Use of
"rel"
may also be specified for regular files in the USS file system (in
which case the
"fd"
access method is used).
If the LRECL of an FBS file is 1, then an accurate end-of-file
pointer can be maintained without adding any padding characters. Because
of the use of BSAM and XDAP to process the file, use of this tiny record size
does not affect program efficiency (data are still transferred a block at
a time). However, it may lead to inefficient processing of the file by other
programs or languages, notably ones that use QSAM.
Under CMS, a file can be processed by the
"rel"
access method if it is a CMS disk file
(not filemode 4) with RECFM F. Use of RECFM F ensures that a character number
can be converted reliably to a record number and an offset within the record.
If the LRECL of a RECFM F file is 1, then an accurate
end-of-file pointer can be maintained without ever adding any padding characters.
Because the file is processed in large blocks (using the multiple record feature
of the FSREAD and FSWRITE macros), use of this tiny record size does not affect
program efficiency. Nor does it lead to inefficient use of disk space, because
the files are physically blocked according to the minidisk block size. However,
it may lead to inefficient processing of the file by other programs or languages
that process one record at a time.
Temporary files are created by the library under two circumstances.
A program can create more than one temporary file during
its execution. Each temporary file is assigned a temporary file number, starting
sequentially at 1. When a temporary file is closed, its file number becomes
available again. When a new temporary file is opened, the lowest available
file number is assigned.
One of two methods is used to create a temporary file
with number nn. First, a check is made for a
SYSTMPnn DD statement. If the file number is
larger than 99, the last two characters of the DDname are in an encoded form.
If this DDname is allocated and references a temporary data set, this data
set is associated with the temporary file. If no SYSTMPnn
DDname is allocated, the library uses dynamic allocation to create a new temporary
file whose data set name depends on the file number. (The system is allowed
to select the DDname, so there is no dependency on the SYSTMPnn style of name.) The data set name depends also on information
associated with the running C programs, so that several C programs can run
in the same address space without conflicts occurring between temporary filenames.
Note:
If an attempt is made to open temporary file nn and a
SYSTMPnn DD statement
of the appropriate kind is defined but the file is in use by another SAS/C program
running in the same address space, the file number is considered to be unavailable,
and the lowest available file number not in use by another such program is
used instead.
If a program is compiled with the
posix
compiler option, then
temporary files are created in the USS
hierarchical file system, rather than as OS/390 temporary files. The USS
temporary files are created in the
/tmp
HFS directory.
Temporary files are normally allocated using a unit
name of VIO and a space allocation of 50 tracks. The unit name and default
space allocation can be changed by a site, as described in the SAS/C installation
instructions. If a particular application requires a larger space allocation
than the default, use of a SYSTMPnn DD statement
specifying the required amount of space is recommended.
Temporary files are created by the library under two circumstances.
A program can create more than one temporary file during
its execution. Each temporary file is assigned a temporary file number, starting
sequentially at 1.
One of two methods is used to create the temporary file
whose number is nn. First, a check is made for
a FILEDEF of the DDname SYSTMPnn. If this DDname
is defined, then it is associated with the temporary file. If no SYSTMPnn DDname is defined, the library creates a file whose name
has the form $$$$$$nn $$$$xxxx,
where nn is the temporary file number, and the xxxx part of the filetype is associated with the calling C program.
This naming convention allows several C programs to execute simultaneously
without conflicts occurring between temporary filenames.
Temporary files are normally created by the library
on the write-accessed minidisk with the most available space. Using FILEDEF
to define a SYSTMPnn DDname with another filemode allows you
to use some other technique if necessary.
Be aware that these temporary files are not known to
CMS as temporary files. Therefore, they are not erased if a program terminates
abnormally or if the system fails during its execution.
The SAS/C library supports two different kinds of access to VSAM
files: standard access, and keyed access. Standard access is used when a
VSAM file is opened in text or binary mode, and it is limited to standard
C functionality. Keyed access is used when a VSAM file is used in keyed mode.
Keyed mode is discussed in detail in Using VSAM Files.
Any kind of VSAM file may be used via standard access.
Restrictions apply to particular file types, for example, a KSDS may not
be opened for output using standard I/O.
-
The library supports VSAM ESDS data sets as it
supports other sequentially organized file types. A VSAM ESDS can be accessed
as a text stream or a binary stream using standard I/O or UNIX style I/O.
A VSAM ESDS is not suitable for processing by the
"rel"
access method because it is not possible, given a character position,
to determine the RBA (relative byte address) of the record containing the
character.
-
The library supports VSAM KSDS data sets for input
only. Output is not supported for standard access because the C I/O routines
are unaware of the existence of keys and cannot guarantee that new records
are added in key order. Use keyed access instead. Also, file positioning with
fseek
or
fsetpos
is not supported, because records are ordered by key, and it is not
possible to transform a key value into the file position formats used for
other library file types. When a KSDS is used for input, records are always
presented to the program in key order, not in the order of their physical
appearance in the data set. Note that KSDS output is available when keyed
access is used.
-
The library supports VSAM RRDS data sets for access
via the
"rel"
access method only. Only
RRDS data sets with a fixed record length are supported. As with all other
files accessed via
"rel"
, file positioning
using
fseek
and
ftell
are fully supported.
-
The library supports VSAM linear data sets that
are also known as Data-in-Virtual (DIV) objects. You must access a DIV object
as a binary stream, and you must use the
"rel"
access method. As with all other files accessed via
"rel"
, file positioning using
fseek
and
ftell
are fully supported.
VSAM ESDS, KSDS and RRDS files are processed using a
single RPL. Move mode is used to support spanned records. A VSAM file cannot
be opened for write only (open mode
"w"
)
unless it was defined by Access Method Services to be a reusable file.
FOOTNOTE 1:
This is the interpretation used in the SAS/C
implementation.
FOOTNOTE 2:
Two
other I/O packages are provided: CMS low-level I/O, defined for low-level
access to CMS disk files, and OS low-level I/O, which performs OS-style sequential
I/O. These forms of I/O are nonportable and are discussed in Chapter 2, "CMS
Low-Level I/O Functions," and Chapter 3, "OS/390 Low-Level I/O Functions,"
in
SAS/C Library Reference, Volume 2.
FOOTNOTE 3:
The library connects the use
of the UNIX low-level I/O interface and the ability to do seeking by character
number because UNIX documentation has traditionally stressed that seeking
by character number is not guaranteed when standard I/O is used. The UNIX Version 7 Programmer's Manual states that the file position used
by standard I/O "is measured in bytes only on UNIX; on some other systems
it is a magic cookie."
Copyright © 2001
by SAS Institute Inc., Cary, NC, USA. All rights reserved.