Porting UNIX Socket Applications to the SAS/C® Environment

Introduction

The SAS/C Socket Library provides the highest practical level of compatibility with the BSD UNIX socket library for MVS and CMS environments. Programs whose only UNIX dependencies are in the area of sockets or other UNIX features already supported by the SAS/C Library can be compiled and run with little or no modification.

Because the socket functions are integrated with the existing SAS/C Library and are not add-on features, many of the incompatibilities of other MVS and CMS socket implementations have been avoided. For example, there are no requirements for additional header files that are specific to MVS or CMS environments; errno, and not some other variable specific to MVS or CMS, is used for socket function error codes, and the close, read, and write calls operate on both files and sockets, just as they do in a UNIX operating system.

There are still some areas where compatibility between MVS and CMS and the UNIX operating system is not possible. This chapter describes the areas of incompatibility that may cause problems when porting socket code from UNIX socket implementations to the SAS/C environment.

Integrated and Non-Integrated Sockets

Under the MVS/ESA 5.1 operating system, OpenEdition supports integrated sockets. This feature provides a TCP/IP socket interface that is integrated with OpenEdition support instead of being an interface to TCP/IP software implemented only in the run-time library. When you use integrated sockets, an open socket has an OpenEdition file descriptor, which can be used like any other OpenEdition file descriptor. For instance, unlike a non-integrated socket, an integrated socket remains open in a child process created by fork, or in a program invoked by an exec function. Thus, when integrated sockets are used, a higher degree of UNIX compatibility is available than when non-integrated sockets are used.

You must decide whether your application is going to use integrated or non-integrated sockets. For example, an application that may run on a system that does not support OpenEdition should use non-integrated sockets. The setsockimp function specifies whether integrated or non-integrated sockets are being used. This function must be called before any other socket-related functions are called. By default, integrated sockets are used with exec-linkage applications, and non-integrated sockets are used otherwise.

Socket Library Restrictions

While almost every socket-related BSD function is available in the SAS/C Library, not all of the traditional UNIX features of these functions are available. The descriptions in Socket Function Reference , describe the features of each function. This section contains a summary of the most significant restrictions.

Socket Descriptors

With Release 6.00 of the SAS/C Compiler, socket descriptors are assigned from the same range and according to the same rules as UNIX file descriptors. This aids in the porting of socket applications, since many such applications depend on this particular assignment of socket numbers. If OpenEdition is installed and running, the maximum number of open sockets and hierarchical file system (HFS) files is set by the site; the default is 64. If OpenEdition is not installed or not active, the maximum number of open sockets is 256. Note that programs written for previous releases of SAS/C software, which assume that socket numbers range from 256 to 511, may need to be modified to accommodate UNIX compatible socket number assignment.

Addressing Families

The BSD socket library design supports the use of more than one type of transport mechanism, known as the addressing family. UNIX implementations usually support at least two addressing families: AF_INET and AF_UNIX. AF_INET uses TCP/IP to transport data. AF_UNIX transports the data using the UNIX file system.

With integrated sockets, either AF_INET or AF_UNIX can be used. With non-integrated sockets, only AF_INET can be used. Programs that use AF_UNIX can usually be modified to use AF_INET.

Sockets

Many of the restrictions in the use of UNIX features are caused by the underlying TCP/IP implementation. These restrictions may vary, depending on the TCP/IP vendor and release. Vendor-specific restrictions affect the following:

socket types. Types other than SOCK_STREAM and SOCKET_DGRAM may not be supported.
socket options used by the setsockopt and getsockopt functions.
fcntl commands.
ioctl commands.
errno values.

In addition, there are the following general restrictions:

Asynchronous I/O to sockets is not supported. (Non-blocking I/O is supported.)
The socketpair function is supported only with integrated sockets.

Function Names

UNIX operating systems support very long external names. The MVS and CMS linking and library utilities restrict external names to eight characters. The SAS/C Compiler and COOL utility can map long external names into eight-character names, making the external MVS and CMS restrictions invisible in most cases.

The SAS/C Library also supports the #pragma map statement, which directs the compiler to change an external name in the source to a different name in the object file. Thus, long names are shortened, and names that are not lexically valid in C language can be generated. The socket library header files change long socket function names by including #pragma map. For example, the <netdb.h> header file contains the following statement:

 #pragma map (gethostbyname, "#GHBNM")

This statement changes the gethostbyname function to #GHBNM. The #pragma map statement enables you to use TCP/IP functions with long names in your source and does not require that you use the extended names option or the COOL utility.

#pragma map statements are already in the header files required for each function. You normally do not have to modify your source to accommodate long names.

If your program produces an unresolved external reference for a socket function containing a long name, first make sure that you have included the appropriate header files as listed in the description for each socket function in Socket Function Reference . The following header files are not always required by UNIX C compilers but are required by the SAS/C Compiler to resolve long names:

Include the <arpa/inet.h> header file with the inet_* functions, such as inet_addr. This is correct coding practice but is not required by UNIX C compilers. As a compatibility feature, the SAS/C Library file <netinet/in.h> includes the <arpa/inet.h> file.
Include the <netdb.h> header file with the gethostid and gethostname functions. This is not required by UNIX C compilers. To reduce incompatibilities caused by failure to include the <netdb.h> header file in existing source code, #pragma map statements for these functions are also available in the <sys/types.h> header file.
The <socket.h>, <netdb.h>, <resolv.h>, and <nameser.h> header files all contain #pragma map statements for the functions that require them. Most UNIX programs include these headers with the appropriate functions.

The functions in most programs ported from a UNIX operating system have long names. If a function in your program contains a long name, use the extname name compiler option and the COOL utility to compile and link your program. The effects of both the #pragma map statement and the extname option are not usually visible to the user. For information on the significance of these features during machine-level debugging, reading link-edit maps, and writing zaps, refer to Appendix 7, "Extended Names," in the SAS/C Compiler and Library User's Guide.

Header Filenames

UNIX header filenames are really pathnames that relate to the /usr/include directory. In most cases, the headers reside directly in the /usr/include directory with no further subdirectories in the pathname. For example, the <netdb.h> header file resides in the /usr/include directory. MVS and CMS file structures do not include subdirectories. All angle-bracketed include files are in the SYSLIB concatenation under MVS or in the GLOBAL MACLIB concatenation under CMS. The SAS/C Compiler ignores subdirectories included in the filename. Specifying <sys/socket.h> appears the same to MVS and CMS systems as specifying <socket.h>.

Header files such as <arpa/nameser.h> and <sys/socket.h> are placed in an MVS partitioned data set or CMS macro library based on the last part of the filename, for example, socket.h. Because of this, a UNIX program that specifies a subdirectory in the header file pathname can work without modification. It is best to code the pathname even for programs intended for use with SAS/C software because they can be ported back to a UNIX operating system more easily and because future releases of the SAS/C Compiler may attach significance to these pathnames. The header files listed with the socket function descriptions in Socket Function Reference include subdirectories.

errno

When there is an error condition, most socket functions return a value of -1 and set errno to a symbolic value describing the nature of the error. The perror function prints a message that explains the error. The SAS/C Library adheres as closely as possible to symbolic UNIX errno values. However, except for specifically defined errno values, such as EWOULDBLOCK, programs may not receive exactly the same errno values as programs would in a particular implementation of the UNIX operating system. The message printed by the perror function may also differ.

Because errno is a macro and not a simple external variable, you should always declare it by including the <errno.h> header file.

Two external symbols, h_errno and _res, are defined parts of the network database and resolver interfaces. h_errno values are the same as those in common versions of the UNIX operating system, but the herror text may be different. As with errno, you cannot declare these symbols directly. Always declare them by including the appropriate header file.

BSD Library Dependencies

Many socket programs implicitly assume the presence of the BSD library. For example, the BSD function bcopy is widely used in socket programs even though it is not portable to UNIX System V. Because of the lack of acceptance of such routines outside of the BSD environment and the fact that the same functionality is often available using ANSI Standard routines, BSD string and utility functions have not been added to the SAS/C Library.

Many UNIX programs ported to the SAS/C Library already contain support for System V environments in which Berkeley string and utility routines are not available. These programs usually call ANSI Standard functions instead. Using ANSI Standard functions is the best means of porting UNIX programs to the SAS/C Library because common ANSI string functions are often built in, and because the code will be more portable to other environments.

To ease porting of code that relies on Berkeley string and utility routines, the SAS/C Usage Notes sample library contains the member BSDSTR, which includes sample source code for the following functions:

bcopy
bzero
bcmp
index
rindex
ffs.

The BSD <strings.h> header file is also available to facilitate the compilation of programs that rely on Berkeley string and utility routines. By default, the <strings.h> header file does not define macros for the functions in the previous list because problems arise in compiling programs that contain direct declarations of the functions. If your program does not contain direct declarations of functions, you can use the #define option to define _BSD_MACROS before you include the <strings.h> header file. Refer to Chapter 7, "Compiler Options," in the SAS/C Compiler and Library User's Guide, Fourth Edition for information on the define compiler option.

INETD and BSD Kernel Routine Dependencies

One of the greatest hurdles to overcome in porting some BSD socket programs is their dependence on BSD kernel routines, such as fork, that are not supported by the SAS/C Compiler (except under OpenEdition MVS). This level of dependency is greatest in BSD daemon programs called from INETD, the UNIX TCP/IP daemon.

Except under OpenEdition, SAS/C software does not support the following UNIX kernel routines that are commonly used in TCP/IP daemon processes and other UNIX programs:

fork: In the UNIX operating system, the fork system call creates another process with an identical address space. This process enables sockets to be shared between parent and child processes. The fork function is available under OpenEdition, and UNIX socket behavior occurs if integrated sockets are specified. However, creating an identical address space this way is not possible under traditional MVS or CMS, although the ATTACH macro may be used under MVS to achieve similar results.
exec: Under UNIX, the exec system call loads a program from an ordinary, executable file onto the current process, replacing the current program. With OpenEdition, the exec family of functions may be used to create a process, and the UNIX socket behavior occurs if integrated sockets are specified. Under traditional MVS or CMS, the ISO/ANSI C system function sometimes can be used as an alternative to the exec routine, but the semantics and effects are different.
dup,dup2: Unlike UNIX style I/O, standard I/O is the most efficient and lowest level form of I/O under MVS and CMS because of the implementation-defined semantics of ISO/ANSI C. The looser semantics place standard I/O closer to native MVS and CMS I/O than to UNIX style I/O. The inverted relationship between UNIX style I/O and standard I/O inhibits dup implementation under traditional MVS and CMS. However, with OpenEdition, dup and dup2 are available, and UNIX socket behavior occurs if integrated sockets are specified.
socketpair, pipe: The socket pair and pipe calls are not useful without the fork system call.

Daemons created by INETD depend heavily on the UNIX environment. For example, they assume that a dup call has been issued to correspond to the stdin, stdout, and stderr file pointers. This correspondence relies on the way the fork system call handles file descriptors.

System programs that involve the INETD daemon must be redesigned for MVS or CMS. The SAS/C Library provides the givesocket, takesocket, and getclientid functions to allow a socket to be passed between cooperating processes in the absence of the fork system call. Refer to Socket Function Reference for more information on these functions.

Character Sets

EBCDIC-to-ASCII translation is one of the greatest sources of socket program incompatibility. When communicating with a program in almost any environment other than MVS and CMS, text must be translated from EBCDIC into ASCII. If all transmitted data were text, the SAS/C Library could translate text automatically to ASCII before sending the data, and it could translate the text to EBCDIC automatically when receiving data. Unfortunately, only the program knows which data are text and which data are binary. Therefore, the program must be responsible for the translation.

The SAS/C Library provides the htoncs and ntohcs routines to facilitate EBCDIC-to-ASCII translation. ASCII is the character set used in network text transmission. The htoncs and ntohcs routines are not portable to UNIX, but you can define them as null function-like macros for environments other than CMS and MVS. You can recompile these routines if you want to use a different EBCDIC-to-ASCII translation method.

Note that, except when using the resolver (see the following section, "The Resolver," for more information), the SAS/C Library does not perform any translations from ASCII to EBCDIC.

The Resolver

In addition to the standard communication and network database routines in the UNIX environment, the SAS/C Library contains a complete implementation of the BSD resolver and provides the standard UNIX interface for resolver programming. This facilitates the writing of applications that communicate with Internet name servers. The resolver is compatible with the UNIX operating system because the routines are derived from the BSD network source. There are, however, three compatibility issues that should be considered:

ASCII-to-EBCDIC translation is performed automatically by the dn_expand function, and EBCDIC-to-ASCII translation is performed by the dn_comp function. These translations should resolve any ASCII-to-EBCDIC translation problems for domain names without requiring special code in the application. The SAS/C Library can translate automatically in this instance because it recognizes that the data are intended to be ASCII text.
Routines that are declared to be external in the BSD name server but that are not documented in the UNIX man pages (for example, the routines that print resolver debugging information) cannot be called directly in the SAS/C implementation.
The _res variable cannot be declared directly in a program because it is implemented as a macro in SAS/C software. Include the <resolv.h> header file for a definition for the _res variable.