Chapter Contents

Previous

Next
SAS/C C++ Development System User's Guide, Release 6.50


C++ Language Definition

This section describes the main features of the SAS/C C++ Development System implementation. The discussion does not attempt to teach you C++ and assumes you have access to the SAS/C Compiler and Library User's Guide, Fourth Edition.

The C++ language accepted by the SAS/C C++ Development System is generally compatible with that specified by Bjarne Stroustrup in the The C++ Programming Language, Second Edition, with the exception of exception handling but with the addition of several ANSI C++ features. C++ code written for previous releases of the SAS/C C++ Development System generally will be accepted by Release 6.50. However, this release has increased compatibility with the ANSI/ISO C++ draft standard by implementing new features and by tightening language rules for certain obscure or unsafe constructs. Release 6.50 of the SAS/C C++ Development System includes these new ANSI C++ features:

Because C++ is in general a superset of C, the C++ language includes many SAS/C features. You can find a detailed definition of the SAS/C implementation in "Language Definition" in the SAS/C Compiler and Library User's Guide. This section describes only those features that are different from SAS/C behavior or that are specific to the C++ environment.

If you have a C++ program that is compliant with AT&T C++ version 3.0, your code generally should work with the SAS/C C++ Development System.

If you have a C++ program that is compliant with AT&T C++ version 2.1, with the exception of some features that are now considered anachronistic, your code generally should work with the SAS/C C++ Development System. The main differences between AT&T C++ 2.1 and the SAS/C C++ Development System are the following:

For a complete list of which anachronisms are supported by the SAS/C C++ Development System, see Anachronisms .


Incompatibility with Previous Releases

Object code generated by Release 6.50 of the C++ translator is not compatible with object code generated by previous releases of the C++ translator and is not compatible with the C++ library for previous releases.


Environmental Elements

This section describes four important environmental elements:


Special characters

C++ uses a number of special characters. Many IBM mainframe terminals and printers do not supply all of these characters. The SAS/C C++ Development System provides two solutions to this problem:

The special character translation table enables each site to customize the representation of special characters. That is, sites can decide which hexadecimal bit pattern or patterns represent that character and so can choose a representation that is available on their terminals and printers.

The special characters that can be customized are braces, square brackets, circumflex, tilde, backslash, vertical bar, pound sign, and exclamation point. You should determine if your site has customized values for these characters and find out what the values are. Otherwise, the default representations listed in Default Representations for Special Characters are in effect. Consult your SAS Software Representative for details about customized values. Default Representations for Special Characters shows the two possible default representations for each character. These primary and alternate representations in columns two and three are EBCDIC equivalents of the characters in hexadecimal notation.

Remember that the alternate representations for characters apply only to C++ program source code and not to general file contents read by C++ programs.

Default Representations for Special Characters
Source File Representation
Character Primary Alternate
left brace { 0xc0

{

0x8b

{

right brace } 0xd0

}

0x9b

}

left bracket [ 0xad

[

0xad

[

right bracket ] 0xbd

]

0xbd

]

circumflex ^

(exclusive or)

0x5f

¬

0x71

^

tilde ~ 0xa1

~

0xa1

~

backslash \ 0xe0

\

0xbe

[ne]

vertical bar | or ¦

(inclusive or)

0x4f

|

0x6a

¦

pound sign # 0x7b

#

0x7b

#

exclamation point ! 0x5a

!

0x5a

!



Storage class limits

The SAS/C Compiler imposes several limits on the sizes of various objects and these may affect your C++ program after it is translated into C and compiled.

The total size of all objects declared in one translation with the same storage class is limited according to the particular storage class, as follows:

extern 16,777,215 (16M-1) bytes
static 8,388,607 (8M-1) bytes
auto 8,388,607 (8M-1) bytes
formal 65,535 (64K-1) bytes.

Individual objects can be up to 8 megabytes in size. The translator imposes no limit on array sizes.

The following types of programs generate very large CSECTS:

 

You should consider alternatives to using large amounts of static data. One alternative is to use the   new operator for dynamic storage allocation. Storage allocated with the new operator is limited only by available memory. 

Numerical limits

The numerical limits are what one would expect for a 32-bit, twos complement machine such as the IBM 370. Integral Type Sizes shows the size ranges for the integral types.

Integral Type Sizes
Type Length in Bytes Range
char
1 0 to 255 (EBCDIC character set)
signed char
1 -128 to 127
short
2 -32,768 to 32,767
unsigned short
2 0 to 65,535
int
4 -2,147,483,648 to 2,147,483,647
unsigned int
4 0 to 4,294,967,295
long
4 -2,147,483,648 to 2,147,483,647
unsigned long
4 0 to 4,294,967,295

Float and Double Type Sizes shows the size ranges for float and double types.  

Float and Double Type Sizes
Type Length in Bytes Range
float
4 +/- 5.4E-79 to +/- 7.2E75
double
8 +/- 5.4E-79 to +/- 7.2E75
long double
8 +/- 5.4E-79 to +/- 7.2E75

 

Additional details on the implementation of the various data types can be found in "Compiler Processing and Code Generation Conventions," in the SAS/C Compiler and Library User's Guide.

Source file sequence number handling

The translator examines the first record in the source file and each #include file to determine if the file contains sequence numbers. Therefore, you can safely mix files with and without sequence numbers and use the translator on sequenced or nonsequenced files without worrying about specifying a sequence number parameter.

For a file with varying-length records, if the first four characters in the first record are alphanumeric and the following four characters are numeric, then the file is assumed to have sequence numbers.

For a file with fixed-length records, if the last four characters in the first record are all numeric and the preceding four characters are alphanumeric, then the file is assumed to have sequence numbers.

If a file is assumed to have sequence numbers, then the characters in each record at the sequence number position are ignored. This algorithm detects sequence numbers or their absence correctly for almost all files, regardless of record type or record length. Occasionally the algorithm may cause problems, as in the following examples:



Language Elements

Certain language elements, such as constants and predefined constants, deserve special explanation, as the translator treats them in accordance with the language described in The C++ Programming Language.

Constants

This section describes how the translator treats character constants and string literals.

Character constants
The translator produces a unique char value for certain alphabetic escape sequences that represent nongraphic characters. This  char value corresponds to the hex values shown in column 2 of Escape Sequence Values .  

Escape Sequence Values
Sequence Hex Value Meaning
\a 0x2f alert
\b 0x16 backspace
\f 0x0c form feed
\n 0x15 newline
\r 0x0d carriage return
\t 0x05 horizontal tab
\v 0x0b vertical tab

String literals
By default, identically written string constants refer to the same storage location: only one copy of the string is generated by the translator. The NOSTringdup compiler option can be used to force a separate copy to be generated for each use of a string literal. However, modifying string constants is not recommended and renders a program nonreentrant. 

Note:   Strings used to initialize char arrays (not  char* ) are not actually generated because they are shorthand for a comma-separated list of single-character constants.  [cautend]

 

Predefined constants

The translator supports several predefined constants:

 

These macros are useful for generating diagnostic messages and inline program documentation. The following list explains the meaning of each macro:

_ _cplusplus
 expands to the decimal constant 1.

c_plusplus
 expands to the decimal constant 1.

_ _DATE_ _
 expands to the current date, in the form Mmm dd yyyy (for example, Jan 01 1990). Double quotes are a part of the expansion; no double quotes should surround _ _DATE_ _ in your source code.

  _ _FILE_ _
 expands to a string literal that specifies the current filename. Double quotes are a part of the expansion; no double quotes should surround _ _FILE_ _ in your source code. 

For the primary source file under MVS batch, _ _FILE_ _ expands to the data set name of the source file, if it is a disk data set, or the DDname allocated to the source file. For the primary source file under CMS,   _ _FILE_ _ expands to "filename filetype", where filename is the CMS filename and filetype is the CMS filetype. 

For a #include or header file, under both MVS and CMS,   _ _FILE_ _ expands to the name that appears in the   #include statement, including the angle brackets or double quotes as part of the string. Thus, for the following,   _ _FILE_ _ expands to "\"   myfile.h \"":

 #include "myfile.h"

For the following, _ _FILE_ _ expands to "   <myfile h e> ":

 #include <myfile h e>

_ _LINE_ _
 expands to an integer constant that is the relative number of the current source line within the file (primary source file or #include file) that contains it.

_ _TIME_ _
 expands to the current time, in the form hh:mm:ss (for example,  10:15:30 ). Double quotes are a part of the expansion; no double quotes should surround   _ _TIME_ _ in your source code.

 

None of the above predefined macros can be undefined with the #undef directive. 

The translator also provides the following predefined macro names. Automatic predefinition of these names can be collectively suppressed by using the undef translator option. (Refer to Option Descriptions for more information on  undef .) These macro names also can be undefined by the  #undef preprocessor directive. 

The following code shows their usage:
#define OSVS 1   // if translating under TSO
                 // or MVS batch
#define CMS 1    // if translating under CMS
#define I370 1   // indicates the SAS/C
                 // Compiler or the translator
#define DEBUG 1  // if the DEBug option is
                 // used
#define NDEBUG 1 // if the DEbug option is not used
 

A few of the predefined macros can only be undefined by the #undef preprocessor directive. They are not affected by the  undef translator option. These macros are:
#define _ _COMPILER_ _ "SAS/C C++ 6.50B" // indicates
      // the current release
      // as a string
#define _ _I370_ _ 1   // indicates the SAS/C
      // Compiler or the translator
#define _ _SASC_ _ 650 // indicates the current
      // version as a number,
      // for example, 650
                 
 

Note:   Because the translator is not a C compiler, the _ _STDC_ _ macro is not defined.  [cautend] 


Language Extensions

This section describes SAS/C extensions to the language described in The C++ Programming Language.

Note:   Use of these extensions is likely to render a program nonportable.  [cautend]

For information on SAS/C extensions to the C language, such as the _ _asm keyword, the   _ _alignmem and   _ _noalignmem keywords, and keywords used in declarations of functions that are neither C++ nor C, see the SAS/C Compiler and Library User's Guide, Fourth Edition. Also refer to the SAS/C Compiler and Library User's Guide for a discussion of the implementation-defined behavior of the SAS/C Compiler. 

Preprocessor extensions

Two #pragma directives are handled by the SAS/C C++ Development System directly:
#pragma linkage
#pragma map
 

These #pragma directives are described in the SAS/C Compiler and Library User's Guide. In C++ programs, these directives can be applied only to functions and variables that have  extern "C" linkage (that is, they are declared in an extern "C" block or have extern "C" in their declaration). 

The _ _ibmos SAS/C extension keyword is a simpler and more direct replacement for  #pragma linkage . The   _ _ibmos keyword is described in the SAS/C Compiler and Library User's Guide. AR370 is a simpler and more powerful replacement for  #pragma map . The AR370 utility is described in the SAS/C Compiler and Library User's Guide. 

All other #pragma directives are passed on directly to the output C file and are otherwise ignored by C++. 

SAS/C extension keywords

You can use the following SAS/C extension keywords in your C++ programs:
_ _asm         _ _local      _ _weak
_ _cobol       _ _pascal
_ _foreign     _ _pli
_ _fortran     _ _ref
_ _ibmos       _ _remote
 

Overloading on these SAS/C extension keywords is supported. The following example shows overloading error_trap to take both local and remote function pointers:
int error_trap(_\x12_local void(*f)());
int error_trap(_\x12_remote void(*f)());
Functions defined using one or more of the keywords   _ _ibmos ,   _ _asm , or   _ _ref must be written in assembler. Therefore, the translator assumes "C" linkage for these functions, even if extern "C" is not explicitly used. Similarly,   _ _pli ,   _ _cobol ,   _ _fortran ,   _ _pascal , and   _ _foreign functions have linkage appropriate for the language and therefore do not have C++ linkage. The main effect of this behavior is that overloading the following functions is not allowed:
_ _asm int myfunc(int);
_ _pli int myfunc(int*);
These functions cannot be overloaded because only one linkage version of a function that is not C++ is permitted. 

For more information on SASC extension keywords, see the SAS/C Compiler and Library User's Guide.

Alternate forms for operators and tokens

C++ is traditionally implemented using the ASCII character set. The translator uses EBCDIC as its native character set because EBCDIC is the preferred character set under TSO, CMS, and MVS batch. Because some characters used by the C++ language are not normal EBCDIC characters (that is, they do not appear on many terminal keyboards), alternate representations are available. Also, for some characters, there is more than one similar EBCDIC character. The translator accepts either character.

Digraph Sequences for Special Characters lists alternate representations that the translator accepts (this set of digraphs is identical to the digraph set accepted by the SAS/C Compiler). The digraph option(s) chosen determines which alternate forms are used:

digraph option 1
turns on the new ISO digraph support.

digraph option 2
turns on SAS/C bracket digraph support, '(|' or '|)'.

digraph option 3
turns on all SAS/C digraphs but does not activate the new ISO digraphs unless option 1 is also activated.

See Option Descriptions for more information on digraph options.

Digraph Sequences for Special Characters
C++ Character EBCDIC Value(s) (hex) Alternate Forms (for use with digraph options 2, 3) Alternate Forms (for use with digraph option 1)
[ (left bracket) 0xad (| <:
] (right bracket) 0xbd |) :>
{ (left brace) 0x8b, 0xc0 \( or (< <%
} (right brace) 0x9b, 0xd0 \) or >) %>
| (inclusive or) 0x4f, 0x6a \!
~ (tilde) 0xa1
#(pound sign) 0x7b
%:
##(double pound sign) 0x7b 0x7b
%:%:
\ (backslash) 0xe0, 0xbe (see below)

For all symbols except the backslash, substitute sequences are not replaced in string constants or character constants. For example, the string constant "<:" contains two characters, not a single left bracket. Contrast this behavior with the ANSI trigraphs, which are replaced in string and character constants.

The backslash is a special case because it has meaning within string and character constants as well as within C++ statements. You can also customize the translator to accept an alternate single character for the backslash, as well as for other characters in Digraph Sequences for Special Characters . The default alternate representations are listed in Default Representations for Special Characters . See your SAS Software Representative for more information.

Embedded $ in identifiers

The dollar sign ($) can be used as an embedded character in identifiers. If the dollar sign is used in identifiers, the dollars translator option must be specified. Use of the dollar sign is not portable because the dollar sign is not part of the portable C++ character set. The dollar sign cannot be used as the first character in an identifier; such usage is reserved for the library. 

Floating-point constants in hexadecimal

An extended format for floating-point constants enables them to be specified in hexadecimal to indicate the exact bit pattern to be placed in memory. A hexadecimal double constant consists of the sequence  0.x , followed by 1 to 14 hexadecimal digits. If there are fewer than 14 digits, the number is extended to 14 digits on the right with 0s. A hexadecimal  double constant defines the exact bit pattern to be used for the constant. For example,   0.x411 has the same value as  1.0 . Use of this feature is nonportable. 

Call-by-reference operator (@)

The @ operator is a language extension provided primarily to aid communication between C++ and other programs. 

In C++ (as in C), the normal argument-passing convention is to use call-by-value; that is, the value of an argument is passed. The normal IBM 370 (neither C nor C++) argument-passing conventions differ from this in two ways. First, arguments are passed by reference; each item in the parameter list is an argument address, not an argument value. Second, the last argument address in the list is usually flagged by setting the high-order bit.

One approach to the call-by-reference problem is to precede each function argument by the & operator, thereby passing the argument address rather than its value. For example, you can write  asmcode(&x) rather than  asmcode(x) . This approach is not generally applicable because it is frequently necessary to pass constants or computed expressions, which are not valid operands of the address-of operator. The translator provides an option to solve this problem. 

When the translator option at is specified, the at sign (  @ ) is treated as an operator. The  @ operator can be used only on an argument to a function call. The result of using it in any other context is undefined. The  @ operator has the same syntax as the C ampersand (  & ) operator. In situations where the C  & can be used,  @ has the same meaning as  & . In addition,  @ can be used on values that are not lvalues such as constants and expressions. In these cases, the value of  @ expr is the address of a temporary storage area to which the value of expr is copied. One special case for the  @ operator is when its argument is an array name or a string literal. In this case,  @array is different from  &array . The latter still addresses the array, while  @array addresses a pointer addressing the array. Use of  @ is, of course, nonportable. Its use should be restricted to programs that call routines, that are not C++, .using call-by-reference. 

When declaring a call by reference instead of using the @ notation, you may want to use the   _ _asm or   _ _ref keyword described in the SAS/C Compiler and Library User's Guide. 

Nesting of #define

If the redef translator option is specified, multiple  #define statements for the same symbol can appear in a source file. When a new  #define statement is encountered for a symbol, the old definition is stacked but is restored if an #undef statement for the symbol occurs. For example, if the line
#define XYZ 12
 

is followed later by
#define XYZ 43
 

the new definition takes effect, but the old one is not forgotten. Then, when the translator encounters the following, the former definition (12) is restored:
#undef XYZ
 

To completely undefine XYZ, an additional #undef is required. Each  #define must be matched by a corresponding  #undef before the symbol is truly forgotten. Identical  #define statements for a symbol (those permitted when  redef is not specified) do not stack. 

Zero-length arrays

An array of length 0 can be declared as a member of a structure or class. No space is allocated for the array, but the following member is aligned on the boundary required for the array type. Zero-length arrays are useful for aligning members to particular boundaries (to match the format of external data for example) and for allocating varying-length arrays following a structure. In the following structure definition, no space is allocated for member d , but the member b is aligned on a doubleword boundary:
struct ABC
{
   int a;
   double d[0];
   int b;
};
 

Zero-length arrays are not permitted in any other context.

_ _inline and _ _actual storage class modifiers

_ _inline is a storage class modifier. It can be used in the same places as a storage class specifier and can be declared in addition to a storage class specifier. If a function is declared as   _ _inline and the module contains at least one definition of the function, the translator sees this as a recommendation that the function be inlined. If a function is declared as   _ _inline and has external linkage, a real copy of the function is created so that other external functions can call it. 

With the 6.50 release, if you use inline functions and have DEBUG turned off, the translator performs inlining of inline functions whether the  optimize option is on or off. If  DEBUG is turned on, the translator disables inlining. The  optimize option is on by default.  

_ _actual is also a storage class modifier. It can be specified with or without the   _ _inline qualifier, but it implies   _ _inline .  _ _actual specifies that the translator should produce an actual (callable) copy of the function if the function has external linkage. If the function has internal linkage, the translator creates an actual function unless it does not need one. 

For additional information, see the discussion of _ _inline and   _ _actual in the SAS/C Compiler and Library User's Guide. 

Note:   The difference between the _ _inline modifier and the  inline C++ keyword is that the inline keyword causes inline functions to behave as if they were declared static while   _ _inline does not. In some cases, current ANSI C++ rules may treat the inline function as if it has external linkage.  [cautend] 


Implementation-Defined Behavior

Implementation-defined behaviors are translator actions that are not explicitly defined by the language standard. For example, in The C++ Programming Language, Stroustrup leaves the range of values that can be accommodated by a double to the discretion of individual implementations. Each implementation is simply required to document the chosen behavior. Allowing implementation-defined behavior enables each vendor to implement the C++ language as efficiently as possible with in the particular operating system and environment. This section describes the implementation-defined behaviors of the translator.

Much of the implementation-defined behavior of the translator corresponds to the implementation-defined behavior of the SAS/C Compiler, while some behaviors are specific to C++. The next two sections describe the implementation-defined behavior of the translator in detail.

Behaviors related to the SAS/C Compiler

The following list enumerates those behaviors common to both the translator and compiler or behaviors that are similar but have small differences. The following list gives a brief description of the behavior and a reference to the SAS/C Compiler and Library User's Guide, where necessary.


C++-specific behaviors

Some implementation-defined behavior is specific to the C++ language. The following list enumerates these behaviors.

 

Initialization and termination

This section describes the order of initialization and termination of file-scope objects defined in C++. Initialization of an object consists of executing its initializer. This includes executing the object's constructor if it has one. Termination of an object consists of executing the object's destructor, if it has one. In general, objects are terminated in the reverse of the order that they are initialized, but this is not necessarily the case for objects in dynamically loaded modules.

When a program containing C++ code is started, file-scope objects defined in C++ translation units in the main load module are initialized in the reverse order of the translation unit's inclusion into the load module by COOL. (For more information on this topic, see INCLUDE Statement .) Within a translation unit, objects are initialized in the order that they are defined in the translation unit.

When the main program ends, either by calling exit or by returning from   main , file-scope objects defined in C++ translation units in the main load module are terminated in the reverse order of how they were initialized. 

Objects defined in C++ translation units in dynamically loaded modules are initialized when the module is loaded and terminated when the module is unloaded. Within a dynamically-loaded module, the order of initialization and termination is the same as for the main load module.


Anachronisms

The following list enumerates several features that a C++ implementation may provide to support old coding styles and features. The following list enumerates these compatibility issues and indicates which are supported by the SAS/C C++ translator.


Chapter Contents

Previous

Next

Top of Page

Copyright © Tue Feb 10 12:11:23 EST 1998 by SAS Institute Inc., Cary, NC, USA. All rights reserved.