Chapter Contents |
Previous |
Next |
Optimization |
Overview |
To define an inline function, add the
__inline
keyword to the
function definition. The following is an example of a function definition
using the
__inline
keyword:
__inline double square(double x) { return x * x; }
The
__inline
keyword causes a function to be inlined
only if you specify the
optimize
option. If
optimize
is specified,
whether or not
__inline
is honored depends on the setting of the
inline
optimizer option.
By default, the
inline
option is in effect whenever the optimizer is run. If you specify
optimize
, you must also specify the
noinline
option if you want the
__inline
keyword to be ignored.
Advantages of Using Inline Functions |
Below is an example of a program that calls an inline
function. The program produces a table of equivalent temperatures using both
the Fahrenheit and Celsius scales. The conversion from Fahrenheit to Celsius
scale is done with the
ftoc
function.
#include <stdio.h> static double ftoc(double); void main() { double fahr, celsius; puts("Fahrenheit Celsius"); for (fahr = 0.0; fahr <= 300.0; fahr += 20.0) { celsius = ftoc(fahr); printf(" %4.0f %6.1f\n", fahr, celsius); } } static double ftoc(double fahr) { return (5.0 / 9.0) * (fahr - 32.0); }
As written, the program performs the following operations
for each of the 16 iterations of the
for
loop:
fahr
ftoc
function
ftoc
main
celsius
.
Suppose
ftoc
is defined as an inline function by adding the
__inline
keyword, as follows:
__inline static double ftoc(double fahr) { return (5.0 / 9.0) * (fahr - 32.0); }
When the program is compiled using the
inline
option, the compiler
replaces the call to
ftoc
with the code for the
ftoc
function, as shown
here:
#include <stdio.h> void main() { double fahr, celsius; puts("Fahrenheit Celsius"); for (fahr = 0.0; fahr <= 300.0; fahr += 20.0) { celsius = (5.0 / 9.0) * (fahr - 32.0); printf(" %4.0f %6.1f\ n", fahr, celsius); } }
Note that the definition of
ftoc
has been moved to
the main function. The static definition has been eliminated. Of the eight
steps listed above, only two steps remain in the loop:
Disadvantages of Using Inline Functions |
Compiler Options for Inlining |
The
inlocal
option can be used to gain some of the benefits of inlining without
using the
__inline
keyword. This option enables the inlining of all static functions
that are called exactly once in the source program. By limiting inlining
to single-call static functions, the
inlocal
option guarantees that the
generated code for the program will not increase over the size when inlining
is not used. In the preceding example, the same results can be obtained without
using the
__inline
keyword by using the
inlocal
option when the program is
compiled.
The
complexity
option provides another way to use inlining without using
the
__inline
keyword. If the
inline
option is in effect, then the compiler inlines
small
static
and
extern
functions automatically even if they are not defined with the
__inline
keyword. The
complexity
option assigns a meaning to the word small
and takes a value between 0 and 20, inclusive. For example, you may specify
complexity(4)
(
-Kcomplexity=4
under UNIX System Services [USS]). This specifies that
the compiler should automatically inline all functions whose complexity is
no higher than 4.
Complexity is a measure of the number of discrete operations
defined by the function. In general, the larger the value specified for complexity,
the larger the functions that are automatically inlined. The
ftoc
function, described
earlier, has a complexity value of 1. The following function, which multiplies
the two square matrices,
a
and
b
, and returns the result in matrix
c
, has a complexity value
of 8:
void mmult(double c[] [10], double a[] [10] , double b[] [10] ) { int i, j, k; for (i = 0; i < 10; i++) for (j = 0; j < 10; j++) { c[i] [j] = 0.0; for (k = 0; k < 10; k++) c[i] [j] = c[i] [j] + a[i] [k] * b[k] [j] ; } }
The following function, a simple binary search function,
has a complexity value of 11. This example returns the index of the element
in
list
that has the same value as
target
.
num_els
is the number of
elements in the
list
array.
list
is sorted alphabetically. If
target
is not found, the function
returns 1.
#include <string.h> int binsrch(char *target, char *list[] , int num_els) { int where, hit; int low, high, current; low = 0; high = num_els; current = num_els / 2; /* Find middle element of array. */ hit = -1; /* Target not found yet. */ do { where = strcmp(target,list[current] ); if (where < 0) /* Target is in top half of list. */ high = current - 1; else if (where > 0) /* Target is in bottom half of list. */ low = current + 1; else hit = current; /* success */ current = (high + low) / 2; } while (high >= low && hit < 0); return hit; }
The optimizer default is
complexity(0)
, which means
that no functions are considered small enough to inline unless they are defined
with the
__inline
keyword. Note that using a high value for
complexity
can lead to
a substantial increase in the size of the generated code for the compilation.
As mentioned earlier, inline functions can call other
inline functions or call themselves recursively. You can control how the
compiler generates code for sequences of calls to inline functions and for
recursive inline functions by using the
depth
and
rdepth
options.
The
depth
option specifies
a limit on the number of nested inline function calls. If inline function
f0
calls inline function
f1
, which calls inline function
fn
, then a single call
to
f0
can result in a significant increase in the size of the function
calling
f0
.
The following program shows how the compiler inlines
functions that call other inline functions. This program computes the length
of the hypotenuse of a triangle whose sides are of lengths
a
and
b
. The main function
calls
hypot
, which in turn calls the
square
function.
#include <stdio.h> #include <math.h> static double hypot(double, double); static double square(double); void main() { double a, b, c; for (a = 1.0; a < 10.0; a += 1.5) { b = a + 0.75; c = hypot(a, b); printf("a = %f, b = %f, c = %f\ n", a, b, c); } } static double hypot(double a, double b) { return sqrt(square(a) + square(b)); } static double square(double x) { return x * x; }
If both
hypot
and
square
are inline functions, then
the compiler generates code for
main
as if the following program had been used.
#include <stdio.h> #include <math.h> void main() { double a, b, c; for (a = 1.0; a < 10.0; a += 1.5) { b = a + 0.75; c = sqrt(a * a + b * b); printf("a = %f, b = %f, c = %f\ n", a, b, c); } }
Note that the
square
function is inlined in
hypot
,
which is then inlined in
main
. In this program, the maximum calling depth
is 2.
If a long sequence of inline function calls is defined,
then the size of the generated code for a compilation can increase greatly
because of the number of functions being inlined. The
depth
option can be used
to control the calling depth of inline functions. If the calling depth exceeds
the number specified by the
depth
option, the compiler stops inlining and generates
calls to the functions instead.
By default, the compiler uses a maximum calling depth
of 3. The compiler accepts
depth
option values between 0 and 6, inclusive.
If the
rdepth
option is used, the compiler inlines recursive
inline functions. The
rdepth
option specifies a maximum depth of recursive
function calls to be inlined. The following program shows an example of this
kind of inlining. The
fib
function calculates the Fibonacci function for
its argument.
#include <stdio.h> __inline static int fib(int); void main() { int i; for (i = 0; i < 10; i++) printf("fib(%d) = %d\ n", i, fib(i)); } __inline static int fib(int i) { if (i < 2) return i; else return fib(i-1) + fib(i-2); }
If the program is compiled using
rdepth(2)
, then the compiler
generates code as if the following program had been used:
#include <stdio.h> static int fib(int); void main() { int i; int result1, result2, result; /* compiler temporary variables */ for (i = 0; i < 10; i++) { if (i < 2) result = i; else { if ((i - 1) < 2) result1 = i - 1; else result1 = fib((i - 1) - 1) + fib((i - 1) - 2); if ((i - 2) < 2) result2 = i - 2; else result2 = fib((i - 2) - 1) + fib((i - 2) - 2); result = result1 + result2; } printf("fib(%d) = %d\ n", i, result); } } static int fib(int i) { if (i < 2) return i; else return fib(i-1) + fib(i-2); }
The compiler has inlined code equivalent to the first
two recursive calls to the
fib
function. This type of inlining can be very useful
with recursive functions that have limited depth.
The maximum depth that can be specified using the
rdepth
option is 6. A large depth value can cause a large increase in the size
of the generated code for the compilation.
rdepth(1)
is the default; that is,
the compiler will not inline recursive functions.
The __actual Keyword for Inline Functions |
There
is no difference between
static
and
extern
functions defined using the
_ _inline
keyword. However, keep in mind that the compiler generally does not
create a callable function for an inline function. This is not a problem
if the function is declared
static
because all calls to the function are replaced
with the inlined code for the function. However,
extern
inline functions
are not callable from other compilations since no callable copy of the function
exists.
The
_ _actual
keyword can be used in the definition
of an inline function.
_ _actual
implies
_ _inline
, but it
also specifies that the compiler should create a callable function as well.
An
_ _actual extern
function is, of course, callable from other
compilations just as any
extern
function.
Functions that Cannot Be Inlined |
The compiler cannot inline a function
Further Benefits of Inline Functions |
There are additional
benefits that occur when functions are inlined.
Here is a simple example. Suppose the following program
invokes the inline
ftoc
function given earlier:
#include <stdio.h> void main() { double fahr, celsius; fahr = 212.0; celsius = ftoc(fahr); printf("%fF is %fC\ n",fahr, celsius); }
After inlining, the program looks like this:
#include <stdio.h> void main() { double fahr, celsius; fahr = 212.0; celsius = (5.0 / 9.0) * (fahr - 32.0); printf("%fF is %fC\ n",fahr, celsius); }
#include <stdio.h> void main() { printf("%fF is %fC\ n",212.0, 100.0); }
Consider the problem of writing a function (called
strlength
) that has almost the same function as the
Standard
strlen
function. The one
difference is that, if the argument to
strlength
is NULL, then
strlength
returns 0.
(The
strlen
function is not meaningful if called with a NULL argument.)
A
STRLENGTH
macro is easily defined as follows:
#define strlength(p) ((p == NULL) ? 0 : strlen(p))
This macro works as described, but with one drawback.
Its argument is evaluated twice, once in the test and once in the call to
strlen
.
This is what is known as an unsafe macro. If it is used with
an argument that has side-effects, the result is usually incorrect. Suppose
that the
STRLENGTH
macro is called as follows:
p = "A TYPICAL STRING"; n = strlength(p++);
The value assigned to
n
is 15, which is incorrect. (The
intended result is 16.) This is because the macro expands to the statement
shown here. (Note that
p
is incremented before being passed to
strlen
.)
n = ((p++ == NULL) ? 0 : strlen(p++));
However, it is easy to define an inline version of
strlength
that works correctly, as shown here:
__inline strlength(char *p) { return (p == NULL) ? 0 : strlen(p); }
The
pow
function, part of the C library, computes the
value of
a
raised to a power
p
as expressed by the relation
r = a
p
pow
can be called with any real values for
a
and
p
. The following inline
function, called
power
, supplants the
pow
function by generating inline code for constant,
nonnegative whole number values of
p
. For
p
<=16.0, the compiler
generates code to compute the value directly. For
p
>16.0, the compiler generates
a loop to compute the result. If
p
is a variable, negative, or contains a nonzero
fractional part, the compiler generates a call to the library
pow
function. In no case
does the compiler generate code for more than one condition.
#include <lcdef.h> [1] pos=m; #include <math.h> [2] pos=m; #undef pow [3] pos=m; #define pow(x, p) power(x, p, isnumconst(p)) [4] pos=m; static __inline double power(double a, double p, int p_is_constant) { /* Test the exponent to see if it's */ /* - a compile-time integer constant */ /* - a whole number */ /* - nonnegative */ if (p_is_constant && (int) p == p && (int) p >= 0) { [5] pos=m; int n = p; /* Handle the cases for 0 <= n <= 4 directly. */ [6] pos=m; if (n == 0) return 1.0; else if (n == 1) return a; else if (n == 2) return a * a; else if (n == 3) return a * a * a; else if (n == 4) return (a * a) * (a * a); /* Handle 5 <= n <= 16 by calling power */ /* recursively. */ /* Note that power is invoked directly, specifying */ /* 1 as the value of the p_is_constant argument. */ /* This is because the isnumconst macro returns */ /* "false" for the expressions (n/2) and */ /* ((n+1)/2), which would defeat the optimization. */ [7] pos=m; else if (n <= 16) return power(a, (double)(n/2), 1) * power(a, (double)((n+1)/2), 1); /* Handle n > 16 via a loop. The loop below */ /* calculates (a ** (2 ** x)) */ /* for 2 <= x <= n and sums the results for each */ /* power of 2 that has the corresponding bit set */ /* in n. */ [8] pos=m; else { double prod = 1.0; for (; n != 0; a *= a, n >>= 1) if (n & 1) prod *= a; return prod; } } /* Finally, if p is negative or not a whole number, */ /* call the library pow function. The pow macro */ /* is defeated by surrounding the name "pow" with */ /* parentheses. */ else [9] pos=m; return (pow)(a, p); }
The numbers in circles in the code above key the explanation that follows:
<lcdef.h>
contains the definition of the
isnumconst
macro.
<math.h>
contains the declaration of the
pow
function.
#undef pow
preprocessor directive undefines any macros
that may be defined for the name
pow
.
pow
macro that will cause
the
power
inline function to be used instead of the library
pow
function. Note that
the
isnumconst
macro is used to determine whether the second argument to
power
is a numeric constant. The result of
isnumconst
is passed as the third
argument to
power
. (See Chapter 1, "Introduction to the SAS/C Library,"
in
SAS/C Library Reference, Volume 1 for more information about
isnumconst
.)
p
is a numeric constant
(as determined by
isnumconst
), a whole number, and greater than or equal to 0.
If an
if
-test is a constant expression (as is this one),
the compiler evaluates the expression and then generates code only for the
then branch or the else branch, depending on the result of the expression.
In this example, if the result is false (that is,
p
is not a constant whole
number greater than or equal to 0) then the compiler ignores the statements
that compose the then branch and does not generate code to perform them.
However, if the result of the expression is true, then the compiler ignores
the statements in the else branch and does not generate a call to the library's
pow
function.
if
-test, as well as the next four, are also constant
expressions. As above, the compiler generates code for the return statement
only if the result of the expression is true. Therefore,
if
0
n
4, the compiler generates
the appropriate return statement to compute the value of
a
n
for
n
=1, 2, 3, or 4.
n
16,
power
is called recursively
to evaluate
a
n . Note that a program using this function
should be compiled using the
rdepth
option with a recursion depth of 6.
n
>16,
power
uses a loop to compute
a
n
.
p
is nonconstant or is not a whole
number,
power
calls the library's
pow
function to compute the result. Note that parentheses
surround the name of the function. This defeats the macro definition for
pow
and ensures that a true function call is generated.
Here are some examples of the use of the power function:
r = pow(a, 0);
Since the
if
-test (n== 0) is true, the compiler generates code
to perform the statement
r = 1.0;
r = pow(a, 2);
Since the
if
-test (n== 2) is true, the compiler generates code
to perform the statement
r = a * a;
r = pow(x, y);
Since
y
is not a constant, the compiler generates code
to call the library's
pow
function:
r = (pow)(x, y);
r = pow(x, 0.75);
Since
y
is not a whole number, the compiler again generates
code to call the library's
pow
function:
r=(pow) (x,0.75);
r = pow(x, 15.0);
Since 15.0> 4,
pow
calls itself recursively. The
compiler generates code equivalent to
r = x * x * x * (x * x) * (x * x) * (x * x) * (x * x) * (x * x) * (x * x);
The computation above can be performed using only six
floating-point multiplications. The following assembler language code illustrates
the machine code instructions generated to compute
x
15:
LD 0,X Floating-point register 0 (FPR0) = x. LD 2,X FPR2 = x, as well. MDR 0,2 FPR0 = x * x = x
2 MDR 2,0 FPR2 = x * x2 = x3 MDR 0,0 FPR0 = x2 * x2 = x4 MDR 2,0 FPR2 = x3 * x4 = x7 MDR 0,0 FPR0 = x4 * x4 = x8 MDR 2,0 FPR2 = x8 * x7 = x15
Note that programs that use the
power
function as shown
should be compiled using these options:
optimize
,
rdepth 6
,
depth 3
. Since the function
is defined using the
__inline
keyword, the
complexity
option is not
required. If the
__inline
keyword is not used, you need to specify the
complexity
option with
a value of at least 16.
Chapter Contents |
Previous |
Next |
Top of Page |
Copyright © 2001 by SAS Institute Inc., Cary, NC, USA. All rights reserved.