Chapter Contents

Previous

Next
Function Categories

String Utility Functions

The C library provides several functions to perform many string manipulations. There are three general categories of string utility functions:

functions that begin with the letter a
convert character strings to numbers.

functions that begin with the letters str
treat their arguments as strings that are terminated with a null character.

functions that begin with the letters mem
treat their arguments as byte strings in which a null character is not considered a terminator. The mem routines are always passed an explicit string length since the string may contain no null characters, or more than one null character.

Two standard string functions that begin with the letters str , strcoll , and strxfrm pertain to localization and are discussed in Chapter 10, "Localization," in SAS/C Library Reference, Volume 2.

The following are string functions:
atof convert a string to floating point
atoi convert a string to integer
atol convert a string to long
atoll convert a string to long long
memchr locate first occurrence of a character
memcmp compare two blocks of memory
memcmpp compare two blocks of memory with padding
memcpy copy characters
memcpyp copy characters (with padding)
memfil fill a block of memory with a multicharacter string
memlwr translate a memory block to lowercase
memmove copy characters
memscan scan a block of memory using a translate table
memscntb build a translate table for use by memscan
memset fill a block of memory with a single character
memupr translate a memory block to uppercase
memxlt translate a block of memory
stcpm unanchored pattern match
stcpma anchored pattern match
strcat concatenate two null-terminated strings
strchr locate first occurrence of a character in a string
strcmp compare two null-terminated strings
strcpy copy a null-terminated string
strcspn locate the first occurrence of the first character in a set
strlen compute length of null-terminated string
strlwr convert a string from uppercase to lowercase
strncat concatenate two null-terminated strings (limited)
strncmp compare portions of two strings
strncpy copy a limited portion of a null-terminated string
strpbrk find first occurrence of character of set in string
strrchr locate the last occurrence of a character in a string
strrcspn locate the last character in a set
strrspn locate the last character of a search set not in a given set
strsave allocate a copy of a character string
strscan scan a string using a translate table
strscntb build a translate table for use by strscan
strspn locate the first occurrence of the first character not in a set
strstr locate first occurrence of a string within a string
strtod convert a string to double
strtok get a token from a string
strtol convert a string to long integer
strtoll convert a string to long long integer
strtoul convert a string to an unsigned long integer
strtoull convert a string to an unsigned long long integer
strupr convert a string from lowercase to uppercase
strxlt translate a character string
xltable build character translation table.


Terms Used in String Function Descriptions

These terms are used in the descriptions of string utility functions:

string
is zero or more contiguous characters terminated by a null byte. The first character of a string is at position 0. Functions that return the int or unsigned position of a character in a string compute the position beginning at 0.

character sequence
is a set of contiguous characters, not necessarily null-terminated.


Optimizing Your Use of memcmp, memcpy, and memset

You can optimize your use of the built-in functions memcmp , memcpy , and memset by controlling the type of the length argument. The compiler inspects the type before the argument is converted to the type specified in the function prototype. If the type of the length argument is one of the types in Types Acceptable as Length Arguments in Built-in Functions, the compiler generates only the code required for the maximum value of the type. Types Acceptable as Length Arguments in Built-in Functions shows the maximum values of these types. Note that these values can be obtained from the <limits.h> header file.

You can use only the types shown in Types Acceptable as Length Arguments in Built-in Functions (in addition to size_t ). If the length argument has any other type, the compiler issues a warning message.

Types Acceptable as Length Arguments in Built-in Functions
Type Maximum Value
char 255
unsigned char 255
short 32767
signed short 32767
unsigned short 65535

If Types Acceptable as Length Arguments in Built-in Functions lists the type of the length argument, the function will not be required to operate on more than 16 megabytes of data. Therefore, the compiler does not generate a call to the true (that is, separately linked) function to handle that case.

If the length argument is one of the char types, the compiler generates a MOVE instruction (which can handle up to 256 characters) rather than a MOVE LONG (which can handle up to 16 megabytes of characters). Because the MOVE LONG instruction is one of the slowest instructions in the IBM 370 instruction set, generating a MOVE saves execution time.


Getting the Most Efficient Code

To get the compiler to generate the most efficient code sequence for string functions, follow these guidelines:

  1. Use the built-in version of the function. Built-in functions are defined as macros in the appropriate header file. Always include <string.h> or <lcstring.h> , and do not use the function name in an #undef preprocessing directive.

  2. Declare or cast the length argument as one of the types in Types Acceptable as Length Arguments in Built-in Functions.

  3. Do not cast the length argument to a wider type. This defeats the compiler's inspection of the type.

You may want to define one or more macros that cast the length argument to a shorter type. For example, here is a program that defines two such macros:

#include <string.h>

   /* Copy up to 32767 characters. */
#define memcpys(to, from, length) memcpy(to, from, (short)length)

   /* Copy up to 255 characters. */
#define memcpyc(to, from, length) memcpy(to, from, (char)length)
.
.
.
int strsz;          /* strsz is known to be less than 32K. */
char *dest, *src;

memcpys(dest, src, strsz);         /* casts strsz to short */
.
.
.

Some recent IBM processors include a hardware feature called the Logical String Assist, which implements the C functions strlen , strcpy , strcmp , memchr , and strchr in hardware. To make use of this hardware feature, #define the symbol _USELSA before including <string.h> or <lcstring.h> . The resulting code will not execute on hardware that does not have the Logical String Assist feature installed.


Chapter Contents

Previous

Next

Top of Page

Copyright © 2001 by SAS Institute Inc., Cary, NC, USA. All rights reserved.