COUNTW Function

Counts the number of words in a character string.

Category: Character

Syntax

COUNTW(<string> <, chars> <, modifiers> )

Optional Arguments

string

specifies a character constant, variable, or expression in which words are counted.

chars

specifies an optional character constant, variable, or expression that initializes a list of characters. The characters in this list are the delimiters that separate words, provided that you do not use the K modifier in the modifier argument. If you specify the K modifier, then all characters that are not in this list are delimiters. You can add more characters to the list by using other modifiers.

modifier

specifies a character constant, variable, or expression in which each non-blank character modifies the action of the COUNTW function. The following characters, in uppercase or lowercase, can be used as modifiers:

blank is ignored.
a or A adds alphabetic characters to the list of characters.
b or B counts from right to left instead of from left to right. Right-to-left counting makes a difference only when you use the Q modifier and the string contains unbalanced quotation marks.
c or C adds control characters to the list of characters.
d or D adds digits to the list of characters.
f or F adds an underscore and English letters (that is, the characters that can begin a SAS variable name using VALIDVARNAME=V7) to the list of characters.
g or G adds graphic characters to the list of characters.
h or H adds a horizontal tab to the list of characters.
i or I ignores the case of the characters.
k or K causes all characters that are not in the list of characters to be treated as delimiters. If K is not specified, then all characters that are in the list of characters are treated as delimiters.
l or L adds lowercase letters to the list of characters.
m or M specifies that multiple consecutive delimiters, and delimiters at the beginning or end of the string argument, refer to words that have a length of zero. If the M modifier is not specified, then multiple consecutive delimiters are treated as one delimiter, and delimiters at the beginning or end of the string argument are ignored.
n or N adds digits, an underscore, and English letters (that is, the characters that can appear after the first character in a SAS variable name using VALIDVARNAME=V7) to the list of characters.
o or O processes the chars and modifier arguments only once, rather than every time the COUNTW function is called. Using the O modifier in the DATA step (excluding WHERE clauses), or in the SQL procedure, can make COUNTW run faster when you call it in a loop where chars and modifier arguments do not change.
p or P adds punctuation marks to the list of characters.
q or Q ignores delimiters that are inside of substrings that are enclosed in quotation marks. If the value of string contains unmatched quotation marks, then scanning from left to right will produce different words than scanning from right to left.
s or S adds space characters (blank, horizontal tab, vertical tab, carriage return, line feed, and form feed) to the list of characters.
t or T trims trailing blanks from the string and chars arguments.
u or U adds uppercase letters to the list of characters.
w or W adds printable characters to the list of characters.
x or X adds hexadecimal characters to the list of characters.

Details

Definition of “Word”

In the COUNTW function, “word” refers to a substring that has one of the following characteristics:
  • is bounded on the left by a delimiter or the beginning of the string
  • is bounded on the right by a delimiter or the end of the string
  • contains no delimiters, except if you use the Q modifier and the delimiters are within substrings that have quotation marks
Note: The definition of “word” is the same in both the SCAN function and the COUNTW.sgml function.
Delimiter refers to any of several characters that you can specify to separate words.

Using the COUNTW Function in ASCII and EBCDIC Environments

If you use the COUNTW function with only two arguments, the default delimiters depend on whether your computer uses ASCII or EBCDIC characters.
  • If your computer uses ASCII characters, then the default delimiters are as follows:
    blank ! $ % & ( ) * + , - . / ; < ^ |
    In ASCII environments that do not contain the ^ character, the SCAN function uses the ~ character instead.
  • If your computer uses EBCDIC characters, then the default delimiters are as follows:
    blank ! $ % & ( ) * + , - . / ; < ¬ | ¢

Using Null Arguments

The COUNTW function allows character arguments to be null. Null arguments are treated as character strings with a length of zero. Numeric arguments cannot be null.

Using the M Modifier

If you do not use the M modifier, then a word must contain at least one character. If you use the M modifier, then a word can have a length of zero. In this case, the number of words is one plus the number of delimiters in the string, not counting delimiters inside of strings that are enclosed in quotation marks when you use the Q modifier.

Example

The following example shows how to use the COUNTW function with the M and P modifiers.
data test;
   length default blanks mp 8;
   input string $char60.;
   default = countw(string);
   blanks = countw(string, ' ');
   mp = countw(string, 'mp');
   datalines;
The quick brown fox jumps over the lazy dog.
        Leading blanks
2+2=4
/unix/path/names/use/slashes
\Windows\Path\Names\Use\Backslashes
;
run;

proc print noobs data=test;
run;
Output from the COUNTW Function
Output from the COUNTW Function