SCAN Function

Returns the nth word from a character string.

Category: Character
Restriction: I18N Level 0 functions are designed for use with Single Byte Character Sets (SBCS) only.
Tip: The DBCS equivalent function is The DBCS equivalent function is KSCAN.

Syntax

Required Arguments

string

specifies a character constant, variable, or expression.

count

is a nonzero numeric constant, variable, or expression that has an integer value that specifies the number of the word in the character string that you want SCAN to select. For example, a value of 1 indicates the first word, a value of 2 indicates the second word, and so on. The following rules apply:

  • If count is positive, SCAN counts words from left to right in the character string.
  • If count is negative, SCAN counts words from right to left in the character string.

Optional Arguments

charlist

specifies an optional character expression that initializes a list of characters. This list determines which characters are used as the delimiters that separate words. The following rules apply:

  • By default, all characters in charlist are used as delimiters.
  • If you specify the K modifier in the modifier argument, then all characters that are not in charlist are used as delimiters.
Tip You can add more characters to charlist by using other modifiers.

modifier

specifies a character constant, a variable, or an expression in which each non-blank character modifies the action of the SCAN function. Blanks are ignored. You can use the following characters as modifiers:

a or A adds alphabetic characters to the list of characters.
b or B scans backward from right to left instead of from left to right, regardless of the sign of the count argument.
c or C adds control characters to the list of characters.
d or D adds digits to the list of characters.
f or F adds an underscore and English letters (that is, valid first characters in a SAS variable name using VALIDVARNAME=V7) to the list of characters.
g or G adds graphic characters to the list of characters. Graphic characters are characters that, when printed, produce an image on paper.
h or H adds a horizontal tab to the list of characters.
i or I ignores the case of the characters.
k or K causes all characters that are not in the list of characters to be treated as delimiters. That is, if K is specified, then characters that are in the list of characters are kept in the returned value rather than being omitted because they are delimiters. If K is not specified, then all characters that are in the list of characters are treated as delimiters.
l or L adds lowercase letters to the list of characters.
m or M specifies that multiple consecutive delimiters, and delimiters at the beginning or end of the string argument, refer to words that have a length of zero. If the M modifier is not specified, then multiple consecutive delimiters are treated as one delimiter, and delimiters at the beginning or end of the string argument are ignored.
n or N adds digits, an underscore, and English letters (that is, the characters that can appear in a SAS variable name using VALIDVARNAME=V7) to the list of characters.
o or O processes the charlist and modifier arguments only once, rather than every time the SCAN function is called. Using the O modifier in the DATA step (excluding WHERE clauses), or in the SQL procedure can make SCAN run faster when you call it in a loop where the charlist and modifier arguments do not change. The O modifier applies separately to each instance of the SCAN function in your SAS code, and does not cause all instances of the SCAN function to use the same delimiters and modifiers.
p or P adds punctuation marks to the list of characters.
q or Q ignores delimiters that are inside of substrings that are enclosed in quotation marks. If the value of the string argument contains unmatched quotation marks, then scanning from left to right will produce different words than scanning from right to left.
r or R removes leading and trailing blanks from the word that SCAN returns.If you specify both the Q and R modifiers, then the SCAN function first removes leading and trailing blanks from the word. Then, if the word begins with a quotation mark, SCAN also removes one layer of quotation marks from the word.
s or S adds space characters to the list of characters (blank, horizontal tab, vertical tab, carriage return, line feed, and form feed).
t or T trims trailing blanks from the string and charlist arguments.If you want to remove trailing blanks from only one character argument instead of both character arguments, then use the TRIM function instead of the SCAN function with the T modifier.
u or U adds uppercase letters to the list of characters.
w or W adds printable (writable) characters to the list of characters.
x or X adds hexadecimal characters to the list of characters.
Tip If the modifier argument is a character constant, then enclose it in quotation marks. Specify multiple modifiers in a single set of quotation marks. A modifier argument can also be expressed as a character variable or expression.

Details

Definition of “Delimiter” and “Word”

A delimiter is any of several characters that are used to separate words. You can specify the delimiters in the charlist and modifier arguments.
If you specify the Q modifier, then delimiters inside of substrings that are enclosed in quotation marks are ignored.
In the SCAN function, “word” refers to a substring that has all of the following characteristics:
  • is bounded on the left by a delimiter or the beginning of the string
  • is bounded on the right by a delimiter or the end of the string
  • contains no delimiters
A word can have a length of zero if there are delimiters at the beginning or end of the string, or if the string contains two or more consecutive delimiters. However, the SCAN function ignores words that have a length of zero unless you specify the M modifier.
Note: The definition of “word” is the same in both the SCAN and COUNTW functions.

Using Default Delimiters in ASCII and EBCDIC Environments

If you use the SCAN function with only two arguments, then the default delimiters depend on whether your computer uses ASCII or EBCDIC characters.
  • If your computer uses ASCII characters, then the default delimiters are as follows:
    blank ! $ % & ( ) * + , - . / ; < ^ |
    In ASCII environments that do not contain the ^ character, the SCAN function uses the ~ character instead.
  • If your computer uses EBCDIC characters, then the default delimiters are as follows:
    blank ! $ % & ( ) * + , - . / ; < ¬ | ¢
If you use the modifier argument without specifying any characters as delimiters, then the only delimiters that will be used are delimiters that are defined by the modifier argument. In this case, the lists of default delimiters for ASCII and EBCDIC environments are not used. In other words, modifiers add to the list of delimiters that are explicitly specified by the charlist argument. Modifiers do not add to the list of default modifiers.

The Length of the Result

In a DATA step, most variables have a fixed length. If the word returned by the SCAN function is assigned to a variable that has a fixed length greater than the length of the returned word, then the value of that variable will be padded with blanks. Macro variables have varying lengths and are not padded with blanks.
The maximum length of the word that is returned by the SCAN function depends on the environment from which it is called:
  • In a DATA step, if the SCAN function returns a value to a variable that has not yet been given a length, then that variable is given a length of 200 characters. If you need the SCAN function to assign to a variable a word that is longer than 200 characters, then you should explicitly specify the length of that variable.
    If you use the SCAN function in an expression that contains operators or other functions, a word that is returned by the SCAN function can have a length of up to 32,767 characters, except in a WHERE clause. In that case, the maximum length is 200 characters.
  • In the SQL procedure, or in a WHERE clause in any procedure, the maximum length of a word that is returned by the SCAN function is 200 characters.
  • In the macro processor, the maximum length of a word that is returned by the SCAN function is 65,534 characters.
The minimum length of the word that is returned by the SCAN function depends on whether the M modifier is specified. See Using the SCAN Function with the M Modifier. See also Using the SCAN Function without the M Modifier.

Using the SCAN Function with the M Modifier

If you specify the M modifier, then the number of words in a string is defined as one plus the number of delimiters in the string. However, if you specify the Q modifier, delimiters that are inside quotation marks are ignored.
If you specify the M modifier, then the SCAN function returns a word with a length of zero if one of the following conditions is true:
  • The string begins with a delimiter and you request the first word.
  • The string ends with a delimiter and you request the last word.
  • The string contains two consecutive delimiters and you request the word that is between the two delimiters.

Using the SCAN Function without the M Modifier

If you do not specify the M modifier, then the number of words in a string is defined as the number of maximal substrings of consecutive non-delimiters. However, if you specify the Q modifier, delimiters that are inside quotation marks are ignored.
If you do not specify the M modifier, then the SCAN function does the following:
  • ignores delimiters at the beginning or end of the string
  • treats two or more consecutive delimiters as if they were a single delimiter
If the string contains no characters other than delimiters, or if you specify a count that is greater in absolute value than the number of words in the string, then the SCAN function returns one of the following:
  • a single blank when you call the SCAN function from a DATA step
  • a string with a length of zero when you call the SCAN function from the macro processor

Using Null Arguments

The SCAN function allows character arguments to be null. Null arguments are treated as character strings with a length of zero. Numeric arguments cannot be null.

Examples

Example 1: Finding the First and Last Words in a String

The following example scans a string for the first and last words. Note the following:
  • A negative count instructs the SCAN function to scan from right to left.
  • Leading and trailing delimiters are ignored because the M modifier is not used.
  • In the last observation, all characters in the string are delimiters.
data firstlast;
   input String $60.;
   First_Word = scan(string, 1);
   Last_Word = scan(string, -1);
   datalines4;
Jack and Jill
& Bob & Carol & Ted & Alice &
Leonardo
! $ % & ( ) * + , - . / ;
;;;;
proc print data=firstlast;
run;
Results of Finding the First and Last Words in a String
Results of Finding the First and Last Words in a String

Example 2: Finding All Words in a String without Using the M Modifier

The following example scans a string from left to right until the word that is returned is blank. Because the M modifier is not used, the SCAN function does not return any words that have a length of zero. Because blanks are included among the default delimiters, the SCAN function returns a blank word only when the count exceeds the number of words in the string. Therefore, the loop can be stopped when SCAN returns a blank word.
data all;
   length word $20;
   drop string;
   string = ' The quick brown fox jumps over the lazy dog.   ';
   do until(word=' ');
      count+1;
      word = scan(string, count);
      output;
   end;
run;
proc print data=all noobs;
run;
Results of Finding All Words without Using the M Modifier
Results of Finding All Words without Using the M Modifier

Example 3: Finding All Words in a String by Using the M and O Modifiers

The following example shows the results of using the M modifier with a comma as a delimiter. With the M modifier, leading, trailing, and multiple consecutive delimiters cause the SCAN function to return words that have a length of zero. Therefore, you should not end the loop by testing for a blank word. Instead, you can use the COUNTW function with the same modifiers and delimiters to count the words in the string.
The O modifier is used for efficiency because the delimiters and modifiers are the same in every call to the SCAN and COUNTW functions.
data comma;
   keep count word;
   length word $30;
   string = ',leading, trailing,and multiple,,delimiters,,';
   delim = ',';
   modif = 'mo';
   nwords = countw(string, delim, modif);
   do count = 1 to nwords;
      word = scan(string, count, delim, modif);
      output;
   end;
run;
proc print data=comma noobs;
run;   
Results of Finding All Words by Using the M and O Modifiers
Results of Finding All Words by Using the M and O Modifiers

Example 4: Using Comma-Separated Values, Substrings in Quotation Marks, and the O and R Modifiers

The following example uses the SCAN function with the O modifier and a comma as a delimiter, both with and without the R modifier.
The O modifier is used for efficiency because in each call of the SCAN or COUNTW function, the delimiters and modifiers do not change. The O modifier applies separately to each of the two instances of the SCAN function:
  • The first instance of the SCAN function uses the same delimiters and modifiers every time SCAN is called. Consequently, you can use the O modifier for this instance.
  • The second instance of the SCAN function uses the same delimiters and modifiers every time SCAN is called. Consequently, you can use the O modifier for this instance.
  • The first instance of the SCAN function does not use the same modifiers as the second instance, but this fact has no bearing on the use of the O modifier.
data test;
   keep count word word_r;
   length word word_r $30;
   string = 'He said, "She said, ""No!""", not "Yes!"';
   delim = ',';
   modif = 'oq';
   nwords = countw(string, delim, modif);
   do count = 1 to nwords;
      word   = scan(string, count, delim, modif);
      word_r = scan(string, count, delim, modif||'r');
      output;
   end;
run;
proc print data=test noobs;
run;
Results of Comma-Separated Values and Substrings in Quotation Marks
Results of Comma-Separated Values and Substrings in Quotation Marks

Example 5: Finding Substrings of Digits by Using the D and K Modifiers

The following example finds substrings of digits. The charlist argument is null. Consequently, the list of characters is initially empty. The D modifier adds digits to the list of characters. The K modifier treats all characters that are not in the list as delimiters. Therefore, all characters except digits are delimiters.
data digits;
   keep count digits;
   length digits $20;
   string = 'Call (800) 555–1234 now!';
   do until(digits = ' ');
      count+1;
      digits = scan(string, count, , 'dko');
      output;
   end;
run;
proc print data=digits noobs;
run;
Results of Finding Substrings of Digits by Using the D and K Modifiers
Results of Finding Substrings of Digits by Using the D and K Modifiers

See Also

CALL Routines: