%KSCAN and %QKSCAN Functions

Search for a word that is specified by its position in a string.
Category: DBCS
Type: NLS macro function

Syntax

%KSCAN (argument, n<,charlist<,modifiers>> )
%QKSCAN (argument, n<,charlist<,modifiers>> )

Required Arguments

argument
is a character string or a text expression. If argument contains a special character or mnemonic operator, listed here, use %QKSCAN.
n
is an integer or a text expression that yields an integer, which specifies the position of the word to return. If n is greater than the number of words in argument, the functions return a null string. If n is negative, %KSCAN examines the character string and selects the word that starts at the end of the string and searches backward.
charlist
specifies an optional character expression that initializes a list of characters. This list determines which characters are used as the delimiters that separate words. The following rules apply:
  • By default, all characters in charlist are used as delimiters.
  • If you specify the K modifier in the modifier argument, then all characters that are not in charlist are used as delimiters.
Tip:You can add more characters to charlist by using other modifiers.
modifier
specifies a character constant, a variable, or an expression in which each non-blank character modifies the action of the %KSCAN function. Blanks are ignored. You can use the following characters as modifiers:
a or A adds alphabetic characters to the list of characters.
b or B scans backward from right to left instead of from left to right, regardless of the sign of the count argument.
c or C adds control characters to the list of characters.
d or D adds digits to the list of characters.
f or F adds an underscore and English letters (that is, valid first characters in a SAS variable name using VALIDVARNAME=V7) to the list of characters.
g or G adds graphic characters to the list of characters. Graphic characters are characters that, when printed, produce an image on paper.
h or H adds a horizontal tab to the list of characters.
i or I ignores the case of the characters.
k or K treats all characters that are not in the list of characters as delimiters. That is, if K is specified, then characters that are in the list of characters are kept in the returned value rather than being omitted because they are delimiters. If K is not specified, then all characters that are in the list of characters are treated as delimiters.
l or L adds lowercase letters to the list of characters.
m or M specifies that multiple consecutive delimiters, and delimiters at the beginning or end of the string argument, refer to words that have a length of zero. If the M modifier is not specified, then multiple consecutive delimiters are treated as one delimiter, and delimiters at the beginning or end of the string argument are ignored.
n or N adds digits, an underscore, and English letters (that is, the characters that can appear in a SAS variable name using VALIDVARNAME=V7) to the list of characters.
o or O processes the charlist and modifier arguments only once, rather than every time the %KSCAN function is called. Using the O modifier in the DATA step (excluding WHERE clauses) or in the SQL procedure can make %KSCAN run faster when you call it in a loop where the charlist and modifier arguments do not change. The O modifier applies separately to each instance of the %KSCAN function in your SAS code. It does not cause all instances of the %KSCAN function to use the same delimiters and modifiers.
p or P adds punctuation marks to the list of characters.
q or Q ignores delimiters that are inside of substrings that are enclosed in quotation marks. If the value of the string argument contains unmatched quotation marks, then scanning from left to right produces different words than scanning from right to left.
r or R removes leading and trailing blanks from the word that %KSCAN returns. If you specify both the Q and R modifiers, then the %KSCAN function first removes leading and trailing blanks from the word. Then, if the word begins with a quotation mark, %KSCAN also removes one layer of quotation marks from the word.
s or S adds space characters to the list of characters (blank, horizontal tab, vertical tab, carriage return, line feed, and form feed).
t or T trims trailing blanks from the string and charlist arguments. If you want to remove trailing blanks from only one character argument instead of both character arguments, then use the TRIM function instead of the %KSCAN function with the T modifier.
u or U adds uppercase letters to the list of characters.
w or W adds printable (writable) characters to the list of characters.
x or X adds hexadecimal characters to the list of characters.
Tip:If the modifier argument is a character constant, then enclose it in quotation marks. Specify multiple modifiers in a single set of quotation marks. A modifier argument can also be expressed as a character variable or expression.

Details

The %KSCAN and %QKSCAN functions search argument and return the nth word. A word is one or more characters separated by one or more delimiters.
%KSCAN does not mask special characters or mnemonic operators in its results, even when the argument was previously masked by a macro quoting function. %QKSCAN masks the following special characters and mnemonic operators in its results:
& % ' " ( ) + − * / < > = ¬ ^ ~ ; , # blank
AND OR NOT EQ NE LE LT GE GT IN
A delimiter is any of several characters that are used to separate words. You can specify the delimiters in the charlist and modifier arguments.
If you specify the Q modifier, then delimiters inside of substrings that are enclosed in quotation marks are ignored.
In the %KSCAN function, word refers to a substring that has all of the following characteristics:
  • is bounded on the left by a delimiter or the beginning of the string
  • is bounded on the right by a delimiter or the end of the string
  • contains no delimiters
A word can have a length of zero if there are delimiters at the beginning or end of the string or if the string contains two or more consecutive delimiters. However, the %KSCAN function ignores words that have a length of zero unless you specify the M modifier.
If you use the %KSCAN function with only two arguments, then the default delimiters depend on whether your computer uses ASCII or EBCDIC characters:
  • If your computer uses ASCII characters, then the default delimiters are as follows:
    blank ! $ % & ( ) * + , - . / ; < ^¦
    In ASCII environments that do not contain the ^ character, the %KSCAN function uses the ~ character instead.
  • If your computer uses EBCDIC characters, then the default delimiters are as follows:
    blank ! $ % & ( ) * + , - . / ; < ¬ | ¢¦
If you use the modifier argument without specifying any characters as delimiters, then the only delimiters that will be used are delimiters that are defined by the modifier argument. In this case, the lists of default delimiters for ASCII and EBCDIC environments are not used. In other words, modifiers add to the list of delimiters that are specified by the charlist argument. Modifiers do not add to the list of default modifiers.
If you specify the M modifier, then the number of words in a string is defined as one plus the number of delimiters in the string. However, if you specify the Q modifier, delimiters that are inside quotation marks are ignored.
If you specify the M modifier, then the %KSCAN function returns a word with a length of zero if one of the following conditions is true:
  • The string begins with a delimiter and you request the first word.
  • The string ends with a delimiter and you request the last word.
  • The string contains two consecutive delimiters and you request the word that is between the two delimiters.
If you do not specify the M modifier, then the number of words in a string is defined as the number of maximal substrings of consecutive nondelimiters. However, if you specify the Q modifier, delimiters that are inside quotation marks are ignored.
If you do not specify the M modifier, then the %KSCAN function does the following:
  • ignores delimiters at the beginning or end of the string
  • treats two or more consecutive delimiters as if they were a single delimiter
If the string contains no characters other than delimiters or if you specify a count that is greater in absolute value than the number of words in the string, then the %KSCAN function returns one of the following:
  • a single blank when you call the %KSCAN function from a DATA step
  • a string with a length of zero when you call the %KSCAN function from the macro processor
The %KSCAN function allows character arguments to be null. Null arguments are treated as character strings with a length of zero. Numeric arguments cannot be null.

Example: Comparing the Actions of %KSCAN and %QKSCAN

This example illustrates the actions of %KSCAN and %QKSCAN:
%macro a;
   aaaaaa
%mend a;
%macro b;
   bbbbbb
%mend b;
%macro c;
   cccccc
%mend c;
%let x=%nrstr(%a*%b*%c);
%put X: &x;
%put The third word in X, with KSCAN: %kscan(&x,3,*);
%put The third word in X, with QKSCAN: %qkscan(&x,3,*);
The %PUT statement writes these lines to the log:
X: %a*%b*%c
The third word in X, with KSCAN: cccccc
The third word in X, with QKSCAN: %c