Functions and CALL Routines |
Category: | Character |
Interaction: | When invoked by the %SYSCALL macro statement, CALL SCAN removes the quotation marks from its arguments. For more information, see Using CALL Routines and the %SYSCALL Macro Statement. |
Syntax |
CALL SCAN(<string>, count, position, length <, <charlist> <, <modifier(s)>>>); |
is a non-zero numeric constant, variable, or expression that has an integer value that specifies the number of the word in the character string that you want the CALL SCAN routine to select. For example, a value of 1 indicates the first word, a value of 2 indicates the second word, and so on. The following rules apply:
specifies a numeric variable in which the position of the word is returned. If count exceeds the number of words in the string, then the value that is returned in position is zero. If count is zero or missing, then the value that is returned in position is missing.
specifies a numeric variable in which the length of the word is returned. If count exceeds the number of words in the string, then the value that is returned in length is zero. If count is zero or missing, then the value that is returned in length is missing.
specifies an optional character constant, variable, or expression that initializes a list of characters. This list determines which characters are used as the delimiters that separate words. The following rules apply:
By default, all characters in charlist are used as delimiters.
If you specify the K modifier in the modifier argument, then all characters that are not in charlist are used as delimiters.
Tip: | You can add more characters to charlist by using other modifiers. |
specifies a character constant, variable, or expression in which each non-blank character modifies the action of the CALL SCAN routine. Blanks are ignored. You can use the following characters as modifiers:
a or A |
adds alphabetic characters to the list of characters. | ||
b or B |
scans backwards, from right to left instead of from left to right, regardless of the sign of the count argument. | ||
c or C |
adds control characters to the list of characters. | ||
d or D |
adds digits to the list of characters. | ||
f or F |
adds an underscore and English letters (that is, valid first characters in a SAS variable name using VALIDVARNAME=V7) to the list of characters. | ||
g or G |
adds graphic characters to the list of characters. Graphic characters are those that, when printed, produce an image on paper. | ||
h or H |
adds a horizontal tab to the list of characters. | ||
i or I |
ignores the case of the characters. | ||
k or K |
causes all characters that are not in the list of characters to be treated as delimiters. That is, if K is specified, then characters that are in the list of characters are kept in the returned value rather than being omitted because they are delimiters. If K is not specified, then all characters that are in the list of characters are treated as delimiters. | ||
l or L |
adds lower case letters to the list of characters. | ||
m or M |
specifies that multiple consecutive delimiters, and delimiters at the beginning or end of the string argument, refer to words that have a length of zero. If the M modifier is not specified, then multiple consecutive delimiters are treated as one delimiter, and delimiters at the beginning or end of the string argument are ignored. | ||
n or N |
adds digits, an underscore, and English letters (that is, the characters that can appear in a SAS variable name using VALIDVARNAME=V7) to the list of characters. | ||
o or O |
processes the charlist and modifier arguments only once, rather than every time the CALL SCAN routine is called.
| ||
p or P |
adds punctuation marks to the list of characters. | ||
q or Q |
ignores delimiters that are inside of substrings that are enclosed in quotation marks. If the value of the string argument contains unmatched quotation marks, then scanning from left to right will produce different words than scanning from right to left. | ||
s or S |
adds space characters to the list of characters (blank, horizontal tab, vertical tab, carriage return, line feed, and form feed). | ||
t or T |
trims trailing blanks from the string and charlist arguments.
| ||
u or U |
adds upper case letters to the list of characters. | ||
w or W |
adds printable (writable) characters to the list of characters. | ||
x or X |
adds hexadecimal characters to the list of characters. |
Tip: | If the modifier argument is a character constant, then enclose it in quotation marks. Specify multiple modifiers in a single set of quotation marks. A modifier argument can also be expressed as a character variable or expression. |
Details |
A delimiter is any of several characters that are used to separate words. You can specify the delimiters in the charlist and modifier arguments.
If you specify the Q modifier, then delimiters inside of substrings that are enclosed in quotation marks are ignored.
In the CALL SCAN routine, "word" refers to a substring that has all of the following characteristics:
is bounded on the left by a delimiter or the beginning of the string
is bounded on the right by a delimiter or the end of the string
contains no delimiters
A word can have a length of zero if there are delimiters at the beginning or end of the string, or if the string contains two or more consecutive delimiters. However, the CALL SCAN routine ignores words that have a length of zero unless you specify the M modifier.
If you use the CALL SCAN routine with only four arguments, then the default delimiters depend on whether your computer uses ASCII or EBCDIC characters.
If your computer uses ASCII characters, then the default delimiters are as follows:
blank ! $ % & ( ) * + , - . / ; < ^
In ASCII environments that do not contain the ^ character, the CALL SCAN routine uses the ~ character instead.
If your computer uses EBCDIC characters, then the default delimiters are as follows:
blank ! $ % & ( ) * + , - . / ; < ¬ | ¢
If you use the modifier argument without specifying any characters as delimiters, then the only delimiters that will be used are those that are defined by the modifier argument. In this case, the lists of default delimiters for ASCII and EBCDIC environments are not used. In other words, modifiers add to the list of delimiters that are explicitly specified by the charlist argument. Modifiers do not add to the list of default modifiers.
If you specify the M modifier, then the number of words in a string is defined as one plus the number of delimiters in the string. However, if you specify the Q modifier, delimiters that are inside quotation marks are ignored.
If you specify the M modifier, the CALL SCAN routine returns a positive position and a length of zero if one of the following conditions is true:
The string begins with a delimiter and you request the first word.
The string ends with a delimiter and you request the last word.
The string contains two consecutive delimiters and you request the word that is between the two delimiters.
In you specify a count that is greater in absolute value than the number of words in the string, then the CALL SCAN routine returns a position and length of zero.
If you do not specify the M modifier, then the number of words in a string is defined as the number of maximal substrings of consecutive non-delimiters. However, if you specify the Q modifier, delimiters that are inside quotation marks are ignored.
If you do not specify the M modifier, then the CALL SCAN routine does the following:
ignores delimiters at the beginning or end of the string
treats two or more consecutive delimiters as if they were a single delimiter
If the string contains no characters other than delimiters, or if you specify a count that is greater in absolute value than the number of words in the string, then the CALL SCAN routine returns a position and length of zero.
To find the designated word as a character string after calling the CALL SCAN routine, use the SUBSTRN function with the string, position, and length arguments:
substrn(string, position, length);
Because CALL SCAN can return a length of zero, using the SUBSTR function can cause an error.
The CALL SCAN routine allows character arguments to be null. Null arguments are treated as character strings with a length of zero. Numeric arguments cannot be null.
Examples |
The following example shows how you can use the CALL SCAN routine to find the position and length of a word in a string.
options pageno=1 nodate ls=80 ps=64; data artists; input string $60.; drop string; do i=1 to 99; call scan(string, i, position, length); if not position then leave; Name=substrn(string, position, length); output; end; datalines; Picasso Toulouse-Lautrec Turner "Van Gogh" Velazquez ; proc print data=artists; run;
SAS Output: Scanning for a Word in a String
The SAS System 1 Obs i position length Name 1 1 1 7 Picasso 2 2 9 8 Toulouse 3 3 18 7 Lautrec 4 4 26 6 Turner 5 5 33 4 "Van 6 6 38 5 Gogh" 7 7 44 9 Velazquez
The following example scans a string for the first and last words. Note the following:
A negative count instructs the CALL SCAN routine to scan from right to left.
Leading and trailing delimiters are ignored because the M modifier is not used.
In the last observation, all characters in the string are delimiters, so no words are found.
options pageno=1 nodate ls=80 ps=64; data firstlast; input String $60.; call scan(string, 1, First_Pos, First_Length); First_Word = substrn(string, First_Pos, First_Length); call scan(string, -1, Last_Pos, Last_Length); Last_Word = substrn(string, Last_Pos, Last_Length); datalines4; Jack and Jill & Bob & Carol & Ted & Alice & Leonardo ! $ % & ( ) * + , - . / ; ;;;; proc print data=firstlast; var First: Last:; run;
Results of Finding the First and Last Words in a String
The SAS System 1 First_ First_ First_ Last_ Last_ Obs Pos Length Word Last_Pos Length Word 1 1 4 Jack 10 4 Jill 2 3 3 Bob 23 5 Alice 3 1 8 Leonardo 1 8 Leonardo 4 0 0 0 0
The following example scans a string from left to right until no more words are found. Because the M modifier is not used, the CALL SCAN routine does not return any words that have a length of zero. Because blanks are included among the default delimiters, the CALL SCAN routine returns a position or length of zero only when the count exceeds the number of words in the string. The loop can be stopped when the returned position is less than or equal to zero. It is safer to use an inequality comparison to end the loop, rather than to use a strict equality comparison with zero, in case an error causes the position to be missing. (In SAS, a missing value is considered to have a lesser value than any nonmissing value.)
options pageno=1 nodate ls=80 ps=64; data all; length word $20; drop string; string = ' The quick brown fox jumps over the lazy dog. '; do until(position <= 0); count+1; call scan(string, count, position, length); word = substrn(string, position, length); output; end; run; proc print data=all noobs; var count position length word; run;
Results of Finding All Words in a String without Using the M Modifier
The SAS System 1 count position length word 1 2 3 The 2 6 5 quick 3 12 5 brown 4 18 3 fox 5 22 5 jumps 6 28 4 over 7 33 3 the 8 37 4 lazy 9 42 3 dog 10 0 0
The following example shows the results of using the M modifier with a comma as a delimiter. With the M modifier, leading, trailing, and multiple consecutive delimiters cause the CALL SCAN routine to return words that have a length of zero.
The O modifier is used for efficiency because the delimiters and modifiers are the same in every call to the CALL SCAN routine.
options pageno=1 nodate ls=80 ps=64; data comma; length word $30; string = ',leading, trailing,and multiple,,delimiters,,'; do until(position <= 0); count + 1; call scan(string, count, position, length, ',', 'mo'); word = substrn(string, position, length); output; end; run; proc print data=comma noobs; var count position length word; run;
Results of Finding All Words in a String by Using the M and O Modifiers
The SAS System 1 count position length word 1 1 0 2 2 7 leading 3 10 10 trailing 4 21 12 and multiple 5 34 0 6 35 10 delimiters 7 46 0 8 47 0 9 0 0
The following example uses the CALL SCAN routine with the O modifier and a comma as a delimiter.
The O modifier is used for efficiency because in each call of the CALL SCAN routine, the delimiters and modifiers do not change.
options pageno=1 nodate ls=80 ps=64; data test; length word word_r $30; string = 'He said, "She said, ""No!""", not "Yes!"'; do until(position <= 0); count + 1; call scan(string, count, position, length, ',', 'oq'); word = substrn(string, position, length); output; end; run; proc print data=test noobs; var count position length word; run;
Results of Comma-Separated Values and Substrings in Quotation Marks
The SAS System 1 count position length word 1 1 7 He said 2 9 20 "She said, ""No!""" 3 30 11 not "Yes!" 4 0 0
The following example finds substrings of digits. The charlist argument is null, and consequently the list of characters is initially empty. The D modifier adds digits to the list of characters. The K modifier treats all characters that are not in the list as delimiters. Therefore, all characters except digits are delimiters.
options pageno=1 nodate ls=80 ps=64; data digits; length digits $20; string = 'Call (800) 555-1234 now!'; do until(position <= 0); count+1; call scan(string, count, position, length, , 'dko'); digits = substrn(string, position, length); output; end; run; proc print data=digits noobs; var count position length digits; run;
Results of Finding Substrings of Digits by Using the D and K Modifiers
The SAS System 1 count position length digits 1 7 3 800 2 12 3 555 3 16 4 1234 4 0 0
See Also |
|
Copyright © 2011 by SAS Institute Inc., Cary, NC, USA. All rights reserved.