Functions for NLS |
SAS provides string functions and CALL routines that allow you to easily manipulate your character data. Many of the original SAS string functions assume that the size of one character is always one byte. This process works well for data in a single-byte character set (SBCS). However, when some of these functions and CALL routines are used with data in a double-byte character set (DBCS) or multi-byte character set (MBCS), the data is often handled improperly and produce incorrect results.
DBCS encodings require a varying number of bytes to represent each character. MBCS is sometimes used as a synonym for DBCS.
To solve this problem SAS introduced a set of string functions and CALL routines, called K functions, for those string manipulations where DBCS and MBCS data must be handled carefully. This page shows the level of I18N compatibility for each SAS string function. I18N is the abbreviation for internationalization. Compatibility indicates whether a program using a particular string function can be adapted to different languages and locales without program changes.
The user needs to understand the difference between byte-based offset-length and character-based offset-length in order to use the K functions properly. Most K functions require the character-based offset or length. Under SBCS environments, the byte-based unit is identical to character-based unit; however, under DBCS or MBCS environment, there are significant differences, and programmers need to distinguish them. The users might need to change the programming logic in order to use the K functions. Most K functions require strings encoded in current SAS session encoding.
String functions are assigned I18N levels depending on whether the functions can process DBCS, MBCS, or SBCS. Here are descriptions of the levels:
I18N Level 0 |
This function is designed for SBCS data. Do not use this function to process DBCS or MBCS data. |
I18N Level 1 |
This function should be avoided, if possible, if you are using a non-English language. The I18N Level 1 functions might not work correctly with DBCS or MBCS encodings under certain circumstances. |
I18N Level 2 |
This function can be used for SBCS, DBCS, and MBCS (UTF-8) data. |
Function | Description | I18N Level 0 | I18N Level 1 | I18N Level 2 |
---|---|---|---|---|
ANYALNUM |
Searches a character string for an alphanumeric character, and returns the first position at which the character is found. |
|
X | |
ANYALPHA |
Searches a character string for an alphabetic character, and returns the first position at which the character is found. |
|
X | |
ANYCNTRL |
Searches a character string for a control character, and returns the first position at which that character is found. |
|
X | |
ANYDIGIT |
Searches a character string for a digit, and returns the first position at which the digit is found. |
|
X | |
ANYFIRST |
Searches a character string for a character that is valid as the first character in a SAS variable name under VALIDVARNAME=V7, and returns the first position at which that character is found. |
|
X | |
ANYGRAPH |
Searches a character string for a graphical character, and returns the first position at which that character is found. |
|
X | |
ANYLOWER |
Searches a character string for a lowercase letter, and returns the first position at which the letter is found. |
|
X | |
ANYNAME |
Searches a character string for a character that is valid in a SAS variable name under VALIDVARNAME=V7, and returns the first position at which that character is found. |
|
X | |
ANYPRINT |
Searches a character string for a printable character, and returns the first position at which that character is found. |
|
X | |
ANYPUNCT |
searches a character string for a punctuation character, and returns the first position at which that character is found. |
|
X | |
ANYSPACE |
Searches a character string for a white-space character (blank, horizontal and vertical tab, carriage return, line feed, and form feed). Returns the first position at which that character is found. |
|
X | |
ANYUPPER |
Searches a character string for an uppercase letter, and returns the first position at which the letter is found. |
|
X | |
ANYXDIGIT |
Searches a character string for a hexadecimal character that represents a digit, and returns the first position at which that character is found. |
|
X | |
BYTE |
Returns one character in the ASCII or the EBCDIC collating sequence. | X | ||
CAT |
Does not remove leading or trailing blanks, and returns a concatenated character string. |
|
X | |
CATS |
Removes leading and trailing blanks, and returns a concatenated character string. | X |
|
|
CATT |
Removes trailing blanks, and returns a concatenated character string. | X |
|
|
CATX |
Removes leading and trailing blanks, inserts delimiters, and returns a character string. | X |
|
|
CHOOSEC |
Returns a character value that represents the results of choosing from a list of arguments. | X | ||
CHOOSEN |
Returns a numeric value that represents the results of choosing from a list of arguments. | X | ||
COALESCEC |
Returns the first non-missing value from a list of numeric arguments. | X | ||
COLLATE |
Returns a character string in ASCII or EBCDIC collating sequence. | X | ||
COMPARE |
Returns the position of the leftmost character by which two strings differ, or returns 0 if there is no difference. | X | ||
COMPBL |
Removes multiple blanks from a character string. | X |
|
|
COMPGED |
Returns the generalized edit distance between two strings. | X | ||
COMPLEV |
Returns the Levenshtein edit distance between two strings. | X | ||
COMPRESS |
Returns a character string with specified characters removed from the original string. | X | ||
COUNT |
Counts the number of times that a specified substring appears within a character string. | X | ||
COUNTC |
Counts the number of characters in a string that appear or do not appear in a list of characters. | X | ||
DEQUOTE |
Removes matching quotation marks from a character string that begins with a quotation mark, and deletes all characters to the right of the closing quotation mark. | X | ||
FIND |
Searches for a specific substring of characters within a character string. | X | ||
FINDC |
Searches a string for any character in a list of characters. | X | ||
HTMLDECODE |
Decodes a string that contains HTML numeric character references or HTML character entity references, and returns the decoded string. | X | ||
HTMLENCODE |
Encodes characters using HTML character entity references, and returns the encoded string. | X | ||
IFC |
Returns a character value based on whether an expression is true, false, or missing. | X | ||
IFN |
Returns a numeric value based on whether an expression is true, false, or missing. | X | ||
INDEX |
Searches a character expression for a string of characters, and returns the position of the string's first character for the first occurrence of the string. | X | ||
INDEXC |
Searches a character expression for any of the specified characters, and returns the position of that character. | X | ||
INDEXW |
Searches a character expression for a string that is specified as a word, and returns the position of the first character in the word. | X |
|
|
KCOMPARE Function |
Returns the result of a comparison of character expressions. | X | ||
KCOMPRESS Function |
Removes specified characters from a character expression. | X | ||
KCOUNT Function |
Returns the number of double-byte characters in an expression. | X | ||
KCVT Function |
Converts data from one type of encoding data to another encoding data. | X | ||
KINDEX Function |
Searches a character expression for a string of characters. | X | ||
KINDEXC Function |
Searches a character expression for specified characters. | X | ||
KLEFT Function |
Left-aligns a character expression by removing unnecessary leading DBCS blanks and SO-SI. | X | ||
KLENGTH Function |
Returns the length of an argument. | X | ||
KLOWCASE Function |
Converts all letters in an argument to lowercase. | X | ||
KREVERSE Function |
Reverses a character expression. | X | ||
KRIGHT Function |
Right-aligns a character expression by trimming trailing DBCS blanks and SO-SI. | X | ||
KSCAN Function |
Selects a specified word from a character expression. | X | ||
KSTRCAT Function |
Concatenates two or more character expressions. | X | ||
KSUBSTR Function |
Extracts a substring from an argument. | X | ||
KSUBSTRB Function |
Extracts a substring from an argument according to the byte position of the substring in the argument. | X | ||
KTRANSLATE Function |
Replaces specific characters in a character expression. | X | ||
KTRIM Function |
Removes trailing DBCS blanks and SO-SI from character expressions. | X | ||
KTRUNCATE Function |
Truncates a numeric value to a specified length. | X | ||
KUPCASE Function |
Converts all letters in an argument to uppercase. | X | ||
KUPDATE Function |
Inserts, deletes, and replaces character value contents. | X | ||
KUPDATEB Function |
Inserts, deletes, and replaces the contents of the character value according to the byte position of the character value in the argument. | X | ||
KVERIFY Function |
Returns the position of the first character that is unique to an expression. | X | ||
LEFT |
Left-aligns a character string. | X | ||
LENGTH |
Returns the length of a non-blank character string, excluding trailing blanks, and returns 1 for a blank character string. | X | ||
LENGTHC |
Returns the length of a character string, including trailing blanks. | X | ||
LENGTHM |
Returns the amount of memory (in bytes) that is allocated for a character string. | X | ||
LENGTHN |
Returns the length of a character string, excluding trailing blanks. | X |
|
|
LOWCASE |
Converts all letters in an argument to lowercase. | X | ||
MISSING |
Returns a numeric result that indicates whether the argument contains a missing value. | X | ||
NLITERAL |
Converts a character string that you specify to a SAS name literal. | X | ||
NOTALNUM |
Searches a character string for a non-alphanumeric character, and returns the first position at which the character is found. |
|
X | |
NOTALPHA |
Searches a character string for a nonalphabetic character, and returns the first position at which the character is found. |
|
X | |
NOTCNTRL |
Searches a character string for a character that is not a control character, and returns the first position at which that character is found. | X | ||
NOTDIGIT |
Searches a character string for any character that is not a digit, and returns the first position at which that character is found. | X | ||
NOTFIRST |
Searches a character string for an invalid first character in a SAS variable name under VALIDVARNAME=V7, and returns the first position at which that character is found. | X | ||
NOTGRAPH |
Searches a character string for a non-graphical character, and returns the first position at which that character is found. |
|
X | |
NOTLOWER |
Searches a character string for a character that is not a lowercase letter, and returns the first position at which that character is found. |
|
X | |
NOTNAME |
Searches a character string for an invalid character in a SAS variable name under VALIDVARNAME=V7, and returns the first position at which that character is found. | X | ||
NOTPRINT |
Searches a character string for a nonprintable character, and returns the first position at which that character is found. | X | ||
NOTPUNCT |
Searches a character string for a character that is not a punctuation character, and returns the first position at which that character is found. | X | ||
NOTSPACE |
Searches a character string for a character that is not a white-space character (blank, horizontal and vertical tab, carriage return, line feed, and form feed), and returns the first position at which that character is found. | X | ||
NOTUPPER |
Searches a character string for a character that is not an uppercase letter, and returns the first position at which that character is found. |
|
X | |
NOTXDIGIT |
Searches a character string for a character that is not a hexadecimal character, and returns the first position at which that character is found. |
|
X | |
NVALID |
Checks the validity of a character string for use as a SAS variable name. | X |
|
|
PROPCASE |
Converts all words in an argument to proper case. | X | ||
QUOTE |
Adds double quotation marks to a character value. | X | ||
RANK |
Returns the position of a character in the ASCII or EBCDIC collating sequence. | X | ||
REPEAT |
Returns a character value that consists of the first argument repeated n+1 times. | X | ||
REVERSE |
Reverses a character string. | X | ||
RIGHT |
Right-aligns a character expression. | X | ||
SCAN |
Returns the nth word from a character string. | X | ||
SOUNDEX |
Encodes a string to facilitate searching. | X | ||
SPEDIS |
Determines the likelihood of two words matching, expressed as the asymmetric spelling distance between the two words. | X | ||
STRIP |
Returns a character string with all leading and trailing blanks removed. | X |
|
|
SUBPAD |
Returns a substring that has a length you specify, using blank padding if necessary. | X | ||
SUBSTR |
Extracts a substring from an argument. | X |
|
|
SUBSTRN |
Returns a substring, allowing a result with a length of zero. | X | ||
TRANSLATE |
Replaces specific characters in a character string. | X | ||
TRANTAB |
Transcodes data by using the specified translation table. | X | ||
TRANWRD |
Replaces or removes all occurrences of a substring in a character string. | X | ||
TRIM |
Removes trailing blanks from a character string, and returns one blank if the string is missing. | X |
|
|
TRIMN |
Removes trailing blanks from character expressions, and returns a string with a length of zero if the expression is missing. | X |
|
|
UPCASE |
Converts all letters in an argument to uppercase. | X | ||
URLDECODE |
Returns a string that was decoded using the URL escape syntax. | X | ||
URLENCODE |
Returns a string that was encoded using the URL escape syntax. | X | ||
VERIFY |
Returns the position of the first character in a string that is not in any of several other strings. | X |
Copyright © 2010 by SAS Institute Inc., Cary, NC, USA. All rights reserved.