creates a linguistic sort key.
sortKey(string, <locale, strength,
case, numeric, order>)
|
- string
-
character expression
- locale
-
specifies the locale name in the form of a POSIX name (ja_JP).
See Values for the LOCALE= System Option for a list of locale names and Posix values.
- strength
-
The value of strength is related to the collation level.
There are five collation-level values. The following table provides information
regarding the five levels. The default value for strength is related to the
locale.
Value |
Type of Collation |
Description |
PRIMARY or P |
PRIMARY specifies differences between base characters (for example,
"a" < "b"). |
It is the strongest difference. For example, dictionaries are divided
into different sections by base character. |
SECONDARY or S |
Accents in the characters are considered secondary differences (for
example, "as" < "às" < "at"). |
Other differences between letters can also be considered secondary differences,
depending on the language. A secondary difference is ignored when there is
a primary difference anywhere in the strings. |
TERTIARY or T |
Upper and lower case differences in characters are distinguished at
the tertiary level (for example, "ao" < "Ao" < "aò"). |
An example is the difference between large and small Kana. A tertiary
difference is ignored when there is a primary or secondary difference anywhere
in the strings. |
QUATERNARY or Q |
When punctuation is ignored at level 1-3, an additional level can be
used to distinguish words with and without punctuation (for example, "ab"
< "a-b" < "aB"). |
This difference is ignored when there is a primary, secondary, or tertiary
difference. The quaternary level should be used if ignoring punctuation is
required or when processing Japanese text. |
IDENTICAL or I |
When all other levels are equal, the identical level is used as a tiebreaker.
The Unicode code point values of the NFD form of each string are compared
at this level, just in case there is no difference at levels 1-4. |
For example, only Hebrew cantillation marks are distinguished at this
level. This level should be used sparingly, as only code point values differences
between two strings is an extremely rare occurrence. |
- case order
-
sorts uppercase and lowercase letters. This argument is
valid for only TERTIARY, QUATERNARY, or IDENTICAL. The following table provides
the values and information for the case order argument.
Value |
Description |
UPPER or U |
Sorts upper case letters first, then the lower case letters. |
LOWER or L |
Sorts lower case letters first, then the upper case
letters. |
- numeric collation
-
orders numbers by the numeric value instead of the number's
characters.
Value |
Description |
NUMERIC or N |
Order numbers (integers) by the numeric value. For example, "8 Main
St." would sort before "45 Main St.". |
- collation order
-
There are two types of collation values: Phonebook and Traditional.
If you do not select a collation value, then the user's locale-default collation
is selected. The following table provides more information.
Value |
Description |
PHONEBOOK or P |
specifies a phonebook style ordering of characters. Select PHONEBOOK
only with the German language. |
TRADITIONAL or T |
specifies a traditional style ordering of characters. Select TRADITIONAL
only with the Spanish language. |
The SORTKEY function creates a linguistic sort key for
data. You must enter at least one argument. If the length of the variable
that receives the key is not large enough, the data truncates, and a warning
is displayed.
locale |
Locale values use the POSIX name (ll_RR). LL represents
the two-letter language code, and RR represents the two-letter region code.
For example, en_US is the POSIX name for English, United States. en represents
the English language, and US represents the United States. If a locale value
is not specified, then the session locale is used. |
strength |
The strength argument determines whether accents
or case affect collating or matching text. If no value is specified for strength,
then the locale determines the value. The following values can be specified
for strength.
PRIMARY |
This value includes base letters, for example, the
letters, A, a, and Å are all processed the same. |
SECONDARY |
This value processes data the same as PRIMARY, and
accents are processed. The letters A and a are processed equally, and Å
is processed as an accented character. |
TERTIARY |
This value processes data the same as SECONDARY,
and the character's case is processed. For example, A, a, and Å are
all processed differently. |
QUATERNARY |
This value processes data the same as TERTIARY, and
punctuation is processed. |
IDENTICAL |
This value process data the same as QUATERNARY, and
code point is processed. |
|
case order |
specifies to sort data using upper case or lower
case letter. The following table shows examples of specifying the UPPER value
or the LOWER value.
UPPER |
LOWER |
Aztec |
aztec |
aztec |
Aztec |
Mars |
mars |
mars |
Mars |
|
collation order |
The collation order value PHONEBOOK is ignored unless
the locale is a German language.
The collation order value TRADITIONAL is ignored unless the locale is
a Spanish language.
A warning message displays for other locales. |
|
|
Copyright © 2010 by SAS Institute Inc., Cary, NC, USA. All rights reserved.