Options for Commands, Statements, and Procedures for NLS |
Valid in: | PROC SORT statement |
PROC SORT statement: | Sorts observations in a SAS data set by one or more characters or numeric variables |
Syntax | |
Options | |
See Also |
Syntax |
PROC SORT collating-sequence-option <other option(s)>; |
Options |
Task | Option | |
---|---|---|
Specify the collating sequence |
|
|
|
Specify ASCII |
ASCII |
|
Specify EBCDIC |
EBCDIC |
|
Specify Danish |
DANISH |
|
Specify Finnish |
FINNISH |
|
Specify Norwegian |
NORWEGIAN |
|
Specify Polish |
POLISH |
|
Specify Swedish |
SWEDISH |
|
Specify a customized sequence |
NATIONAL |
|
Specify any of the collating sequences listed above (ASCII, EBCDIC, DANISH, FINNISH, ITALIAN, NORWEGIAN, POLISH, SPANISH, SWEDISH, or NATIONAL), the name of any other system provided translation table (POLISH, SPANISH), and the name of a user-created translation table. You can specify an encoding. You can also specify either the keyword LINGUISTIC or UCA to achieve a locale-appropriate collating sequence. |
SORTSEQ= |
Options can include one collating-sequence-option and multiple other options. The order of the two types of options does not matter and both types are not necessary in the same PROC SORT step. Only the explanations for the PROC SORT collating-sequence-options follow.
Operating Environment Information: For information about behavior specific to your operating environment for the DANISH, FINNISH, NORWEGIAN, or SWEDISH collating-sequence-option, see the SAS documentation for your operating environment.
sorts character variables using the ASCII collating sequence. You need this option only when you want to achieve an ASCII ordering on a system where EBCDIC is the native collating sequence.
sorts characters according to the Danish and Norwegian convention.
The Danish and Norwegian collating sequence is shown in National Collating Sequences of Alphanumeric Characters.
sorts character variables using the EBCDIC collating sequence. You need this option only when you want to achieve an EBCDIC ordering on a system where ASCII is the native collating sequence.
sorts characters according to the Polish convention.
sorts characters according to the Finnish and Swedish convention. The Finnish and Swedish collating sequence is shown in National Collating Sequences of Alphanumeric Characters.
sorts character variables using an alternate collating sequence, as defined by your installation, to reflect a country's National Use Differences. To use this option, your site must have a customized national sort sequence defined. Check with the SAS Installation Representative at your site to determine whether a customized national sort sequence is available.
See DANISH.
See FINNISH.
specifies the collating sequence. The collating-sequence can be a collating-sequence-option, a translation table, an encoding, or the keyword LINGUISTIC. Only one collating sequence can be specified. For detailed information, refer to Collating Sequence.
Here are descriptions of the collating sequences:
specifies either a translation table, which can be one that SAS provides or any user-defined translation table, or one of the PROC SORT statement Collating-Sequence-Options. For an example of using PROC TRANTAB and PROC SORT with SORTSEQ=, see Using Different Translation Tables for Sorting.
The available translation tables are
ASCII | |
DANISH | |
EBCDIC | |
FINNISH | |
ITALIAN | |
NORWEGIAN | |
POLISH | |
REVERSE | |
SPANISH | |
SWEDISH |
The following figure shows how the alphanumeric characters in each language will sort.
National Collating Sequences of Alphanumeric Characters
specifies an encoding value. The result is the same as a binary collation of the character data represented in the specified encoding. See the supported encoding values in SBCS, DBCS, and Unicode Encoding Values for Transcoding Data.
Restriction: | PROC SORT is the only procedure or part of the SAS system that recognizes an encoding specified for the SORTSEQ= option. |
Tip: | When the encoding value contains a character other than an alphanumeric character or underscore, the value needs to be enclosed in quotation marks. |
See: | The list of the encodings that can be specified in SBCS, DBCS, and Unicode Encoding Values for Transcoding Data. |
specifies linguistic collation, which sorts characters according to rules of the specified language. The rules and default collating sequence options are based on the language specified in the current locale setting. The implementation is provided by the International Components for Unicode (ICU) library and produces results that are largely compatible with the Unicode Collation Algorithms (UCA).
Alias: | UCA |
Restriction: | The SORTSEQ=LINGUISTIC option is available only on the PROC SORT SORTSEQ= option and is not available for the SAS System SORTSEQ= option. |
Restriction | Note that linguistic collation is not supported on platforms VMS on Itanium (VMI) or 64-bit Windows on Itanium (W64). |
Tip: | LINGUISTIC sorting requires more memory with the z/OS mainframe. You might need to set your REGION to 50M or higher. This action must be done in JCL, if you are running in batch mode, or in the VERIFY screen if you are running interactively. This action allows the ICU libraries to load properly and does not affect the memory that is used for sorting. |
Tip: | The collating-rules must be enclosed in parentheses. More than one collating rule can be specified. |
Tip: | When BY processing is performed on data sets that are sorted with linguistic collation, the NOBYSORTED system option might need to be specified in order for the data set to be treated properly. BY processing is performed differently than collating sequence processing. |
See: | The ICU License agreement in the Base SAS Procedures Guide. |
See: | The Collating Sequence for detailed information on linguistic collation. |
See Also: | Refer to http://www.unicode.org Web site for the Unicode Collation Algorithm (UCA) specification. |
The following are the collation-rules that can be specified for the LINGUISTIC option. These rules modify the linguistic collating sequence:
controls the handling of variable characters like spaces, punctuation, and symbols. When this option is not specified (using the default value Non-Ignorable), differences among these variable characters are of the same importance as differences among letters. If the ALTERNATE_HANDLING option is specified, these variable characters are of minor importance.
Default: | NON_IGNORABLE |
Tip: | The SHIFTED value is often used in combination with STRENGTH= set to Quaternary. In such a case, whitespace, punctuation, and symbols are considered when comparing strings, but only if all other aspects of the strings (base letters, accents, and case) are identical. |
specify order of uppercase and lowercase letters. This argument is valid for only TERTIARY, QUATERNARY, or IDENTICAL levels. The following table provides the values and information for the CASE_FIRST argument:
Value | Description |
---|---|
UPPER | Sorts uppercase letters first, then the lowercase letters. |
LOWER | Sorts lowercase letters first, then the uppercase letters. |
The following table lists the available COLLATION= values: If you do not select a collation value, then the user's locale-default collation is selected.
specifies the locale name in the form of a POSIX name. For example, ja_JP. See the Values for the LOCALE= System Option for a list of locale and POSIX values supported by PROC SORT.
Restriction: |
The following locales are not supported
by PROC SORT:
|
orders integer values within the text by the numeric value instead of characters used to represent the numbers.
Default: | OFF |
The value of strength is related to the collation level. There are five collation-level values. The following table provides information about the five levels. The default value for strength is related to the locale.
Alias: | LEVEL= |
For more information, see the PROC SORT documentation for your operating environment.
See Also |
Procedures
| |||||
System Options: |
Copyright © 2010 by SAS Institute Inc., Cary, NC, USA. All rights reserved.