SORT Procedure

Concepts: SORT Procedure

Multi-threaded Sorting

The SAS system option THREADS permits multi-threaded sorting, which is new with SAS System 9. Multi-threaded sorting achieves a degree of parallelism in the sorting operations. This parallelism is intended to reduce the real time to completion for a given operation and therefore limit the cost of additional CPU resources. For more information, see Support for Parallel Processing in SAS Language Reference: Concepts.
Note: The TAGSORT option does not support multi-threaded sorting.
The value of the SAS system option CPUCOUNT= affects the performance of the multi-threaded sort. CPUCOUNT= suggests how many system CPUs are available for use by the multi-threaded procedures.
For more information, see the THREADS System Option in SAS System Options: Reference and the CPUCOUNT= System Option in SAS System Options: Reference.

Sorting Orders for Numeric Variables

For numeric variables, the following is the smallest-to-largest comparison sequence:
  1. SAS missing values (shown as a period or special missing value)
  2. negative numeric values
  3. zero
  4. positive numeric values

Sorting Orders for Character Variables

Default Collating Sequence

The order in which alphanumeric characters are sorted is known as the collating sequence. This sort order is determined by the session encoding.
By default, PROC SORT uses either the EBCDIC or the ASCII collating sequence when it compares character values, depending on the environment under which the procedure is running.
For more information about the various collating sequences and when they are used, seeCollating Sequence in SAS National Language Support (NLS): Reference Guide.
Note: ASCII and EBCDIC represent the family names of the session encodings. The sort order can be determined by referring to the encoding.

EBCDIC Order

The z/OS operating environment uses the EBCDIC collating sequence.
The sorting order of the English-language EBCDIC sequence is consistent with the following sort order example.
EBCDIC Sort Order Example
blank . < ( + | & ! $ * ) ; ¬ - / , % _ > ?: # @ ' = "
a b c d e f g h i j k l m n o p q r ~ s t u v w x y z
{ A B C D E F G H I } J K L M N O P Q R \S T
U V W X Y Z
0 1 2 3 4 5 6 7 8 9
The main features of the EBCDIC sequence are that lowercase letters are sorted before uppercase letters, and uppercase letters are sorted before digits. Note also that some special characters interrupt the alphabetic sequences. The blank is the smallest character that you can display.

ASCII Order

The operating environments that use the ASCII collating sequence include the following:
  • UNIX and its derivatives
  • Windows
  • OpenVMS
From the smallest to the largest character that you can display, the English-language ASCII sequence is consistent with the order shown in the following table.
ASCII Sort Order Example
blank ! " # $ % & ' ( ) * + , - . /0 1 2 3 4 5 6 7 8 9 : ; < = > ? @
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z[ \] ∘_
a b c d e f g h i j k l m n o p q r s t u v w x y z { } ~
The main features of the ASCII sequence are that digits are sorted before uppercase letters, and uppercase letters are sorted before lowercase letters. The blank is the smallest character that you can display.

Specifying Sorting Orders for Character Variables

The options EBCDIC, ASCII, NATIONAL, DANISH, SWEDISH, and REVERSE specify collating sequences that are stored in the HOST catalog.
If you want to provide your own collating sequences or change a collating sequence provided for you, then use the TRANTAB procedure to create or modify translation tables. When you create your own translation tables, they are stored in your PROFILE catalog, and they override any translation tables that have the same name in the HOST catalog. For complete details, see TRANTAB Procedure in SAS National Language Support (NLS): Reference Guide.
Linguistic Collation sorts data according to rules of language. For detailed information about Linguistic Collation, see Collating Sequence in SAS National Language Support (NLS): Reference Guide.
Note: System managers can modify the HOST catalog by copying newly created tables from the PROFILE catalog to the HOST catalog. Then all users can access the new or modified translation table.

Stored Sort Information

PROC SORT records the BY variables, collating sequence, and character set that it uses to sort the data set. This information is stored with the data set to help avoid unnecessary sorts.
Before PROC SORT sorts a data set, it checks the stored sort information. If you try to sort a data set how it is currently sorted, then PROC SORT does not perform the sort and writes a message to the log to that effect. To override this behavior, use the FORCE option. If you try to sort a data set how it is currently sorted and you specify an OUT= data set, then PROC SORT simply makes a copy of the DATA= data set.
To override the sort information that PROC SORT stores, use the _NULL_ value with the SORTEDBY= data set option. Refer to the SORTEDBY= Data Set Option in SAS Data Set Options: Reference.
If you want to change the sort information for an existing data set, then use the SORTEDBY= data set option in the MODIFY statement in the DATASETS procedure. For more information, see MODIFY Statement.
To access the sort information that is stored with a data set, use the CONTENTS statement in PROC DATASETS. For more information, see CONTENTS Statement.
The number of variables by which you can sort a data set with PROC SORT is limited only by available memory. The number of columns by which you can order the rows of a result set using PROC SQL, is also limited only by available memory. The sort indicator, whether stored in the metadata of a Base data set or represented in memory, is limited to 127 variables. For this reason, up to 127 variables can be stored in the sort indicator or listed on the SORTEDBY= data set option. If you are sorting by more than 127 variables, then only the first 127 are recorded in the sort indicator. If you sort the data set again by the entire list of BY variables, then the data set will not be recognized as being sorted, because the additional variables (beyond 127) are not found within the sort indicator. For a detailed explanation, refer toWhat Is a Sort Indicator? in SAS Language Reference: Concepts.

Presorted Input Data Sets

Specifying the PRESORTED option prevents SAS from sorting an already sorted data set. Before sorting, SAS checks the sequence of observations within the input data set to determine whether the observations are in order. Use the PRESORTED option when you know or strongly suspect that a data set is already in order according to the key variables specified in the BY statement. The sequence of observations within the data set is checked by reading the data set and comparing the BY variables of each observation read to the BY variables of the preceding observation. This process continues until either the entire data set has been read or an out-of-sequence observation is detected.
If the entire data set has been read and no out-of-sequence observations have been found, then one of two actions is taken. If no output data set has been specified, the sort order metadata of the input data set is updated to indicate that the sequence has been verified. This verification notes that the data set is validly sorted according to the specified BY variables. Otherwise, if the observation sequence has been verified and an output data set is specified, the observations from the input data set are copied to the output data set, and the metadata for the output data set indicates that the data is validly sorted according to the BY variables.
If observations within the data set are not in sequence, then the data set will be sorted.
If the NODUPKEY option has been specified, then the sequence checking determines whether observations with duplicate keys are present in the data set. Otherwise, the input data set is deemed not to be sorted if the NODUPKEY option is specified and observations with duplicate keys are detected.
If the metadata of the input data set indicates that the data is already sorted according to the key variables listed in the BY statement and the input data set has been validated, then neither sequence checking nor sorting will be performed.
See Sorted Data Sets in SAS Language Reference: Concepts and interactions with the SORTVALIDATE System Option in SAS System Options: Reference.