The SORT Procedure |
PROC SORT <collating-sequence-option> <other option(s)>; |
Task | Option | |
---|---|---|
Specify the collating sequence |
|
|
|
Specify ASCII |
ASCII |
|
Specify EBCDIC |
EBCDIC |
|
Specify Danish |
DANISH |
|
Specify Finnish |
FINNISH |
|
Specify Norwegian |
NORWEGIAN |
|
Specify Polish |
POLISH |
|
Specify Swedish |
SWEDISH |
|
Specify a customized sequence |
NATIONAL |
|
Specify any of the collating sequences listed above (ASCII, EBCDIC, DANISH, FINNISH, ITALIAN, NORWEGIAN, POLISH, SPANISH, SWEDISH, or NATIONAL), the name of any other system provided translation table (POLISH, SPANISH), and the name of a user-created translation table. You can specify an encoding. You can also specify either the keyword LINGUISTIC or UCA to achieve a locale appropriate collating sequence. |
SORTSEQ= |
Specify the input data set |
DATA= |
|
Sort a SAS data set without changing the created and modified dates |
DATECOPY |
|
Create output data sets |
|
|
|
Specifies the output data set |
OUT= |
|
Specifies the output data set to which duplicate observations are written |
DUPOUT= |
Specify the output order |
|
|
|
Maintain relative order within BY groups |
EQUALS |
|
Do not maintain relative order within BY groups |
NOEQUALS |
Eliminate duplicate observations |
|
|
|
Delete observations with duplicate BY values |
NODUPKEY |
|
Delete duplicate observations |
NODUPRECS |
Delete the input data set before the replacement output data set is populated. |
OVERWRITE |
|
Specify whether or not the data set is likely already sorted. |
PRESORTED |
|
|
Reverse the collation order for character variables |
REVERSE |
Specify the available memory |
SORTSIZE= |
|
Force redundant sorting |
FORCE |
|
Reduce temporary disk usage |
TAGSORT |
|
Override SAS system option THREADS |
|
|
|
Enable multi-threaded sorting |
THREADS |
|
Prevent multi-threaded sorting |
NOTHREADS |
Options |
Options can include one collating-sequence-option and multiple other options. The order of the two types of options does not matter and both types are not necessary in the same PROC SORT step.
Operating Environment Information: For information about behavior specific to your operating environment for the DANISH, FINNISH, NORWEGIAN, or SWEDISH collating-sequence-option, see the SAS documentation for your operating environment.
Restriction: | You can specify only one collating-sequence-option in a PROC SORT step. |
sorts character variables using the ASCII collating sequence. You need this option only when you want to achieve an ASCII ordering on a system where EBCDIC is the native collating sequence.
See also: | Sorting Orders for Character Variables |
sorts characters according to the Danish and Norwegian convention.
The Danish and Norwegian collating sequence is shown in National Collating Sequences of Alphanumeric Characters.
sorts character variables using the EBCDIC collating sequence. You need this option only when you want to achieve an EBCDIC ordering on a system where ASCII is the native collating sequence.
See also: | Sorting Orders for Character Variables |
sorts characters according to the Polish convention.
sorts characters according to the Finnish and Swedish convention. The Finnish and Swedish collating sequence is shown in National Collating Sequences of Alphanumeric Characters.
sorts character variables using an alternate collating sequence, as defined by your installation, to reflect a country's National Use Differences. To use this option, your site must have a customized national sort sequence defined. Check with the SAS Installation Representative at your site to determine whether a customized national sort sequence is available.
specifies the collating sequence. The collating-sequence can be a collating-sequence-option, a translation table, an encoding, or the keyword LINGUISTIC. Only one collating sequence can be specified. For detailed information, refer to the Collating Sequence section in the SAS National Language Support (NLS): Reference Guide.
Here are descriptions of the collating sequences:
specifies either a translation table, which can be one that SAS provides or any user-defined translation table, or one of the PROC SORT statement Collating-Sequence-Options. For an example of using PROC TRANTAB and PROC SORT with SORTSEQ=, see Using Different Translation Tables for Sorting in SAS National Language Support (NLS): Reference Guide.
The available translation tables are
ASCII | |
EBCDIC | |
POLISH | |
REVERSE | |
The following figure shows how the alphanumeric characters in each language will sort.
National Collating Sequences of Alphanumeric Characters
specifies an encoding value. The result is the same as a binary collation of the character data represented in the specified encoding. See the supported encoding values in the SAS National Language Support (NLS): Reference Guide.
Restriction: | PROC SORT is the only procedure or part of the SAS system that recognizes an encoding specified for the SORTSEQ= option. |
Tip: | When the encoding value contains a character other than an alphanumeric character or underscore, the value needs to be enclosed in quotation marks. |
See: | The list of the encodings that can be specified in the SAS National Language Support (NLS): Reference Guide. |
specifies linguistic collation, which sorts characters according to rules of the specified language. The rules and default collating-sequence options are based on the language specified in the current locale setting. The implementation is provided by the International Components for Unicode (ICU) library. It produces results that are largely compatible with the Unicode Collation Algorithms (UCA).
Alias: | UCA |
Restriction: | The SORTSEQ=LINGUISTIC option is available only on the PROC SORT SORTSEQ= option and is not available for the SAS System SORTSEQ= option. |
Restriction | Note that linguistic collation is not supported on platforms VMS on Itanium (VMI) or 64-bit Windows on Itanium (W64). |
Tip: | The collating-rules must be enclosed in parentheses. More than one collating rule can be specified. |
Tip: | When BY processing is performed on data sets that are sorted with linguistic collation, the NOBYSORTED system option might need to be specified in order for the data set to be treated properly. BY processing is performed differently than collating sequence processing. |
See: | The ICU License agreement. |
See: | The section on Linguistic Collation in the SAS National Language Support (NLS): Reference Guide for detailed information. |
See Also: | Refer to http://www.unicode.org Web site for the Unicode Collation Algorithm (UCA) specification. |
The following are the collation-rules that can be specified for the LINGUISTIC option. These rules modify the linguistic collating sequence:
controls the handling of variable characters like spaces, punctuation, and symbols. When this option is not specified (using the default value Non-Ignorable), differences among these variable characters are of the same importance as differences among letters. If the ALTERNATE_HANDLING option is specified, these variable characters are of minor importance.
Default: | NON_IGNORABLE |
Tip: | The SHIFTED value is often used in combination with STRENGTH= set to Quaternary. In such a case, spaces, punctuation, and symbols are considered when comparing strings, but only if all other aspects of the strings (base letters, accents, and case) are identical. |
specifies the order of uppercase and lowercase letters. This argument is valid for only TERTIARY, QUATERNARY, or IDENTICAL levels. The following table provides the values and information for the CASE_FIRST argument:
Value | Description |
---|---|
UPPER | Sorts uppercase letters first, then the lowercase letters. |
LOWER | Sorts lowercase letters first, then the uppercase letters. |
The following table lists the available COLLATION= values: If you do not select a collation value, then the user's locale-default collation is selected.
Value | Description |
---|---|
UPPER | Sorts uppercase letters first, then the lowercase letters. |
LOWER | Sorts lowercase letters first, then the uppercase letters. |
specifies the locale name in the form of a POSIX name (for example, ja_JP). See the Values for the LOCALE= System Option in SAS National Language Support (NLS): Reference Guide for a list of locale and POSIX values supported by PROC SORT.
Restriction: |
The following locales are not supported
by PROC SORT:
|
orders integer values within the text by the numeric value instead of characters used to represent the numbers.
Default: | OFF |
The value of strength is related to the collation level. There are five collation-level values. The following table provides information about the five levels. The default value for strength is related to the locale.
Alias: | LEVEL= |
For more information, see the PROC SORT documentation for your operating environment.
identifies the input SAS data set.
Restriction: | For in-database processing to occur, it is necessary that the data set refer to a table residing on the DBMS. |
Main discussion: | Input Data Sets |
copies the SAS internal date and time when the SAS data set was created and the date and time when it was last modified before the sort to the resulting sorted data set. Note that the operating environment date and time are not preserved.
Restriction: | DATECOPY can be used only when the resulting data set uses the V8 or V9 engine. |
Tip: | You can alter the file creation date and time with the DTC= option in the MODIFY statement in PROC DATASETS. For more information, see MODIFY Statement. |
specifies the output data set to which duplicate observations are written.
Interaction: | In-database processing does not occur when the DUPOUT= option is specified. |
Tip: | If the DUPOUT= data set name that is specified is the same as the INPUT data set name, SAS will not sort or overwrite the INPUT data set. Instead, SAS will generate an error message. The FORCE option must be specified in order to overwrite the INPUT data set with the DUPOUT= data set of the same name. |
specifies the order of the observations in the output data set. For observations with identical BY-variable values, EQUALS maintains the relative order of the observations within the input data set in the output data set. NOEQUALS does not necessarily preserve this order in the output data set.
sorts and replaces an indexed data set when the OUT= option is not specified. Without the FORCE option, PROC SORT does not sort and replace an indexed data set because sorting destroys user-created indexes for the data set. When you specify FORCE, PROC SORT sorts and replaces the data set and destroys all user-created indexes for the data set. Indexes that were created or required by integrity constraints are preserved.
Restriction: | If you use PROC SORT with the FORCE option on data sets that were created with the Version 5 compatibility engine or with a sequential engine such as a tape format engine, you must also specify the OUT= option. |
Tip: | PROC SORT checks for the sort indicator before it sorts a data set so that data is not sorted again unnecessarily. By default, PROC SORT does not sort a data set if the sort information matches the requested sort. You can use FORCE to override this behavior. You might need to use FORCE if SAS cannot verify the sort specification in the data set option SORTEDBY=. For more information about SORTEDBY=, see the chapter on SAS data set options in SAS Language Reference: Dictionary. |
checks for and eliminates observations with duplicate BY values. If you specify this option, then PROC SORT compares all BY values for each observation to the ones for the previous observation that is written to the output data set. If an exact match is found, then the observation is not written to the output data set.
Operating Environment Information: If you use the VMS operating environment and are using the VMS host sort, the observation that is written to the output data set is not always the first observation of the BY group.
Note: See NODUPRECS for information about eliminating duplicate observations.
Interaction: | When you are removing observations with duplicate BY values with NODUPKEY, the choice of EQUALS or NOEQUALS can have an effect on which observations are removed. |
Interaction: | In-database sorting occurs when the NODUPKEY option is specified and the system option SQLGENERATION= is assigned a DBMS and the system option SORTPGM=BEST. |
Tip: | Use the EQUALS option with the NODUPKEY option for consistent results in your output data sets. |
Featured in: | Retaining the First Observation of Each BY Group |
checks for and eliminates duplicate observations. If you specify this option, then PROC SORT compares all variable values for each observation to the ones for the previous observation that was written to the output data set. If an exact match is found, then the observation is not written to the output data set.
Note: See NODUPKEY for information about eliminating observations with duplicate BY values.
Alias : | NODUP |
Interaction: | When you are removing consecutive duplicate observations in the output data set with NODUPRECS, the choice of EQUALS or NOEQUALS can have an effect on which observations are removed. |
Interaction: | The action of NODUPRECS is directly related to the setting of the SORTDUP= system option. When SORTDUP= is set to LOGICAL, NODUPRECS removes duplicate observations based on the examination of the variables that remain after a DROP or KEEP operation on the input data set. Setting SORTDUP=LOGICAL increases the number of duplicate observations that are removed, because it eliminates variables before observation comparisons take place. Also, setting SORTDUP=LOGICAL can improve performance, because dropping variables before sorting reduces the amount of memory required to perform the sort. When SORTDUP= is set to PHYSICAL, NODUPRECS examines all variables in the data set, regardless of whether they have been kept or dropped. For more information about SORTDUP=, see the chapter on SAS system options in SAS Language Reference: Dictionary. |
Interaction: | In-database processing does not occur when the NODUPRECS option is specified. However, if the NODUPRECS and NODUPKEY options are specified, system option SQLGENERATION= set for in-database processing, and system option SORTPGM=BEST, the NODUPRECS option is ignored and in-database processing does occur. |
Tip: | Use the EQUALS option with the NODUPRECS option for consistent results in your output data sets. |
Tip: | Because NODUPRECS checks only consecutive observations, some nonconsecutive duplicate observations might remain in the output data set. You can remove all duplicates with this option by sorting on all variables. |
See THREADS|NOTHREADS.
names the output data set. If SAS-data-set does not exist, then PROC SORT creates it.
Without the OUT= option, PROC SORT replaces the original data set with the sorted observations when the procedure executes without errors.
Default: | Without OUT=, PROC SORT overwrites the original data set. |
Tip: | With in-database sorts, the output data set cannot refer to the input table on the DBMS. |
Tip: | You can use data set options with OUT=. |
Featured in: | Sorting by the Values of Multiple Variables |
enables the input data set to be deleted before the replacement output data set is populated with observations.
Restriction: | The OVERWIRTE option has no effect when an OUT= data set is specified. |
Restriction: | The OVERWRITE option has no effect if you also specify the TAGSORT option. You cannot overwrite the input data set because TAGSORT must reread the input data set while populating the output data set. |
Restriction: | The OVERWRITE option is supported by the SAS sort and SAS multi-threaded sort only. The option has no effect if you are using a host sort. |
Tip: | Using the OVERWRITE option can reduce disk space requirements. |
Because the input data set is deleted, data will be lost if a failure occurs while the output data set is being written.
before sorting, checks within the input data set to determine whether the sequence of observations are in order. Use the PRESORTED option when you know or strongly suspect that a data set is already in order according to the key variables that are specified in the BY statement. By specifying this option, you avoid the cost of sorting the data set.
Note: See the NODUPRECS option for information about eliminating duplicate observations.
Interaction: | Sequence checking is not performed when the FORCE option is specified. |
Interaction: | When the NODUPRECS option has been specified and one or more variables have been dropped from the input data set, then the SORTDUP system option setting will affect the detection of adjacent duplicate observations. |
Tip: | You can use the DATA step to import data, from external text files, in a sequence compatible with SAS processing and according to the sort order specified by the combination of SORT options and key variables listed in the BY statement. You can then specify the PRESORTED option if you know or highly suspect that the data is sorted accordingly. |
Tip: | Using the PRESORTED option with ACCESS engines and DBMS data is not recommended. These external databases are not guaranteed to return observations in sorted order unless an ORDER BY clause is specified in a query. Generally, physical ordering is not a concept that external databases use. Therefore, these databases are not guaranteed to return observations in the same order when executing a query multiple times. Physical order can be important for producing consistent, repeatable results when processing data. Without a repeatable data retrieval order, PROC SORT does not guarantee the return of observations in the same order from one PROC SORT execution to another, even when the EQUALS option is used to request sort stability. Without a repeatable retrieval order, the detection and elimination of adjacent duplicate records (requested by specifying the NODUPRECS option) by PROC SORT can also vary from one PROC SORT execution to another. |
See also: | System option SORTVALIDATE in SAS Language Reference: Dictionary |
sorts character variables using a collating sequence that is reversed from the normal collating sequence.
Operating Environment Information: For information about the normal collating sequence for your operating environment, see EBCDIC Order, ASCII Order, and the SAS documentation for your operating environment.
Restriction: | The REVERSE option cannot be used with a collating-sequence-option. You can specify either a collating-sequence-option or the REVERSE option in a PROC SORT, but you cannot specify both. |
Interaction: | Using REVERSE with the DESCENDING option in the BY statement restores the sequence to the normal order. |
See also: | The DESCENDING option in the BY statement. The difference is that the DESCENDING option can be used with both character and numeric variables. |
specifies the maximum amount of memory that is available to PROC SORT. Valid values for memory-specification are as follows:
specifies the amount of memory in bytes, where n is a real number.
specifies the amount of memory in kilobytes, where n is a real number.
specifies the amount of memory in megabytes, where n is a real number.
specifies the amount of memory in gigabytes, where n is a real number.
Specifying the SORTSIZE= option in the PROC SORT statement temporarily overrides the SAS system option SORTSIZE=. For more information about SORTSIZE=, see the chapter on SAS system options in SAS Language Reference: Dictionary.
Operating Environment Information: Some system sort utilities might treat this option differently. Refer to the SAS documentation for your operating environment.
stores only the BY variables and the observation numbers in temporary files. The BY variables and the observation numbers are called tags. At the completion of the sorting process, PROC SORT uses the tags to retrieve records from the input data set in sorted order.
Note: The utility file created is much smaller than it would be if the TAGSORT option were not specified.
enables or prevents the activation of multi-threaded sorting.
Default: |
the value of the THREADS | NOTHREADS
SAS system option.
Note: The default can be overridden using the procedure THREADS | NOTHREADS option. |
Restriction: | Your site administrator can create a restricted options table. A restricted options table specifies SAS system option values that are established at startup and cannot be overridden. If the THREADS | NOTHREADS system option is listed in the restricted options table, any attempt to set these system options is ignored and a warning message is written to the SAS log. |
Restriction: | If a failure occurs when adding the THREADS | NOTHREADS procedure option using the SPD engine, PROC SORT stops processing and writes a message to the SAS log. |
Interaction: | The PROC SORT THREADS | NOTHREADS options override the SAS system THREADS | NOTHREADS options unless the system option is restricted. (See Restriction.) For more information about THREADS, see the chapter on SAS system options in SAS Language Reference: Dictionary. |
Interaction: |
The THREADS system option is honored
if PROC SORT determines that multi-threaded processing is deemed to be beneficial.
If the value of the SAS system option CPUCOUNT=1, then multi-threaded processing
is not beneficial. However, you can specify the PROC SORT THREADS option to
force multi-threaded processing when the system option is set to NOTHREADS
or when the system option is THREADS and the procedure option is NOTHREADS.
This option combination prevents multi-threaded processing and overrides the
actions taken that are based on the system options.
Note: When multi-threaded sorting is in effect and NOEQUALS is specified, observations within BY groups might be returned in an unpredictable order. |
Interaction: | If multi-threaded SAS sort is being used, the UTILLOC= system option will affect the placement of utility files. Thread-enabled SAS applications are able to create temporary files that can be accessed in parallel by separate threads. For more information about the UTILLOC= system option, see the chapter on SAS system options in SAS Language Reference: Dictionary. |
Interaction: | The TAGSORT option is not supported by the multi-threaded sort. Specifying the TAGSORT option will prevent multi-threaded processing. |
See also: | Multi-threaded Sorting and Support for Parallel Processing in SAS Language Reference: Concepts. |
Copyright © 2010 by SAS Institute Inc., Cary, NC, USA. All rights reserved.