SORT Procedure: Windows

Sorts observations in a SAS data set by one or more variables, and then stores the resulting sorted observations in a new SAS data set or replaces the original data set.

Windows specifics: Sort utilities available; SORTSIZE= and TAGSORT statement options
See: SORT Procedure in Base SAS Procedures Guide

Syntax

PROC SORT <option(s)> <collating-sequence-option> ;

Required Arguments

SORTSIZE=memory-specification

specifies the maximum amount of memory available to the SORT procedure. For further explanation of the SORTSIZE= option, see the following Details section.

TAGSORT

stores only the BY variables and the observation number in temporary files. When you specify TAGSORT, the sort is a single-threaded sort. Do not specify TAGSORT if you want SAS to use multiple threads to sort. For details about TAGSORT option, see the following Details section.

Details

Sort Procedure Syntax

This version is a simplified version of the SORT procedure syntax. For the complete syntax and its explanation, see the SORT procedure in SORT Procedure in Base SAS Procedures Guide
The SORT procedure sorts observations in a SAS data set by one or more character or numeric variables, either replacing the original data set or creating a new, sorted data set. By default under Windows, the SORT procedure uses the ASCII collating sequence.
The SORT procedure uses the sort utility specified by the SORTPGM system option. Sorting can be done by SAS, your database, or the Windows SyncSort utility. You can use all the options available to the SAS sort utility, such as the SORTSEQ and NODUPKEY options. For a complete list of all options available, see the list of sort options in the See Also section.

SORTSIZE= Option

Under Windows, you can use the SORTSIZE= option in the PROC SORT statement to limit the amount of memory that is available to the SORT procedure. This option might reduce the amount of swapping SAS must do to sort the data set. If PROC SORT needs more memory than you specify, it creates a temporary utility file to store the data in. The SORT procedure's algorithm can swap data more efficiently than Windows can.
The syntax of the SORTSIZE= option is as follows:

Syntax

SORTSIZE=memory-specification
where memory-specification can be one of the following:
n
specifies the amount of memory in bytes.
nK
specifies the amount of memory in 1-kilobyte multiples.
nM
specifies the amount of memory in 1-megabyte multiples.
The default SAS configuration file sets this option to 1G using the SORTSIZE= system option.
You can override the default value of the SORTSIZE= system option by specifying a different SORTSIZE= value in the PROC SORT statement, or by submitting an OPTIONS statement that sets the SORTSIZE= system option to a new value.

TAGSORT Option

The TAGSORT option in the PROC SORT statement is useful in sorts when there might not be enough disk space to sort a large SAS data set. When you specify TAGSORT, the sort is a single-threaded sort. Do not specify TAGSORT if you want the SAS to use multiple threads to sort.
When you specify the TAGSORT option, only sort keys (that is, the variables specified in the BY statement) and the observation number for each observation are stored in the temporary files. The sort keys, together with the observation number, are referred to as tags. At the completion of the sorting process, the tags are used to retrieve the records from the input data set in sorted order. Thus, in cases where the total number of bytes of the sort keys is small compared with the length of the record, temporary disk use is reduced considerably. You should have enough disk space to hold another copy of the data (the output data set) or two copies of the tags, whichever is greater. Note that while using the TAGSORT option can reduce temporary disk use, the processing time can be much higher. However, on PCs with limited available disk space, the TAGSORT option can allow sorts to be performed in situations where they would otherwise not be possible.

Creating Your Own Collating Sequences

If you want to provide your own collating sequences or change a collating sequence that has been provided for you, use the TRANTAB procedure to create or modify translate tables. For more information about the TRANTAB procedure, see SAS National Language Support (NLS): Reference Guide. When you create your own translate tables, they are stored in your Sasuser.Profile catalog and they override any translate tables by the same name that are stored in the HOST catalog.
Note: System managers can modify the HOST catalog by copying newly created tables from the Sasuser.Profile catalog to the HOST catalog. Then all users can access the new or modified translate table.
If you want to see the names of the collating sequences stored in the HOST catalog (using the SAS Explorer), submit the following statement:
dm 'catalog sashelp.host' catalog;
Alternatively, you can select the View menu, select the explorer item, double-click the Sashelp library, and then double-click the HOST catalog. In batch mode, you can use the following statements to generate a list of the contents of the HOST catalog:
proc catalog catalog=sashelp.host;
   contents;
run;
Entries of type TRANTAB are the collating sequences.
If you want to see the contents of a particular translate table, use the following statements:
proc trantab table=table-name;
   list;
run;
The contents of the collating sequence are displayed in the SAS log.

Using SyncSort with SAS

If SyncSort is installed at your site, you can use Syncsort as an alternative sorting algorithm to the database sort or the SAS sort. SAS determines which sort to use by the values that are set for the SORTPGM, SORTCUT, and SORTCUTP system options.
The SyncSort installation process adds the SyncSort directory to the Windows PATH statement. As long as the SyncSort directory is included in the Windows PATH statement, SAS is able to launch SyncSort. SyncSort is developed by Syncsort, Inc.

Setting SyncSort as the Sort Algorithm

To always sort using the SyncSort sort routine, the value of the SORTPGM system option must be HOST. To set this option, submit the following OPTIONS statement:
options sortpgm=host;
Note: The SORTPGM option can also be set from the System Options window, in the SAS configuration file, or during SAS invocation. This example shows how to specify the SORTPGM system option at invocation or in the SAS configuration file:
-sortpgm host

Sorting Based on Size or Observations

The sort routine that SAS uses can be based on either the number of observations in a data set or on the size of the data set. When the SORTPGM option is set to BEST, SAS uses the first available and pertinent sorting algorithm based on this order of precedence:
  • database sort utility
  • host sort utility
  • SAS sort utility
If sorting is not to be done by the database, SAS looks at the values for the SORTCUT and SORTCUTP options to determine which sort to use.
SyncSort is used when the number of observations is greater than or equal to the value of sortcut. The SORTCUTP option specifies the number of bytes in the data set above which SyncSort is used.
If SORTCUT and SORTCUTP are set to zero, SAS uses the SAS sort routine. If you specify both options and either condition is met, SAS uses SyncSort.
When the following OPTIONS statement is in effect, the SyncSort routine is used when the number of observations is 501 or greater:
options sortpgm=best sortcut=500;
Here, the SyncSort routine is used when the size of the data set is greater than 40M:
options sortpgm=best sortcutp=40M;
For more information about these sort options, see SORTPGM System Option: Windows, SORTCUT System Option: Windows , and SORTCUTP System Option: Windows.

Changing the Location of SyncSort Temporary Files

By default, SyncSort uses the location that is specified in the WORK option for temporary files. To change the location of SyncSort temporary files, specify a new location by using the SORTDEV option. Here is an example:
options sortdev="c:\temp\sortsync";
For more information about the SORTDEV options, see SORTDEV System Option: Windows .

Passing Options to SyncSort

Use the SORTANOM option to specify the options that you want to use for SyncSort:
SORTANOM Options for SyncSort
Task
SORTANOM Option
Run in multi-call mode instead of single-call mode
SORTANOM=b
Print statistics in the SAS log about the sorting process
SORTANOM=t
Print in the SAS log the commands that have been passed to Syncsort
SORTANOM=v
Multiple options can be specified by concatenating the options:
options sortdev=btv;
For more information about the SORTANOM option, see SORTANOM System Option: Windows.

Passing Parameters to SyncSort

Use the SORTPARM option to pass Syncsort options to SyncSort. Enclose the options in quotations marks as in this OPTIONS statement:
options sortparm="SyncSort-options";
For information about the SORTPARM option, see SORTPARM System Option: Windows. See the SyncSort documentation for a description of the SyncSort options.

Specifying the SORTSEQ= Option with a Host Sort Utility

The SORTSEQ= option enables you to specify the collating sequence for your sort. For a list of valid values, see the SORT procedure in Base SAS Procedures Guide.
CAUTION:
If you are using a host sort utility to sort your data, then specifying the SORTSEQ= option might corrupt the character BY variables if the sort sequence translation table and its inverse are not one-to-one mappings.
The translation table must map each character to a unique weight, and the inverse table must map each weight to a unique character variable.
If your translation tables are not one-to-one mappings, then you can use one of the following methods to perform your sort:
  • create a translation table that maps one-to-one. When you create a translation table that maps one-to-one, you can easily create a corresponding inverse table by using the TRANTAB procedure. If your table is not mapped one-to-one, then you receive the following note in the SAS log when you try to create an inverse table:
    NOTE:  This table cannot be mapped one to one.
    For more information, see the TRANTAB procedure in SAS National Language Support (NLS): Reference Guide.
  • use the SAS sort. You can specify the SAS sort by using the SORTPGM system option. For more information, see SORTPGM System Option: Windows.
  • specify the collation order options of your host sort utility. See the documentation for your host sort utility for more information.
  • create a view with a dummy BY variable.
Note: After using one of these methods, you might need to perform subsequent BY processing using either the NOTSORTED option or the NOBYSORTED system option. For more information about the NOTSORTED option, see the BY statement in SAS Statements: Reference. For more information about the NOBYSORTED system option, see the BYSORTED system option in SAS System Options: Reference.

Example: Creating a View with a Dummy BY Variable

The following code is an example of creating a view using a dummy BY variable:
options no date nostimer ls-78 ps-60;
options sortpgm=host msglevel=i;
data one;
   input name $ age;
   datalines;
   anne 35
   ALBERT 10
   JUAN 90
   janet 5
   bridget 23
   BRIAN 45
   ;
run;
data oneview / view=oneview;
   set one;
   name1=upcase(name);
run;
proc sort data=oneview out=final(drop=name1);
   by name1;
run;
proc print data=final;
run;
The output is the following: