Sorting Data

Overview to Sorting Data

You can sort table rows by the values of one or more character or numeric columns. For Base SAS tables and third-party relational database tables, the process either replaces the original table or creates a new table. You can perform sorting in two ways:
  • using the SAS SORT procedure
  • setting properties for a SAS sort template in SAS Data Integration Studio, as shown in the following figure:
    The Sort by Columns Tab in the Sort Properties Dialog Box
    The Sort by Columns Tab in the Sort Properties Dialog Box
To manage the memory that is used for the sorting process, you can specify the maximum amount of memory that is available to the sort. Generally, the sort size should be less than the physical memory available to the process. If the sorting requires more memory than you specify, then SAS creates a temporary utility file on disk. To specify a sort size in SAS Data Integration Studio, access the Options tab in the properties window for the sort template and enter a value in the Sortsize field, as shown in the following figure:
The Options Tab in the SAS Sort Properties Dialog Box
The Options Tab in the SAS Sort Properties Dialog Box
The SPD Engine has implicit sorting capabilities, which saves time and resources for SAS applications that process large tables. When the SPD Engine encounters a BY clause, if the data is not already sorted or indexed on the BY column, then the SPD Engine automatically sorts the data without affecting the permanent table or producing a new table. You can change the implicit sorting options when you define an SPD Engine library in the metadata. See Setting LIBNAME Options That Affect Performance of SPD Engine Tables.
For more information about the SORT procedure, see the Base SAS Procedures Guide.

Multi-Threaded Sorting

The SAS system option THREADS activates multi-threaded sorting, which achieves a degree of parallelism in the sorting operations. This parallelism is intended to reduce the real time to completion for a given operation. However, the parallelism comes at the possible cost of additional CPU resources. For more information, see "Support for Parallel Processing" in SAS Language Reference: Concepts.
The performance of the multi-threaded sort is affected by the value of the SAS system option CPUCOUNT=. CPUCOUNT= indicates how many system CPUs are available for use by the multi-threaded sort. The multi-threaded sort supports concurrent input from the partitions of a partitioned table.
Note: For information about the support of partitioned tables in your operating environment, see the SAS documentation for your operating environment.
For more information about THREADS and CPUCOUNT=, see the chapter about SAS system options in SAS System Options: Reference.

Sorting a Database Table

When you use a third-party database table, the column ordering that is produced by the SORT procedure depends on whether the DBMS or SAS performs the sorting. If you use the BEST value of the SAS system option SORTPGM=, then either the DBMS or SAS performs the sort. If the DBMS performs the sort, then the configuration and characteristics of the DBMS sorting program affect the resulting data order. Most database management systems do not guarantee sort stability, and the sort might be performed by the database table regardless of the state of the SORTEQUALS or NOSORTEQUALS system options and the EQUALS or NOEQUALS procedure options.
If you set the SAS system option SORTPGM= to SAS, then unordered data is delivered from the DBMS to SAS and SAS performs the sorting. However, consistency in the delivery order of columns from a database table is not guaranteed. Therefore, even though SAS can perform a stable sort on the DBMS data, SAS cannot guarantee that the ordering of columns within output BY groups will be the same, run after run. To achieve consistency in the ordering of columns within BY groups, first populate a SAS table with the database table and then use the EQUALS or SORTEQUALS option to perform a stable sort.