Sort Data Task

About the Sort Data Task

The Sort Data task enables you to sort the table by any of its columns. The result from this task is a sorted table in the Work library. No results or output data is displayed when you run this task.

Assigning Data to Roles

To run the Sort Data task, you must assign a column to the Sort by role.
Role
Description
Sort by
When you assign one or more variables to this role, the table is grouped by the selected variable or variables. The order in which the variables appear within this role determines which variable is the primary sort key, which variable is the secondary sort key, and so on. The primary sort key is always the first variable that is listed within the Sort by role.
Columns to drop
When you assign one or more variables to this role, the output that is generated does not contain the specified variables. You can assign a maximum of (n – 1) variables to this role, where n is the total number of variables in the table.

Setting Options

Option Name
Description
Output Order
Collating sequence
indicates what collating sequence to use when sorting character variables. You can use these collation standards:
  • sequence that is defined on the server (Server default)
  • the ASCII or EBCDIC collating sequences
  • the reverse collation order for character variables
  • a national standard, such as Danish, Finnish, Italian, Norwegian, Spanish, or Swedish
  • a custom-defined collating sequence that is defined by your installation site
Maintain original data order within ‘Sort by’ groupings
groups the data according to the order that you set for the Sort by role. If this option is not selected, then the output table is grouped in an undefined order within the sorted key groups.
Duplicate Records
Keep all records
keeps all of the records that are in the output table, including all duplicates of records.
Keep only the first record for each ‘Sort by’ group
eliminates any duplicate observations that have the same values for the Sort by group. If the Group data in the order of the Sort by variable option is selected, then the observation that is retained for each Sort by group is the first one that is read from the original table. However, if the Group data in the order of the Sort by variable option is not selected, then the observation that is kept for each Sort by group cannot be predetermined.
Do not keep adjacent duplicate records
compares each record to the previous record in the output table. If an exact match is found, the duplicate record is not written to the output table.
Note: If you do not assign all variables to the Sort by role, some duplicate records might not be removed because the records are not adjacent.
Advanced Sorting
Memory for sorting
specifies the maximum amount of memory that can be used for the Sort Data task. You can specify the amount of memory in bytes (B), kilobytes (KB), megabytes (MB), or gigabytes (GB). You can also specify to use all of the available memory or to use the default amount of memory that has been allocated on the server.
Reduce temporary disk space requirements
indicates that during the Sort Data process, only the Sort by variables and the observation numbers are stored within temporary files, reducing the amount of storage necessary to perform the sort. In the final phase of the sort, the temporary file is used as an index to access the original table and then to send the data to the results table in the correctly sorted sequence.
Force a sort of indexed data
indicates that you want to sort all tables even if the table is already sorted in the desired sequence or the table contains a user-created index with keys that reflect those specified in the Sort by role. If you specify this option, the table is sorted regardless of the current order of the table or whether it contains an index.
Results
Location to save output data
specifies the location for the output table. By default, this table is saved to the temporary Work library.