Sorted Data Sets

The Sort Indicator

What Is a Sort Indicator?

After a data set is sorted, a sort indicator is added to the data set descriptor information. The sort indicator is updated without a permanent sort of the data set by using the SORTEDBY= data set option. The Sortedby and Validated sort information is updated when the SORTEDBY= data set option is used.
The sort indicator contains some or all of the following sort information of a SAS data set:
  • how the data set is sorted by which variable or variables
  • whether the order for a variable is descending or ascending
  • the character set used for character variables
  • the collating sequence used for ordering character data
  • collation rules if the data set is sorted linguistically
  • whether there is only one observation for any given BY group (use of NODUPKEY option)
  • whether there are no adjacent duplicate observations (use of NODUPREC option)
  • whether the data set is validated
The sort indicator is set when a data set is sorted by a SORT procedure, an SQL procedure with an ORDER BY clause, a DATASETS procedure MODIFY statement, or a SORTEDBY= data set option. If the SORT or SQL procedures were used to sort the data set, which is being sorted by SAS, the CONTENTS procedure output indicates that the Validated sort information is YES. If the SORTEDBY= data set option was used to sort the data set, which is being sorted by the user, the CONTENTS procedure output indicates the Validated sort information is set to NO and the Sortedby sort information is updated with the variable or variables specified by the data set option.
Data sets can be sorted outside of SAS. In that case, you might use the SORTEDBY= data set option or the DATASETS procedure MODIFY statement to add the sort order to the sort indicator. In this case, they are not validated. For more information, see Validating That a Data Set Is Sorted.
To view the sort indicator information, use the CONTENTS procedure or the CONTENTS statement in the DATASETS procedure. The following three examples show the sort indicator information in the CONTENTS procedure output.

Example 1: Using No Sorting

The first example is a data set created without any type of sort:
options yearcutoff=1920;
libname myfiles 'C:\My Documents';

data myfiles.sorttest1;
   input priority 1. +1 indate date7.
         +1 office $ code $;
   format indate date7.;
   datalines;
1 03may11 CH J8U
1 21mar11 LA M91
1 01dec11 FW L6R
1 27feb10 FW Q2A
2 15jan11 FW I9U
2 09jul11 CH P3Q
3 08apr10 CH H5T
3 31jan10 FW D2W
;
proc contents data=myfiles.sorttest1;
run;
Note that the CONTENTS procedure output indicates there was no sort. SAS did not sort the data set, and the user did not specify that the data is sorted.
Contents of SORTTEST1 Data Set – No Sorting
Contents of SORTTEST1 Data Set – No Sorting

Example 2: Using the SORTEDBY= Data Set Option

In the second example, the data set is created using the SORTEDBY= data set option in the DATA statement.
options yearcutoff=1920;
libname myfiles 'C:\My Documents';

data myfiles.sorttest1 (sortedby=priority descending indate);
   input priority 1. +1 indate date7.
         +1 office $ code $;
   format indate date7.;
   datalines;
1 03may01 CH J8U
1 21mar01 LA M91
1 01dec00 FW L6R
1 27feb99 FW Q2A
2 15jan00 FW I9U
2 09jul99 CH P3Q
3 08apr99 CH H5T
3 31jan99 FW D2W
;
proc contents data=myfiles.sorttest1;
run;
Note that the CONTENTS procedure output shows that the data set is sorted. Therefore, a Sort Information section containing sort indicator information is created. In the Sort Information section, the Sortedby information indicates the data set is sorted by the PRIORITY variable and is in descending order by the INDATE variable. The data set is sorted using the SORTEDBY= data set option, so the Validated information is NO. The Character Set information for the data set is ANSI.
Contents of SORTTEST1 Data Set – Sorted
Contents of SORTTEST1 Data Set – Sorted
Contents of SORTTEST1 Data Set – Sorted
Contents of SORTTEST1 data set – sorted

Example 3: Using the SORT Procedure

In the third example, the data set is sorted using the SORT procedure.
options yearcutoff=1920;
libname myfiles 'C:\My Documents';

data myfiles.sorttest1;
   input priority 1. +1 indate date7.
         +1 office $ code $;
   format indate date7.;
   datalines;
1 03may01 CH J8U
1 21mar01 LA M91
1 01dec00 FW L6R
1 27feb99 FW Q2A
2 15jan00 FW I9U
2 09jul99 CH P3Q
3 08apr99 CH H5T
3 31jan99 FW D2W
;
proc sort data=myfiles.sorttest1;
     by priority descending
indate;
run;

proc contents data=myfiles.sorttest1;
run;
Note that the CONTENTS procedure output shows that the data set is sorted. Therefore, a Sort Information section containing sort indicator information is created. In the Sort Information section, the Sortedby information indicates the data set is sorted by the PRIORITY variable and is in descending order by the INDATE variable. The data set is sorted using the SORT procedure, so the Validated information is YES. The Character Set information for the data set is ANSI.
Contents of SORTTEST1 Data Set – Validated Sort
Contents of SORTTEST1 Data Set – Validated Sort
Contents of SORTTEST1 Data Set – Validated Sort
Contents of SORTTEST1 data set – validated sort

How SAS Uses the Sort Indicator to Improve Performance

The sort information provided by the sort indicator is used internally for performance improvements. There are several ways to improve performance using the sort indicator:
  • SAS uses the sort indicator to validate whether there was a previous sort. If there was a previous sort from a SORT procedure or an SQL procedure with an ORDER BY clause, then SAS does not perform another sort.
    • The SORT procedure sets the sort indicator when a sort occurs. The SORT procedure checks for the sort indicator before it sorts a data set so that data is not re-sorted unnecessarily. For more information, see the SORT Procedure in the Base SAS Procedures Guide.
    • The SQL procedure uses the sort indicator to process queries more efficiently and to determine whether an internal sort is necessary before performing a join. For more information, see the SQL procedure in the Base SAS Procedures Guide.
  • When using the sort indicator during index creation, SAS determines whether the data is already sorted by the key variable or variables in ascending order by checking the sort indicator in the data file. If the values in the sort indicator are in ascending order, SAS does not sort the values for the index file. For more information, see Understanding SAS Indexes.
  • When processing a WHERE expression without an index, SAS first checks the sort indicator. If the Validated sort information is YES, SAS stops reading the file once there are no more values that satisfy the WHERE expression.
  • If an index is selected for WHERE expression processing, the sort indicator for that data set is changed to reflect the order that is specified by the index.
  • For BY-group processing, if the data set is already sorted by the BY variable, SAS does not use the index, even if the data set is indexed on the BY variable.
  • If the Validated sort information is set to YES, SAS does not need to perform another sort.

Validating That a Data Set Is Sorted

Any SAS procedure that requires data to be sorted as part of the process checks the sort indicator information. The sort indicator is set when a data set is sorted by a SORT procedure, an SQL procedure with an ORDER BY clause, a DATASETS procedure MODIFY statement, or a SORTEDBY= data set option. If the SORT or SQL procedures were used to sort the data set, the CONTENTS procedure output indicates the Validated sort information is YES. If the SORTEDBY= data set option was used to sort the data set, the CONTENTS procedure output indicates the Validated sort information is NO. For examples of the CONTENTS procedure output, see Example 1: Using No Sorting, Example 2: Using the SORTEDBY= Data Set Option, and Example 3: Using the SORT Procedure.
You can use the SORTVALIDATE system option to specify whether the SORT procedure validates that a data set is sorted correctly when the data set sort indicator shows that a user has specified that the data set is sorted. The user can specify a sort order by using the SORTEDBY= data set option in a DATA statement or by using the SORTEDBY= option in the DATASETS procedure MODIFY statement. When the sort indicator is set by a user, SAS cannot be absolutely certain that a data set is sorted according to the variables in the BY statement.
If the SORTVALIDATE system option is set and the data set sort indicator was set by a user, the SORT procedure performs a sequence check on each observation to ensure that the data set is sorted according to the variables in the BY statement. If the data set is not sorted correctly, SAS sorts the data set.
At the end of a successful sequence check or at the end of a sort, the SORT procedure sets the sort indicator Validated sort information to YES. If a sort is performed, the SORT procedure also updates the sort indicator Sortedby sort information to the variables in the BY statement. If an output data set is specified, the sort indicator Validated sort information in the output data set is set to YES. If no sort is necessary, the data set is copied to the output data set. For more information about validated data sets, see the SORTVALIDATE System Option in SAS System Options: Reference.