Previous Page | Next Page

SAS Data Sets

Sorted Data Sets


The Sort Indicator

After a data set is sorted, a sort indicator is added to the data set descriptor information. The sort indicator is updated without a permanent sort of the data set by using the SORTEDBY= data set option. The Sortedby and Validated sort information is updated when the SORTEDBY= data set option is used.

The sort indicator contains some or all of the following sort information of a SAS data set:

The sort indicator is set when a data set is sorted by a SORT procedure, an SQL procedure with an ORDER BY clause, a DATASETS procedure MODIFY statement, or a SORTEDBY= data set option. If the SORT or SQL procedures were used to sort the data set, which is being sorted by SAS, the CONTENTS procedure output indicates that the Validated sort information is YES. If the SORTEDBY= data set option was used to sort the data set, which is being sorted by the user, the CONTENTS procedure output indicates the Validated sort information is set to NO and the Sortedby sort information is updated with the variable or variables specified by the data set option.

Data sets can be sorted outside of SAS. In that case, you might use the SORTEDBY= data set option or the DATASETS procedure MODIFY statement to add the sort order to the sort indicator. In this case, they are not validated. For more information, see Validating That a Data Set Is Sorted.

To view the sort indicator information, use the CONTENTS procedure or the CONTENTS statement in the DATASETS procedure. The following three examples show the sort indicator information in the CONTENTS procedure output.


Example 1: Using No Sorting

The first example is a data set created without any type of sort:

options yearcutoff=1920;

data sasuser.sorttest1;
   input priority 1. +1 indate date7.
         +1 office $ code $;
   format indate date7.;
   datalines;
1 03may07 CH J8U
1 21mar07 LA M91
1 01dec07 FW L6R
1 27feb06 FW Q2A
2 15jan07 FW I9U
2 09jul07 CH P3Q
3 08apr06 CH H5T
3 31jan06 FW D2W
;
proc contents data=sasuser.sorttest1;
run;

Note that the CONTENTS procedure output indicates there was no sort. SAS did not sort the data set, and the user did not specify that the data is sorted.

Contents of SORTTEST1 Data Set -- No Sorting

                                        The SAS System      12:47 Wednesday, October 3, 2007   1

                                      The CONTENTS Procedure

          Data Set Name        SASUSER.SORTTEST1                Observations          8
          Member Type          DATA                             Variables             4
          Engine               V9                               Indexes               0
          Created              Wed, Oct 03, 2007 12:55:59 PM    Observation Length    32
          Last Modified        Wed, Oct 03, 2007 12:55:59 PM    Deleted Observations  0
          Protection                                            Compressed            NO
          Data Set Type                                         Sorted                NO
          Label
          Data Representation  WINDOWS_32
          Encoding             wlatin1  Western (Windows)


                                Engine/Host Dependent Information

     Data Set Page Size          4096
     Number of Data Set Pages    1
     First Data Page             1
     Max Obs per Page            126
     Obs in First Data Page      8
     Number of Data Set Repairs  0
     Filename                    C:\Documents and Settings\suholm\sasuser\sorttest1.sas7bdat
     Release Created             9.0201B0
     Host Created                XP_PRO


                            Alphabetic List of Variables and Attributes

                              #    Variable    Type    Len    Format

                              4    code        Char      8
                              2    indate      Num       8    DATE7.
                              3    office      Char      8
                              1    priority    Num       8

Example 2: Using the SORTEDBY= Data Set Option

In the second example, the data set is created using the SORTEDBY= data set option in the DATA statement.

options yearcutoff=1920;

data sasuser.sorttest1 (sortedby=priority descending indate);
   input priority 1. +1 indate date7.
         +1 office $ code $;
   format indate date7.;
   datalines;
1 03may01 CH J8U
1 21mar01 LA M91
1 01dec00 FW L6R
1 27feb99 FW Q2A
2 15jan00 FW I9U
2 09jul99 CH P3Q
3 08apr99 CH H5T
3 31jan99 FW D2W
;
proc contents data=sasuser.sorttest1;
run;

Note that the CONTENTS procedure output shows that the data set is sorted. Therefore, a Sort Information section containing sort indicator information is created. In the Sort Information section, the Sortedby information indicates the data set is sorted by the PRIORITY variable and is in descending order by the INDATE variable. The data set is sorted using the SORTEDBY= data set option, so the Validated information is NO. The Character Set information for the data set is ANSI.

Contents of SORTTEST1 Data Set - Sorted

                                          The SAS System      12:47 Wednesday, October 3, 2007   2

                                      The CONTENTS Procedure

         Data Set Name        SASUSER.SORTTEST1                Observations          8
         Member Type          DATA                             Variables             4
         Engine               V9                               Indexes               0
         Created              Wed, Oct 03, 2007 01:02:55 PM    Observation Length    32
         Last Modified        Wed, Oct 03, 2007 01:02:55 PM    Deleted Observations  0
         Protection                                            Compressed            NO
         Data Set Type                                         Sorted                YES
         Label
         Data Representation  WINDOWS_32
         Encoding             wlatin1  Western (Windows)


                                Engine/Host Dependent Information

     Data Set Page Size          4096
     Number of Data Set Pages    1
     First Data Page             1
     Max Obs per Page            126
     Obs in First Data Page      8
     Number of Data Set Repairs  0
     Filename                    C:\Documents and Settings\suholm\sasuser\sorttest1.sas7bdat
     Release Created             9.0201B0
     Host Created                XP_PRO


                            Alphabetic List of Variables and Attributes

                              #    Variable    Type    Len    Format

                              4    code        Char      8
                              2    indate      Num       8    DATE7.
                              3    office      Char      8
                              1    priority    Num       8


                                        Sort Information

                            Sortedby       priority DESCENDING indate
                            Validated      NO
                            Character Set  ANSI

Example 3: Using the SORT Procedure

In the third example, the data set is sorted using the SORT procedure.

options yearcutoff=1920;

data sasuser.sorttest1;
   input priority 1. +1 indate date7.
         +1 office $ code $;
   format indate date7.;
   datalines;
1 03may01 CH J8U
1 21mar01 LA M91
1 01dec00 FW L6R
1 27feb99 FW Q2A
2 15jan00 FW I9U
2 09jul99 CH P3Q
3 08apr99 CH H5T
3 31jan99 FW D2W
;
proc sort data=sasuser.sorttest1;
     by priority descending indate;run;

proc contents data=sasuser.sorttest1;
run;

Note that the CONTENTS procedure output shows that the data set is sorted. Therefore, a Sort Information section containing sort indicator information is created. In the Sort Information section, the Sortedby information indicates the data set is sorted by the PRIORITY variable and is in descending order by the INDATE variable. The data set is sorted using the SORT procedure, so the Validated information is YES. The Character Set information for the data set is ANSI.

Contents of SORTTEST1 Data Set - Validated Sort

                                          The SAS System      12:47 Wednesday, October 3, 2007   4

                                      The CONTENTS Procedure

         Data Set Name        SASUSER.SORTTEST1                Observations          8
         Member Type          DATA                             Variables             4
         Engine               V9                               Indexes               0
         Created              Wed, Oct 03, 2007 01:08:35 PM    Observation Length    32
         Last Modified        Wed, Oct 03, 2007 01:08:35 PM    Deleted Observations  0
         Protection                                            Compressed            NO
         Data Set Type                                         Sorted                YES
         Label
         Data Representation  WINDOWS_32
         Encoding             wlatin1  Western (Windows)


                                Engine/Host Dependent Information

     Data Set Page Size          4096
     Number of Data Set Pages    1
     First Data Page             1
     Max Obs per Page            126
     Obs in First Data Page      8
     Number of Data Set Repairs  0
     Filename                    C:\Documents and Settings\suholm\sasuser\sorttest1.sas7bdat
     Release Created             9.0201B0
     Host Created                XP_PRO


                            Alphabetic List of Variables and Attributes

                              #    Variable    Type    Len    Format

                              4    code        Char      8
                              2    indate      Num       8    DATE7.
                              3    office      Char      8
                              1    priority    Num       8


                                        Sort Information

                            Sortedby  priority DESCENDING indate
                            Validated      YES
                            Character Set  ANSI 

How SAS Uses the Sort Indicator to Improve Performance

The sort information provided by the sort indicator is used internally for performance improvements. There are several ways to improve performance using the sort indicator:


Validating That a Data Set Is Sorted

Any SAS procedure that requires data to be sorted as part of the process checks the sort indicator information. The sort indicator is set when a data set is sorted by a SORT procedure, an SQL procedure with an ORDER BY clause, a DATASETS procedure MODIFY statement, or a SORTEDBY= data set option. If the SORT or SQL procedures were used to sort the data set, the CONTENTS procedure output indicates the Validated sort information is YES. If the SORTEDBY= data set option was used to sort the data set, the CONTENTS procedure output indicates the Validated sort information is NO. For examples of the CONTENTS procedure output, see Example 1: Using No Sorting, Example 2: Using the SORTEDBY= Data Set Option, and Example 3: Using the SORT Procedure.

You can use the SORTVALIDATE system option to specify whether the SORT procedure validates that a data set is sorted correctly when the data set sort indicator shows that a user has specified that the data set is sorted. The user can specify a sort order by using the SORTEDBY= data set option in a DATA statement or by using the SORTEDBY= option in the DATASETS procedure MODIFY statement. When the sort indicator is set by a user, SAS cannot be absolutely certain that a data set is sorted according to the variables in the BY statement.

If the SORTVALIDATE system option is set and the data set sort indicator was set by a user, the SORT procedure performs a sequence check on each observation to ensure that the data set is sorted according to the variables in the BY statement. If the data set is not sorted correctly, SAS sorts the data set.

At the end of a successful sequence check or at the end of a sort, the SORT procedure sets the sort indicator Validated sort information to YES. If a sort is performed, the SORT procedure also updates the sort indicator Sortedby sort information to the variables in the BY statement. If an output data set is specified, the sort indicator Validated sort information in the output data set is set to YES. If no sort is necessary, the data set is copied to the output data set. For more information about validated data sets, see the SORTVALIDATE= System Option in SAS Language Reference: Dictionary.

Previous Page | Next Page | Top of Page