Advanced Performance Tuning Methods

Overview of Advanced Performance Tuning Methods

This section presents some advanced performance topics, such as improving the performance of the SORT procedure and calculating data set size. Use these methods only if you are an experienced SAS user and you are familiar with how SAS is configured on your machine.

Improving Performance of the SORT Procedure

Overview of Improving Performance of the SORT Procedure

Two options for the PROC SORT statement are available under Windows, the SORTSIZE=, and TAGSORT options. These two options control the amount of memory the SORT procedure uses during a sort and are discussed in the next two sections. Also included is a discussion of determining where the sorting process occurs for a given data set and determining how much disk space you need for the sort. For more information about the SORT procedure, see SORT Procedure: Windows .

SORTSIZE Option

The PROC SORT statement supports the SORTSIZE= option, which limits the amount of memory available for PROC SORT to use.
If you do not use the SORTSIZE option in the PROC SORT statement, PROC SORT uses the value of the SORTSIZE system option. If the SORTSIZE system option is not set, PROC SORT uses the amount of memory specified by the REALMEMSIZE system option. If PROC SORT needs more memory than you specify, it creates a temporary utility file in your SAS Work directory to complete the sort.
The default value of this option is 1G.

TAGSORT Option

The TAGSORT option is useful in single-threaded situations where there might not be enough disk space to sort a large SAS data set. The TAGSORT option is not supported for multi-threaded sorts.
When you specify the TAGSORT option, only sort keys (that is, the variables specified in the BY statement) and the observation number for each observation are stored in the temporary files. The sort keys, together with the observation number, are referred to as tags. At the completion of the sorting process, the tags are used to retrieve the records from the input data set in sorted order. Thus, in cases where the total number of bytes of the sort keys is small compared with the length of the record, temporary disk use is reduced considerably. However, you should have enough disk space to hold another copy of the data (the output data set) or two copies of the tags, whichever is greater. Note that although using the TAGSORT option can reduce temporary disk use, the processing time might be much higher.

Choosing a Location for the Sorted File

When you sort a SAS data set, SAS creates a temporary utility file. If the sort uses multiple threads, you can specify the location of the utility file by using the UTILLOC system option. The default location for utility files is the Work data library for both single-threaded and multi-threaded sorts. If two or more locations are specified for the UTILLOC system option, the next available location is used as the location for the utility file. For sorts that use a single thread, the temporary utility file is opened in the Work data library if there is not enough memory to hold the data set during the sort. The utility file has a .sas7butl file extension. Before you sort, ensure that your Work data library has room for this temporary utility file.
The sorted data set replaces the input data set anytime the OUT= option specifies the same name as the IN/DATA= data set. The OVERWRITE option allows for the input data set to be deleted before the output data set has been created , decreasing the peak disk space requirements.
The output data set is populated from the contents of the utility file. The original data set is deleted after the sort is complete if the output data set is replacing the input data set. Before you sort a data set, make sure that you have space for the .sas7butl file.
Use the following rules to determine where the .sas7butl file and the resulting sorted data set are created:
  • If you omit the OUT= option in the PROC SORT statement, the data set is sorted on the drive and in the directory or subdirectory where it is located. For example, if you submit the following statements (note the two-level data set name), the .sas7butl file is created in the subdirectory that was created for the SAS session within the specified WORK directory.
    libname mylib 'c:\sas\mydata';
    proc sort data=mylib.report;
       by name;
    run;
    Similarly, if you specify a one-level data set name, the .sas7butl file is created in your Work data library.
  • If you use the OUT= option in the PROC SORT statement, the .sas7butl file is created in the directory associated with the libref used in the OUT= option. If you use a one-level name (that is, no libref), the .sas7butl file is created in the Work data library. For example, in the following SAS program, the first and second sorts occurs in the SAS Work subdirectory:
    proc sort data=report out=newrpt;
       by name;
    run;
    libname january 'f:\jandata';
    proc sort data=report out=january.newrpt;
       by name;
    run;

Calculating Data Set Size

In single-threaded environments, you always need free disk space that equals three to four times the data set size. For example, if your data set takes up 1MB of disk space, you need 3 to 4MB of disk space to complete the sort.
In multi-threaded environments, if you use the OVERWRITE option in the PROC SORT statement, you need space equal to the data set size. If you do not specify the OVERWRITE option, the space that you need is equal to two times the data set size. For more information about the OVERWRITE option, see the “Sort Procedure” in the Base SAS Procedures Guide.
To estimate the amount of disk space that is needed for a SAS data set:
  1. create a dummy SAS data set that contains one observation and the variables that you need
  2. run the CONTENTS procedure using the dummy data set
  3. determine the data set size by performing simple math using information from the CONTENTS procedure output.
For example, for a data set that has one character variable and four numeric variables, you would submit the following statements:
data oranges;
  input variety $ flavor texture looks;
  total=flavor+texture+looks;
  datalines;
navel 9 8 6
;
proc contents data=oranges;
  title 'Example for Calculating Data Set Size';
run;
These statements generate the output shown in the following output:
Example for Calculating Data Set Size with PROC CONTENTS
                    Example for Calculating Data Set Size                    1
                                            19:39 Wednesday, February 12, 2003

                            The CONTENTS Procedure

Data Set Name        WORK.ORANGES                     Observations          1 
Member Type          DATA                             Variables             5 
Engine               V9                               Indexes               0 
Created              Wednesday, October               Observation Length    40
                     3, 2007 07:41:04                                         
Last Modified        Wednesday, October               Deleted Observations  0 
                     10, 2007 07:41:04                                        
Protection                                            Compressed            NO
Data Set Type                                         Sorted                NO
Label                                                                         
Data Representation  WINDOWS_32                                               
Encoding             wlatin1  Western (Windows)                               


                      Engine/Host Dependent Information

Data Set Page Size          4096                                              
Number of Data Set Pages    1                                                 
First Data Page             1                                                 
Max Obs per Page            101                                               
Obs in First Data Page      1                                                 
Number of Data Set Repairs  0                                                 
File Name                   C:\TEMP\SAS Temporary Files\_TD246\oranges.sas7bdat 
                          
Release Created             9.0201B0                                          
Host Created                XP_PRO                                            


                  Alphabetic List of Variables and Attributes
 
                         #    Variable    Type    Len

                         2    flavor      Num       8
                         4    looks       Num       8
                         3    texture     Num       8
                         5    total       Num       8
                         1    variety     Char      8
The size of the resulting data set depends on the data set page size and the number of observations. The following formula can be used to estimate the data set size:
  • number of data pages = 1 + (floor(number of obs / Max Obs per Page))
  • size = 1024 + (Data Set Page Size * number of data pages)
(floor represents a function that rounds the value down to the nearest integer.)
Taking the information shown in Example for Calculating Data Set Size with PROC CONTENTS , you can calculate the size of the example data set:
  • number of data pages = 1 + (floor(1/101))
  • size = 1024 + (4096 * 1) = 5120
Thus, the example data set uses 5,120 bytes of storage space.

Increasing the Efficiency of Interactive Processing

If you are running a SAS job using SAS interactively and the job generates numerous log messages or extensive output, consider using the AUTOSCROLL command to suppress the scrolling of windows. This action makes your job run faster because SAS does not have to use resources to update the display of the LOG and OUTPUT windows during the job. For example, issuing autoscroll 0 in the LOG window causes the LOG window not to scroll until your job is finished. (For the OUTPUT window, AUTOSCROLL is set to 0 by default.)
Minimizing the LOG window also might make your job run faster, especially if SAS is generating numerous log messages.