Performance Considerations under Windows |
This section presents some advanced performance topics, such as improving the performance of the SORT procedure and calculating data set size. Use these methods only if you are an experienced SAS user and you are familiar with the way SAS is configured on your machine.
Improving Performance of the SORT Procedure |
Two options for the PROC SORT statement are available under Windows, the SORTSIZE= and TAGSORT options. These two options control the amount of memory the SORT procedure uses during a sort and are discussed in the next two sections. Also included is a discussion of determining where the sorting process occurs for a given data set and determining how much disk space you need for the sort. For more information about the SORT procedure, see SORT Procedure: Windows.
The PROC SORT statement supports the SORTSIZE= option, which limits the amount of memory available for PROC SORT to use.
If you do not use the SORTSIZE option in the PROC SORT statement, PROC SORT uses the value of the SORTSIZE system option. If the SORTSIZE system option is not set, PROC SORT uses the amount of memory specified by the REALMEMSIZE system option. If PROC SORT needs more memory than you specify, it creates a temporary utility file in your SAS Work directory to complete the sort.
The default value of this option is 64 megabytes (MB).
The TAGSORT option is useful in single-threaded situations where there might not be enough disk space to sort a large SAS data set. The TAGSORT option is not supported for multi-threaded sorts.
When you specify the TAGSORT option, only sort keys (that is, the variables specified in the BY statement) and the observation number for each observation are stored in the temporary files. The sort keys, together with the observation number, are referred to as tags. At the completion of the sorting process, the tags are used to retrieve the records from the input data set in sorted order. Thus, in cases where the total number of bytes of the sort keys is small compared with the length of the record, temporary disk use is reduced considerably. However, you should have enough disk space to hold another copy of the data (the output data set) or two copies of the tags, whichever is greater. Note that although using the TAGSORT option can reduce temporary disk use, the processing time might be much higher.
Where the physical sort occurs for a given data set depends on how you reference the data set name and whether you use the OUT= option in the PROC SORT statement. You might want to know where the sort occurs if you think there might not be enough disk space available for the sort.
When you sort a SAS data set, SAS creates a temporary utility file. If the sort uses multiple threads, you can specify the location of the utility file by using the UTILLOC system option. The default location for utility files is the Work data library. If two or more locations are specified for the UTILLOC option, the second location is used as the location for the utility file. For sorts that use a single thread, the temporary utility file is opened in the Work data library if there is not enough memory to hold the data set during the sort. The utility file has a .sas7butl file extension. Before you sort, ensure that your Work data library has room for this temporary utility file.
If you specify the OVERWRITE option in the PROC SORT statement, SAS replaces the input data set with the sorted data set.
If you do not specify the OVERWRITE option in the PROC SORT statement, a second file that has a .sas7butl file extension is created. If the sort completes successfully, this file is renamed to the data set name of the file being sorted (with a .sas7bdat file extension). The original data set is deleted after the sort is complete. Before you sort a data set, be sure that you have space for this .sas7butl file.
Use the following rules to determine where the .sas7butl file and the resulting sorted data set are created:
If you omit the OUT= option in the PROC SORT statement, the data set is sorted on the drive and in the directory or subdirectory where it is located. For example, if you submit the following statements (note the two-level data set name), the .sas7butl file is created on the C: drive in the MYDATA subdirectory:
libname mylib 'c:\sas\mydata'; proc sort data=mylib.report; by name; run;
Similarly, if you specify a one-level data set name, the .sas7butl file is created in your Work data library.
If you use the OUT= option in the PROC SORT statement, the .sas7butl file is created in the directory associated with the libref used in the OUT= option. If you use a one-level name (that is, no libref), the .sas7butl file is created in the Work data library. For example, in the following SAS program, the first sort occurs in the SAS Work subdirectory, while the second occurs on the F: drive in the JANDATA directory:
proc sort data=report out=newrpt; by name; run; libname january 'f:\jandata'; proc sort data=report out=january.newrpt; by name; run;
Calculating Data Set Size |
In single-threaded environments, you always need free disk space that equals three to four times the data set size. For example, if your data set takes up 1MB of disk space, you need 3 to 4MB of disk space to complete the sort.
In multi-threaded environments, if you use the OVERWRITE option in the PROC SORT statement, you need space equal to the data set size. If you do not specify the OVERWRITE option, the space you need is equal to two times the data set size. For more information about the OVERWRITE option, see the SORT procedure in Base SAS Procedures Guide.
To estimate the amount of disk space that is needed for a SAS data set:
create a dummy SAS data set that contains one observation and the variables you need
determine the data set size by performing simple math using information from the CONTENTS procedure output.
For example, for a data set that has one character variable and four numeric variables, you would submit the following statements:
data oranges; input variety $ flavor texture looks; total=flavor+texture+looks; datalines; navel 9 8 6 ; proc contents data=oranges; title 'Example for Calculating Data Set Size'; run;
These statements generate the output shown in the following output:
Example for Calculating Data Set Size with PROC CONTENTS
Example for Calculating Data Set Size 1 19:39 Wednesday, February 12, 2003 The CONTENTS Procedure Data Set Name WORK.ORANGES Observations 1 Member Type DATA Variables 5 Engine V9 Indexes 0 Created Wednesday, October Observation Length 40 3, 2007 07:41:04 Last Modified Wednesday, October Deleted Observations 0 10, 2007 07:41:04 Protection Compressed NO Data Set Type Sorted NO Label Data Representation WINDOWS_32 Encoding wlatin1 Western (Windows) Engine/Host Dependent Information Data Set Page Size 4096 Number of Data Set Pages 1 First Data Page 1 Max Obs per Page 101 Obs in First Data Page 1 Number of Data Set Repairs 0 File Name C:\TEMP\SAS Temporary Files\_TD246\oranges.sas7bdat Release Created 9.0201B0 Host Created XP_PRO Alphabetic List of Variables and Attributes # Variable Type Len 2 flavor Num 8 4 looks Num 8 3 texture Num 8 5 total Num 8 1 variety Char 8
The size of the resulting data set depends on the data set page size and the number of observations. The following formula can be used to estimate the data set size:
number of data pages = 1 + (floor(number of obs / Max Obs per Page )) | |
Taking the information shown in Example for Calculating Data Set Size with PROC CONTENTS, you can calculate the size of the example data set:
Thus, the example data set uses 5,120 bytes of storage space.
Increasing the Efficiency of Interactive Processing |
If you are running a SAS job using SAS interactively and the job generates numerous log messages or extensive output, consider using the AUTOSCROLL command to suppress the scrolling of windows. This action makes your job run faster because SAS does not have to use resources to update the display of the LOG and OUTPUT windows during the job. For example, issuing autoscroll 0 in the LOG window causes the LOG window not to scroll until your job is finished. (For the OUTPUT window, AUTOSCROLL is set to 0 by default.)
Minimizing the LOG window also might make your job run faster, especially if SAS is generating numerous log messages.
Copyright © 2010 by SAS Institute Inc., Cary, NC, USA. All rights reserved.