|
|
Use the improved SAS®9 sort
algorithm
|
SAS®9 includes a rewritten
SORT algorithm that incorporates threading and data latency reduction
algorithms. The SAS®9 sort uses multiple threads and outperforms
a SAS 8 sort in almost all circumstances.
|
|
Perform the following steps:
-
-
Drop unnecessary columns.
-
|
Direct sort utility files to fast
storage devices
|
Use the WORK invocation option,
the UTILLOC invocation option, or both options to direct SORT procedure
utility files to fast, less-utilized storage devices. Some procedure
utility files are accessed heavily, and separating them from other
active files might improve performance.
|
Distribute sort utility files
across multiple devices
|
Distribute SORT procedure utility
files across multiple fast, less-utilized devices. Direct the SORT
procedure utility file of each job to a different device. Use the
WORK invocation option, the UTILLOC invocation option, or both options.
|
Pre-sort explicitly on the most
common sort key
|
SAS Data Integration Studio might
arrange a table in sort order, one or multiple times. For large tables
in which sort order is required multiple times, look for a common
sort order. Use the MSGLEVEL=I option to expose information that is
in the SAS log to determine where sorts occur.
|
Change the default SORTSIZE value
|
For large tables, set SORTSIZE
to 256 MB or 512 MB. For extremely large tables (a billion or more
wide rows), set SORTSIZE to 1 GB or higher. Tune these recommended
values further based on empirical testing or based on in-depth knowledge
of your hardware and operating system.
|
Change the default MEMSIZE value
|
Set MEMSIZE at least 50% larger
than SORTSIZE.
|
Set the NOSORTEQUALS system option
|
In an ETL process flow, maintaining
relative row order is rarely a requirement. If maintaining the relative
order of rows with identical key values is not important, set the
system option NOSORTEQUALS to save resources.
|
Set the UBUFNO option to the maximum
of 20
|
The UBUFNO option specifies the
number of utility I/O buffers. In some cases, maximizing UBUFNO increases
sort performance up to 10%. Increasing UBUFNO has no negative ramifications.
|
Use the TAGSORT option for nearly
sorted data
|
TAGSORT is an alternative SAS 8
sort algorithm that is useful for data that is almost in sort order.
The option is most effective when the sort-key width is no more than
5 percent of the total uncompressed column width. Using the TAGSORT
option on a large unsorted data set results in extremely long sort
times compared to a SAS®9 sort that uses multiple threads.
|
Use relational database sort engines
to pre-sort tables without data order issues
|
Pre-sorting in relational databases
might outperform sorting that is based on SAS. Use options of the
SAS Data Integration Studio Extract transformation to generate an
ORDER BY clause in the SAS SQL. The ORDER BY clause asks the relational
database to return the rows in that particular sorted order.
|
Determine disk space requirements
to complete a sort
|
Size the following sort data components:
-
-
SORT procedure utility file
-
|
|
Because sorting is so I/O intensive,
it is important to start with only the rows and columns that are needed
for the sort. The SORT procedure WORK files and the output file are
dependent on the input file size.
|
Size SORT procedure utility files
|
Consider a number of factors to
size the SORT procedure utility files:
-
sizing information of the input
data
-
any pad bytes added to character
columns
-
any pad bytes added to short numeric
columns
-
pad bytes that align each row by
8 bytes (for SAS data sets)
-
8 bytes per row overhead for EQUALS
processing
-
per-page unused space in the SORT
procedure utility files
-
multi-pass merge: doubling of SORT
procedure utility files (or sort failure)
|
|
To size the output data, apply
the sizing rules of the destination data store to the columns that
are produced by the sort.
|