|
Optimizing Performance for the SAS System under OpenVMS
Optimizing Performance for the
SAS System under OpenVMS
for AXP Systems
Table Of Contents Introduction 1 Data Set I/O 2
Allocate Data Set Space Appropriately 2
Turn off Disk Volume Highwater Marking 3
Eliminate Disk Fragmentation 4
Decrease the Data Set Observation Length 5
Set Larger Buffer Size for Sequential Writes
and Reads 6
Use Asynchronous Read-Ahead when Processing
Data Sets Sequentially. 8
External I/O 9
Allocate File Space Appropriately 10
Turn off Disk Volume Highwater Marking 10
Eliminate Disk Fragmentation 12
System Startup 12
Bibliography 14
Optimizing Performance for the SAS System under OpenVMS for AXP Systems Introduction System speed and resource consumption are concerns of all software users. This document provides tuning tips for Release 6.09 of the SAS System under OpenVMS to increase performance and reduce resource consumption. Suggestions are grouped by SAS System function: data set I/O, external file I/O, and system startup. SAS System performance partially depends on the performance of the underlying OpenVMS system. This document does not discuss how to tune your OpenVMS system, but it does offer some suggestions, such as installing SAS images that can aid OpenVMS systemwide performance. Note: This document does not discuss OpenVMS hardware-related solutions to problems related to I/O. Tuning suggestions in this document were tested on an OpenVMS AXP workstation with 64M of memory and an attached SCSI disk drive. A combination of SAS STIMER data and information from the DCL SHOW STATUS command were used to collect data points for this study. Each performance optimization discussed includes the following information in tabular summary at the beginning of each section:
Use this information to help you decide which tuning options are suitable for your SAS System applications. Most tuning tips have an associated cost, which can often involve a tradeoff between resources. For example, reduced I/O may require more memory consumption or greater CPU time. Pay careful attention to the possible costs of each tuning tip. Some tips can cause performance degradation if misapplied. Data Set I/O The tuning information presented in this section applies to reading and writing SAS data sets. In general, the larger your data sets, the more potential performance gain for your entire SAS job. Performance gains described here were observed on data sets that were approximately 100,000 blocks in size. Allocate Data Set Space Appropriately Job type Jobs that write data sets User SAS programmer
Usage ALQ=x and DEQ=x as SAS libname or data set
options where x is a value representing
the number of blocks
Benefit You may see up to a 50% decrease in
elapsed time on write operations,
reflected in fewer direct I/Os. File
fragmentation is also reduced, thereby
helping performance when you read the data
set.
Cost Performance degradation when ALQ= or DEQ= values are incompatible with data set size. By default, the SAS System initially allocates 33 disk blocks for a data set. Each time the data set is extended, another 32 blocks must be located on the disk. OpenVMS maintains a bit map on each disk that identifies the blocks that are available for use. When a data set is written and then extended, OpenVMS alternates between scanning the bit map to locate free blocks, and actually writing the data set. However, if the data set were written with larger initial and extent allocations, writes to the data set would proceed uninterrupted for longer periods of time. At the hardware level, this means that disk activity is concentrated on the data set, and disk head seeks alternate between the bit map and the data set are minimized. The user sees fewer I/Os and faster elapsed time. Large initial and extent values can also reduce disk fragmentation. SAS data sets are written using the RMS algorithm "contiguous best try." With large preallocation, the space is reserved to the data set and does not become fragmented as it does when inappropriate ALQ= and DEQ= values are used. SAS Institute recommends setting ALQ= to the size of the data set to be written. If you are uncertain of the size, underestimate and use DEQ= for extents. Values as large as 5000 blocks are reasonable for DEQ=. Larger DEQ= values are not recommended. For assistance predicting data set size, refer to Appendix 3 of theSAS Companion for the VMS EnvironmentVersion 6, First Edition. The following is an example using the ALQ= and DEQ= options:
libname x '[]';
/* Know this is a big data set. */
data x.big (alq=100000 deq=5000);
length a b c d e f g h i j k l m n o p q r s t
u v w x y z $200;
do ii=1 to 13000;
output;
end;
run;
References Chapter 9, "Data Set Options," and
Appendix 3, "Estimating Data Set Size," in
SAS Companion for the VMS Environment,
Version 6, First Edition.
Guide to OpenVMS File Applications, AA-
PV6PA-TK
Turn off Disk Volume Highwater Marking Job type Any SAS application that writes data sets. Data set size is not important. User System manager
Usage Use the /NOHIGHWATER_MARKING qualifier
when initializing disks. For active
disks, issue the DCL command SET
VOLUME/NOHIGHWATER_MARKING.
Benefit You will see a greater percentage gain for
jobs that are write intensive. The savings
in elapsed time can be as great as 40%.
Direct I/Os are reduced.
Cost There is no performance penalty. However,
some OpenVMS sites may require that this
OpenVMS highwater marking feature be
running for security.
Highwater marking is an OpenVMS security feature that is enabled by default. It forces pre-zeroing of disk blocks for files that are opened for random access. All SAS data sets are random access files, and pay the performance penalty of pre-zeroing, increased I/Os, and increased elapsed time. Two DCL commands can be used independently to disable highwatering on a disk. When initializing a new volume, use the nohighwater qualifier to inhibit the highwater function as in the following example: $ initialize/nohighwater AXP$DKA470 mydisk To disable volume highwatering on an active disk, use a command similar to the following: $ set volume/nohighwater AXP$DKA200
References OpenVMS System Manager's Manual:
Tuning, Monitoring, and Complex Systems,
AA-PV5NA-TK
OpenVMS DCL Dictionary A-M, AA-PV5KA-TK
Eliminate Disk Fragmentation Job type Any jobs that frequently access common data sets User SAS programmer and system manager
Usage Devote a disk to frequently accessed data
sets, or keep your disks defragmented.
Benefit The savings in elapsed time varies with
the current state of the disk, but can
exceed 50% on writes and 25% on reads.
Cost The cost to the user is the time and
effort to better manage disk access. For
the system manager, it can involve
regularly defragmenting disks or obtaining
additional disk drives.
Any software that reads and writes from disk benefits from a well managed disk. This applies to SAS data sets. On an unfragmented disk , files are kept contiguous, so after one I/O operation the disk head is well positioned for the next I/O operation. A disk drive that is frequently defragmented can provide performance benefits. Use this disk to store commonly accessed SAS data sets. In some situations, adding an inexpensive SCSI drive to the configuration allows the system manager to maintain a clean, unfragmented environment more easily than using a large disk farm. Data sets maintained on this unfragmented SCSI disk may perform better than heavily fragmented data sets on larger disks. Decrease the Data Set Observation Length Job type SAS procedures and data steps that read or write large data sets User SAS programmer
Usage Use the DROP= or KEEP= data set options
to keep temporary variables from being
written to the data set. Use the
COMPRESS= option to compact repeated data.
Benefit By decreasing the length of the SAS
observation, less data is read or written
to disk, reducing I/Os and elapsed time.
Gains are data dependent.
Cost Provided you do not need the variables
later, dropping them has no drawbacks.
Data set compression increases CPU time.
However, it hurts performance when the
data are not suitable for compression.
Compression can cause the data set to
actually grow larger on disk, increasing
both I/Os and elapsed time.
DROP= and KEEP= exclude unwanted variables from data set observations, resulting in smaller data sets. This improves performance when reading and writing the data set. The following example uses the DROP= and KEEP= options:
data a (drop=scratch);
length char1 $80 char2 $80 scratch $160;
input char1;
input char2;
scratch = char1 || char2;
if (substr(scratch, 75, 10) ^= 'throw away')
then output;
cards;
Data set compression is valuable when you have repeating numeric or character data. For example, if zero is the prevalent numeric value, or if the character data contains many blanks, compression is helpful. Compression is only supported for the Base data set engine in Release 6.09 of the SAS System. There are some restrictions on using compressed data sets. Refer to your SAS System documentation for details. The following is an example data step using the compression option:
/* Input sparse matrix, mostly zeroes. */
data b (compress=yes);
input a1-a10000;
input b1-b10000;
cards;
References Chapter 15, "SAS Data Set Options" in
SAS Language Reference, Version 6, First
Edition.
Chapter 4, "Read and Write Data
Selectively," and Chapter 7, "Know SAS
System Defaults" in SAS Programming Tips:
A Guide to Efficient SAS Processing.
Set Larger Buffer Size for Sequential Writes and Reads Job type SAS steps that do sequential I/O operations on large data sets User SAS programmer
Usage The host CACHESIZ= option controls the
buffering of data set pages during I/O
operations. CACHESIZ= can appear as a
data set option, or on a LIBNAME statement
that uses the Base engine. The SAS
BUFSIZE= option sets the data set page
size when the data set is created. The
BUFSIZE= option can be used in the same
manner as the CACHESIZ= option, or as a
SAS system option.
Benefit You may see as much as a 30% decrease in
elapsed time in some steps when an
appropriate value is chosen for a
particular data set.
Cost If the data set observation size is large,
there can be substantial wasted space in
the data set if you do not choose an
appropriate BUFSIZE=. Also, memory is
consumed for the data cache, and a
separate cache is used for each data set
opened.
BUFSIZE= sets the SAS internal page size for the data set. Once set, this becomes a permanent attribute of the file that cannot be changed. This option is only meaningful when creating the data set. If the user does not specify a BUFSIZE= option, SAS selects a value that contains as many observations as possible with the least amount of wasted space. Note that an observation cannot span page boundaries. Therefore there can be unused space at the end of a page unless the observations pack evenly into the page. By default, the SAS System tries to choose a page size between 8192 and 32768 if an explicit BUFSIZE= option has not been specified. By increasing the BUFSIZE= value, more observations can be stored on a page, and the same amount of data can be accessed with fewer I/Os. When explicitly choosing a BUFSIZE, be sure to choose a value that does not waste space in a data set page, resulting in wasted disk space. The highest recommended value for BUFSIZE= is 65024. The SAS System maintains a cache that is used to buffer multiple data set pages in memory. This allows multiple pages to be read or written in a single operation, reducing I/Os. A separate cache is maintained for each data set opened. The CACHESIZ= option specifies the size of this cache. The CACHESIZ value is a temporary attribute that only applies to the current open data set. You can use a different CACHESIZ at different times when accessing the same file. In order to conserve memory, a maximum of 32768 bytes is allocated for the cache by default. The default allows as many pages as can be completely contained in the 32768 byte cache to be buffered and accessed with a single I/O. However, a user can specify a CACHESIZ= value up to 65024 bytes, the largest amount that can be accessed in a single I/O on a VMS system. The following is an example of an efficiently written large data set, using the BUFSIZE= option: Note: BUFSIZE=63488 is now a permanent attribute.
libname buf '[]';
data buf.big (bufsize=63488);
length a b c d e f g h i j k l m n o p q r s t
u v w x y z $200;
do ii=1 to 13000;
output;
end;
run;
The following is an example of an efficiently written large data set, using the CACHESIZ= option: Note: CACHESIZ= value is not permanent.
libname cache '[]';
data cache.big (cachesiz=65024);
length a b c d e f g h i j k l m n o p q r s t
u v w x y z $200;
do ii=1 to 13000;
output;
end;
run;
References "Accessing Data Libraries Using
Engines" in Chapter 5, "Using SAS Files"
in SAS Companion for the VMS Environment,
Version 6, First Edition.
Use Asynchronous Read-Ahead when Processing Data Sets Sequentially Job type PROC and data steps that read the data set sequentially User SAS programmer
Usage The base engine RAH= option is a data set
only option. It accepts three values
q YES (turn asynchronous read-ahead
on)
q NO (default; do not perform
asynchronous read- aheads)
q LOG (turn option on and write
statistics to the LOG when the data set
is closed).
This option works in conjunction with the
CACHESIZ= option. Asynchronous read-ahead
is enabled only if caching is turned on.
(See previous section discussing the
CACHESIZ= option.)
Benefit You may see a 10% to 20% run-time
improvement over simple caching alone, but
this will vary greatly depending on the
data set geometry and step implementation.
The RAH= option is designed to improve
sequential data set read operations. You
will see the greatest run-time gains when
the data set is large and the page size is
small, or the amount of processing done
per observation is significant.
Cost Since the CACHESIZ= option must be in
effect for the RAH= to function, memory
consumption is increased as described by
the CACHESIZ= option documentation. Also,
an additional caching buffer is allocated
for the data set.
The Read-Ahead option (RAH=) causes the SAS System to "prefetch" data set observations asynchronously. The option assists the user in tuning sequential data set processing. This is an experimental data set option with Release 6.09 of the SAS System. A RAH= value of YES or LOG is only effective when
The following example demonstrates the usage of the RAH= option:
data b;
set a (rah=log);
run;
proc compare data=b (rah=yes)
compare=c (rah=yes);
run;
Be aware that some steps show a decrease in run-time performance, most notably data steps with WHERE clause statements. It is advisable to closely monitor RAH= option processing with the RAH=LOG setting.
References "Accessing Data Libraries Using
Engines" in Chapter 5, "Using SAS Files"
in theSAS Companion for the VMS
Environment, Version 6.
External I/O The following guidelines apply to reading and writing OpenVMS native files using the SAS System. For several of the suggestions, the larger your files, the more performance gain for your entire SAS job. These suggestions parallel several of the data set I/O suggestions. Allocate File Space Appropriately Job type SAS procedures and data steps that write external files User SAS programmer
Usage ALQ= and DEQ= as part of the FILENAME or
FILE statement
Benefit Specifying appropriate values can decrease
elapsed time up to 50%, as well as
reducing disk fragmentation.
Cost Performance degradation when ALQ= and
DEQ= values are incompatible with file
size.
The SAS System allocates disk space for external files based on the value of ALQ=. It then extends the file if needed based on the value of DEQ=. By default these values are 33 blocks for ALQ= and 32 blocks for DEQ=. Every time a file has to be extended, the system has to search the disk for free space. This requires I/Os. When this is done repeatedly for large files it will degrade performance. By setting larger ALQ= and DEQ= values for large files this overhead will be reduced. Optimal I/O will occur when ALQ= is equal to the size of the file. Since this is not always feasible, it is better to underestimate the value for ALQ= and set a larger DEQ= value. This will allocate enough space for a smaller file, while extending it occasionally to meet the demands of a larger file. Allocating too much space may be costly if /HIGHWATER_MARKING is set on the disk. ( See the note under /HIGHWATER_MARKING later in this section.)
References Chapter 6, "Using External Files" in
SAS Companion for the VMS Environment,
Version 6, First Edition.
Guide to OpenVMS File Applications, AA-
PV6PA-TK
Turn off Disk Volume Highwater Marking Job type Jobs that write external files User System manager
Usage Use the /NOHIGHWATER_MARKING qualifier
when initializing disks. For active
disks, issue the DCL command SET
VOLUME/NOHIGHWATER_MARKING.
Benefits Elapse time can be improved by up to 40%. Direct I/Os are reduced.
Cost There is no performance penalty. However,
some OpenVMS sites may require this
feature for security purposes.
The SAS System uses the random access method when opening external files. This means that allocated disk space does not have to be processed in a sequential method. /HIGHWATER_MARK is a safeguard that clears disk space before it is allocated to remove residue of former files. To do this the entire space allocated has to be overwritten. Overwriting the space costs some elapsed time and I/Os. If the data stored on the disk are not of a truly confidential nature, then a performance gain can be achieved by disabling highwater marking on this disk. Two DCL commands can be used independently to disable highwater marking on a disk. When initializing a new volume, use the following to inhibit the highwater function: $ initialize/nohighwater AXP$DKA470 mydisk To disable volume highwater marking on an active disk, use a command similar to the following: $ set volume/nohighwater AXP$DKA200
References OpenVMS System Manager's Manual:
Tuning, Monitoring, and Complex Systems,
AA-PV5NA-TK
Eliminate Disk Fragmentation Job type Any jobs that access common external files frequently User System manager
Usage Devote a disk to often accessed files or
keep your disks defragmented.
Benefit The savings on elapsed time depend on the
current state of the disk but can be
reduced up to 40%.
Cost The cost to the user is the time and
effort to better manage disk access rather
than letting OpenVMS do all of the work.
For the system manager, this may involve
regularly defragmenting disks or obtaining
additional disk drives.
On an unfragmented disk, files are contiguous, so after one I/O operation the disk head is well positioned for the next I/O operation. Also, split I/Os are rare, which decreases elapsed time to perform I/O. Where possible, dedicating a disk drive that can be frequently defragmented can provide performance benefits. Use this disk to store commonly accessed SAS external files. In some situations, adding an inexpensive SCSI drive to the configuration may allow the system manager to maintain a clean, unfragmented environment more easily than maintaining a large disk farm. Files maintained on this unfragmented SCSI disk may perform better than heavily fragmented files on larger disks. System Startup Job type All jobs User System manager
Usage The OpenVMS Install facility can be used
to make core SAS images memory resident.
Benefit Elapsed time of system startup may
decrease by as much as 30% when running
the SAS System under X-Windows, and by 40%
in other modes.
Cost With Release 6.09 of the SAS System,
installing all images previously listed in
the sample commands consumes approximately
5500 global pages and 9 global sections.
Installing SAS System images can decrease the elapsed time of SAS System startup by up to 40%. Installing images is most effective on systems where two or more users are using SAS simultaneously. Use the following commands to install the core set of SAS images:
$ install == "$sys$system:install/command"
$ install add/open/header/share SAS$IMAGE
$ install add/open/header/share
SAS$LIBRARY:SASSHR.EXE
$ install add/open/header/share
SAS$LIBRARY:SABXSPH.EXE
$ install add/open/header/share
SAS$LIBRARY:SASBUNDL.EXE
$ install add/open/header/share
SAS$LIBRARY:SASIMPTR.EXE
$ install add/open/header/share
SAS$LIBRARY:SASMSG.EXE
When using the terminal-based full screen driver, include the following:
$ install add/open/header/share
SAS$LIBRARY:SABVU.EXE
When using Motif (X-windows), include the following:
$ install add/open/header/share
SAS$LIBRARY:SABMOTIF.EXE
References OpenVMS System Manager's Manual:
Tuning, Monitoring, and Complex Systems,
AA-PV5NA-TK
Installation Instructions and System
Manager's Guide, Release 6.09 of the SAS
System under OpenVMS for AXP Systems.
Bibliography Open VMS System Configuration: Installation Instructions and System Manager's Guide, Release 6.09 of the SAS System under OpenVMS for AXP Systems. SAS Language Reference, Version 6, First Edition, Chapter 15, "SAS Data Set Options." SAS options related to SAS images:
SAS Companion for the VMS Environment, Version 6,
First Edition, Chapter 12,
Options and related topics for SAS files and
external files:
SAS Companion for the VMS Environment, Version 6,
First Edition, Chapter 8 Host Specific Options and related topics:
SAS Companion for the VMS Environment, Version 6,
First Edition, Chapter 5, Tips and Techniques for Using I/O Subsystems in Version 6 of the SAS System under VMS, SUGI '90. Installation Instructions and System Manager's Guide, Release 6.09 of the SAS System under OpenVMS for AXP Systems. SAS Catalog options and related topics: SAS Technical Report P-220 Changes and Enhancements to the SAS System for the VMS Environment, Release 6.07. Appendix 2, "Additional Changes and Enhancements." SAS Programming: A Programmer's Guide to Sorting in the VMS Environment. SAS Companion for the VMS Environment, Version 6, First Edition, Chapter 12, "Optimizing System Performance." Tips and Techniques for Using I/O Subsystems in Version 6 of the SAS System under VMS, SUGI '90. Introduction to Efficient Programming Techniques, SUGI '91. Installation Instructions and System Manager's Guide, Release 6.09 of the SAS System under OpenVMS for AXP Systems. SAS Programming Tips: A Guide To Efficient SAS Processing, Chapter 4, "Read and Write Data Selectively;" Chapter 7, "Know SAS System Defaults." Other OpenVMS Documentation: Guide to OpenVMS File Applications, AA-PV6PA-TK OpenVMS System Manager's Manual: Tuning, Monitoring, and Complex Systems, AA-PV5NA-TK OpenVMS DCL Dictionary, A-M, AA-PV5KA-TK |