SAS Institute. The Power to Know

Knowledge Base


TS-298

Optimizing Performance for the SAS System under OpenVMS

                Optimizing Performance for the
                   SAS System under OpenVMS
                       for AXP Systems

Table Of Contents

Introduction 1

Data Set I/O 2

     Allocate Data Set Space Appropriately       2
     Turn off Disk Volume Highwater Marking      3
     Eliminate Disk Fragmentation                4
     Decrease the Data Set Observation Length    5
     Set Larger Buffer Size for Sequential Writes
     and Reads                                   6
     Use Asynchronous Read-Ahead when Processing
     Data Sets Sequentially.                     8

External I/O                                     9

     Allocate File Space Appropriately          10
     Turn off Disk Volume Highwater Marking     10
     Eliminate Disk Fragmentation               12

System Startup                                  12

Bibliography                                    14

Optimizing Performance for the SAS System under OpenVMS for AXP Systems

Introduction

System speed and resource consumption are concerns of all software users. This document provides tuning tips for Release 6.09 of the SAS System under OpenVMS to increase performance and reduce resource consumption. Suggestions are grouped by SAS System function: data set I/O, external file I/O, and system startup.

SAS System performance partially depends on the performance of the underlying OpenVMS system. This document does not discuss how to tune your OpenVMS system, but it does offer some suggestions, such as installing SAS images that can aid OpenVMS systemwide performance.

Note: This document does not discuss OpenVMS hardware-related solutions to problems related to I/O.

Tuning suggestions in this document were tested on an OpenVMS AXP workstation with 64M of memory and an attached SCSI disk drive. A combination of SAS STIMER data and information from the DCL SHOW STATUS command were used to collect data points for this study.

Each performance optimization discussed includes the following information in tabular summary at the beginning of each section:

  • the type of SAS application affected (Job type)
  • who can implement the suggestion (User)
  • a usage example (Usage)
  • potential benefits (Benefit)
  • costs (Cost)
  • references for additional information (References)

Use this information to help you decide which tuning options are suitable for your SAS System applications.

Most tuning tips have an associated cost, which can often involve a tradeoff between resources. For example, reduced I/O may require more memory consumption or greater CPU time. Pay careful attention to the possible costs of each tuning tip. Some tips can cause performance degradation if misapplied.

Data Set I/O

The tuning information presented in this section applies to reading and writing SAS data sets. In general, the larger your data sets, the more potential performance gain for your entire SAS job. Performance gains described here were observed on data sets that were approximately 100,000 blocks in size.

Allocate Data Set Space Appropriately

Job type Jobs that write data sets

User SAS programmer

Usage     ALQ=x and DEQ=x as SAS libname or data set
          options where x is a  value representing
          the number of blocks

Benefit You may see up to a 50% decrease in

          elapsed time on write operations,
          reflected in fewer direct I/Os. File
          fragmentation is also reduced, thereby
          helping performance when you read the data
          set.

Cost Performance degradation when ALQ= or DEQ=

values are incompatible with data set size.

By default, the SAS System initially allocates 33 disk blocks for a data set. Each time the data set is extended, another 32 blocks must be located on the disk. OpenVMS maintains a bit map on each disk that identifies the blocks that are available for use. When a data set is written and then extended, OpenVMS alternates between scanning the bit map to locate free blocks, and actually writing the data set. However, if the data set were written with larger initial and extent allocations, writes to the data set would proceed uninterrupted for longer periods of time. At the hardware level, this means that disk activity is concentrated on the data set, and disk head seeks alternate between the bit map and the data set are minimized. The user sees fewer I/Os and faster elapsed time.

Large initial and extent values can also reduce disk fragmentation. SAS data sets are written using the RMS algorithm "contiguous best try." With large preallocation, the space is reserved to the data set and does not become fragmented as it does when inappropriate ALQ= and DEQ= values are used.

SAS Institute recommends setting ALQ= to the size of the data set to be written. If you are uncertain of the size, underestimate and use DEQ= for extents. Values as large as 5000 blocks are reasonable for DEQ=. Larger DEQ= values are not recommended. For assistance predicting data set size, refer to Appendix 3 of theSAS Companion for the VMS EnvironmentVersion 6, First Edition.

The following is an example using the ALQ= and DEQ= options:

     libname x '[]';
     /* Know this is a big data set. */
     data x.big (alq=100000 deq=5000);
     length a b c d e f g h i j k l m n o p q r s t
     u v w x y z $200;
     do ii=1 to 13000;
     output;
     end;
     run;

References Chapter 9, "Data Set Options," and

         Appendix 3, "Estimating Data Set Size,"  in
         SAS Companion for the VMS Environment,
         Version 6, First Edition.

         Guide to OpenVMS File Applications, AA-
         PV6PA-TK

Turn off Disk Volume Highwater Marking

Job type Any SAS application that writes data sets.

Data set size is not important.

User System manager

Usage     Use the /NOHIGHWATER_MARKING qualifier
          when initializing disks.  For active
          disks, issue the DCL command SET
          VOLUME/NOHIGHWATER_MARKING.

Benefit You will see a greater percentage gain for

         jobs that are write intensive.  The savings
         in elapsed time can be as great as 40%.
         Direct I/Os are reduced.

Cost      There is no performance penalty.  However,
          some OpenVMS sites may require that this
          OpenVMS highwater marking feature be
          running for security.

Highwater marking is an OpenVMS security feature that is enabled by default. It forces pre-zeroing of disk blocks for files that are opened for random access. All SAS data sets are random access files, and pay the performance penalty of pre-zeroing, increased I/Os, and increased elapsed time.

Two DCL commands can be used independently to disable highwatering on a disk. When initializing a new volume, use the nohighwater qualifier to inhibit the highwater function as in the following example:

$ initialize/nohighwater AXP$DKA470 mydisk

To disable volume highwatering on an active disk, use a command similar to the following:

$ set volume/nohighwater AXP$DKA200

References     OpenVMS System Manager's Manual:
          Tuning, Monitoring, and Complex Systems,
          AA-PV5NA-TK

          OpenVMS DCL Dictionary A-M, AA-PV5KA-TK

Eliminate Disk Fragmentation

Job type Any jobs that frequently access common data sets

User SAS programmer and system manager

Usage     Devote a disk to frequently accessed data
          sets, or keep your disks defragmented.

Benefit The savings in elapsed time varies with

          the current state of  the disk, but can
          exceed 50% on writes and 25% on reads.

Cost      The cost to the user is the time and
          effort to better  manage disk access.  For
          the system manager, it can involve
          regularly defragmenting disks or obtaining
          additional disk drives.

Any software that reads and writes from disk benefits from a well managed disk. This applies to SAS data sets. On an unfragmented disk , files are kept contiguous, so after one I/O operation the disk head is well positioned for the next I/O operation.

A disk drive that is frequently defragmented can provide performance benefits. Use this disk to store commonly accessed SAS data sets. In some situations, adding an inexpensive SCSI drive to the configuration allows the system manager to maintain a clean, unfragmented environment more easily than using a large disk farm. Data sets maintained on this unfragmented SCSI disk may perform better than heavily fragmented data sets on larger disks.

Decrease the Data Set Observation Length

Job type SAS procedures and data steps that read or

write large data sets

User SAS programmer

Usage     Use the  DROP= or KEEP=  data set options
          to keep temporary variables from being
          written to the data set.  Use the
          COMPRESS= option to compact repeated data.

Benefit By decreasing the length of the SAS

          observation, less data is read or written
          to disk, reducing I/Os and elapsed time.
          Gains are data dependent.

Cost      Provided you do not need the variables
          later, dropping them has no drawbacks.
          Data set compression increases CPU time.
          However, it hurts performance when the
          data are not suitable for compression.
          Compression can cause the data set to
          actually grow larger on disk, increasing
          both I/Os and elapsed time.

DROP= and KEEP= exclude unwanted variables from data set observations, resulting in smaller data sets. This improves performance when reading and writing the data set.

The following example uses the DROP= and KEEP= options:

     data a (drop=scratch);
     length char1 $80 char2 $80 scratch $160;
     input char1;
     input char2;
     scratch = char1 || char2;
     if (substr(scratch, 75, 10) ^= 'throw away')
     then output;
     cards;

Data set compression is valuable when you have repeating numeric or character data. For example, if zero is the prevalent numeric value, or if the character data contains many blanks, compression is helpful. Compression is only supported for the Base data set engine in Release 6.09 of the SAS System. There are some restrictions on using compressed data sets. Refer to your SAS System documentation for details.

The following is an example data step using the compression option:

     /* Input sparse matrix, mostly zeroes. */
     data b (compress=yes);
     input a1-a10000;
     input b1-b10000;
     cards;

References     Chapter 15, "SAS Data Set Options" in
          SAS Language Reference, Version 6, First
          Edition.

          Chapter 4, "Read and Write Data
          Selectively," and Chapter 7, "Know SAS
          System Defaults" in SAS Programming Tips:
          A Guide to Efficient SAS Processing.

Set Larger Buffer Size for Sequential Writes and Reads

Job type SAS steps that do sequential I/O operations on large data sets

User SAS programmer

Usage     The host  CACHESIZ=  option controls the
          buffering of data set pages during I/O
          operations.  CACHESIZ= can appear as a
          data set option, or on a LIBNAME statement
          that uses the Base engine.  The SAS
          BUFSIZE= option sets the data set page
          size when the data set is created.  The
          BUFSIZE=  option can be used in the same
          manner as the  CACHESIZ=  option, or as a
          SAS system option.

Benefit You may see as much as a 30% decrease in

          elapsed time in some steps when an
          appropriate value is chosen for a
          particular data set.

Cost      If the data set observation size is large,
          there can be substantial wasted space in
          the data set if you do not choose  an
          appropriate BUFSIZE=.  Also, memory is
          consumed for the data cache, and a
          separate cache is used for each data set
          opened.

BUFSIZE= sets the SAS internal page size for the data set. Once set, this becomes a permanent attribute of the file that cannot be changed. This option is only meaningful when creating the data set. If the user does not specify a BUFSIZE= option, SAS selects a value that contains as many observations as possible with the least amount of wasted space. Note that an observation cannot span page boundaries. Therefore there can be unused space at the end of a page unless the observations pack evenly into the page. By default, the SAS System tries to choose a page size between 8192 and 32768 if an explicit BUFSIZE= option has not been specified. By increasing the BUFSIZE= value, more observations can be stored on a page, and the same amount of data can be accessed with fewer I/Os. When explicitly choosing a BUFSIZE, be sure to choose a value that does not waste space in a data set page, resulting in wasted disk space. The highest recommended value for BUFSIZE= is 65024.

The SAS System maintains a cache that is used to buffer multiple data set pages in memory. This allows multiple pages to be read or written in a single operation, reducing I/Os. A separate cache is maintained for each data set opened. The CACHESIZ= option specifies the size of this cache. The CACHESIZ value is a temporary attribute that only applies to the current open data set. You can use a different CACHESIZ at different times when accessing the same file. In order to conserve memory, a maximum of 32768 bytes is allocated for the cache by default. The default allows as many pages as can be completely contained in the 32768 byte cache to be buffered and accessed with a single I/O. However, a user can specify a CACHESIZ= value up to 65024 bytes, the largest amount that can be accessed in a single I/O on a VMS system.

The following is an example of an efficiently written large data set, using the BUFSIZE= option:

Note: BUFSIZE=63488 is now a permanent attribute.

     libname buf '[]';
     data buf.big (bufsize=63488);
     length a b c d e f g h i j k l m n o p q r s t
     u v w x y z $200;
     do ii=1 to 13000;
     output;
     end;
     run;

The following is an example of an efficiently written large data set, using the CACHESIZ= option:

Note: CACHESIZ= value is not permanent.

     libname cache '[]';
     data cache.big (cachesiz=65024);
     length a b c d e f g h i j k l m n o p q r s t
     u v w x y z $200;
     do ii=1 to 13000;
     output;
     end;
     run;

References     "Accessing Data Libraries Using
          Engines" in Chapter 5, "Using SAS Files"
          in SAS Companion for the VMS Environment,
          Version 6, First Edition.

Use Asynchronous Read-Ahead when Processing Data Sets Sequentially

Job type PROC and data steps that read the data set sequentially

User SAS programmer

Usage     The base engine  RAH= option is a data set
          only option.  It accepts three values

          q    YES  (turn asynchronous read-ahead
          on)

          q    NO  (default; do not perform
          asynchronous read-  aheads)

          q    LOG (turn option on and write
          statistics to the LOG    when the data set
          is closed).

          This option works in conjunction with the
          CACHESIZ= option. Asynchronous read-ahead
          is enabled only if caching is turned on.
          (See previous section discussing the
          CACHESIZ= option.)

Benefit You may see a 10% to 20% run-time

          improvement over simple caching alone, but
          this will vary greatly depending on the
          data set geometry and step implementation.

          The  RAH=  option is designed to improve
          sequential data set read  operations.  You
          will see the greatest run-time gains  when
          the data set is large and the page size is
          small, or the amount of processing  done
          per observation is significant.

Cost      Since the CACHESIZ= option must be in
          effect for the  RAH= to function, memory
          consumption is increased as described by
          the CACHESIZ= option documentation.  Also,
          an additional caching buffer is allocated
          for the data set.

The Read-Ahead option (RAH=) causes the SAS System to "prefetch" data set observations asynchronously. The option assists the user in tuning sequential data set processing. This is an experimental data set option with Release 6.09 of the SAS System.

A RAH= value of YES or LOG is only effective when

  • the member is of type DATA
  • the data set is opened for input
  • the data set resides in a disk library on the local OpenVMS node
  • data set caching is in effect.

The following example demonstrates the usage of the RAH= option:

     data b;
     set a (rah=log);
     run;
     proc compare data=b (rah=yes)
     compare=c (rah=yes);
     run;

Be aware that some steps show a decrease in run-time performance, most notably data steps with WHERE clause statements. It is advisable to closely monitor RAH= option processing with the RAH=LOG setting.

References      "Accessing Data Libraries Using
          Engines" in  Chapter 5, "Using SAS Files"
          in theSAS Companion for the VMS
          Environment, Version 6.

External I/O

The following guidelines apply to reading and writing OpenVMS native files using the SAS System. For several of the suggestions, the larger your files, the more performance gain for your entire SAS job. These suggestions parallel several of the data set I/O suggestions.

Allocate File Space Appropriately

Job type SAS procedures and data steps that write external files

User SAS programmer

Usage     ALQ= and DEQ= as part of the FILENAME or
          FILE statement

Benefit Specifying appropriate values can decrease

          elapsed time up to 50%,  as well as
          reducing disk fragmentation.

Cost      Performance degradation when  ALQ=  and
          DEQ=  values are incompatible with file
          size.

The SAS System allocates disk space for external files based on the value of ALQ=. It then extends the file if needed based on the value of DEQ=. By default these values are 33 blocks for ALQ= and 32 blocks for DEQ=.

Every time a file has to be extended, the system has to search the disk for free space. This requires I/Os. When this is done repeatedly for large files it will degrade performance. By setting larger ALQ= and DEQ= values for large files this overhead will be reduced. Optimal I/O will occur when ALQ= is equal to the size of the file. Since this is not always feasible, it is better to underestimate the value for ALQ= and set a larger DEQ= value. This will allocate enough space for a smaller file, while extending it occasionally to meet the demands of a larger file. Allocating too much space may be costly if /HIGHWATER_MARKING is set on the disk. ( See the note under /HIGHWATER_MARKING later in this section.)

References      Chapter 6, "Using External Files" in
          SAS Companion for the VMS Environment,
          Version 6, First Edition.

           Guide to  OpenVMS File Applications, AA-
          PV6PA-TK

Turn off Disk Volume Highwater Marking

Job type Jobs that write external files

User System manager

Usage     Use the /NOHIGHWATER_MARKING qualifier
          when initializing disks.  For active
          disks, issue the DCL command SET
          VOLUME/NOHIGHWATER_MARKING.

Benefits Elapse time can be improved by up to 40%.

Direct I/Os are reduced.

Cost      There is no performance penalty.  However,
          some OpenVMS sites may require this
          feature for security purposes.

The SAS System uses the random access method when opening external files. This means that allocated disk space does not have to be processed in a sequential method. /HIGHWATER_MARK is a safeguard that clears disk space before it is allocated to remove residue of former files. To do this the entire space allocated has to be overwritten. Overwriting the space costs some elapsed time and I/Os. If the data stored on the disk are not of a truly confidential nature, then a performance gain can be achieved by disabling highwater marking on this disk.

Two DCL commands can be used independently to disable highwater marking on a disk. When initializing a new volume, use the following to inhibit the highwater function:

$ initialize/nohighwater AXP$DKA470 mydisk

To disable volume highwater marking on an active disk, use a command similar to the following:

$ set volume/nohighwater AXP$DKA200

References     OpenVMS System Manager's Manual:
          Tuning, Monitoring, and Complex Systems,
          AA-PV5NA-TK

Eliminate Disk Fragmentation

Job type Any jobs that access common external files frequently

User System manager

Usage     Devote a disk to often accessed files or
          keep your disks defragmented.

Benefit The savings on elapsed time depend on the

          current state of the disk but can be
          reduced up to 40%.

Cost      The cost to the user is the time and
          effort to better manage disk access rather
          than letting OpenVMS do all of the work.
          For the system manager, this may involve
          regularly defragmenting disks or obtaining
          additional disk drives.

On an unfragmented disk, files are contiguous, so after one I/O operation the disk head is well positioned for the next I/O operation. Also, split I/Os are rare, which decreases elapsed time to perform I/O.

Where possible, dedicating a disk drive that can be frequently defragmented can provide performance benefits. Use this disk to store commonly accessed SAS external files. In some situations, adding an inexpensive SCSI drive to the configuration may allow the system manager to maintain a clean, unfragmented environment more easily than maintaining a large disk farm. Files maintained on this unfragmented SCSI disk may perform better than heavily fragmented files on larger disks.

System Startup

Job type All jobs

User System manager

Usage     The OpenVMS Install facility can be used
          to make core SAS images memory resident.

Benefit Elapsed time of system startup may

          decrease by as much as 30% when running
          the SAS System under X-Windows, and by 40%
          in other modes.

Cost      With Release 6.09 of the SAS System,
          installing all images previously listed in
          the sample commands consumes approximately
          5500 global pages and 9 global sections.

Installing SAS System images can decrease the elapsed time of SAS System startup by up to 40%. Installing images is most effective on systems where two or more users are using SAS simultaneously. Use the following commands to install the core set of SAS images:

     $ install == "$sys$system:install/command"
     $ install add/open/header/share SAS$IMAGE
     $ install add/open/header/share
     SAS$LIBRARY:SASSHR.EXE
     $ install add/open/header/share
     SAS$LIBRARY:SABXSPH.EXE
     $ install add/open/header/share
     SAS$LIBRARY:SASBUNDL.EXE
     $ install add/open/header/share
     SAS$LIBRARY:SASIMPTR.EXE
     $ install add/open/header/share
     SAS$LIBRARY:SASMSG.EXE

When using the terminal-based full screen driver, include the following:

     $ install add/open/header/share
     SAS$LIBRARY:SABVU.EXE

When using Motif (X-windows), include the following:

     $ install add/open/header/share
     SAS$LIBRARY:SABMOTIF.EXE

References     OpenVMS System Manager's Manual:
          Tuning, Monitoring, and Complex Systems,
          AA-PV5NA-TK

          Installation Instructions and System
          Manager's Guide, Release 6.09 of the SAS
          System under OpenVMS for AXP Systems.

Bibliography

Open VMS System Configuration:

Installation Instructions and System Manager's Guide, Release 6.09 of the SAS System under OpenVMS for AXP Systems.

SAS Language Reference, Version 6, First Edition, Chapter 15, "SAS Data Set Options."

SAS options related to SAS images:

SAS Companion for the VMS Environment, Version 6, First Edition, Chapter 12,
"Optimizing System Performance."

Options and related topics for SAS files and external files:
General SAS Options and related topics:

SAS Companion for the VMS Environment, Version 6, First Edition, Chapter 8
"SAS System Options;" Chapter 9, "Data Set Options;" Chapter 12, "Optimizing System Performance;" and Appendix 3, "Estimating Data Set Size."

Host Specific Options and related topics:

SAS Companion for the VMS Environment, Version 6, First Edition, Chapter 5,
"Using SAS Files;" Chapter 9, "Data Set Options;" and Appendix 4, "Advanced Topics."

Tips and Techniques for Using I/O Subsystems in Version 6 of the SAS System under VMS, SUGI '90.

Installation Instructions and System Manager's Guide, Release 6.09 of the SAS System under OpenVMS for AXP Systems.

SAS Catalog options and related topics:

SAS Technical Report P-220 Changes and Enhancements to the SAS System for the VMS Environment, Release 6.07. Appendix 2, "Additional Changes and Enhancements."

SAS Programming: A Programmer's Guide to Sorting in the VMS Environment.

SAS Companion for the VMS Environment, Version 6, First Edition, Chapter 12, "Optimizing System Performance."

Tips and Techniques for Using I/O Subsystems in Version 6 of the SAS System under VMS, SUGI '90.

Introduction to Efficient Programming Techniques, SUGI '91.

Installation Instructions and System Manager's Guide, Release 6.09 of the SAS System under OpenVMS for AXP Systems.

SAS Programming Tips: A Guide To Efficient SAS Processing, Chapter 4, "Read and Write Data Selectively;" Chapter 7, "Know SAS System Defaults."

Other OpenVMS Documentation:

Guide to OpenVMS File Applications, AA-PV6PA-TK

OpenVMS System Manager's Manual: Tuning, Monitoring, and Complex Systems, AA-PV5NA-TK

OpenVMS DCL Dictionary, A-M, AA-PV5KA-TK