Previous Page | Next Page

Optimizing System Performance

Data Set I/O under OpenVMS

The information that is presented in this section applies to reading and writing SAS data sets. In general, the larger your data sets, the greater the potential performance gain for your entire SAS job. The performance gains that are described here were observed on data sets of approximately 100,000 blocks.


Allocating Data Set Space Appropriately

Job type

Jobs that write data sets.

User

SAS programmer.

Usage

Use ALQ=x and DEQ=y (or ALQMULT=x and DEQMULT=y) as LIBNAME statement options or as data set options, where x and y are values representing the number of blocks.

Benefit

There is up to a 50 percent decrease in elapsed time on write operations as reflected in fewer direct I/Os. File fragmentation is also reduced, thereby enhancing performance when you read the data set.

Cost

You will experience performance degradation when ALQ= or DEQ= values are incompatible with the data set size.

SAS initially allocates enough space for 10 pages of data for a data set. Each time the data set is extended, another five pages of space is allocated on the disk. OpenVMS maintains a bitmap on each disk that identifies the blocks that are available for use. When a data set is written and then extended, OpenVMS alternates between scanning the bitmap to locate free blocks and actually writing the data set. However, if the data sets were written with larger initial and extent allocations, then write operations to the data set would proceed uninterrupted for longer periods of time. At the hardware level, this means that disk activity is concentrated on the data set, and disk head seek operations that alternate between the bitmap and the data set are minimized. The user sees fewer I/Os and faster elapsed time.

Large initial and extent values can also reduce disk fragmentation. SAS data sets are written using the RMS algorithm "contiguous best try." With large preallocation, the space is reserved to the data set and does not become fragmented as it does when inappropriate ALQ= and DEQ= values are used.

SAS recommends setting ALQ= to the size of the data set to be written. If you are uncertain of the size, underestimate and use DEQ= for extents. Values of DEQ= larger than 5000 blocks are not recommended. For information about predicting data set size, see Estimating the Size of a SAS Data Set under OpenVMS.

The following is an example of using the ALQ= and DEQ= options:

libname x '[]';
/* Know this is a big data set. */
data x.big (alq=100000 deq=5000);
   length a b c d e f g h i j k l m 
_n o p q r s t u v w x y z $200;
   do ii=1 to 13000;
   output;
end;
run;

Note:   If you do not want to specify an exact number of blocks for the data set, use the ALQMULT= and DEQMULT= options.  [cautionend]


References for Allocating Data Set Space


Turning Off Disk Volume High-Water Marking

Job type

Any SAS application that writes data sets. Data set size is not important.

User

System manager.

Usage

Use the /NOHIGHWATER_MARKING qualifier when initializing disks. For active disks, issue the DCL command SET VOLUME/NOHIGHWATER_MARKING.

Benefit

There is a greater percentage gain for jobs that are write intensive. The savings in elapsed time can be as great as 40 percent. Direct I/Os are reduced.

Cost

There is no performance penalty. However, for security purposes, some OpenVMS sites might require this OpenVMS high-water marking feature to be set.

High-water marking is an OpenVMS security feature that is enabled by default. It forces prezeroing of disk blocks for files that are opened for random access. All SAS data sets are random access files and, therefore, pay the performance penalty of prezeroing, increased I/Os, and increased elapsed time.

Two DCL commands can be used independently to disable high-water marking on a disk. When initializing a new volume, use the NOHIGHWATER_MARKING qualifier to disable the high-water function as in the following example:

$ initialize/nohighwater $DKA470 mydisk

To disable volume high-water marking on an active disk, use a command similar to the following:

$ set volume/nohighwater $DKA200


References for Turning Off Disk Volume High-Water Marking


Eliminating Disk Fragmentation

Job type

Any jobs that frequently access common data sets.

User

SAS programmer and system manager.

Usage

Devote a disk to frequently accessed data sets, or keep your disks defragmented.

Benefit

The savings in elapsed time varies with the current state of the disk, but it can exceed 50 percent on write operations and 25 percent on read operations.

Cost

The cost to the user is the time and effort to better manage disk access. For the system manager, it can involve regularly defragmenting disks or obtaining additional disk drives.

Any software that reads and writes from disk benefits from a well-managed disk. This applies to SAS data sets. On an unfragmented disk, files are kept contiguous; thus, after one I/O operation, the disk head is well positioned for the next I/O operation.

A disk drive that is frequently defragmented can provide performance benefits. Use a frequently defragmented disk to store commonly accessed SAS data sets. In some situations, adding an inexpensive SCSI drive to the configuration allows the system manager to maintain a clean, unfragmented environment more easily than using a large disk farm. Data sets maintained on an unfragmented SCSI disk might perform better than heavily fragmented data sets on larger disks.

Defragmenting a disk means using the OpenVMS backup utility after regular business hours, when disk activity is likely to be minimal, to perform an image backup of a disk. Submit the following command sequence to create a defragmented copy of the source disk on the destination disk, using the files from the source disk:

$ mount/foreign 'destination-disk'
$ backup/image 'source-disk' 'destination-disk'

When the image backup operation is complete, dismount the destination disk and remount it using a normal mount operation (without the /FOREIGN qualifier) so that the disk can be used again for I/O operations. SAS does not recommend the use of dynamic defragmenting tools that run in the background of an active system because such programs can corrupt files.


References for Eliminating Disk Fragmentation


Setting Larger Buffer Size for Sequential Write and Read Operations

Job type

SAS steps that do sequential I/O operations on large data sets.

User

SAS programmer.

Usage

The CACHESIZE= data set option controls the buffering of data set pages during I/O operations. CACHESIZE= can be used either as a data set option or in a LIBNAME statement that uses the BASE engine. The BUFSIZE= data set option sets the data set page size when the data set is created. BUFSIZE= can be used as a data set option, in a LIBNAME statement, or as a SAS system option.

Benefit

There is as much as a 30 percent decrease in elapsed time in some steps when an appropriate value is chosen for a particular data set.

Cost

If the data set observation size is large, substantial space in the data set might be wasted if you do not choose an appropriate value for BUFSIZE=. Also, memory is consumed for the data cache, and multiple caches might be used for each data set opened.


Using the BUFSIZE= Option

The BUFSIZE= data set option sets the SAS internal page size for the data set. Once set, this becomes a permanent attribute of the file that cannot be changed. This option is meaningful only when you are creating a data set. If you do not specify a BUFSIZE= option, SAS selects a value that contains as many observations as possible with the least amount of wasted space.

An observation cannot span page boundaries. Therefore, unused space at the end of a page can occur unless the observations pack evenly into the page. By default, SAS tries to choose a page size between 8192 and 32768 if an explicit BUFSIZE= option has not been specified. If you increase the BUFSIZE= value, more observations can be stored on a page, and the same amount of data can be accessed with fewer I/Os. When explicitly choosing a BUFSIZE, be sure to choose a value that does not waste space in a data set page, resulting in wasted disk space. The highest recommended value for BUFSIZE= is 65024.

The following is an example of an efficiently written large data set, using the BUFSIZE= data set option. Note that in the following example, BUFSIZE=63488 becomes a permanent attribute of the data set:

libname buf '[]';
data buf.big (bufsize=63488);
   length a b c d e f g h i j k l m 
          n o p q r s t u v w x y z $200;
   do ii=1 to 13000;
   output;
end;
run;


Using the CACHENUM= Option

For each SAS file that you open, SAS maintains a set of caches to buffer the data set pages. The size of each of these caches is controlled by the CACHESIZE= option. The number of caches used for each open file is controlled by the CACHENUM= option. The ability to maintain more data pages in memory potentially reduces the number of I/Os that are required to access the data. The number of caches that are used to access a file is a temporary attribute. It might be changed each time you access the file.

By default, up to 10 caches are used for each SAS file that is opened; each of the caches is the value (in bytes) of CACHESIZE= in size. On a memory-constrained system you might wish to reduce the number of caches used to conserve memory.

The following example shows using the CACHENUM= option to specify that 8 caches of 65024 bytes each are used to buffer data pages in memory.

proc sort data=cache.big (cachesize=65024 cachenum=8);
   by serial;
run;


Using the CACHESIZE= Option

SAS maintains a cache that is used to buffer multiple data set pages in memory. This reduces I/O operation by enabling SAS to read or write multiple pages in a single operation. SAS maintains multiple caches for each data set that is opened. The CACHESIZE= data set option specifies the size of each cache.

The CACHESIZE= value is a temporary attribute that applies only to the data set that is currently open. You can use different CACHESIZE= values at different times when accessing the same file. To conserve memory, a maximum of 65024 bytes is allocated for the cache by default. The default allows as many pages as can be completely contained in the 65024-byte cache to be buffered and accessed with a single I/O.

Here is an example that uses the CACHESIZE= data set option to write a large data set efficiently. Note that in the following example, CACHESIZE= value is not a permanent attribute of the data set:

libname cache '[]';
data cache.big (cachesize=65024);
   length a b c d e f g h i j k l m 
          n o p q r s t u v w x y z $200;
   do ii=1 to 13000;
   output;
end;
run;


Using Asynchronous I/O When Processing SAS Data Sets

Job type

Jobs that read or write SAS files.

User

SAS programmer.

Usage

The BASE engine now performs asynchronous reading and writing by default. This allows overlap between SAS data set I/O and computation time.

Note:   Asynchronous reading and writing is enabled only if caching is turned on.   [cautionend]

Benefit

Asynchronous I/O allows other processing to continue while SAS is waiting for I/O completion. If there is a large gap between the CPU time used and the elapsed time reported in the FULLSTIMER statistics, asynchronous I/O can help reduce that gap.

Cost

Because data page caching must be in effect, the memory usage of the I/O cache must be incurred. For more information about controlling the size and number of caches used for a particular SAS file, see CACHENUM= Data Set Option: OpenVMS and CACHESIZE= Data Set Option: OpenVMS.

Asynchronous I/O is enabled by default. There are no additional options that need to be specified to use this feature. For all SAS files that use a data cache, SAS performs asynchronous I/O. Because multiple caches are now available for each SAS file, while an I/O is being performed on one cache of data, SAS might continue processing using other caches. For example, when SAS writes to a file, once the first cache becomes full an asynchronous I/O is initiated on that cache, but SAS does not have to wait on the I/O to complete. While that transaction is in progress, SAS can continue processing new data pages and store them in one of the other available caches. When that cache is full, an asynchronous I/O can be initiated on that cache as well.

Similarly, when SAS reads a file, additional caches of data can be read from the file asynchronously in anticipation of those pages being requested by SAS. When those pages are required, they will have already been read from disk, and no I/O wait will occur.

Because caching (with multiple caches) needs to be enabled for asynchronous I/O to be effective, if the cache is disabled with the CACHESIZE=0 option or the CACHENUM=0 option, no asynchronous I/O can occur.


References for Using Asynchronous I/O

Previous Page | Next Page | Top of Page