Optimizing SAS I/O

Put Catalogs and Data Sets into Separate Libraries, Using the Optimal Block Size for Each

The physical block size (BLKSIZE= ) of a SAS bound library determines both the minimum page size and the minimum unit of space allocation for the library. The 6KB default is relatively efficient across a range of device types, and it leads to lower memory requirements for catalog buffers. However, when you use the 6KB default, more DASD space is needed to hold a given amount of data because smaller blocks lead to capacity losses. In one test case on a 3380, an MXG daily PDB required 8% more tracks when it was stored in 6KB physical blocks instead of in half-track blocks.
Because the optimal block sizes for SAS catalogs and SAS data sets are not necessarily the same, consider putting catalogs and data sets into separate libraries. For catalog libraries, 6KB is a good general physical block size on any device. For data sets, choose either a full-track or half-track block size, depending on whether the library is stored on a device that supports full-track blocks.

Optimize I/O for Direct Access Bound Libraries

Overview of Optimize I/O for Direct Access Bound Libraries

Determining whether the primary access pattern that you want to use is sequential or random, and then selecting an appropriate page size based on your determination, helps you optimize the performance of your SAS session. Based on the primary access pattern that you are using, select an appropriate page size according to the guidelines in Sequential Processing Pattern and Random Processing Pattern.
The BUFSIZE data set option enables you to establish a non-default page size for a new SAS data set, but there are some limitations. Once determined, the page size becomes a permanent attribute of the SAS data set and influences the efficiency of both the output operation that creates the data set as well as that of subsequent read or update operations.
The minimum page size that can be specified for a SAS data set is the block size of the library that contains it. Because the library block size is fixed when the library is created, achieving optimal performance might require creating new libraries with special block sizes. You might also have to divide into separate libraries those members that you access sequentially and those members that you access randomly.

Sequential Processing Pattern

Use the following recommendations to optimize performance when using a sequential access pattern:
  • Choose a library block size that corresponds to half-track blocking, that is, two blocks per track. For example, specify:
     option blksize(3380)=half blksize(3390)=half;
  • Select a BUFNO value that is an even number between 6 and 10. Setting BUFNO to a value from 10 to 30 might result in small additional gains in the number of bytes transferred per unit of elapsed time. However, this gain might come at the expense of monopolizing the channel or the device. Consult with your system administrators to evaluate the likelihood of this problem occurring, as well as the impact on the system.
  • Choose a larger BUFNO value than the default value. Start with 10, although it might be helpful to increase the page size to 30. The performance benefit varies depending on the cache scheme that is used by the controller on which the library resides.
  • Consider using the In-Memory File (IMF) feature for a SAS file that is accessed across many SAS steps (DATA / procedure) if the file is small enough to fit into the available region size. Load the file using the SASFILE statement before the SAS steps that processes the file. The file is read and, if necessary, written only once. Without IMF, the file would be read once per step. For more information about how to reserve enough space to hold the entire data set in memory while it is being processed, see SASFILE Statement: z/OS.

Random Processing Pattern

Use the following criteria to optimize performance when using a random access pattern:
  • Choose a library block size of 6K, if that block size is practical. However, for some DASD controller configurations, half-track blocking performs nearly as well as a 6K block size for random access. Half-track blocking, which results in fewer inter-block gaps, allows more data to be packed on a track.
  • If necessary, set the member page size, the BUFSIZE, equal to the library block size.
  • Consider using the SASFILE statement to load a repetitively accessed file, such as a master file, into memory. The elapsed time for such operations is dramatically reduced by IMF because all of the member pages that need to be accessed must be read into memory only once. However, take care to ensure that the region size for the job is large enough to contain the file being loaded. It might also be necessary to consult with your z/OS system administrator to ensure that the job is protected against having the working-set size for its virtual storage trimmed.

Optimize I/O for Sequential Libraries

Sequential format bound libraries are those libraries that are processed with the TAPE engine.
  • Use the default BUFSIZE when you access sequential format bound libraries. The default BUFSIZE is always the most appropriate choice.
  • For both new and existing sequential format bound libraries on disk, specify the optimal half-track block size. The block size must be specified as part of the allocation parameters (that is, DD statement or LIBNAME statement). The BLKSIZE and BLKSIZE (device) system options are not used for sequential access bound libraries.
  • For libraries on tape, if it is possible, structure your SAS job to write all library members as part of a single PROC COPY operation. Using one PROC COPY operation avoids the I/O delays that result when SAS repositions back to the beginning of the tape data set between every SAS procedure or DATA step.
  • For libraries on tape that are assigned internally, specify the engine in the LIBNAME statement. This specification avoids an extra tape mount.

Determine Whether You Should Compress Your Data

Overview of Compressing Data

Compressing data reduces I/O and disk space but increases CPU time. Therefore, whether data compression is worthwhile to you depends on the resource cost-allocation policy in your data center. Often your decision must be based on which resource is more valuable or more limited, DASD space or CPU time.
You can use the portable SAS system option COMPRESS= to compress all data sets that are created during a SAS session. Or, use the SAS data set option COMPRESS= to compress an individual data set. Data sets that contain many long character variables generally are excellent candidates for compression.
The following tables illustrate the results of compressing SAS data sets under z/OS. In both cases, PROC COPY was used to copy data from an uncompressed source data set into uncompressed and compressed result data sets, using the system option values COMPRESS=NO and COMPRESS=YES, respectively.
Note: When you use PROC COPY to compress a data set, you must include the NOCLONE option in your PROC statement. Otherwise, PROC COPY propagates all the attributes of the source data set, including its compression status.
In the following tables, the CPU row shows how much time was used by an IBM 3090-400S to copy the data, and the SPACE values show how much storage (in megabytes) was used.
For the first table, the source data set was a problem-tracking data set. This data set contained mostly long, character data values, which often contained many trailing blanks.
Compressed Data Comparison 1
Resource
Uncompressed
Compressed
Change
CPU
4.27 sec
27.46 sec
+23.19 sec
Space
235 MB
54 MB
-181 MB
For the preceding table, the CPU cost per megabyte is 0.1 seconds.
For the next table, the source data set contained mostly numeric data from an MICS performance database. The results were again good, although not as good as when mostly character data was compressed.
Compressed Data Comparison 2
Resource
Uncompressed
Compressed
Change
CPU
1.17 sec
14.68 sec
+13.51 sec
Space
52 MB
39 MB
-13 MB
For the preceding table, the CPU cost per megabyte is 1 second.
For more information about compressing SAS data, see SAS(R) Programming Tips: A Guide to Efficient SAS(R) Processing.

Consider Using SAS Software Compression in Addition to Hardware Compression

Some storage devices perform hardware data compression dynamically. Because this hardware compression is always performed, you might decide not to enable the SAS COMPRESS option when you are using these devices. However, if DASD space charges are a significant portion of your total bill for information services, you might benefit by using SAS software compression in addition to hardware compression. The hardware compression is transparent to the operating environment. If you use hardware compression only, then space charges are assessed for uncompressed storage.

Consider Placing SAS Libraries in Hiperspaces

One effective method of avoiding I/O operations is to use SAS software's HIPERSPACE engine option. This option is specific to z/OS and enables you to place a SAS library in a hiperspace instead of on disk.
The major factor that affects hiperspace performance is the amount of expanded storage on your system. The best candidates for using hiperspace are jobs that execute on a system that has plenty of expanded storage. If expanded storage on your system is constrained, the hiperspaces are moved to auxiliary storage. Moving the hiperspaces to auxiliary storage eliminates much of the potential benefit of using the hiperspaces.
For more information about using hiperspaces under z/OS, see Hiperspace and DIV Libraries and Tuning SAS(R) Applications in the OS/390 and z/OS Environments.

Consider Designating Temporary SAS Libraries as Virtual I/O Data Sets

Treating libraries as "virtual I/O" data sets is another effective method of avoiding I/O operations. This method works well with any temporary SAS library--especially WORK. To use this method, specify UNIT=VIO as an engine option in the LIBNAME statement or LIBNAME function.
The VIO method is always effective for small libraries (<10 cylinders). If your installation has set up your system to allow VIO to go to expanded storage, then VIO can also be effective for large temporary libraries (up to several hundred cylinders). Using VIO is most practical during evening and night shifts when the demands on expanded storage and on the paging subsystem are typically light.
The VIO method can also save disk space because it is an effective way of putting large paging data sets to double use. During the day, these data sets can be used for their normal function of paging and swapping back storage; during the night, they become a form of temporary scratch space.