Previous Page | Next Page

Using SAS Engines

The CONCUR Engine under OpenVMS


What Is the CONCUR Engine?

The concurrency (CONCUR) engine allows concurrent READ and WRITE access to data sets. Note that the concurrency engine supports only SAS data sets. It does not support SAS files of member types other than DATA, such as INDEX or CATALOG.


Differences between the CONCUR Engine and the V9 Engine

In contrast to the V9 engine, the CONCUR engine does not support indexing and compression of observations. The CONCUR engine can access files only within a single machine or OpenVMS cluster; access to SAS data sets on other operating environments and concurrent READ and WRITE access to SAS data sets across DECnet are features that are provided by SAS/SHARE software. For more information about using SAS/SHARE software, see the SAS/SHARE User's Guide. The CONCUR engine is optimized for random concurrent access, while the V9 engine is better suited to sequential access. So, for example, if you intend to use the FSEDIT procedure or the POINT= option in the SET statement to access your data randomly, the CONCUR engine might be the best choice for you, even if you do not need any of the concurrent access capabilities.

Version 8 SAS introduced support for several new features related to data sets. The CONCUR engine supports many of these features: member names with lengths up to 32 characters; variable names with lengths up to 32 characters; and member or variable labels with lengths up to 256 characters. Note that while the CONCUR engine supports the creation and access of Version 6 format files, the long character strings are not allowed when accessing or creating a Version 6 concurrency engine file. For more information about support for these longer character strings, see SAS Language Reference: Concepts.


Reading Aligned and Unaligned Data Sets

Beginning with SAS 9.2 Phase 2, the CONCUR engine reads only SAS data sets that are aligned by data type. Data sets that were created with the CONCUR engine before SAS 9.2 Phase 2, are unaligned. These data sets must be aligned to be read by the CONCUR engine in SAS 9.2 Phase 2, or later.

The CONCURB engine is a read-only engine that reads only unaligned data sets in a library. To convert all unaligned data sets in a library to aligned data sets, use the COPY procedure. Using PROC COPY, the unaligned data sets in the input library are read by the CONCURB engine, and then the aligned data sets are written to the output library by the CONCUR engine.

libname a concur aligned-library-name;
libname b concurb unaligned-library-name;
proc copy in=b out=a;
run;


How to Select the CONCUR Engine

There are three ways to select the CONCUR engine:


Record-Level Locking and File-Sharing with the CONCUR Engine

The CONCUR engine creates and accesses SAS data sets in an acceptable format to allow record-level locking and file-sharing.

CAUTION:
SAS data sets that are created with the CONCUR engine are not interchangeable with SAS data sets that are accessed and created with any other engine.

If you plan to share a particular SAS data set, create it using the CONCUR engine.  [cautionend]

If you have a SAS data set that you want to share after it is created, you can copy it, using the CONCUR engine as the output engine. Then it will be in the correct format for sharing. For example, if you want shared UPDATE access to a data set that was created using the V9 engine, you can use the following statements to convert it:

libname inlib v9 '[mydir.base]';
libname outlib concur '[mydir.share]';
proc copy in=inlib out=outlib;
run;

After you run this SAS program, all SAS data sets that are created with the V9 engine in the library that is referenced by INLIB are copied to the library referenced by OUTLIB using the CONCUR engine. To create data sets using the CONCUR engine, your directory must have a version limit greater than 1.


Member Types Supported

The CONCUR engine supports the SAS 9.2 member type DATA.


Engine/Host Options for the CONCUR Engine

Several concurrency engine options control the creation and access of SAS data sets. Most of these options have direct correlation to options available through OpenVMS Record Management Services (RMS). The CONCUR engine creates relative organization files with record-level locking enabled.

Note:   Data sets created with the CONCUR engine have a maximum observation length of 32K.  [cautionend]

You can use the following engine/host options with the CONCUR engine:

ALQ=

specifies the number of OpenVMS disk blocks to allocate initially to a data set when it is created. The value can range from 0 to 2,147,483,647. If the value is 0, the minimum number of blocks required for a sequential file is used. The ALQ= option defaults to the bucket size. OpenVMS RMS always rounds the value up to the next disk cluster boundary.

The ALQ= option (allocation quantity) corresponds to the FAB$L_ALQ field in OpenVMS RMS. For additional details, see ALQ= Data Set Option: OpenVMS and Guide to OpenVMS File Applications.

BKS=

specifies the number of OpenVMS disk blocks in each bucket of the file. The value can range from 0 to 63. If the value is 0, the bucket size used is the minimum number of blocks needed to contain a single observation. The default value is 32.

When deciding on the bucket size to use, consider whether the file is usually accessed randomly (small bucket size), sequentially (large bucket size), or both (medium bucket size). The bucket size is a permanent attribute of the file, so this option applies to output files only.

The BKS= option (bucket size) corresponds to the FAB$B_BKS field in OpenVMS RMS or the FILE BUCKET_SIZE attribute when using File Definition Language (FDL). For additional details, see BKS= Data Set Option: OpenVMS and Guide to OpenVMS File Applications.

DEQ=

specifies the number of OpenVMS disk blocks to add each time OpenVMS RMS automatically extends a data set during a write operation. The value can range from 0 to 65,535. OpenVMS RMS always rounds the value up to the next disk cluster boundary. A large value can result in fewer file extensions over the life of the file; a small value results in numerous file extensions over the life of the file. A file with numerous file extensions that might be noncontiguous slows record access.

If the value specified is 0, OpenVMS RMS uses the default value for the process. The DEQ= option defaults to the bucket size.

The DEQ= option (default file extension quantity) corresponds to the FAB$W_DEQ field in OpenVMS RMS. For additional details, see DEQ= Data Set Option: OpenVMS and Guide to OpenVMS File Applications.

FILEFMT=

specifies the file format, or version of the engine, to use. Allowed values are 606, 607, 801, and 901. The default value is 801. There was an internal file format change between Release 6.06 and Release 6.07, and again between Version 6 and Version 8. The Version 8 and SAS 9 formats are identical. The concurrency (CONCUR) engine can create and access all versions of the file format. When you access a file for input or update, the CONCUR engine detects the correct version of the existing file. When you create a new file, the CONCUR engine defaults to creating a Version 8 format file unless overridden by the FILEFMT= option.

The following example shows how to create a file in Release 6.07 format:

libname clib concur '[]';
data clib.v607 (filefmt=607);
... more SAS statements ...
run;
MBF=

specifies the number of I/O buffers you want OpenVMS RMS to allocate for a particular file. The value can range from 0 to 127, and it represents the number of buffers to use. By default, this option is set to 2 for files opened for update and 1 for files opened for input or output. If the value 0 is specified, the process' default value is used.

The MBF= option (multibuffer count) corresponds to the RAB$B_MBF field in OpenVMS RMS or the CONNECT MULTIBUFFER_COUNT attribute when using FDL. For additional details, see MBF= Data Set Option: OpenVMS and Guide to OpenVMS File Applications.


Data Set Options Supported by the CONCUR Engine

The CONCUR engine recognizes all data set options that are documented in SAS Language Reference: Dictionary except the FILECLOSE=, COMPRESS=, OUTREP, and REUSE options. Of special importance to the CONCUR engine is the portable data set option CNTLLEV=. (For details, see CNTLLEV= Data Set Option: OpenVMS.) Other data set options that are likely to be useful include LOCKREAD= and LOCKWAIT=. (For details, see LOCKREAD= Data Set Option: OpenVMS and LOCKWAIT= Data Set Option: OpenVMS.) For more information, see SAS Language Reference: Dictionary.

The engine/host options that are discussed in Engine/Host Options for the CONCUR Engine can also be used as data set options when you use the CONCUR engine. For details, see Specifying Data Set Options.


System Option Values Used by the CONCUR Engine

The CONCUR engine does not use the values of any SAS system options.


DECnet Access

The CONCUR engine supports both creation and reading of files across DECnet, but not the updating of files across DECnet. You are allowed to create and read files because the engine uses multistreaming only when the file is opened for update. Support of DECnet access means you can now specify a node name in the physical pathname of your SAS library, as long as you do not plan to update the data sets stored in the library. The following is an example:

libname mylib concur 'mynode::bldgc:[testdata]';


Passwords

The CONCUR engine supports SAS passwords. The syntax and behavior is the same as passwords used with the V9 engine.


Internals of a Concurrency Engine Data Set


Contents and Organization of a Concurrency Engine Data Set

If you are familiar with OpenVMS RMS, it might be helpful to know the internal file format of a concurrency engine data set. A concurrency engine data set is a relative format file. The record length is determined by the length of one observation, with a minimum length of 8 bytes. Because the data set is a relative format file, the maximum observation length of a concurrency engine data set is 32,767 bytes. The first portion of the file contains header records that provide information to the engine concerning the number of observations in the file, the number of variables, some positioning information to optimize access, the date and time, SAS software release, operating environment the data set was created on, and so on.

Following the header information is information pertaining to each individual variable in the file. A NAMESTR is stored for each variable on the data set. The NAMESTR includes the variable name, type, label, and size. Multiple NAMESTRs are stored in a single record, up to the maximum number of NAMESTRs that the record length accommodates.

After the NAMESTRs, the observations begin. There is always one observation per record. With one exception, the record length is the observation length. If the observation length is less than 8 bytes, the record length defaults to 8. If you delete a record in a relative format file, the record still exists in the file, but it is marked as deleted.

Note:   In a concurrency engine data set, a data set of deleted observations takes the same amount of disk space as a data set of valid observations. To remove the deleted observations, you must use the COPY procedure and copy the data set to a new data set type, such as a data set created with the V9 engine.  [cautionend]


Notes on File-Sharing Capabilities

Although all record-level locking capabilities are provided through the use of OpenVMS RMS features, some file-sharing capabilities are provided by OpenVMS RMS and some are provided by the engine itself. The engine can correctly set the share options of a file when the file is opened for input or update, because SAS uses the name of the existing data set directly. However, output data sets are created with a temporary name and then renamed to the actual data set name after the data set is closed. This ensures the integrity of existing data sets of the same name in case an error occurs during creation of the new data set. Therefore, the engine must handle all file-sharing issues that disallow sharing of output files. This is done through the locking of specific filenames, which is why your directory must have a version limit of at least 2 to create concurrency engine data sets.


Optimizing the Performance of the CONCUR Engine


Introduction to Optimizing Engine Performance

Engine performance is often a trade-off between various factors. You can optimize the performance of the CONCUR engine in your operating environment. By controlling the size and number of buffers, you can specify how SAS accesses your data. By specifying the data set options, you can control the level and amount of data that is accessed. The amount of disk space available for these operations also effects engine performance.


Controlling the Size and Number of Buffers

Depending on the type of record access your SAS application performs, you need to consider both the size of buffers (bucket size) and the number of buffers (multibuffer count). For complete details about specifying the size and number of buffers, see BKS= Data Set Option: OpenVMS and MBF= Data Set Option: OpenVMS.

The two extremes of record access are records that are accessed completely sequentially or completely randomly. For example, many SAS procedures typically access data sets sequentially, processing the records from first to last. On the other hand, you might access observations in a completely random order when using the FSEDIT procedure to edit or browse observations in a data set.

There are also cases in which records are accessed randomly but might be reaccessed frequently. One example is an application that uses a data set in which particular observations contain information that is referred to frequently. Again, using the FSEDIT procedure as an example, the data set can be designed in such a way that you must access the first observation followed by observation 200, then the first observation again followed by observation 300, and so on.

Finally, there are cases in which records are accessed randomly, but then adjacent records are likely to be accessed. An application can use the POINT= option in a SET statement to selectively input the first 10 observations out of every 100 observations.

Most often, an application accesses a data set by a combination of several of these methods. The following list gives suggestions for the number of buffers and bucket size you should use for each method:

completely sequential or random access

is most efficient with a single buffer. However, the bucket size differs:

random access

is more efficient with a smaller bucket size.

sequential access

is more efficient with a larger bucket size.

random access with reaccessed records

is most efficient with multiple buffers to keep the reaccessed records in the buffer cache. You should use a small bucket size in this instance.

random access with subsequent adjacent access

is most efficient with a single buffer. However, use a larger bucket size so that more records are stored in the buffer cache. This increases the probability that the required records have been read into memory with a single I/O.

If your program accesses the data set by several methods, you must find a compromise between the number of buffers and bucket sizes. This is what SAS attempts to do with the defaults, because the intended use of the file is unknown. Because you know the intended use of your CONCUR engine data sets, you can improve the CONCUR engine's performance by optimizing the buffer settings.


Using Portable Data Set Options

Several data set options are portable options that are available for all engines, but they are particularly useful with the concurrency engine.

CNTLLEV=

specifies the level of access (control level) to the data set, whether concurrent or exclusive. If you decide to create a concurrency engine data set to take advantage of its random access optimizations, but you do not need to provide for concurrent access at this time, you can use the CNTLLEV= data set option to further improve performance. By default, when using the concurrency engine, data sets that are opened for input allow shared READ access, data sets that are opened for output allow no sharing, and data sets that are opened for update allow shared UPDATE access. When sharing is allowed, record-level locking is enabled. When you do not need this feature, you can reduce the overhead of record locking by using CNTLLEV=MEM to disable the sharing.

The CNTLLEV= data set option takes one of two values:

MEM

specifies that the application requires exclusive access to the data set. Member-level control restricts any other application from accessing the data set until the step has completed.

REC

specifies that concurrent access is allowed and OpenVMS RMS record-level locking is enabled. This option entails more processing overhead and should be used only when necessary.

Each SAS procedure specifies a required control level to the engine, depending on the intended access of the observations. If you use CNTLLEV=REC and the SAS procedure requires member-level control to ensure the integrity of the data during processing, a warning is written to the SAS log indicating that inaccurate or unpredictable results can occur if the data set is being updated by another process during the analysis.

A common example of improving performance by overruling the CNTLLEV default of the procedure is with the FSEDIT procedure, which uses a default of CNTLLEV=REC. A session using the FSEDIT procedure with a concurrency engine data set does not need to incur the overhead of record-level locking if concurrent access is not required. By using the data set option CNTLLEV=MEM, the application tells the engine to override the control level specification of the procedure because exclusive access at the member level is wanted. This disables record-level locking, decreases the overhead for processing the data set, and improves performance. In tests using the SET statement to input a concurrency engine data set, using the CNTLLEV=MEM option caused the step to run in one-third the CPU time as the same step using the CNTLLEV=REC option.

For syntax and usage examples for the CNTLLEV= data set option, see CNTLLEV= Data Set Option: OpenVMS and SAS Language Reference: Dictionary.

FIRSTOBS= and OBS=

specify a beginning and ending observation to subset your data set.

The value of the FIRSTOBS= data set option specifies the first observation that should be included for processing in the SAS DATA step. Some engines have to read the records sequentially, discarding them until the requested observation is reached. Because a concurrency engine data set is a relative format file, the engine can directly access the beginning observation without having to first read any other observations in the file.

Using the OBS= data set option to specify the last observation that you want to process can improve performance by terminating the input of observations without having to read records until the end-of-file character is reached.

For more information about the FIRSTOBS= and OBS= data set options, see SAS Language Reference: Dictionary.


Using the POINT= Option

You can use the POINT= option in a SET statement to access contiguous ranges of observation. For example, with the POINT= option, the SAS program can read observations 10 through 50, then observations 90 through 150, and so on. Obviously, reading only the records that you actually need improves performance by decreasing the number of records you must access. Due to the physical format of a concurrency engine data set, the engine can access the required records directly.


Disk Space Usage

For most data sets, the disk space that is required for a CONCUR engine data set and a V9 engine data set are comparable. However, for data sets in which the number of observations is greater than the number of variables, concurrency engine data sets are usually smaller. An exception to this is a concurrency engine data set that has many variables and only a few observations; in this case, space might be wasted.

However, there is a file format for both uncompressed and compressed data sets that makes the V9 engine disk space usage more efficient.


Performance Comparisons

Performance is a main concern for many applications, so it is useful to know how the CONCUR engine compares to the V9 engine when various features of SAS are used:

Creating data sets

When you compare the creation and sequential input of data sets using each engine, the V9 engine tends to be faster when the data sets are small. However, as the size of the data set increases, the V9 and CONCUR engines are comparable in CPU time used. In all cases, the page faults that are incurred for the CONCUR engine are substantially less than for the V9 engine.

Accessing existing data sets

When you compare random access of an existing file using both engines, the concurrency engine is much faster. When you use a large bucket size in the concurrency engine, with a comparable page size in the V9 engine, the concurrency engine takes approximately one-half as much CPU time. When the bucket size and page size are small, the concurrency engine takes about one-third as much CPU time. Again, page faults for the concurrency engine are substantially less.

Previous Page | Next Page | Top of Page