SAS/ACCESS Interface to DB2 Under z/OS |
Assessing When to Tune Performance |
Among the factors that affect DB2 performance are the size of the table that is being accessed and the form of the SQL SELECT statement. If the table that is being accessed is larger than 10,000 rows (or 1,000 pages), you should evaluate all SAS programs that access the table directly. When you evaluate the programs, consider these questions.
Does the program need all columns that the SELECT statement retrieves?
Do the WHERE clause criteria retrieve only those rows that are needed for subsequent analysis?
Is the data going to be used by more than one procedure in one SAS session? If so, consider extracting the data into a SAS data file for SAS procedures to use instead of allowing the data to be accessed directly by each procedure.
Do the rows need to be in a particular order? If so, can an indexed column be used to order them? If there is no index column, is DB2 doing the sort?
Do the WHERE clause criteria allow DB2 to use the available indexes efficiently?
Can your DB2 system use parallel processing to access the data more quickly?
In addition, the DB2 Resource Limit Facility limits execution time of dynamic SQL statements. If the time limit is exceeded, the dynamic statement is terminated and the SQL code -905 is returned. This list describes several situations in which the RLF could stop a user from consuming large quantities of CPU time.
An extensive search by the FSEDIT, FSVIEW, or FSBROWSE procedures or an SCL application.
An extensive load into a DB2 table. In this case, you can break up the load by lowering the commit frequency, or you can use the bulk-load facility through SAS/ACCESS Interface to DB2 under z/OS.
Methods for Improving Performance |
You can do several things in your SAS application to improve DB2 engine performance.
Set the SAS system option DB2DBUG. This option prints to the SAS log the dynamic SQL that is generated by the DB2 engine and all other SQL that is executed by the DB2 engine. You can then verify that all WHERE clauses, PROC SQL joins, and ORDER BY clauses are being passed to DB2. This option is for debugging purposes and should not be set once the SAS application is used in production. The NODB2DBUG option disables this behavior.
Verify that all SAS procedures and DATA steps that read DB2 data share connections where possible. You can do this by using one libref to reference all SAS applications that read DB2 data and by accepting the default value of SHAREDREAD for the CONNECTION= option.
If your DB2 subsystem supports parallel processing, you can assign a value to the CURRENT DEGREE special register. Setting this register might enable your SQL query to use parallel operations. You can set the special register by using the LIBNAME options DBCONINIT= or DBLIBINIT= with the SET statement as shown in this example:
libname mydb2 db2 dbconinit="SET CURRENT DEGREE='ANY'";
Use the view descriptor WHERE clause or the DBCONDITION= option to pass WHERE clauses to DB2. You can also use these methods to pass sort operations to DB2 with the ORDER BY clause instead of performing a sort within SAS.
If you are using a SAS application or an SCL application that reads the DB2 data twice, let the DB2 engine spool the DB2 data. This happens by default because the default value for the SPOOL= option is YES.
The spool file is read both when the application rereads the DB2 data and when the application scrolls forward or backward through the data. If you do not use spooling, and you need to scroll backward through the DB2 table, the DB2 engine must start reading from the beginning of the data and read down to the row that you want to scroll back to.
Use the SQL procedure to pass joins to DB2 instead of using the MATCH MERGE capability (that is, merging with a BY statement) of the DATA step.
Use the DBKEY= option when you are doing SAS processing that involves the KEY= option. When you use the DBKEY= option, the DB2 engine generates a WHERE clause that uses parameter markers. During the execution of the application, the values for the key are substituted into the parameter markers in the WHERE clause.
If you do not use the DBKEY= option, the entire table is retrieved into SAS, and the join is performed in SAS.
Consider using stored procedures when they can improve performance in client/server applications by reducing network traffic. You can execute a stored procedure by using the DBCONINIT= or DBLIBINIT= LIBNAME options.
Use the READBUFF= LIBNAME option to retrieve records in blocks instead of one at a time.
Optimizing Your Connections |
Since SAS 7, the DB2 engine supports more than one connection to DB2 per SAS session. This is an improvement over SAS 6 in a number of ways, especially in a server environment. One advantage is being able to separate tasks that fetch rows from a cursor from tasks that must issue commits. This separation eliminates having to resynchronize the cursor, prepare the statement, and fetch rows until you are positioned back on the row you were on. It also enables tasks that must issue commits to eliminate locking contention to do so sooner because they are not delayed until after cursors are closed to prevent having to resynchronize. In general, tables that are opened for input fetch from cursors do not issue commits, while update openings might, and output openings do, issue commits.
You can control how the DB2 engine uses connections by using the CONNECTION= option in the LIBNAME statement. At one extreme is CONNECTION=UNIQUE, which causes each table access, whether it is for input, update, or output, to create and use its own connection. Conversely, CONNECTION=SHARED means that only one connection is made, and that input, update, and output accesses all share that connection.
The default value for the CONNECTION= option is CONNECTION=SHAREDREAD, which means that tables opened for input share one connection, while update and output openings get their own connections. CONNECTION=SHAREDREAD allows for the best separation between tasks that fetch from cursors and tasks that must issue commits, eliminating the resynchronizing of cursors.
The values GLOBAL and GLOBALREAD perform similarly to SHARED and SHAREDREAD. The difference is that you can share the given connection across any of the librefs that you specify as GLOBAL or GLOBALREAD.
Although the default value of CONNECTION=SHAREDREAD is usually optimal, at times another value might be better. If you must use multiple librefs, you might want to set them each as GLOBALREAD. In this case, you have one connection for all of your input openings, regardless of which libref you use, as opposed to one connection per libref for input openings. In a single-user environment (as opposed to a server session), you might know that you do not have multiple openings occurring at the same time. In this case, you might want to use SHARED--or GLOBAL for multiple librefs. By using such a setting, you eliminate the overhead of creating separate connections for input, update, and output transactions. If you have only one opening at a time, you eliminate the problem of resynchronizing input cursors if a commit occurs.
Another reason for using SHARED or GLOBAL is the case of opening a table for output while opening another table within the same database for input. This can result in a -911 deadlock situation unless both opens occur in the same connection.
As explained in DB2 Under z/OS Information for the Database Administrator, the first connection to DB2 is made from the main SAS task. Subsequent connections are made from corresponding subtasks, which the DB2 engine attaches; DB2 allows only one connection per task. Due to the system overhead of intertask communication, the connection established from the main SAS task is a faster connection in terms of CPU time. Because this is true, you can expect better performance (less CPU time) if you use the first connection for these operations when you read or write large numbers of rows. If you read only rows, SHAREDREAD or GLOBALREAD can share the first connection. However, if you are both reading and writing rows (input and output opens), you can use CONNECTION=UNIQUE to make each opening use the first connection. UNIQUE causes each opening to have its own connection. If you have only one opening at a time, and some are input while others are output (for large amounts of data), the performance benefit of using the main SAS task connection far outweighs the overhead of establishing a new connection for each opening.
The utility connection is another type of connection that the DB2 engine uses, which the use does not control. This connection is a separate connection that can access the system catalog and issue commits to release locks. Utility procedures such as DATASETS and CONTENTS can cause this connection to be created, although other actions necessitate it as well. There is one connection of this type per libref, but it is not created until it is needed. If you have critical steps that must use the main SAS task connection for performance reasons, refrain from using the DEFER=YES option in the LIBNAME statement. It is possible that the utility connection can be established from that task, causing the connection you use for your opening to be from a slower subtask.
In summary, no one value works best for the CONNECTION= option in all possible situations. You might need to try different values and arrange your SAS programs in different ways to obtain the best performance possible.
Copyright © 2010 by SAS Institute Inc., Cary, NC, USA. All rights reserved.