Using Properties Window Options to Optimize SQL Processing Performance

Problem

You want to set specific options in the SQL Properties pane or table properties panes that are located in the Designer window for an SQL Join transformation. These options are intended to improve the performance of SQL processes that are included in a SAS Data Integration Studio job.

Tasks

Bulk Load Tables

The fastest way to insert data into a relational database when using the SAS/ACCESS engine is to use the bulk-loading capabilities of the database. By default, the SAS/ACCESS engines load data into tables by preparing an SQL INSERT statement, executing the INSERT statement for each row, and issuing a COMMIT. If you specify BULKLOAD=YES as a DATA step or LIBNAME option, then the database load utility is invoked. This invocation enables you to bulk load rows of data as a single unit, which can significantly enhance performance. You can set the BULKLOAD option on the Bulkload to DBMS property pane for the target table. Some databases require that the table be empty in order to load records with their bulk-load utilities. Check your database documentation for these restrictions.
For smaller tables, the extra overhead of the bulk-load process might slow performance. For larger tables, the speed of the bulk-load process outweighs the overhead costs. Each SAS/ACCESS engine invokes a different load utility and uses different options. For information about using the bulk-load option for each SAS/ACCESS engine, see the online documentation for each engine.
The Use Bulkload for Uploading and Bulkload Options properties are available on the properties window for each table in a query. The Use Bulkload for Uploading property applies to the source table. It is a valid option only when the source table is being uploaded to the DBMS to create a homogeneous join. The Bulkload to DBMS property applies to target tables and turns bulk loading on and off. The Bulkload to DBMS property is not valid when the Target Table is Pass Through property on the SQL Properties pane is set to Yes.
The option to bulk load tables applies only to source tables that are participating in a heterogeneous join. Also, the user must be uploading the table to the DBMS where the join is performed.

Optimize the SELECT Statement

If you set the Select * property to Yes in the Select Properties pane, a Select * statement selects all columns in the order in which they are stored in a table and then runs when the query is submitted. If you set the Select * property to No and enter only the columns that you need for the query in the SELECT statement, you can improve performance. You can also enhance performance by carefully ordering columns so that non-character columns (such as numeric, DATE, and DATETIME) come first and character columns come last.

Set Buffering Options

You can adjust I/O buffering. Set the Buffer Size property to 128 KB to promote fast I/O performance (or 64 KB to enhance large, sequential processes). The Buffer Size property is available in the SQL Properties pane. Other buffering options are database-specific and are available in the properties pane for each of the individual tables in the query. For example, you can set the READBUFF option by entering a number in the Number of Rows in DBMS Read property in the properties pane, which buffers the database records read before passing them to SAS. INSERTBUFF is an example of another option that is available on some database management systems.
You should experiment with different settings for these options to find optimal performance for your query. These options apply to data sets. Therefore, do not specify them unless you know that explicit pass-through or implicit pass-through is not used on that portion of the query because they could actually slow performance. If these options are present in the query at all, they prevent implicit pass-through processing. If these options are present on the part that is being explicitly passed through, a database error occurs because the database cannot recognize these options.
For example, if the Target Table is Pass Through property on the SQL Properties pane is set to Yes, then using INSERTBUFF data set option on this target table causes an error on the database. If the Pass Through property in the SQL Properties pane is set to Yes and a number is specified in the Buffer Size property, then the database returns an error because it does not recognize this option in the query's FROM clause. To avoid the risk of preventing implicit pass-through, specify these options in the LIBNAME statement, which applies to all tables that use that LIBNAME and anything that accesses those tables. These buffering data set options are great performance boosters if the database records are all copied to SAS before the query runs in SAS (with no pass-through) because it buffers the I/O between the database and SAS into memory.

Use Threaded Reads

Threaded reads divide resource-intensive tasks into multiple independent units of work and execute those units simultaneously. SAS can create multiple threads, and a read connection is established between the DBMS and each SAS thread. The results from each of these threads, know as a result set, is partitioned across the connections, and rows are passed to SAS simultaneously (in parallel) across the connections. This approach improves performance.
To perform a threaded read, SAS first creates threads, which are standard operating system tasks that are controlled by SAS, within the SAS session. Next, SAS establishes a DBMS connection on each thread. SAS then causes the DBMS to partition the result set and reads one partition per thread. To cause the partitioning, SAS appends a WHERE clause to the SQL so that a single SQL statement becomes multiple SQL statements, one for each thread. The DBSLICE option specifies user-supplied WHERE clauses to partition a DBMS query for threaded reads. The DBSLICEPARM option controls the scope of DBMS threaded reads and the number of DBMS connections. You can enable threaded reads with the Parallel Processing with Threads property on the SQL Properties pane.

Write User-Written Code

The User Written property determines whether the query is user-written or generated. When the User Written property on the SQL Properties pane is set to Yes, you can edit the code on the Source tab, and the entire job is saved as user written. When the User Written property in the Where, Having, or Join Properties pane is set to Yes, you can then enter code directly into the field. Therefore, you can either write a new SQL query from scratch or modify a query that is generated when conditions are added to the top section of the Where/Having/Join tab. When User Written is set to No in any properties pane, the SQL field is read-only. It displays only the generated query. User-written code can be used as a last resort because the code cannot be regenerated from the metadata when there are changes. The User Written property is available in the SQL Properties pane and in the Where/Having/Join Properties pane.