SAS/ACCESS Interface to Oracle |
Overview |
Without user-specified partitioning from the DBSLICE= option, SAS/ACCESS Interface to Oracle tries to use its own partitioning techniques. The technique it chooses depends on whether the table is physically partitioned on the Oracle server.
For general information about this feature, see Autopartitioning Techniques in SAS/ACCESS.
Note: Threaded reads for the Oracle engine on z/OS are not supported.
Partitioned Oracle Tables |
If you are working with a partitioned Oracle table, it is recommended that you let the Oracle engine partition the table for you. The Oracle engine gathers all partition information needed to perform a threaded read on the table.
A partitioned Oracle table is a good candidate for a threaded read because each partition in the table can be read in parallel with little contention for disk resources. If the Oracle engine determines that the table is partitioned, it makes the same number of connections to the server as there are partitions, as long as the maximum number of threads that are allowed is higher than the number of partitions. Each connection retrieves rows from a single partition.
If the value of the maximum number of allowed threads is less than the number of partitions on the table, a single connection reads multiple partitions. Each connection retrieves rows from a single partition or multiple partitions. However, you can use the DB_ONE_CONNECT_PER_THREAD= data set option so that there is only one connection per thread.
The following example shows how to do this. First, create the SALES table in Oracle.
CREATE TABLE SALES (acct_no NUMBER(5), acct_name CHAR(30), amount_of_sale NUMBER(6), qtr_no INTEGER) PARTITION BY RANGE (qtr_no) (PARTITION sales1 VALUES LESS THAN (2) TABLESPACE ts0, PARTITION sales2 VALUES LESS THAN (2) TABLESPACE ts1, PARTITION sales3 VALUES LESS THAN (2) TABLESPACE ts2, PARTITION sales4 VALUES LESS THAN (2) TABLESPACE ts3)
Performing a threaded read on this table with the following code, SAS makes four separate connections to the Oracle server and each connection reads from each partition. Turning on SASTRACE= shows the SQL that is generated for each connection.
libname x oracle user=testuser path=oraserver; data new; set x.SALES (DBSLICEPARM=(ALL,10)); run; ORACLE: SELECT "ACCT_NO","ACCT_NAME", "AMOUNT_OF_SALE", "QTR_NO" FROM SALES partition (SALES2) ORACLE: SELECT "ACCT_NO","ACCT_NAME", "AMOUNT_OF_SALE", "QTR_NO" FROM SALES partition (SALES3) ORACLE: SELECT "ACCT_NO","ACCT_NAME", "AMOUNT_OF_SALE", "QTR_NO" FROM SALES partition (SALES1) ORACLE: SELECT "ACCT_NO","ACCT_NAME", "AMOUNT_OF_SALE", "QTR_NO" FROM SALES partition (SALES4)
Using the following code, SAS instead makes two separate connections to the Oracle server and each connection reads from two different partitions.
libname x oracle user=testuser path=oraserver; data new; set x.SALES (DBSLICEPARM=(ALL,2)); run; ORACLE: SELECT "ACCT_NO","ACCT_NAME", "AMOUNT_OF_SALE", "QTR_NO" FROM SALES partition (SALES2) UNION ALL SELECT "ACCT_NO","ACCT_NAME", "AMOUNT_OF_SALE", "QTR_NO" FROM SALES partition (SALES3) ORACLE: SELECT "ACCT_NO","ACCT_NAME", "AMOUNT_OF_SALE", "QTR_NO" FROM SALES partition (SALES1) UNION ALL SELECT "ACCT_NO","ACCT_NAME", "AMOUNT_OF_SALE", "QTR_NO" FROM SALES partition (SALES4)
Using DB_ONE_CONNECT_PER_THREAD=NO, however, you can override the default behavior of limiting the number of connections to the number of threads. As shown below, SAS makes four separate connections to the Oracle server and each connection reads from each of the partition.
libname x oracle user=testuser path=oraserver; data new; set x.SALES (DBSLICEPARM=(ALL,2) DB_ONE_CONNECT_PER_THREAD=NO ); run; ORACLE: SELECT "ACCT_NO","ACCT_NAME", "AMOUNT_OF_SALE", "QTR_NO" FROM SALES partition (SALES2) ORACLE: SELECT "ACCT_NO","ACCT_NAME", "AMOUNT_OF_SALE", "QTR_NO" FROM SALES partition (SALES3) ORACLE: SELECT "ACCT_NO","ACCT_NAME", "AMOUNT_OF_SALE", "QTR_NO" FROM SALES partition (SALES1) ORACLE: SELECT "ACCT_NO","ACCT_NAME", "AMOUNT_OF_SALE", "QTR_NO" FROM SALES partition (SALES4)
The second parameter of the DBSLICEPARM= LIBNAME option determines the number of threads to read the table in parallel. The number of partitions on the table, the maximum number of allowed threads, and the value of DB_ONE_CONNECT_PER_THREAD= determine the number of connections to the Oracle server for retrieving rows from the table.
Nonpartitioned Oracle Tables |
If the table is not partitioned, and the DBSLICE= option is not specified, Oracle resorts to the MOD function (see Autopartitioning Techniques in SAS/ACCESS With this technique, the engine makes N connections, and each connection retrieves rows based on a WHERE clause as follows:
WHERE ABS(MOD(ModColumn,N))=R
ModColumn is a column in the table of type integer and is not used in any user specified WHERE clauses. (The engine selects this column. If you do not think this is the ideal partitioning column, you can use the DBSLICE= data set option to override this default behavior.)
N defaults to 2, and N can be overridden with the second parameter in the DBSLICEPARM= data set option.
The Oracle engine selects the ModColumn to use in this technique. Any numeric column with zero scale value can qualify as the ModColumn. However, if a primary key column is present, it is preferred over all others. Generally, values in the primary key column are in a serial order and yield an equal number of rows for each connection. This example illustrates the point:
create table employee (empno number(10) primary key, empname varchar2(20), hiredate date, salary number(8,2), gender char(1));
Performing a threaded read on this table causes Oracle to make two separate connections to the Oracle server. SAS tracing shows the SQL generated for each connection:
data new; set x.EMPLOYEE(DBSLICPARM=ALL); run; ORACLE: SELECT "EMPNO", "EMPNAME", "HIREDATE", "SALARY", "GENDER" FROM EMPLOYEE WHERE ABS(MOD("EMPNO",2))=0 ORACLE: SELECT "EMPNO", "EMPNAME", "HIREDATE", "SALARY", "GENDER" FROM EMPLOYEE WHERE ABS(MOD("EMPNO",2))=1
EMPNO, the primary key, is selected as the MOD column.
The success of MOD depends on the distribution of the values within the selected ModColumn and the value of N. Ideally, the rows are distributed evenly among the threads.
You can alter the N value by changing the second parameter of DBSLICEPARM= LIBNAME option.
Performance Summary |
There are times you might not see an improvement in performance with the MOD technique. It is possible that the engine might not be able to find a column that qualifies as a good MOD column. In these situations, you can explicitly specify DBSLICE= data set option to force a threaded read and improve performance.
It is a good policy to let the engine autopartition and intervene with DBSLICE= only when necessary.
Copyright © 2010 by SAS Institute Inc., Cary, NC, USA. All rights reserved.