Autopartitioning Scheme for ODBC

Overview

Autopartitioning for SAS/ACCESS Interface to ODBC is a modulo (MOD) function method. For general information about this feature, see Autopartitioning Techniques in SAS/ACCESS.

Autopartitioning Restrictions

SAS/ACCESS Interface to ODBC places additional restrictions on the columns that you can use for the partitioning column during the autopartitioning phase. Here is how columns are partitioned.
  • SQL_INTEGER, SQL_BIT, SQL_SMALLINT, and SQL_TINYINT columns are given preference.
  • You can use SQL_DECIMAL, SQL_DOUBLE, SQL_FLOAT, SQL_NUMERIC, and SQL_REAL columns for partitioning under these conditions:
    • The ODBC driver supports converting these types to SQL_INTEGER by using the INTEGER cast function.
    • The precision minus the scale of the column is greater than 0 but less than 10—that is, 0<(precision-scale)<10.
The exception to the above rule is for Oracle SQL_DECIMAL columns. As long as the scale of the SQL_DECIMAL column is 0, you can use the column as the partitioning column.

Nullable Columns

If you select a nullable column for autopartitioning, the OR<column-name>IS NULL SQL statement is appended at the end of the SQL code that is generated for the threaded Read. This ensures that any possible NULL values are returned in the result set. Also, if the column to be used for the partitioning is SQL_BIT, the number of threads are automatically changed to two, regardless of how the DBSLICEPARM= option is set.

Using WHERE Clauses

Autopartitioning does not select a column to be the partitioning column if it appears in the WHERE clause. For example, the following DATA step could not use a threaded Read to retrieve the data because all numeric columns in the table are in the WHERE clause:
data work.locemp;
  set trlib.MYEMPS;
  where EMPNUM<=30 and ISTENURE=0 and
        SALARY<=35000 and NUMCLASS>2;
run;

Using DBSLICEPARM=

SAS/ACCESS Interface to ODBC defaults to three threads when you use autopartitioning but do not specify a maximum number of threads in DBSLICEPARM= to use for the threaded Read.

Using DBSLICE=

You might achieve the best possible performance when using threaded Reads by specifying the DBSLICE= option for ODBC in your SAS operation. This is especially true if your DBMS supports multiple database partitions and provides a mechanism to allow connections to individual partitions. If your DBMS supports this concept, you can configure an ODBC data source for each partition and use the DBSLICE= clause to specify both the data source and the WHERE clause for each partition, as shown in this example:
proc print data=trilib.MYEMPS(DBSLICE=(DSN1="EMPNUM BETWEEN 1 AND 33"
DSN2="EMPNUM BETWEEN 34 AND 66"
DSN3="EMPNUM BETWEEN 67 AND 100"));
run;
See your DBMS or ODBC driver documentation for more information about configuring for multiple partition access. You can also see Configuring SQL Server Partitioned Views for Use with DBSLICE= for an example of configuring multiple partition access to a table.
Using the DATASOURCE= syntax is not required to use DBSLICE= with threaded Reads for the ODBC interface. The methods and examples described in DBSLICE= work well in cases where the table that you want to read is not stored in multiple partitions in your DBMS. These methods also give you flexibility in column selection. For example, if you know that the STATE column in your employee table only contains a few distinct values, you can customize your DBSLICE= clause accordingly:
datawork.locemp;
set trlib2.MYEMP(DBSLICE=("STATE='FL'" "STATE='GA'"
  "STATE='SC'" "STATE='VA'" "STATE='NC'"));
where EMPNUM<=30 and ISTENURE=0 and SALARY<=35000 and NUMCLASS>2;
run;

Configuring SQL Server Partitioned Views for Use with DBSLICE=

Microsoft SQL Server implements multiple partitioning by creating a global view across multiple instances of a Microsoft SQL Server database. For this example, assume that Microsoft SQL Server has been installed on three separate machines (SERVER1, SERVER2, SERVER3), and three ODBC data sources (SSPART1, SSPART2, SSPART3) have been configured against these servers. Also, a linked server definition for each of these servers has been defined. This example uses SAS to create the tables and associated views, but you can accomplish this outside of the SAS environment.
  1. Create a local SAS table to build the Microsoft SQL Server tables.
    data work.MYEMPS;
    format HIREDATE mmddyy 0. SALARY 9.2
       NUMCLASS 6. GENDER $1. STATE $2. EMPNUM 10.;
    do EMPNUM=1 to 100;
         morf=mod(EMPNUM,2)+1;
         if(morf eq 1) then
             GENDER='F';
         else
             GENDER='M';
         SALARY=(ranuni(0)*5000);
         HIREDATE=int(ranuni(13131)*3650);
         whatstate=int(EMPNUM/5);
         if(whatstate eq 1) then
             STATE='FL';
         if(whatstate eq 2) then
             STATE='GA';
         if(whatstate eq 3) then
             STATE='SC';
         if(whatstate eq 4) then
             STATE='VA';
         else
             state='NC';
         ISTENURE=mod(EMPNUM,2);
         NUMCLASS=int(EMPNUM/5)+2;
         output;
    end;
    run;
  2. Create a table on each of the SQL server databases with the same table structure, and insert one–third of the overall data into each table. These table definitions also use CHECK constraints to enforce the distribution of the data on each of the subtables of the target view.
    libname trlib odbc user=ssuser pw=sspwd dsn=sspart1;
    proc datasets library=trlib;
      delete MYEMPS1;run;
    run;
    data trlib.MYEMPS1(drop=morf whatstate
       DBTYPE=(HIREDATE="datetime" SALARY="numeric(8,2)"
       NUMCLASS="smallint" GENDER="char(1)" ISTENURE="bit" STATE="char(2)"
       EMPNUM="int NOT NULL Primary Key CHECK (EMPNUM BETWEEN 0 AND 33)"));
    set work.MYEMPS;
    where (EMPNUM BETWEEN 0 AND 33);
    run;
    
    libname trlib odbc user=ssuer pw=sspwd dsn=sspart2;
    proc datasets library=trlib;
      delete MYEMPS2;run;
    data trlib.MYEMPS2(drop=morf whatstate
       DBTYPE=(HIREDATE="datetime" SALARY="numeric(8,2)"
       NUMCLASS="smallint" GENDER="char(1)" ISTENURE="bit" STATE="char(2)"
       EMPNUM="int NOT NULL Primary Key CHECK (EMPNUM BETWEEN 34 AND 66)"));
    set work.MYEMPS;
    where (EMPNUM BETWEEN 34 AND 66);
    run;
    
    libname trlib odbc user=ssuer pw=sspwd dsn=sspart3;
    proc datasets library=trlib;
      delete MYEMPS3;run;
    data trlib.MYEMPS3(drop=morf whatstate
       DBTYPE=(HIREDATE="datetime" SALARY="numeric(8,2)"
       NUMCLASS="smallint" GENDER="char(1)" ISTENURE="bit" STATE="char(2)"
       EMPNUM="int NOT NULL Primary Key CHECK (EMPNUM BETWEEN 67 AND 100)"));
    set work.MYEMPS;
    where (EMPNUM BETWEEN 67 AND 100);
    run;
  3. Create a view using the UNION ALL construct on each Microsoft SQL Server instance that references the other two tables. This creates a global view that references the entire data set.
    /*SERVER1,SSPART1*/
    proc sql noerrorstop;
    connect to odbc (UID=ssuser PWD=sspwd DSN=SSPART1);
    execute (drop view MYEMPS) by odbc;
    execute (create view MYEMPS AS
             SELECT * FROM users.ssuser.MYEMPS1
             UNION ALL
             SELECT * FROM SERVER2.users.ssuser.MYEMPS2
             UNION ALL
             SELECT * FROM SERVER3.users.ssuser.MYEMPS3) by odbc;
    quit;
    
    /*SERVER2,SSPART2*/
    proc sql noerrorstop;
    connect to odbc (UID=ssuser PWD=sspwd DSN=SSPART2);
    execute (drop view MYEMPS) by odbc;
    execute (create view MYEMPS AS
             SELECT * FROM users.ssuser.MYEMPS2
             UNION ALL
             SELECT * FROM SERVER1.users.ssuser.MYEMPS1
             UNION ALL
             SELECT * FROM SERVER3.users.ssuser.MYEMPS3) by odbc;
    quit;
    
    /*SERVER3,SSPART3*/
    proc sql noerrorstop;
    connect to odbc (UID=ssuser PWD=sspwd DSN=SSPART3);
    execute (drop view MYEMPS) by odbc;
    execute (create view MYEMPS AS
             SELECT * FROM users.ssuser.MYEMPS3
             UNION ALL
             SELECT * FROM SERVER2.users.ssuser.MYEMPS2
             UNION ALL
             SELECT * FROM SERVER1.users.ssuser.MYEMPS1) by odbc;
    quit;
  4. Set up your SAS operation to perform the threaded Read. The DBSLICE= option contains the Microsoft SQL Server partitioning information.
    proc print data=trlib.MYEMPS(DBLICE=(sspart1="EMPNUM BETWEEN 1 AND 33"
    sspart2="EMPNUM BETWEEN 34 AND 66"
    sspart3="EMPNUM BETWEEN 67 AND 100"));
    run;
This configuration lets the ODBC interface access the data for the MYEMPS view directly from each subtable on the corresponding Microsoft SQL Server instance. The data is inserted directly into each subtable, but this process can also be accomplished by using the global view to divide up the data. For example, you can create empty tables and then create the view as seen in the example with the UNION ALL construct. You can then insert the data into the view MYEMPS. The CHECK constraints allow the Microsoft SQL Server query processor to determine which subtables should receive the data.
Other tuning options are available when you configure Microsoft SQL Server to use partitioned data. For more information, see the "Creating a Partitioned View" and "Using Partitioned Views" sections in Creating and Maintaining Databases (SQL Server 2000).