Autopartitioning Scheme for DB2 under UNIX and PC Hosts

Overview

Autopartitioning for SAS/ACCESS Interface to DB2 for UNIX and PC Hosts is a modulo (MOD) function method. For general information about this feature, see Autopartitioning Techniques in SAS/ACCESS.

Autopartitioning Restrictions

SAS/ACCESS Interface to DB2 under UNIX and PC Hosts places additional restrictions on the columns that you can use for the partitioning column during the autopartitioning phase. Here is how columns are partitioned.
  • INTEGER and SMALLINT columns are given preference.
  • You can use other DB2 numeric columns for partitioning as long as the precision minus the scale of the column is between 0 and 10—namely, 0<(precision-scale)<10.

Nullable Columns

If you select a nullable column for autopartitioning, the OR<column-name> IS NULL SQL statement is appended at the end of the SQL code that is generated for the threaded Reads. This ensures that any possible NULL values are returned in the result set.

Using WHERE Clauses

Autopartitioning does not select a column to be the partitioning column if it appears in a SAS WHERE clause. For example, this DATA step cannot use a threaded Read to retrieve data because all numeric columns in the table (see the table definition in Using DBSLICE=) are in the WHERE clause:
data work.locemp;
set trlib.MYEMPS;
where EMPNUM<=30 and ISTENURE=0 and
 SALARY<=35000 and NUMCLASS>2;
run;

Using DBSLICEPARM=

Although SAS/ACCESS Interface to DB2 under UNIX and PC Hosts defaults to three threads when you use autopartitioning, do not specify a maximum number of threads for the threaded Read in the DBSLICEPARM= LIBNAME option.

Using DBSLICE=

You might achieve the best possible performance when using threaded Reads by specifying the DBSLICE= data set option for DB2 in your SAS operation. This is especially true if your DB2 data is evenly distributed across multiple partitions in a DB2 Enterprise Extended Edition (EEE) database system. When you create a DB2 table under the DB2 EEE model, you can specify the partitioning key that you want to use by appending the clause PARTITIONING KEY(column-name) to your CREATE TABLE statement. Here is how you can accomplish this by using the LIBNAME option, DBCREATE_TABLE_OPTS=, within the SAS environment.
/*points to a triple node server*/
libname trlib2 db2 user=db2user pw="db2pwd" datasrc=sample3c
DBCREATE_TABLE_OPTS='PARTITIONING KEY(EMPNUM);

proc datasets library=trlib;
  delete MYEMPS1;run;

data trlib.myemps(drop=morf whatstate
   DBTYPE=(HIREDATE="date" SALARY="numeric(8,2)"
   NUMCLASS="smallint" GENDER="char(1)" ISTENURE="numeric(1)" STATE="char(2)"
   EMPNUM="int NOT NULL Primary Key"));
format HIREDATE mmddyy10.;
do EMPNUM=1 to 100;
     morf=mod(EMPNUM,2)+1;
     if(morf eq 1) then
         GENDER='F';
     else
         GENDER='M';
     SALARY=(ranuni(0)*5000);
     HIREDATE=int(ranuni(13131)*3650);
     whatstate=int(EMPNUM/5);
     if(whatstate eq 1) then
         STATE='FL';
     if(whatstate eq 2) then
         STATE='GA';
     if(whatstate eq 3) then
         STATE='SC';
     if(whatstate eq 4) then
         STATE='VA';
     else
         state='NC';
     ISTENURE=mod(EMPNUM,2);
     NUMCLASS=int(EMPNUM/5)+2;
     output;
end;
run;
After the table MYEMPS is created on this three-node database, a third of the rows reside on each of the three nodes.
Optimization of the threaded Read against this partitioned table depends on the location of the DB2 partitions. If the DB2 partitions are on the same machine, you can use DBSLICE= with the DB2 NODENUMBER function in the WHERE clause:
proc print data=trlib2.MYEMPS(DBSLICE=("NODENUMBER(EMPNO)=0"
   "NODENUMBER(EMPNO)=1" "NODENUMBER(EMPNO)=2"));
run;
If the DB2 partitions reside on different physical machines, you can usually obtain the best results by using the DBSLICE= option with the SERVER= syntax in addition to the DB2 NODENUMBER function in the WHERE clause.
In the next example, DBSLICE= contains specific partitioning information for DB2. Also, Sample3a, Sample3b, and Sample3c are DB2 database aliases that point to individual DB2 EEE database nodes that exist on separate physical machines. For more information about the configuration of these nodes, see Configuring DB2 EEE Nodes on Physically Partitioned Databases.
proc print data=trlib2.MYEMPS(DBSLICE=(sample3a="NODENUMBER(EMPNO)=0"
   samble3b="NODENUMBER(EMPNO)=1" sample3c="NODENUMBER(EMPNO)=2"));
run;
NODENUMBER is not required to use threaded Reads for SAS/ACCESS Interface to DB2 under UNIX and PC Hosts. The methods and examples described in DBSLICE= work well in cases where the table that you want to read is not stored in multiple partitions to DB2. These methods also give you full control over which column is used to execute the threaded Read. For example, if the STATE column in your employee table contains only a few distinct values, you can modify your DBSLICE= clause accordingly:
data work.locemp;
set trlib2.MYEMPS (DBSLICE=("STATE='GA'"
    "STATE='SC'" "STATE='VA'" "STATE='NC'"));
where EMPNUM<=30 and ISTENURE=0 and SALARY<=35000 and NUMCLASS>2;
run;

Configuring DB2 EEE Nodes on Physically Partitioned Databases

Assuming that the database SAMPLE is partitioned across three different machines, you can create a database alias for it at each node from the DB2 Command Line Processor by issuing these commands:
catalog tcpip node node1 remote <hostname> server 50000
catalog tcpip node node2 remote <hostname> server 50000
catalog tcpip node node3 remote <hostname> server 50000
catalog database sample as samplea at node node1
catalog database sample as sampleb at node node2
catalog database sample as samplec at node node3
This enables SAS/ACCESS Interface to DB2 to access the data for the SAMPLE table directly from each node. For more information about configuring DB2 EEE to use multiple physical partitions, see the DB2 Administrator's Guide.