Autopartitioning Scheme for Greenplum

Overview

Autopartitioning for SAS/ACCESS Interface to Greenplum is a modulo (MOD) function method. For general information about this feature, see Autopartitioning Techniques in SAS/ACCESS.

Autopartitioning Restrictions

SAS/ACCESS Interface to Greenplum places additional restrictions on the columns that you can use for the partitioning column during the autopartitioning phase. Here is how columns are partitioned.
  • INTEGER and SMALLINT columns are given preference.
  • You can use other numeric columns for partitioning if the precision minus the scale of the column is greater than 0 but less than 10—namely, 0<(precision-scale)<10.

Nullable Columns

If you select a nullable column for autopartitioning, the OR<column-name>IS NULL SQL statement is appended at the end of the SQL code that is generated for the threaded Read. This ensures that any possible NULL values are returned in the result set.

Using WHERE Clauses

Autopartitioning does not select a column to be the partitioning column if it appears in a SAS WHERE clause. For example, this DATA step cannot use a threaded Read to retrieve the data because all numeric columns in the table are in the WHERE clause:
data work.locemp;
set trlib.MYEMPS;
where EMPNUM<=30 and ISTENURE=0 and
 SALARY<=35000 and NUMCLASS>2;
run;

Using DBSLICEPARM=

Although SAS/ACCESS Interface to Greenplum defaults to three threads when you use autopartitioning, do not specify a maximum number of threads for the threaded Read in DBSLICEPARM= LIBNAME option.

Using DBSLICE=

You might achieve the best possible performance when using threaded Reads by specifying the DBSLICE= data set option for Greenplum in your SAS operation. This is especially true if your Greenplum data is evenly distributed across multiple partitions in a Greenplum database system.
When you create a Greenplum table using the Greenplum database partition model, you can specify the partitioning key that you want to use by appending the PARTITION BY<column-name> clause to your CREATE TABLE statement. Here is how you can accomplish this by using the DBCREATE_TABLE_OPTS=LIBNAME option within the SAS environment.
/* Points to a triple-node server. */
libname mylib sasiogpl user=myuser pw=mypwd db=greenplum;
DBCREATE_TABLE_OPTS='PARTITION BY(EMPNUM);

proc datasets library=mylib;
  delete MYEMPS1;run;

data mylib.myemps(drop=morf whatstate
   DBTYPE=(HIREDATE="date" SALARY="numeric(8,2)"
   NUMCLASS="smallint" GENDER="char(1)" ISTENURE="numeric(1)" STATE="char(2)"
   EMPNUM="int NOT NULL Primary Key"));
format HIREDATE mmddyy10.;
do EMPNUM=1 to 100;
     morf=mod(EMPNUM,2)+1;
     if(morf eq 1) then
         GENDER='F';
     else
         GENDER='M';
     SALARY=(ranuni(0)*5000);
     HIREDATE=int(ranuni(13131)*3650);
     whatstate=int(EMPNUM/5);
     if(whatstate eq 1) then
         STATE='FL';
     if(whatstate eq 2) then
         STATE='GA';
     if(whatstate eq 3) then
         STATE='SC';
     if(whatstate eq 4) then
         STATE='VA';
     else
         state='NC';
     ISTENURE=mod(EMPNUM,2);
     NUMCLASS=int(EMPNUM/5)+2;
     output;
end;
run;
After the MYEMPS table is created on this three-node database, a third of the rows reside on each of the three nodes.
Using DBSLICE= works well when the table that you want to read is not stored in multiple partitions. It gives you flexibility in column selection. For example, if you know that the STATE column in your employee table contains only a few distinct values, you can modify your DBSLICE= option accordingly.
data work.locemp;
set mylib.MYEMPS (DBSLICE=("STATE='GA'"
    "STATE='SC'" "STATE='VA'" "STATE='NC'"));
where EMPNUM<=30 and ISTENURE=0 and SALARY<=35000 and NUMCLASS>2;
run;