You might achieve the
best possible performance when using threaded Reads by specifying
the DBSLICE= data set option for Greenplum in your SAS operation. This
is especially true if your Greenplum data is evenly distributed across
multiple partitions in a Greenplum database system.
When you create a Greenplum
table using the Greenplum database partition model, you can specify
the partitioning key that you want to use by appending the
PARTITION
BY<column-name>
clause
to your CREATE TABLE statement. Here is how you can accomplish this
by using the DBCREATE_TABLE_OPTS=LIBNAME option within the SAS environment.
/* Points to a triple-node server. */
libname mylib sasiogpl user=myuser pw=mypwd db=greenplum;
DBCREATE_TABLE_OPTS='PARTITION BY(EMPNUM);
proc datasets library=mylib;
delete MYEMPS1;run;
data mylib.myemps(drop=morf whatstate
DBTYPE=(HIREDATE="date" SALARY="numeric(8,2)"
NUMCLASS="smallint" GENDER="char(1)" ISTENURE="numeric(1)" STATE="char(2)"
EMPNUM="int NOT NULL Primary Key"));
format HIREDATE mmddyy10.;
do EMPNUM=1 to 100;
morf=mod(EMPNUM,2)+1;
if(morf eq 1) then
GENDER='F';
else
GENDER='M';
SALARY=(ranuni(0)*5000);
HIREDATE=int(ranuni(13131)*3650);
whatstate=int(EMPNUM/5);
if(whatstate eq 1) then
STATE='FL';
if(whatstate eq 2) then
STATE='GA';
if(whatstate eq 3) then
STATE='SC';
if(whatstate eq 4) then
STATE='VA';
else
state='NC';
ISTENURE=mod(EMPNUM,2);
NUMCLASS=int(EMPNUM/5)+2;
output;
end;
run;
After the MYEMPS table
is created on this three-node database, a third of the rows reside
on each of the three nodes.
Using DBSLICE= works
well when the table that you want to read is not stored in multiple
partitions. It gives you flexibility in column selection. For example,
if you know that the STATE column in your employee table contains
only a few distinct values, you can modify your DBSLICE= option accordingly.
data work.locemp;
set mylib.MYEMPS (DBSLICE=("STATE='GA'"
"STATE='SC'" "STATE='VA'" "STATE='NC'"));
where EMPNUM<=30 and ISTENURE=0 and SALARY<=35000 and NUMCLASS>2;
run;