DISTRIBUTED_BY= Data Set Option

Uses one or multiple columns to distribute table rows across database segments.
Valid in: DATA and PROC steps (when accessing DBMS data using SAS/ACCESS software)
Default: RANDOMLY DISTRIBUTED
Data source: Greenplum

Syntax

DISTRIBUTED_BY='column-1 <…,column-n>' | DISTRIBUTED RANDOMLY

Syntax Description

column-name
specifies a DBMS column name.
DISTRIBUTED RANDOMLY
determines the column or set of columns that the Greenplum database uses to distribute table rows across database segments. This is known as round-robin distribution.

Details

For uniform distribution—namely, so that table records are stored evenly across segments (machines) that are part of the database configuration—the distribution key should be as unique as possible.

Example: Create a Table By Specifying a Distribution Key

libname x sasiogpl user=myuser password=mypwd dsn=Greenplum;
data x.sales (dbtype=(id=int qty=int amt=int) distributed_by='distributed by (id)');
          id = 1;
          qty = 100;
          sales_date = '27Aug2009'd;
          amt = 20000;
run;
It creates the SALES table.
CREATE TABLE SALES
(id int,
 qty int,
 sales_date double precision,
 amt int
) distributed by (id)