Data Set Options for Relational Databases

DISTRIBUTED_BY= Data Set Option

Uses one or multiple columns to distribute table rows across database segments.

Default value:	RANDOMLY DISTRIBUTED
Valid in:	DATA and PROC steps (when accessing DBMS data using SAS/ACCESS software)
DBMS support:	Greenplum

Syntax
Syntax Description
Details
Example

Syntax

DISTRIBUTED_BY='column-1 <... ,column-n>' | RANDOMLY DISTRIBUTED

Syntax Description

column-name: specifies a DBMS column name.
DISTRIBUTED RANDOMLY: determines the column or set of columns that the Greenplum database uses to distribute table rows across database segments. This is known as round-robin distribution.

Details

For uniform distribution--namely, so that table records are stored evenly across segments (machines) that are part of the database configuration--the distribution key should be as unique as possible.

Example

This example shows how to create a table by specifying a distribution key.

libname x sasiogpl user=myuser password=mypwd dsn=Greenplum;

data x.sales (dbtype=(id=int qty=int amt=int) distributed_by='distributed by (id)');
          id = 1;
          qty = 100;
          sales_date = '27Aug2009'd;
          amt = 20000;
run;

It creates the SALES table.

CREATE TABLE SALES 
(id int,
 qty int,
 sales_date double precision,
 amt int
) distributed by (id)

Top of Page