GP_DISTRIBUTED_BY= Table Option

Specifies the distribution key for the table being created.

Category: Table Control
Alias: DISTRIBUTED_BY=
Data source: Greenplum, HAWQ

Syntax

Arguments

DISTRIBUTED BY (column, [ ...column] )

specifies one or more DBMS column names to use as the distribution key.

DISTRIBUTED RANDOMLY

specifies to determine the column or set of columns to use to distribute table rows across database segments. This is known as round-robin distribution.

Details

DISTRIBUTED BY uses hash distribution with one or more columns declared as the distribution key. For the most even data distribution, the distribution key should be the primary key of the table or a unique column (or set of columns). If that is not possible, then you might choose DISTRIBUTED RANDOMLY, which sends the data round-robin to the segment instances. If a value is not supplied, then hash distribution is chosen using the primary key (if the table has one) or the first eligible column of the table as the distribution key.
DISTRIBUTED_BY can be submitted as shown above or within the DBCREATE_TABLE_OPTS= table option. Here is an example of how it is specified in DBCREATE_TABLE_OPTS=:
dbcreate_table_opts='distributed by ("b")'

See Also

Last updated: February 23, 2017