It might be beneficial
to rebalance the rows of a table if the data access patterns do not
take advantage of partitioning or if the HDFS block distribution becomes
uneven.
The DISTRIBUTIONINFO statement displays the number
of rows from Table1 on each machine in the cluster.
The DROPTABLE statement is used to drop the active
table, Table1.
The BALANCE statement rebalanced Table1 into a temporary
table. The TABLE statement is used with the &_TEMPLAST_ macro
variable to access the temporary table.
The PROMOTE statement changes the temporary table
into a regular in-memory table with the original table name, Table1.
After setting the Table1 as the active table with
the TABLE statement, the DISTRIBUTIONINFO statement displays the nearly
homogenous distribution of rows.
The SAVE statement can be used to save the table back
to HDFS with the homogeneous block distribution.
Output
The following output
shows the partial display for the first DISTRIBUTIONINFO statement.
One machine has zero rows and another machine has approximately twice
the number of rows.
Uneven Row Distribution
The following output
shows the homogenous distribution of rows after the BALANCE statement
is used.