IMSTAT Procedure

BALANCE Statement

The BALANCE statement creates a temporary table from the active table and re-balances it so that the number of rows on the worker nodes are balanced as evenly as possible. The rows are balanced within ± 1 row of each other.

Rebalancing a Table

Syntax

BALANCE;

Without Arguments

The re-balancing removes any observations marked as deleted or marked for purging in the active table. A WHERE clause is observed when the data are re-balanced.
One case for re-balancing is if the data distribution for a table has become uneven due to block movement within the Hadoop Distributed File System. This can occur when nodes fail in Hadoop or Hadoop processes have exited on some nodes. Another situation where re-balancing is useful is when a partitioned table has uneven distribution across the worker nodes due to uneven sizes of the partition. This can affect the performance of all actions running in the LASR Analytic Server since typically the nodes with the most records determine the overall performance.
Re-balancing of a table removes partition and ordering information from the table.
The BALANCE statement can be used with non-distributed servers as well. However, it is less important because all records of a table reside on the same machine. It might be useful, however, to derive from a partitioned table a new table subject to a WHERE clause and that has deleted records removed and is not partitioned.