IMSTAT Procedure
BALANCE Statement
The BALANCE statement creates a temporary table
from the active table and re-balances it so that the number of rows
on the worker nodes are balanced as evenly as possible. The rows are
balanced within ±
1 row of each other.
Syntax
Without Arguments
The re-balancing removes
any observations marked as deleted or marked for purging in the active
table. A WHERE clause is observed when the data are re-balanced.
One case for re-balancing
is if the data distribution for a table has become uneven due to
block movement within the Hadoop Distributed File System. This can
occur when nodes fail in Hadoop or Hadoop processes have exited on
some nodes. Another situation where re-balancing is useful is when
a partitioned table has uneven distribution across the worker nodes
due to uneven sizes of the partition. This can affect the performance
of all actions running in the LASR Analytic Server since typically
the nodes with the most records determine the overall performance.
Re-balancing of a table
removes partition and ordering information from the table.
The BALANCE statement
can be used with non-distributed servers as well. However, it is less
important because all records of a table reside on the same machine.
It might be useful, however, to derive from a partitioned table a
new table subject to a WHERE clause and that has deleted records removed
and is not partitioned.
Copyright © SAS Institute Inc. All rights reserved.