IMSTAT Procedure

Example 3: Rebalancing a Table

Details

It might be beneficial to rebalance the rows of a table if the data access patterns do not take advantage of partitioning or if the HDFS block distribution becomes uneven.

Program

libname example sasiola host="grid001.example.com" port=10010 tag='hps';

proc imstat immediate;
  table example.table1;
  distributioninfo; 1
  balance;
  droptable; 2
  table example.&_templast_; 3
  promote table1; 4
  table example.table1;
  distributioninfo; 5
  /* save path="/hps" replace; */ 6
quit;

Program Description

  1. The DISTRIBUTIONINFO statement displays the number of rows from Table1 on each machine in the cluster.
  2. The DROPTABLE statement is used to drop the active table, Table1.
  3. The BALANCE statement rebalanced Table1 into a temporary table. The TABLE statement is used with the &_TEMPLAST_ macro variable to access the temporary table.
  4. The PROMOTE statement changes the temporary table into a regular in-memory table with the original table name, Table1.
  5. After setting the Table1 as the active table with the TABLE statement, the DISTRIBUTIONINFO statement displays the nearly homogenous distribution of rows.
  6. The SAVE statement can be used to save the table back to HDFS with the homogeneous block distribution.

Output

The following output shows the partial display for the first DISTRIBUTIONINFO statement. One machine has zero rows and another machine has approximately twice the number of rows.
Uneven Row Distribution
DISTRIBUTIONINFO results for an unbalanced table
The following output shows the homogenous distribution of rows after the BALANCE statement is used.
Homogenous Row Distribution
DISTRIBUTIONINFO results after rebalancing