Ixutil Examples

Create a Hybrid Index

Assume there is an SPD Server domain named my domain assigned to the directory /spds on a machine named spot. A user has created a table with the following SAS program:

LIBNAME my_data sasspds 'mydomain'
   server=spot.spdsname
   user='anonymous';

data my_data.test(index=(x));
   do i = 1 to 30000;
     x = mod(i,3);
     output;
   end;
run;

data my_data.test1;
   do i = 1 to 10000;
    x = mod(i,2);
    output;
   end;
run;

PROC APPEND
   base=my_data.test
   data=my_data.test1;
run;

PROC SQL;
   delete from my_data.test
     where x=1;
quit;

The SAS program above creates a hybrid index for column X of the table named test, on the machine named spot, in the directory named /spds.

Retrieve Disk Usage and Fragmentation Statistics

Use the -stats option of ixutil to get the disk usage and segment list fragmentation statistics for the index:

> ixutil -stats test -dsn test -libpath /spds

SAS Scalable Performance Data Server 4.5(TS M0) 
Build(Feb 26 2009, 11:50:08)
Index File Utility
Copyright (c) 1996-2009 by SAS Institute Inc, Cary NC 27513 USA

Statistics for Index X:
---------------------------------------------------
+--segment_size         = 8192
+--n_segments_in_tbl    = 5
+--n_values_in_index    = 2
+--n_vdeleted_values    = 1
+--percent_vdeleted     = 33.33
+--n_seglist_values     = 2
+--n_seglist_chunks     = 3
+--avg_chunks_per_list  = 1.00
+--idx_file_bytes       = 13304
+--idx_garbage_bytes    = 4272
+--percent_idx_garbage  = 32.11


Ixutil completed successfully

The statistics include the following:

the segment size of the index.
the number of segments in the table.
the number of distinct values for the index.
the number of virtually deleted values (values that are no longer recognized by query indexes).
the percentage of virtually deleted values.
the number of values that require segment lists (a value that is in only one segment does not require a segment list).
the number of segment list chunks for all values of the index.
the average number of chunks for any value in the index.
the size of the idx file for the index. The idx file maintains the value segment lists and bitmaps.
the number of garbage bytes in the idx file. This is space in the file that was thrown away and cannot be reclaimed. Garbage bytes can result from deleting values, updating values, or appending values.
the percentage of garbage bytes in the idx file.

The average number of chunks for a segment list is a good indicator of the fragmentation level of the value segment lists. As this value increases, it can affect the query performance for the index. If the average number of chunks exceeds 10, you should consider reorganizing the index to consolidate the segment lists.

The number of garbage bytes and the percentage of garbage bytes indicate the amount of unused disk space being consumed by the index. To conserve and consolidate disk space, consider reorganizing the index. Reorganizing the index frees up disk space when the garbage content is high.

Retrieve Index Distribution Statistics

Use the -dist option of ixutil -stats to get the index distribution for the index:

> ixutil -stats x -dsn test -libpath /spds -dist
SAS Scalable Performance Data Server 
   4.5(TS M0) Build(Feb 26 2009, 11:50:08)
Index File Utility
Copyright (c) 1996-2008 by SAS Institute Inc, Cary NC 27513 USA

Statistics for Index X:
---------------------------------------------------
+--segment_size         = 8192
+--n_segments_in_tbl    = 5
+--n_values_in_index    = 2
+--n_vdeleted_values    = 1
+--percent_vdeleted     = 33.33
+--n_seglist_values     = 2
+--n_seglist_chunks     = 3
+--avg_chunks_per_list  = 1.00
+--idx_file_bytes       = 13304
+--idx_garbage_bytes    = 4272
+--percent_idx_garbage  = 32.11

+--Distribution Stats for Non Unique Values
+----minimum segments for all values = 4
+----maximum segments for all values = 5
+----average segments of any value = 4
+----average percentage of segments of any value = 90.00

Ixutil completed successfully

The distribution statistics include the following:

the number of unique values in the index
the number of non-unique values in the index
the minimum number of segments that any value is in
the maximum number of segments that any value is in
the average number of segments that all values are in
the average percentage of segments that all values are in

The distribution statistics can be used to determine the effectiveness of the index. The index will perform better if the distribution of the index values is clustered in a minimum number of segments. In general, the lower the average percentage of segments that all values are in, the better the index will perform.

Reorganize the Index

Use the -reorg option to reorganize the index to consolidate segment lists and retrieve unused disk space:

> ixutil -reorg x -dsn test -libpath /spds
SAS Scalable Performance Data Server 
   4.5(TS M0) Build(Feb 26 2009, 11:50:08)
Index File Utility
Copyright (c) 1996-2009 by SAS Institute Inc, Cary NC 27513 USA

Reorg for Index x:
Reorg successfully completed
Ixutil completed successfully

Running the index utility program again to get the statistics shows that the segment lists for all of the values have been aggregated (the avg_chunks_per_list is 1.0) and the unused disk space has been freed (the idx_garbage_bytes is 0), resulting in a proportional decrease in the size of the index file.

Aggregating the segment lists and compacting the index file minimizes the reads on the index for a query. It will also increase the locality of segment data for an index key. The combination of these will give the best query performance for the index.

Review Disk Usage Statistics

Use the -stats option once more to review the index and segment list data, in order to view the improved performance statistics.

> ixutil -stats x -dsn test -libpath /spds
SAS Scalable Performance Data Server 
   4.5(TS M0) Build(Feb 26 2009, 11:50:08)
Index File Utility
Copyright (c) 1996-2009 by SAS Institute Inc, Cary NC 27513 USA

Statistics for Index X:
-----------------------------------------
+--segment_size        = 8192
+--n_segments_in_tbl   = 5
+--n_values_in_index   = 2
+--n_vdeleted_values   = 0
+--percent_vdeleted    = 0.00
+--n_seglist_values    = 2
+--n_seglist_chunks    = 2
+--avg_chunks_per_list = 1.00
+--idx_file_bytes      = 9008
+--idx_garbage_bytes   = 0
+--percent_idx_garbage = 0.00

Create a Join Index

Assume there are SPD Server tables in a domain in directory /tmp. A user has created two tables, Table1 and Table2 that can be joined on column ID. An SPD Server index exists on the column ID for both tables. A join index is created on the tables to allow a Parallel Range Join on column ID.

Use the -crejidx option of the SPD Server ixutil command to create the join index.

> ixutil -crejidx Table1,ID Table2,ID
  -libpath /tmp
  -joinparts 4

SAS Scalable Performance Data Server 
   4.5(TS M0) Build(Feb 26 2009, 11:50:08)
Index File Utility
Copyright (c) 1996-2009 by SAS Institute Inc, Cary NC 27513  USA

Ixutil completed successfully.

Generate Join Index Statistics

Now, get statistics on the join index that you created, using the -statjidx option of the ixutil command. The statistics are printed for each join range of the index, as well as for the overall index. The range statistics identify each range (sobs=starting observation, eobs=ending observation), the number of unique join keys that exist in the range, and the number of keys that will be joined in the range for each table.

> ixutil -statjidx Table1,ID Table2,ID
  -libpath /tmp

SAS Scalable Performance Data Server 
   4.5(TS M0) Build(Feb 26 2009, 11:50:08)
Index File Utility
Copyright (c) 1996-2009 by SAS Institute Inc, Cary NC 27513  USA


Stat of Join Index Table1.jdxid.table2.jdxid.0.0.0.spds9: Nranges=4
-----------------------------------------------------------------------
+-Range 0
+----<Table1,ID>: sobs=1 eobs=25000 (Sorted)
+-------unique_keys=25000, max_occurance=1
+-------obs=25000, joinobs=25000, rangepct=100.00
+----<Table2,ID>: sobs=1 eobs=10000 (Sorted)
+-------unique_keys=10000, max_occurance=1
+-------obs=10000, joinobs=10000, rangepct=100.00
+-Range 1
+----<Table1,ID>: sobs=25001 eobs=50000 (Sorted)
+-------unique_keys=25000, max_occurance=1
+-------obs=25000, joinobs=25000, rangepct=100.00
+----<Table2,ID>: sobs=-1 eobs=0
+-------unique_keys=0, max_occurance=0
+-------obs=2, joinobs=0, rangepct=  0.00
+-Range 2
+----<Table1,ID>: sobs=50001 eobs=75000 (Sorted)
+-------unique_keys=25000, max_occurance=1
+-------obs=25000, joinobs=25000, rangepct=100.00
+----<Table2,ID>: sobs=-1 eobs=0
+-------unique_keys=0, max_occurance=0
+-------obs=2, joinobs=0, rangepct=  0.00
+-Range 3
+----<Table1,ID>: sobs=75001 eobs=100000 (Sorted)
+-------unique_keys=25000, max_occurance=1
+-------obs=25000, joinobs=25000, rangepct=100.00
+----<Table2,ID>: sobs=-1 eobs=0
+-------unique_keys=0, max_occurance=0
+-------obs=2, joinobs=0, rangepct=  0.00

Table Table1, Column ID average range join row pct=100.00
Table Table2, Column ID average range join row pct= 25.00

Ixutil completed successfully

Delete the Join Index

Use the -deljidx option of the ixutil command to delete the join index.

> ixutil -deljidx Table1,ID Table2,ID
-libpath /tmp

SAS Scalable Performance Data Server 
   4.5(TS M0) Build(Feb 26 2009, 11:50:08)
Index File Utility
Copyright (c) 1996-2009 by SAS Institute Inc, Cary NC 27513  USA

Ixutil completed successfully

Parallel join work units are based on the ranges of the join keys. For example, range 0 will join ranges 1 through 100, range 1 can join range 101 to 200, and so on Ranges can overlap observations if the tables are not sorted by the join key. The more sorted the table is by the join key, the fewer rows a range work unit will need to read, in order to gather all of the rows in its range. The overall performance of the parallel join index depends on how well the table is sorted by the join key. The stronger the join key sort, the better the performance. If a range work unit has a range percentage of 0 for either table, then there are no rows in the table for that range, and that range will be discarded by a parallel work thread.