Ixutil Examples :: SAS(R) Scalable Performance Data Server 5.1: Administrator's Guide, Second Edition

Create an Index

Assume that an SPD Server domain named my domain is assigned to the directory /spds on a machine named spot. A user has created a table with the following SAS program:

LIBNAME my_data sasspds 'mydomain'
   server=spot.spdsname
   user='anonymous';

data my_data.test(index=(x));
   do i = 1 to 30000;
     x = mod(i,3);
     output;
   end;
run;

data my_data.test1;
   do i = 1 to 10000;
    x = mod(i,2);
    output;
   end;
run;

PROC APPEND
   base=my_data.test
   data=my_data.test1;
run;

PROC SQL;
   delete from my_data.test
     where x=1;
quit;

This SAS program creates an index for column X of the table named test, on the machine named spot, in the directory named /spds.

Retrieve Disk Usage and Fragmentation Statistics

Use the -stats option of ixutil to get the disk usage and segment list fragmentation statistics for the index:

> ixutil -stats test -dsn test -libpath /spds

SAS Scalable Performance Data Server 5.1(TS M0) 
Build(Apr 26 2013, 11:50:08)
Index File Utility
Copyright (c) 1996-2013 by SAS Institute Inc, Cary NC 27513 USA

Statistics for Index X:
---------------------------------------------------
+--segment_size         = 8192
+--n_segments_in_tbl    = 5
+--n_values_in_index    = 2
+--n_vdeleted_values    = 1
+--percent_vdeleted     = 33.33
+--n_seglist_values     = 2
+--n_seglist_chunks     = 3
+--avg_chunks_per_list  = 1.00
+--idx_file_bytes       = 13304
+--idx_garbage_bytes    = 4272
+--percent_idx_garbage  = 32.11


Ixutil completed successfully

The statistics include the following information:

the segment size of the index.
the number of segments in the table.
the number of distinct values for the index.
the number of virtually deleted values (values that are no longer recognized by query indexes).
the percentage of virtually deleted values.
the number of values that require segment lists (a value that is in only one segment does not require a segment list).
the number of segment list chunks for all values of the index.
the average number of chunks for any value in the index.
the size of the .idx file for the index. The .idx file maintains the value segment lists and bitmaps.
the number of garbage bytes in the .idx file. Garbage bytes are the space in the file that was thrown away and cannot be reclaimed. Garbage bytes can result from deleting values, updating values, or appending values.
the percentage of garbage bytes in the .idx file.

The average number of chunks for a segment list is a good indicator of the fragmentation level of the index. As this value increases, query performance can degrade when SPD Server must retrieve per-value information by making multiple reads of the index. If the average number of chunks exceeds 10, you should consider reorganizing the index to consolidate the segment lists.

The number of garbage bytes and the percentage of garbage bytes indicate the amount of unused disk space that is being consumed by the index. To conserve and consolidate disk space, consider reorganizing the index. Reorganizing the index frees up disk space when the garbage content is high.

Retrieve Index Distribution Statistics

Use the -dist option of ixutil -stats to get the index distribution for the index:

> ixutil -stats x -dsn test -libpath /spds -dist
SAS Scalable Performance Data Server 
   5.1 (TS M0) Build(Apr 26 2013, 11:50:08)
Index File Utility
Copyright (c) 1996-2013 by SAS Institute Inc, Cary NC 27513 USA

Statistics for Index X:
---------------------------------------------------
+--segment_size         = 8192
+--n_segments_in_tbl    = 5
+--n_values_in_index    = 2
+--n_vdeleted_values    = 1
+--percent_vdeleted     = 33.33
+--n_seglist_values     = 2
+--n_seglist_chunks     = 3
+--avg_chunks_per_list  = 1.00
+--idx_file_bytes       = 13304
+--idx_garbage_bytes    = 4272
+--percent_idx_garbage  = 32.11

+--Distribution Stats for Non Unique Values
+----minimum segments for all values = 4
+----maximum segments for all values = 5
+----average segments of any value = 4
+----average percentage of segments of any value = 90.00

Ixutil completed successfully

The distribution statistics include the following information:

the number of unique values in the index
the number of nonunique values in the index
the minimum number of segments that any value is divided into
the maximum number of segments that any value is divided into
the average number of segments that all values are divided into
the average percentage of segments that all values are divided into

You can use the distribution statistics to determine the effectiveness of the index. The index performs better if the distribution of the index values is clustered in a minimum number of segments. In general, the lower the average percentage of segments that all values are divided into, the better the index performs.

Reorganize the Index

Use the -reorg option to reorganize the index to consolidate segment lists and retrieve unused disk space:

> ixutil -reorg x -dsn test -libpath /spds
SAS Scalable Performance Data Server 
   5.1 (TS M0) Build(Apr 26 2013, 11:50:08)
Index File Utility
Copyright (c) 1996-2013 by SAS Institute Inc, Cary NC 27513 USA

Reorg for Index x:
Reorg successfully completed
Ixutil completed successfully

If you run the index utility program again to get the statistics, you will find that the segment lists for all of the values have been aggregated (the avg_chunks_per_list is 1.0) and that the unused disk space has been freed (the idx_garbage_bytes is 0). This outcome results in a proportional decrease in the size of the index file.

Aggregating the segment lists and compacting the index file minimizes the reads on the index for a query. The locality of segment data for an index key also increases. The combination of these process results in the best query performance for the index.

Review Disk Usage Statistics

Use the -stats option to review the index and segment list data, in order to view the improved performance statistics.

> ixutil -stats x -dsn test -libpath /spds
SAS Scalable Performance Data Server 
   5.1 (TS M0) Build(Apr 26 2013, 11:50:08)
Index File Utility
Copyright (c) 1996-2013 by SAS Institute Inc, Cary NC 27513 USA

Statistics for Index X:
-----------------------------------------
+--segment_size        = 8192
+--n_segments_in_tbl   = 5
+--n_values_in_index   = 2
+--n_vdeleted_values   = 0
+--percent_vdeleted    = 0.00
+--n_seglist_values    = 2
+--n_seglist_chunks    = 2
+--avg_chunks_per_list = 1.00
+--idx_file_bytes      = 9008
+--idx_garbage_bytes   = 0
+--percent_idx_garbage = 0.00

Create a Join Index

Assume that SPD Server tables are in a domain in the directory /tmp. A user has created two tables, Table1 and Table2 that can be joined on the column ID. An SPD Server index exists on the column ID for both tables. A join index is created on the tables to allow a parallel range join on column ID.

Use the -crejidx option of the SPD Server ixutil command to create the join index.

> ixutil -crejidx Table1,ID Table2,ID
  -libpath /tmp
  -joinparts 4

SAS Scalable Performance Data Server 
   5.1 (TS M0) Build(Apr 26 2013, 11:50:08)
Index File Utility
Copyright (c) 1996-2013 by SAS Institute Inc, Cary NC 27513  USA

Ixutil completed successfully.

Generate Join Index Statistics

Obtain statistics on the join index that you created by using the -statjidx option of the ixutil command. The statistics are printed for each join range of the index and for the overall index. The range statistics identify each range (sobs=starting observation, eobs=ending observation), the number of unique join keys that exist in the range, and the number of keys that are joined in the range for each table.

> ixutil -statjidx Table1,ID Table2,ID
  -libpath /tmp

SAS Scalable Performance Data Server 
   5.1 (TS M0) Build(Apr 26 2013, 11:50:08)
Index File Utility
Copyright (c) 1996-2013 by SAS Institute Inc, Cary NC 27513  USA


Stat of Join Index Table1.jdxid.table2.jdxid.0.0.0.spds9: Nranges=4
-----------------------------------------------------------------------
+-Range 0
+----<Table1,ID>: sobs=1 eobs=25000 (Sorted)
+-------unique_keys=25000, max_occurance=1
+-------obs=25000, joinobs=25000, rangepct=100.00
+----<Table2,ID>: sobs=1 eobs=10000 (Sorted)
+-------unique_keys=10000, max_occurance=1
+-------obs=10000, joinobs=10000, rangepct=100.00
+-Range 1
+----<Table1,ID>: sobs=25001 eobs=50000 (Sorted)
+-------unique_keys=25000, max_occurance=1
+-------obs=25000, joinobs=25000, rangepct=100.00
+----<Table2,ID>: sobs=-1 eobs=0
+-------unique_keys=0, max_occurance=0
+-------obs=2, joinobs=0, rangepct=  0.00
+-Range 2
+----<Table1,ID>: sobs=50001 eobs=75000 (Sorted)
+-------unique_keys=25000, max_occurance=1
+-------obs=25000, joinobs=25000, rangepct=100.00
+----<Table2,ID>: sobs=-1 eobs=0
+-------unique_keys=0, max_occurance=0
+-------obs=2, joinobs=0, rangepct=  0.00
+-Range 3
+----<Table1,ID>: sobs=75001 eobs=100000 (Sorted)
+-------unique_keys=25000, max_occurance=1
+-------obs=25000, joinobs=25000, rangepct=100.00
+----<Table2,ID>: sobs=-1 eobs=0
+-------unique_keys=0, max_occurance=0
+-------obs=2, joinobs=0, rangepct=  0.00

Table Table1, Column ID average range join row pct=100.00
Table Table2, Column ID average range join row pct= 25.00

Ixutil completed successfully

Delete the Join Index

Use the -deljidx option of the ixutil command to delete the join index.

> ixutil -deljidx Table1,ID Table2,ID
-libpath /tmp

SAS Scalable Performance Data Server 
   5.1 (TS M0) Build(Apr 26 2013, 11:50:08)
Index File Utility
Copyright (c) 1996-2013 by SAS Institute Inc, Cary NC 27513  USA

Ixutil completed successfully

Parallel join work units are based on the ranges of the join keys. For example, range 0 joins ranges 1 through 100, range 1 can join range 101 to 200, and so on. Ranges can overlap observations if the tables are not sorted by the join key. Join keys result in table sorting. The nature of the join key determines how much sorting is performed on the table. The more extensive the table sorting performed on behalf of the join key, the fewer rows a range work unit needs to read in order to gather all of the rows in its range. The overall performance of the parallel join index depends on how well the table is sorted by the join key. The stronger the join key sort, the better the performance. If a range work unit has a range percentage of 0 for either table, then there are no rows in the table for that range, and that range is discarded by a parallel work thread.