Indexing SPD Server Tables

SPD Server efficiently indexes tables of varying size and data distributions. The SPD Server SPD index supports queries that require global table views (such as queries that contain BY clause processing or SQL joins), or queries that require segmented views (such as parallel processing of WHERE clause statements).

The SPD Server Index

The SPD Server index maintains two views of the index values: a global view and a segmented view. SPD Server maintains the global view by using a unique global B-tree that has a single entry for each discrete value. The segmented view is maintained by the data for each value in the global B-tree, which includes a list of segments that contain the value, and includes for each segment a bitmap that identifies which rows in the segment contain the value. The global view is maintained in the SPD Server index.hbx file, and the segmented data is maintained in the SPD Server index.idx file.
For queries that require a global view, SPD Server searches the hybrid global B-tree for a particular value. SPD Server scans the segment lists for the value, and then reads the bitmaps from each segment that contains the value. SPD Server uses the bitmap to locate and retrieve the observations for that segment. This type of query returns results sorted first by value and then by observation number. This sorting is optimal for BY clause processing and SQL joins.
A parallel WHERE clause on a table that is indexed is done in two phases. The first phase, pre-evaluation, uses the SPD Server indexes to build a list of segments that satisfy the query. The list drops segments from the WHERE clause scan queue when those segments contain no data in the clause range. As more and more segments are excluded from the scan queue, the benefit of the pre-evaluation phase increases proportionally. The second phase in the evaluation launches threads that read an index in parallel. Each thread queries a particular segment of the index, using information from the pre-evaluation phase. The thread uses the SPD Server index to read the segment bitmap. The per-segment bitmaps identify the segment rows that satisfy the query for that particular column. If you include more than one indexed column in the WHERE clause, SPD Server retrieves the per-segment bitmaps for each column in parallel (as are the segments for each column). After SPD Server retrieves all the bitmaps for each column of the segment, it determines which rows satisfy the query and returns those segment rows to the client. The multi-threaded per-segment queries begin execution at the same time, but their finishing order varies and cannot be reasonably predicted. As a result, the overall order of the results cannot be guaranteed when you are using this type of query. For more information about using indexed columns with WHERE clause evaluations, see WHERE Clause Planner.
When a table is modified as the result of an append or update, all SPD Server indexes on the table are updated. When the index is updated, the per-value segment lists can potentially fragment or some disk space might be wasted. A highly fragmented SPD Server index can negatively impact the performance of queries that use the index. In this case, you should reorganize the index to eliminate the fragmentation and reclaim wasted disk space, using the ixutil utility program. For more information about SPD Server index utilities, see Managing SAS Scalable Performance Data (SPD) Server Passwords and Users in SAS Scalable Performance Data Server: Administrator's Guide.