Indexing a Table

SPD Server provides a single SPD Server index type that efficiently indexes tables of varying size and data distributions. The SPD Server SPD index optimally supports queries that require global table views (such as queries that contain BY clause processing and SQL joins), or queries that require segmented views (such as parallel processing of WHERE clause statements).

The SPD Server Index

The SPD Server index maintains two views of the index values, a global view and a segmented view. The global view is maintained using a unique global B-tree that has a single entry for each discrete value. The segmented view is maintained by the data for each value in the global B-tree, which includes a list of segments that contain the value, and for each segment a bitmap that identifies which rows in the segment contain the value. The global view is maintained in the SPD Server index.hbx file, and the segmented data is maintained in the SPD Server index.idx file.
For queries that require a global view, SPD Server searches the hybrid global B-tree for a particular value. The segment lists are scanned for the value, then the bitmaps from each segment containing the value are read. SPD Server uses the bitmap to locate and retrieve the observations for that segment. This type of query returns results sorted first by value and then by observation number. This sorting is optimal for BY Clause processing and SQL joins.
A parallel WHERE clause on a table that is indexed is done in two phases. The first phase, pre-evaluation, uses the SPD Server indexes to build a list of segments that satisfy the query. The list drops segments from the WHERE clause scan queue when those segments contain no data in the clause range. As more and more segments are excluded from the scan queue, the benefit of the pre-evaluation phase increases proportionally. The second phase in the evaluation launches threads that read an index in parallel. Each thread queries a particular segment of the index, using information from the pre-evaluation phase. Using the SPD Server index, the thread reads the segment bitmap. The per-segment bitmaps identify the segment rows which satisfy the query for that particular column. If you include more than one indexed column in the WHERE clause, SPD Server retrieves the per-segment bitmaps for each column in parallel (as are the segments for each column). After retrieving all the bitmaps for each column of the segment, SPD Server determines which rows satisfy the query, and returns those segment rows to the client. The multi-threaded per-segment queries begin execution at the same time, and their finishing order varies and cannot be reasonably predicted. As a result, the overall order of the results cannot be guaranteed when you are using this type of query. See the documentation chapter on WHERE Clause Planner for a more detailed description on using indexed columns with WHERE clause evaluations.
When a table is modified due an append or update, all SPD Server indexes on the table are updated. Updating the index can potentially fragment the per-value segment lists or cause some disk space to be wasted. A highly fragmented SPD Server index can negatively impact the performance of queries that use the index. In this case, you should reorganize the index to eliminate the fragmentation and reclaim wasted disk space, using the ixutil utility program. For further information about SPD Server index utilities, see Chapter 15, "Managing SAS Scalable Performance Data (SPD) Server Passwords, Users, and Table ACLs," of the SAS Scalable Performance Data (SPD) Server 4.5: Administrator's Guide.