Server Indexing with WHERE Clause :: SAS® Scalable Performance Data Server 5.3: User's Guide

Overview of Server Indexing with WHERE

The six evaluation strategies that the WHERE clause planner uses are EVAL 1, EVAL 2, EVAL 3, EVAL 4, EVAL 5, and EVAL 6. The different EVAL strategies calculate the number of rows that will be required to execute a given query.

True rows contain the variable values specified in a WHERE clause. False rows do not contain the variable value specified in the clause. EVAL 1, EVAL 3, EVAL 4, and EVAL 5 evaluate true rows in the table using indexes. EVAL 2 and EVAL 6 evaluate true rows of a table without using indexes. EVAL strategies are explored in more detail in WHERE Clause EVAL Strategies.

Server tables can have one or more indexes. A table can use a combination of four different indexing strategies. The choice depends on the data populating the table, the size of the table, and the types of queries that will be executed against the table. Server indexing evaluates the processor cost of a WHERE clause. The section WHERE-Costing Using Cardinality Ratio and Distribution Values shows how factors of cardinality and distribution are used to choose the evaluation strategy that will perform the WHERE clause at the smallest processor cost.

SPD Indexes

The server uses segmented indexes. A segmented index is created by dividing the index of a table into equally sized ranges of rows. Each range of rows is called a segment, or slot. You use the SEGSIZE= setting to define the size of the segment. A series of sub-indexes each points to blocks of rows in the table. By default, the server creates an index segment for every 8192 rows in a table.

The SPD segmented index facilitates the server's parallel evaluation of WHERE clauses with an indexed predicate. First, the SPD index supports a pre-evaluation phase to determine which segments contain values that satisfy the predicate. Pre-evaluation speeds queries by eliminating segments that do not contain any possible values. Then, a number of threads up to the value of the SPDSTCNT macro variable are launched to query the remaining index segments. The threads query the segments of the SPD index in parallel to retrieve the segment rows that satisfy the predicate. When all segments have been queried, the per-segment results are accumulated to determine the rows that satisfy the predicate. If the query contains multiple indexed predicates, then those predicates are also evaluated in parallel. When all predicates have been completed, their results are accumulated to determine the rows that satisfy the query.

Indexing with WHERE

A parallel WHERE clause on a table that is indexed is done in two phases. The first phase, pre-evaluation, uses the server indexes to build a list of segments that satisfy the query. The list drops segments from the WHERE clause scan queue when those segments contain no data in the clause range. As more and more segments are excluded from the scan queue, the benefit of the pre-evaluation phase increases proportionally. The second phase in the evaluation launches threads that read an index in parallel. Each thread queries a particular segment of the index, using information from the pre-evaluation phase. The thread uses the server index to read the segment bitmap. The per-segment bitmaps identify the segment rows that satisfy the query for that particular column. If you include more than one indexed column in the WHERE clause, the server retrieves the per-segment bitmaps for each column in parallel (as are the segments for each column). After the server retrieves all the bitmaps for each column of the segment, it determines which rows satisfy the query and then returns those segment rows to the client. The multi-threaded per-segment queries begin execution at the same time, but their finishing order varies and cannot be reasonably predicted. As a result, the overall order of the results cannot be guaranteed when you are using this type of query. For more information about using indexed columns with WHERE clause evaluations, see Understanding the WHERE Clause Planner .

When a table is modified as the result of an append or update, all server indexes on the table are updated. When the index is updated, the per-value segment lists can potentially fragment or some disk space might be wasted. A highly fragmented server index can negatively impact the performance of queries that use the index. In this case, you should reorganize the index to eliminate the fragmentation and reclaim wasted disk space, using the ixutil utility program. For more information, see Index Utility in SAS Scalable Performance Data Server: Administrator’s Guide.

MINMAX Variable List

SPD Server has a table option called MINMAXVARLIST=. The primary purpose of MINMAXVARLIST= is for use with server dynamic cluster tables, where specific members in the dynamic cluster contain a set or range of values, such as sales data for a given month. When a server SQL subsetting WHERE clause specifies specific months from a range of sales data, the WHERE planner checks the MIN and MAX variable list. Based on the MIN and MAX list information, the server WHERE planner includes or eliminates member tables in the dynamic cluster for evaluation.

Use the MINMAXVARLIST= table option with either numeric or character-based columns. MINMAXVARLIST= uses the list of columns that you submit to build a variable list. The MINMAXVARLIST= list contains only the minimum and maximum values for each column. The WHERE clause planner uses the index to filter SQL predicates quickly, and to include or eliminate member tables belonging to the cluster table from the evaluation.

Although the MINMAXVARLIST= table option is primarily intended for use with dynamic clusters, it also works on standard server tables. MINMAXVARLIST= can help reduce the need to create many indexes on a table, which can save valuable resources and space.

The MINMAXVARLIST= table option is available only when a table is being created or defined. If a table has a MINMAXVARLIST= variable list, moving or copying the table will destroy the variable list unless MINMAXVARLIST= is specified in the table output.

For more information, see MINMAXVARLIST= Table Option.