SPD Server Indexing

Overview of Server Indexing

SPD Server tables can have one or more indexes. There is a combination of four different indexing strategies a table can use, and the choice depends on the data populating the table, the size of the table, and the types of queries that will be executed against the table.

SPD Server indexing evaluates the processor cost of a WHERE clause. The section WHERE-Costing Using Cardinality Ratio and Distribution Values shows how factors of cardinality and distribution are used to choose the evaluation strategy that will perform the WHERE clause at the smallest processor cost. The five evaluation strategies that the WHERE clause planner uses are EVAL 1, EVAL 2, EVAL 3, EVAL 4, and EVAL 5. The different EVAL strategies calculate the number of rows that will be required to execute a given query.

True rows are rows that contain the variable values specified in a WHERE clause. False rows do not contain the variable value specified in the clause. EVAL 1, EVAL 3, EVAL 4, and EVAL 5 evaluate true rows in the table using indices. EVAL 2 evaluates true rows of a table without using indices. EVAL strategies are explored in more detail in WHERE Clause EVAL Strategies.

SPD Indexes

SPD Server uses segmented indices. A segmented index is created by dividing the index of a table into equally sized ranges of rows. Each range of rows is called a segment, or slot. You use the SEGSIZE= setting to define the size of the segment. A series of sub-indices each point to blocks of rows in the table. By default, SPD Server creates an index segment for every 8192 rows in a table.

The SPD segmented index facilitates SPD Server's parallel evaluation of WHERE clauses with an indexed predicate. First, the SPD index supports a pre-evaluation phase to determine which segments contain values that satisfy the predicate. Pre-evaluation speeds queries by eliminating segments that do not contain any possible values. Then, a number of threads up to the value of the SPDSTCNT= variable are launched to query the remaining index segments. The threads query the segments of the SPD index in parallel to retrieve the segment rows that satisfy the predicate. When all segments have been queried, the per-segment results are accumulated to determine the rows that satisfy the predicate. If the query contains multiple indexed predicates, then those predicates are also evaluated in parallel. When all predicates have been completed, their results are accumulated to determine the rows that satisfy the query.

MINMAX Variable List

SPD Server has a table option called MINMAXVARLIST=. The primary purpose of MINMAXVARLIST= is for use with SPD Server dynamic cluster tables, where specific members in the dynamic cluster contain a set or range of values, such as sales data for a given month. When an SPD Server SQL subsetting WHERE clause specifies specific months from a range of sales data, the WHERE planner checks the MIN and MAX variable list. Based on the MIN and MAX list information, the SPD Server WHERE planner includes or eliminates member tables in the dynamic cluster for evaluation.

Use the MINMAXVARLIST= table option with either numeric or character-based columns. MINMAXVARLIST= uses the list of columns that you submit to build a variable list. The MINMAXVARLIST= list contains only the minimum and maximum values for each column. The WHERE clause planner uses the index to filter SQL predicates quickly, and to include or eliminate member tables belonging to the cluster table from the evaluation.

Although the MINMAXVARLIST= table option is primarily intended for use with dynamic clusters, it also works on standard SPD Server tables. MINMAXVARLIST= can help reduce the need to create many indexes on a table, which can save valuable resources and space.

The MINMAXVARLIST= table option is available only when a table is being created or defined. If a table has a MINMAXVARLIST= variable list, moving or copying the table will destroy the variable list unless MINMAXVARLIST= is specified in the table output.

%let domain=path3 ;
%let host=kaboom ;
%let port=5201 ;

LIBNAME &domain sasspds "&domain"
   server=&host..&port
   user='anonymous' ;

/* Create three tables called */
/* xy1, xy2, and xy3.         */

data &domain..xy1(minmaxvarlist=(x y));
  do x = 1 to 10;
  do y = 1 to 3;
  output;
  end;
end;
run;

data &domain..xy2(minmaxvarlist=(x y));
  do x = 11 to 20;
  do y = 4 to 6 ;
  output;
  end;
end;
run;

data &domain..xy3(minmaxvarlist=(x y));
  do x = 21 to 30;
  do y = 7 to 9 ;
  output;
  end;
end;
run;


/* Create a dynamic cluster table */
/* called cluster_table out of    */
/* new tables xy1, xy2, and xy3   */

PROC SPDO library=&domain ;
   cluster create cluster_table
      mem=xy1
      mem=xy2
      mem=xy3;
      
quit;


/* Enable WHERE evaluation to see  */
/* how the SQL planner selects     */
/* members from the cluster. Each  */
/* member is evaluated using the   */
/* min-max variable list.          */

%let SPDSWDEB=YES;


/* The first member has true rows  */

PROC PRINT data=&domain..cluster_table ;
   where x eq 3 and y eq 3;
run;


/* Examine the other tables */

PROC PRINT data=&domain..cluster_table ;
   where x eq 19
   and y eq 4 ;
run;

PROC PRINT data=&domain..cluster_table ;
   where x eq 22
   and y eq 9;
run;

PROC PRINT data=&domain..cluster_table ;
   where x between 1 and 10
   and y eq 3;
run;

PROC PRINT data=&domain..cluster_table ;
   where x between 11 and 30
   and y eq 8 ;
run;


/* Delete the dynamic cluster table. */


PROC SPDO library=&domain ;
   cluster undo cluster_table ;
quit;

PROC DATASETS lib=&domain nolist;
   delete xy1 xy2 xy3 ;
quit ;