Optimizing WHERE Clauses

Overview of Optimizing WHERE Clauses

SPD Server includes advanced methods to optimize WHERE clauses. Before SAS SPD Server 4.0, the rule-based, heuristic WHERE clause planner WHINIT was used to manually tune queries for performance. Now the server provides dynamic WHERE clause costing, which is an automatic feature that can replace the need to manually tune queries. The server’s dynamic WHERE-costing uses factors of cardinality and distribution to calculate relative processor costs of various WHERE clause options. The server administrators can set server parameter commands in the spdsserv.parm parameter file, or users can set macro variables to turn dynamic WHERE-costing on and off. If dynamic WHERE-costing is turned off, the server reverts to using the rules-based WHERE clause planner.

WHERE Clause Definitions and Terminology

WHERE clauses

selection criteria for a query that specify one or more Boolean predicates. Implementing the criteria, the server selects only rows that satisfy the WHERE clause.

Predicates

the building blocks of WHERE clauses. Use them stand-alone or combine them with the operators AND and OR to form complex WHERE clauses. Here is an example of a WHERE clause:

"where x > 1 and y in (1 2 3)"
In this example, there are two predicates, x > 1 and y in (1 2 3). You specify the negative of a predicate by using NOT. For example, use where x > 1 and not (y in (1 2 3)).

Boolean logic

determines whether two predicates joined with an AND or OR are true (satisfy the specification) or false (do not satisfy the specification). The AND operator requires that all predicates be true for the entire expression to be true. For example, the expression p1 AND p2 AND p3 is true only if all three predicates (p1, p2, and p3) are true. In contrast, the OR operator requires only one predicate to be true for the entire expression to be true.

For the WHERE clause (x < 5 or y in (1 2 3)) and z = 10, the following truth table describes the overall result (truth):
"x < 5 ?"    "y in (1 2 3) ?"    "z = 10 ?"    Result
=========    ================    ==========    ======
  False          False            False        False
  False          False            True         False
  False          True             False        False
  False          True             True         True
  True           False            False        False
  True           False            True         True
  True           True             False        False
  True           True             True         True

Indexes

structures associated with tables that permit the server to quickly access rows that satisfy an indexed predicate. In an example WHERE clause, where x = 10 and y > 11, the server selects the best index on column X to directly retrieve rows that have a value of 10 in the X column. If no index exists for X, the server must sequentially read each row in the table searching for X equal to 10.

Simple index

index on a single column.

Composite indexes

composite indexes index two or more columns. The list of columns in an index is sometimes called the index key.

Parallelism

the server capability that enables multiple threads to execute in parallel. Using multiple processors in parallel mode is sometimes called “divide and conquer” processing. The server uses parallelism to evaluate the multiple indexes that are involved in more complicated WHERE clauses.

Last updated: February 8, 2017