The example star schema
in
Example Star Schema has four dimension tables (Supplier, Products, Location,
and Time) and one fact table (Sales). The schema has simple indexes
on the SUPPLIER_ID, PRODUCT_CODE, STORE_NUMBER, and SALES_DATE columns
in the Sales fact table.
Consider the following
SQL query to create a January sales report for an organization's
stores that are in North Carolina:
PROC SQL;
select
sum(s.sales_amt) as sales_amt
sum(s.units_sold) as units_sold
s.product)code,
t.sales_month
from
spdslib.sales s,
spdslib.supplier sup,
spdslib.products p,
spdslib.location l,
spdslib.time t
where
s.store_number = l.store_number
and s.sales_date = t.sales_date
and s.product_code = p.product_code
and s.supplier_id = sup.supplier_id
and l.state = 'NC'
and t.sales_date
between '01JAN2005'd and '31JAN2005'd;
quit;
During optimization,
the STARJOIN Planner examines the WHERE clause subsetting in the query
to determine which dimension tables are processed first.
The WHERE clause subsetting
of the STATE column of the Location dimension table (
where
... l.state = 'NC'
) and the subsetting of the
SALES_DATE column of the Time dimension table (
where ...
t.sales_date between '01JAN2005'd and '31JAN2005'd
)
cause SPD Server to process the Location and Time tables first. The
remaining dimension tables, Supplier and Products, are processed second.
The SPD Server STARJOIN
facility uses the first dimension tables to reduce the rows in the
fact table to candidate rows that contain the matching criteria. The
facility uses the values in each dimension table key to create a list
of values that meet the subsetting criteria of the fact table.
For example, the previous
SQL query is intended to create a January sales report for stores
located in North Carolina. The WHERE clause in the SQL code joins
the Location and Sales tables on the STORE_NUMBER column. Suppose
that there are 10 unique North Carolina stores, with consecutively
ordered STORE_NUMBER values that range from 101 to 110. When the WHERE
clause is evaluated, the results will include a list of the 10 North
Carolina store IDs that existed in January 2005.
Because the fact table
and dimension tables for the STORE_NUMBER column have simple indexes,
the STARJOIN facility chooses the IN-SET strategy. The facility subsets
the STATE column values to
'NC'
in
order to build the set of store numbers that are associated with North
Carolina locations. The STARJOIN facility can use the set of North
Carolina store numbers to generate an SQL
where ...
in
expression. SQL uses the
where ... in
expression
to efficiently subset the candidate rows in the fact table before
the final SQL expression is evaluated.