STARJOIN Facility Reference

Overview: SPD Server STARJOIN Facility

SPD Server provides an enhanced SQL planner that includes the STARJOIN facility. The SPD Server STARJOIN facility validates, optimizes, and executes SQL queries on data that is configured in a star schema. Star schemas consist of two or more normalized dimension tables that surround a centralized fact table. The centralized fact table contains data elements of interest that are derived from the dimension tables.

In data warehouses with large numbers of tables and millions or billions of rows of data, a properly constructed STARJOIN can minimize overhead data redundancy during query evaluation. If the SPD Server STARJOIN facility is not enabled, or if SPD Server SQL does not detect a star schema, then the SQL is processed using pairwise joins.

How does a STARJOIN differ from a pairwise join? In SPD Server, a properly configured STARJOIN requires only three steps to complete, regardless of the number of dimension tables. SPD Server pairwise joins require one step for each table to complete the join. If a star schema consists of 25 dimension tables and one fact table, the STARJOIN is accomplished in three steps; joining the tables in the star schema using pairwise joins requires 26 steps.

When data is configured in a valid SPD Server star schema, and the STARJOIN facility is not disabled, the SPD Server STARJOIN facility can produce quicker and more processor-efficient SQL query performance than SQL pairwise joins.

Star Schemas

Overview of Star Schemas

To exploit the power of the SPD Server STARJOIN facility, the data must be configured as a star schema, and it must meet specific SPD Server SQL star schema requirements.

Star schemas are the simplest data warehouse schema, consisting of a central fact table that is surrounded by multiple normalized dimension tables. Fact tables contain the measures of interest. Dimension tables provide detailed information about the attributes within each dimension. The columns in fact tables are either foreign key columns that define the links between the fact table and individual dimension tables, or they are columns that calculate numeric values that are based on foreign key data.

Figure 1 is an example of a simple star schema. The dimension tables Products, Supplier, Location, and Time surround the fact table Sales.

Example Star Schema

diagram of a star schema with one fact table SALES and four dimension tables PRODUCTS, LOCATION, TIME, SUPPLIER

The dimension tables, fact table, and keys in Figure 1 are used in the examples in this document.

Dimension Tables Information

Products is a table of products, with one row per unique product SKU. The row for each unique SKU contains information such as product name, height, width, depth, weight, pallet cube, and so on. The example Products table contains 1,500 rows.

Supplier is a table of the suppliers that supply the products. The row for each unique supplier contains information such as supplier name, address, state, contact representative, and so on. The example Supplier table contains 25 rows.

Location is a table of the stores selling the products. The row for each unique location contains information such as store number, store name, store address, store manager, store sales volume, and so on. The Location table contains 500 rows.

Time is a sequential sales transaction table. Each row in the Time table represents one day out of a rolling three-year, 365-day-per-year calendar. The row for each day contains information such as date, day of week, month, quarter, year, and so on. The Time table contains 1,095 rows.

Fact Table Information

The fact table Sales is a table that combines information from the four dimension tables, Products, Supplier, Location, and Time. Its foreign keys are imported, one from each dimension table: PRODUCT_CODE from Products, STORE_NUMBER from Location, SUPPLIER_ID from Supplier, and SALES_DATE from Time. The fact table Sales might have other columns with facts or information that are not found in any dimension table. Examples of fact table columns that are not foreign keys from a dimension table are columns such as QTY_SOLD or NET_SALES. The fact table in this example could contain as many as 1,500 x 25 x 500 x 1,095 = 20,531,250,000 rows.

SPD Server STARJOIN Requirements

For SPD Server SQL to take advantage of the STARJOIN planner, the following conditions must be true:

STARJOIN optimization must be enabled in SPD Server.
The SPD Server star schema must use a single central fact table.
All dimension tables in the SPD Server star schema must be connected to the fact table.
SPD Server dimension tables can appear in only one join condition.
SPD Server fact tables are equally joined to dimension tables.
SPD Server SQL infers fact tables by topology (common equally joined predicates).
Dimension tables that have no subsetting require a simple index on the dimension table's join column.

When SPD Server SQL is submitted that does not meet these STARJOIN conditions, SPD Server reverts to performing the requested SQL task using SPD Server's pairwise join strategy. The SPD Server STARJOIN Examples section of this document provides three examples that show valid, invalid, and restricted candidates for the SPD Server STARJOIN facility.

Enabling STARJOIN Optimization in SPD Server

SPD Server STARJOIN optimization is enabled by default. The SPD Server STARJOIN RESET Statement Options section provides detailed information about statement options that enable or disable the STARJOIN facility in SPD Server.

Invoking the SPD Server STARJOIN Facility

SPD Server knows when to use the STARJOIN facility because it is topology based. SPD Server invokes STARJOIN based on the SQL that is submitted. When SQL is submitted and STARJOIN optimization is enabled, SPD Server checks the submitted SQL for admissible STARJOIN patterns. SPD Server SQL identifies a fact table by scanning for a common equally joined table among multiple join predicates in a WHERE clause. When SPD Server SQL detects patterns that have multiple equally joined operators sharing a common table, the common table becomes the star schema's fact table.

When an SQL statement that is submitted to SPD Server uses structures that indicate the presence of a star schema, the STARJOIN validation checks begin.

SPD Server STARJOIN Optimization

Overview of STARJOIN Optimization

The SPD Server STARJOIN optimization process searches for the most efficient SQL strategy to use for computations. The STARJOIN optimization process consists of three steps, regardless of the number of dimension tables that are joined to the fact table in the star schema.

Classify dimension tables that are called by SQL as Phase I tables or Phase II tables.
Phase I probes fact table indexes and selects a STARJOIN strategy.
Phase II performs index lookups and joins subsetted fact table rows with Phase II tables.

Classify Dimension Tables That Are Called by SQL as Phase I Tables or Phase II Tables

After the STARJOIN planner validates the join sub-tree, join optimization begins. Join optimization is the process that searches for the most efficient SQL strategy to use when joining the tables in the star schema.

The first step in optimization is to examine the dimension tables that were called by SQL for structures that SPD Server can use to improve performance. Each dimension table is classified as a Phase I table or a Phase II table. The structure of a dimension table and whether the submitted SQL filters or subsets the table's contents determine its classification. SPD Server uses different processes to handle Phase I and Phase II dimension tables.

Phase I tables can improve performance. A Phase I table is a dimension table that is either very small (nine rows or less), or a dimension table whose SQL queries contain one or more filtering criteria that is expressed with a WHERE clause. A Phase II table is any dimension table that does not meet Phase I criteria. Rows in Phase II tables that are referenced in the SQL query are not subsetted.

Consider the star schema that is illustrated in Example Star Schema, with the fact table Sales and the dimension tables Products, Supplier, Location, and Time.

Suppose a submitted SQL query requests transaction reports from the fact table Sales for all stores where the location is the state of North Carolina, for the time period of the month of January, for all products, and for all suppliers. The SQL query subsets the Location and Time tables, so SPD Server classifies the Location and Time tables as Phase I tables. The query requests information from all of the rows in the Product and Supplier tables. Because those tables are not subsetted by a WHERE clause in the submitted SQL, STARJOIN classifies the Products and Supplier tables in this query as Phase II tables.

Now, using the same star schema, add more detail to the SQL query. Set up a new query that requests transaction reports from the fact table Sales for all stores where the location is the state of North Carolina, for the time period of the month of January, and for products where the supplier is from the state of North Carolina. The subsetted dimension tables Location, Time, and Supplier are classified as Phase I tables. The Products table, unfiltered by the submitted SQL query, is classified as a Phase II table.

Dimension tables are classified as Phase I or Phase II tables because the two types of tables require different index probe methods.

Phase I Probes Fact Table Indexes and Selects a STARJOIN Strategy

Phase I uses the SQL join keys from the subsetted Phase I dimension tables to get a smaller set of candidate rows to query in the central fact table. After optimizing the candidate rows in the fact table, the Phase I index probe examines index structures to determine the best STARJOIN strategy to use. There are two SPD Server STARJOIN strategies: the IN-SET strategy and the COMPOSITE strategy. In all but a few cases, the IN-SET strategy is the most robust and efficient processing strategy. The user can determine which strategy SPD Server chooses by providing the required table index types in the submitted SQL.

Phase I creates the smaller set of candidate rows in the central fact table by eliminating fact table rows that do not match the SQL join keys from the subsetted Phase I dimension tables. For example, if the SQL query requests information about transactions that occurred only in North Carolina store locations, the candidate rows that are retained in the fact table uses the SQL that subsets the Location dimension table:

WHERE location.STATE = "NC";

If the Sales fact table contains sales records for all 50 states, Phase I uses the SQL that subsets the Location dimension table to eliminate the sales records of all stores in states other than North Carolina from the fact table candidate rows. The example is simple, but powerful -- reducing the fact table candidate row set to transactions from only North Carolina stores eliminates massive amounts of nonproductive data processing.

The Phase I index probe inventories the number and types of indexes on the fact table and dimension tables as it attempts to identify the best STARJOIN strategy. To use the STARJOIN IN-SET strategy, Phase I must find simple indexes on all SQL join columns in the fact table and dimension tables. Otherwise, to use the STARJOIN COMPOSITE strategy, Phase I searches for the best composite index that is available on the fact table. The best composite index for the fact table is the composite index that spans the largest set of join predicates from the aggregated Phase I dimension tables.

Based on the fact table and dimension table index probe, SPD Server selects the STARJOIN strategy using the following logic:

If one or more simple indexes are found on fact table and dimension table SQL join columns, and no spanning composite indexes are found on the fact table, SPD Server selects the STARJOIN IN-SET strategy.
If an optimal spanning composite index is found on the fact table, and no simple indexes are found on fact table and dimension table SQL join columns, SPD Server selects the STARJOIN COMPOSITE strategy.
If both simple and spanning composite indexes are found, SPD Server generally selects the STARJOIN IN-SET strategy, unless the composite index is an exact match for all of the Phase I join predicates, and only lesser matches are available with the IN-SET strategy.
If no suitable indexes are found for either STARJOIN strategy, SPD Server does not use STARJOIN; it joins the sub-tree using the standard SPD Server pairwise join.

The IN-SET and COMPOSITE join strategies have some underlying differences.

The IN-SET join strategy will cache temporary Phase 1 probes in memory, when possible, for use by Phase 2. The caching can result in significant performance improvements by using in-memory lookups into the dimension tables for Phase 2 probes, rather than performing more costly file system probes of the dimension tables, which can result in significant performance improvements. The amount of memory allocated for Phase 1 IN-SET caching is controlled by the STARSIZE= server parameter. You can use the STARJOIN DETAILS option to see which partial results of the Phase 1 IN-SET strategy are cached, and whether sufficient memory was allocated for STARJOIN to cache all partial results.

The IN-SET join strategy uses an IN-SET transformation of dimension table metadata to produce a powerful compound WHERE clause to be used on the STARJOIN fact table. The "IN" part of the term "IN-SET" refers to an IN specification in the SQL WHERE clause. The IN-SET is the set of values that populate the contents of the SQL IN query expression. For example, in the following SQL WHERE clause, the cities Raleigh, Cary, and Clayton are the values of the IN-SET:

WHERE location.CITY in ("Raleigh", "Cary", "Clayton");

For the IN-SET strategy, Phase I dimension tables are subsetted, and then the resulting set of join keys form the SQL IN expression for the fact table's corresponding join column. You must have simple indexes on all SQL join columns in both the fact table and dimension tables before STARJOIN Phase I can select the IN-SET strategy.

If the dimension table Location has six rows for Raleigh, Cary, and Clayton, then six STORE_NUMBER values are applied to the IN-SET WHERE clause that is used to select the candidate rows from the central fact table. The STARJOIN IN-SET facility transforms the dimension table's CITY values into STORE_NUMBER values that can be used to select candidate rows from the Sales fact table. The transformed WHERE clause to be applied to the fact table might resemble the following code:

WHERE fact.STORE_NUMBER in
 (100,101,102,103,104,105,106);

You can use IN-SET transformations in a star schema that has any number of dimension tables and a fact table. Consider the following example dimension table subsetting statement:

WHERE location.CITY in
("Raleigh","Cary","Clayton")
 and Time.SALES_WEEK = 1;

Because the Sales fact table has no matching CITY column to join with the Location dimension table, and no matching SALES_WEEK column to join with the Time table, the IN-SET strategy uses transformations to create a WHERE clause that the Sales fact table can resolve:

WHERE fact.STORE_NUMBER in
 (100,101,102,103,104,105,106)
and Time.SALES_DATE in
 ('01JAN2005'd,'02JAN2005'd,'03JAN2005'd,
  '04JAN2005'd,'05JAN2005'd,'06JAN2005'd,
  '07JAN2005'd,);

The advantage of the STARJOIN facility is that it handles all of the transformations on a fact table, from dimension table subsetting to IN-SET WHERE clauses.

The COMPOSITE join strategy uses a composite index on the fact table to exhaustively probe the full Cartesian product of the combined join keys that is produced by the aggregated dimension table subsetting. SPD Server compares the composite indexes on the fact table to the theoretical composite index that is made from all of the join keys in the Phase I dimension tables. Phase I selects the best composite index on the fact table, based on the join requirements of the dimension tables.

A disadvantage of using the COMPOSITE join strategy is that when more than a few join keys exist, the Cartesian product map can become large geometric matrixes that can interfere with processing performance. You must have a composite index on the fact table that consists of Phase I dimension table join columns before STARJOIN Phase I can select the COMPOSITE join strategy.

If any Phase I dimension tables contain join predicates that do not have supporting simple or composite indexes on the fact table, those Phase I dimension tables are dropped from Phase I processing and are moved to the Phase II group.

Phase II Performs Index Lookups and Joins Subsetted Fact Table Rows with Phase II Tables

Phase I optimizes the join strategies between the Phase I dimension tables and the candidate rows from the fact table . After Phase I terminates, Phase II takes over. Phase II completes the indicated joins between the candidate rows from the fact table and the corresponding rows in the subsetted Phase I dimension tables. After completing the joins with the Phase I dimension tables, Phase II performs index lookups from the fact table to the Phase dimension II tables. Phase II dimension tables should have indexes created on all columns that join with the fact table.

When SPD Server completes the STARJOIN Phase I and Phase II tasks, the STARJOIN optimizations have been performed, the STARJOIN strategy has been selected, and the subsetted dimension tables and fact table joins are ready to run and produce the desired SQL results set.

Indexing Strategies to Optimize STARJOIN Query Performance

Overview of Indexing Strategies

Once the baseline criteria to create an SQL STARJOIN in SPD Server have been satisfied, you can configure indexing to influence which strategy the SPD Server STARJOIN facility chooses.

With the IN-SET strategy, the SPD Server STARJOIN facility can use multiple simple indexes on the fact table. The IN-SET strategy is the simplest to configure, and usually provides the best performance. To configure your work to choose the STARJOIN IN-SET strategy, create a simple index on each fact table and dimension table SQL column that you want to use in a join relation. Creating simple indexes prevents STARJOIN Phase I from rejecting a Phase I dimension table so that it becomes a non-optimized Phase II table. In addition, simple indexes facilitate the Phase II fact-table-to-dimension-table join lookup.

Consider the following SQL code for a star schema with one fact table and two dimension tables:

PROC SQL;
select F.FID, D1.DKEY, D2.DKEY
from fact F, DIM1 D1, DIM2 D2
where D1.DKEY EQ F.D1KEY
and D2.DKEY EQ F.D2KEY
and D1.REGION EQ 'Midwest'
and D2.PRODUCT EQ 'TV';

Indexing to Optimize the IN-SET Join Strategy

The SPD Server IN-SET join strategy is the preferred strategy for almost every STARJOIN. If you want the previous example code to trigger the IN-SET STARJOIN strategy, create simple indexes on the join columns for the star schema's fact table and dimension tables:

On the fact table F, create simple indexes on columns F.D1KEY and F.D2KEY.
On the dimension tables D1 and D2, create simple indexes on columns D1.DKEY and D2.DKEY.

Other fact table and dimension table indexes might exist that could filter WHERE clauses, but those simple indexes are the indexes that are needed to enable the STARJOIN IN-SET join strategy.

Indexing to Optimize the COMPOSITE Join Strategy

Using the COMPOSITE join strategy, the dimension tables with WHERE clause subsetting are collected from the set of equally joined predicates. A fact table composite index is needed for the fact table columns that correspond to the subsetted dimension table columns. The composite index on the fact table is necessary to facilitate the dimension tables' Cartesian product probes on the fact table rows. The STARJOIN optimizer code looks for the best composite index, which is based on the best and simplest left-to-right match of the columns in the COMPOSITE join.

If the subsetting in a STARJOIN is limited to a single dimension table, then the COMPOSITE join strategy can be enabled by creating a simple index on the join column of the single dimension table. That index is used to perform the Phase II index lookup on the fact table candidate rows. The fact table candidate row set is the result of the Phase I composite index probe.

For the previous example code to trigger the COMPOSITE STARJOIN strategy, create a composite index named COMP1 on the fact table for each of the dimension table keys: F.COMP1=(D1KEY D2KEY).

Other fact table and dimension table indexes might exist that could filter WHERE clauses, but the COMPOSITE index named COMP1 is the type of index that is needed to enable the STARJOIN COMPOSITE join strategy.

Although the COMPOSITE join strategy might appear to be a simpler configuration, the strongest utility of the COMPOSITE join strategy is limited to join relations between the fact table and dimension tables that are based on a Cartesian matrix of outcomes. As the number of dimension tables and join relations increases, the resulting Cartesian matrixes increase geometrically in size and can become unmanageable. The superior performance of the IN-SET strategy is so dramatic and robust that you should consider using the COMPOSITE join strategy only if you have good evidence that it compares favorably with the IN-SET strategy.

Example: Indexing Using the IN-SET Join Strategy

The example star schema in Example Star Schema has four dimension tables (Supplier, Products, Location, and Time) and one fact table (Sales) with simple indexes on the SUPPLIER_ID, PRODUCT_CODE, STORE_NUMBER, and SALES_DATE columns in the Sales fact table.

Consider the following SQL query to create a January sales report for an organization's North Carolina stores:

PROC SQL;
select
 sum(s.sales_amt) as sales_amt
 sum(s.units_sold) as units_sold
 s.product)code,
 t.sales_month

from
 spdslib.sales s,
 spdslib.supplier sup,
 spdslib.products p,
 spdslib.location l,
 spdslib.time t

where
 s.store_number = l.store_number
and s.sales_date = t.sales_date
and s.product_code = p.product_code
and s.supplier_id = sup.supplier_id
and l.state = 'NC'
and t.sales_date
 between '01JAN2005'd and '31JAN2005'd;

quit;

During optimization, the STARJOIN planner examines the WHERE clause subsetting in the SQL to determine which dimension tables qualify as Phase I tables and which are Phase II tables.

The WHERE clause subsetting of the STATE column of the Location dimension table (where ... l.state = 'NC') and the subsetting of the SALES_DATE column of the Time dimension table (where ... t.sales_date between '01JAN2005'd and '31JAN2005'd) cause SPD Server to process the Location and Time tables as Phase I tables. The remaining dimension tables Supplier and Products are processed as Phase II tables.

SPD Server STARJOIN uses the Phase I dimension tables to reduce the rows in the fact table to candidate rows that contain the matching criteria. The values in each dimension table key are used to create a list of values that meet the subsetting criteria of the fact table.

For example, the previous SQL query is intended to create a January sales report for stores located in North Carolina. Note that the WHERE clause in the SQL code joins the Location and Sales tables on the STORE_NUMBER column. Suppose that there are 10 unique North Carolina stores, with consecutively ordered STORE_NUMBER values that run from 101 to 110. When the WHERE clause is evaluated, the results will include a list of the 10 North Carolina store IDs that existed in January 2005.

With simple indexes on the fact table and dimension tables for the STORE_NUMBER column, STARJOIN chooses the IN-SET strategy. Subsetting the STATE column values to 'NC' enables STARJOIN to build the set of store numbers that are associated with North Carolina locations. STARJOIN can use the set of North Carolina store numbers to generate a where ... in SQL expression. SQL uses the where ... in expression to efficiently subset the candidate rows in the fact table before the final SQL expression evaluation.

In other words, STARJOIN uses a matrix of database relationships and index combinations to reorganize the SQL expression for more internal processing that can take advantage of the IN-SET join strategy. For the previous example code, the internal STARJOIN SQL reorganization resembles the following example code. The WHERE clause IN-SET statements for the STORE_NUMBER and TIME columns can be rapidly processed to subset the candidate rows in the Sales fact table. (The optimized code sections are highlighted.)

PROC SQL;
 select sum(s.sales_amt) as sales_amt
 sum(s.units_sold) as units_sold
s.product)code,
 t.sales_month

from spdslib.sales s,
 spdslib.supplier sup,
 spdslib.products p,
 spdslib.location l,
 spdslib.time t

 where s.store_number = l.store_number
 and s.sales_date = t.sales_date
 and s.product_code = p.product_code
 and s.supplier_id = sup.supplier_id
 and s.store_number in (101,102,103,104,105,106,107,108,109,110)
 and s.time
 in ('01JAN2005'd,'02JAN2005'd,'03JAN2005'd,
     '04JAN2005'd,      <...>, '28JAN2005'd,
     '29JAN2005'd,'30JAN2005'd,'31JAN2005'd);

quit;

After Phase I completes the candidate row optimization on the Sales fact table, Phase II processes the optimized query from the fact table outward. Phase II uses the values in the fact table's subsetted candidate rows to perform index lookups on the dimension tables' contents to complete the join in the most efficient manner.

SPD Server STARJOIN RESET Statement Options

Overview of STARJOIN Reset Statement Options

SPD Server recognizes several RESET statements that can configure or provide information about the STARJOIN facility in SPD Server SQL.

RESET NOSTARJOIN=[0/1]

The NOSTARJOIN option suppresses the use of the SPD Server STARJOIN optimizer in the planning and running of SQL statements that have valid STARJOIN patterns or star schemas. The statements NOSTARJOIN and NOSTARJOIN=1 are equivalent. When NOSTARJOIN is enabled, SPD Server ignores STARJOIN and uses pairwise joins to plan and run SQL statements. The default setting is NOSTARJOIN=0, meaning that in SPD Server, STARJOIN is enabled unless reset, and STARJOIN optimization occurs when SQL recognizes a valid SPD Server pattern or star schema.

RESET STARMAGIC=nnn

STARMAGIC is the STARJOIN counterpart to the SQL MAGIC number option. You can set magic numbers that direct STARJOIN to override internal heuristics, which results in enhanced join strategies. The STARMAGIC option uses bit flags to configure the STARJOIN code. You can select different controls by adding the values for the different bit flags in the following STARMAGIC set:

STARMAGIC Bit Flags

Value	Meaning
1	forces all dimension tables to be classified as Phase I tables.
2	obsolete; not used.
4	requires exact matches on the fact table composite index to meet STARJOIN Phase I conditions.
8	disables IN-SET join strategy. (Default setting is enabled.)
16	disables COMPOSITE join strategy. (Default setting is enabled.)

RESET DETAILS="stj$"

All internal STARJOIN debugging information is tied to RESET DETAILS="stj$". Issuing this statement displays available information as SPD Server attempts to validate a join sub-tree. The RESET DETAILS="stj$" statement is useful for debugging STARJOIN and SQL statements.

Example: STARJOIN RESET Statements

The following example connects to sasspds, and then issues the "stj$" RESET option to display all available information as SPD Server attempts to validate the join sub-tree for the submitted SQL on a star schema. The STARMAGIC=16 setting disables the STARJOIN COMPOSITE join strategy (STARJOIN COMPOSITE joins are enabled by default in SPD Server). The NOSTARJOIN=0 setting means that STARJOIN is enabled (or resets a disabled STARJOIN facility) and ensures that STARJOIN optimization occurs if SQL recognizes a valid SPD Server pattern or star schema. (The STARJOIN facility is enabled by default in SPD Server.)

After submitting the following SQL statements, the code disconnects from sasspds and quits:

    PROC SQL;
      connect to sasspds
        (dbq="star"
         server=sunburn.5007
         user='anonymous');

      execute (reset
         DETAILS="stj$"
         STARMAGIC=16
         NOSTARJOIN=0)

      by sasspds;

      execute (
         ...
         SQL statements
         ...);
      by sasspds;

      disconnect from sasspds;
    quit;

SPD Server STARJOIN Examples

Example 1: Valid SQL STARJOIN Candidate

The following code is an example of an SQL submission that SPD Server is able to use as a star schema. The submission is a valid candidate because:

a single central fact table, Sales, exists
the dimension tables Time, Products, Location, and Supplier all join with the fact table Sales
each dimension table appears in only one join condition
all dimension tables link to the fact table using equally joined operators

PROC SQL;
  create table Sales_Report as
  select a.STORE_NUMBER,
         b.quarter
         c.month,
         d.state,
         e.SUPPLIER_ID

  sum(a.total_sold) as tot_qtr_mth_sales
  from   Sales a,
         Time b,
         Products c,
         Location d,
         Supplier e

  where a.sales_date   = b.sales_date
    and a.STORE_NUMBER = d.store_number
    and a.PRODUCT_CODE = c.product_code
    and a.SUPPLIER_ID  = d.supplier_id
    and b.quarter in (3, 4)
    and c.PRODUCT_CODE in (23, 100)

  group by b.quarter,
           a.STORE_NUMBER,
           b.month;
  quit;

Example 2: Invalid SQL STARJOIN Candidate

The following code is an example of an SQL submission that SPD Server is not able to use as a star schema because no single central fact table can be identified. Changes to the previous code example are highlighted:

  PROC SQL;
  create table Sales_Report as
  select a.STORE_NUMBER,
         b.quarter
         c.month,
         d.state,
         e.SUPPLIER_ID

  sum(a.total_sold) as tot_qtr_mth_sales
  from   Sales a,
         Time b,
         Products c,
         Location d,
         Supplier e

  where a.sales_date   = b.sales_date
    and a.STORE_NUMBER = d.store_number
    and a.PRODUCT_CODE = c.product_code
    and c.SUPPLIER_ID  = d.supplier_id
    and b.quarter in (3, 4)
    and c.PRODUCT_CODE in (23, 100)

  group by b.quarter,
           a.STORE_NUMBER,
           b.month;
  quit;

SPD Server is not able to use the SQL submission in this example as a star schema This submitted code joins the dimension tables for Time, Products, and Location to the Sales table, but the table for Supplier is joined to the Sales table through the Products table. As a result, the topology does not define a single central fact table.

Example 3: STARJOIN Candidate with Created or Calculated Columns

The STARJOIN facility in SPD Server supports calculated or created columns. The following code is an example of an SQL submission that creates columns, but still uses STARJOIN optimization, if the central fact table and the dimension tables contain indexes on the join columns for the STARJOIN:

  PROC SQL;
  create table &Regional_Report as
  select case d.state
    when 1 then 'NC'
    when 2 then 'SC'
    when 3 then 'GA'
    when 4 then 'VA'
    else '  '

    end as state_abv,
    b.quarter,
    sum (a.tot_amt) as total_amt

  from wk_str_upd_t a,
       week_t b,
       location_t d,

  where a.we_dt       = b.we_dt
    and a.chn_str_nbr = d.chn_str_nbr
    and b.quarter     = 2

  group by d.state,
           b.quarter
    having d.state in (1,2,3,4);
  quit;

The highlighted code creates a column called state_abv. The SPD Server STARJOIN facility supports created columns if the appropriate indexes on the join columns exist in the fact table and dimension tables.