Maintaining OLAP Groups and OLAP Summary Data Stores |
Understanding Detail Data |
Detail data is information that is at or near the fact level in a database. It is the data that an OLAP application intends to analyze. In SAS/Warehouse Administrator, detail data is stored in
To understand what detail data consists of, consider an OLAP application to analyze the sales of toys for which the detail data represents toy sales transactions across several product lines and manufacturers. For the Toy Store Warehouse, a Subject named Toy Sales contains the Detail Logical Table named Sales Detail Grp, which is physically stored as a SAS SQL view named DATA.STAR. Sales Detail Grp Detail Logical Table shows some of the detail data in the Detail Logical Table.
Note: For an explanation of how the Detail Logical Table is created and a complete description of its contents, see Example: Creating a Detail Logical Table as a View to Multiple Detail Tables.
Sales Detail Grp Detail Logical Table
To understand the detail data, consider the following:
As a whole, the toy sales transactions form a population. For each transaction, specific information is captured about the context of that transaction including the time and location of the transaction. Information about the participants in the transaction is also part of the context, such as age, gender, and income of the purchaser, and promotional campaigns implemented by the vendor.
All of that information can be divided into subpopulations or classifications. The fields used to record that information are referred to as classification columns or class columns.
To be useful, the class columns need to break the population into a manageably small number of subpopulations. In order to make selection possible, several class columns with different granularity are often stored. For example, geographic classifications can be done at the Country, Region, State, City, and Zipcode level. The number of discrete values for any class column within the population is referred to as its cardinality.
The idea of OLAP is to explore a large population looking for trends in the measurements within or between many different subpopulations. There can be many different subpopulations defined by combinations of values of the class columns.
In addition to class columns, there are certain properties of a transaction that are measured and stored numerically. These are the measurements, which for the toy sales data are the number of toys sold, the cost to the seller, and the price paid by the buyer. The fields that store the raw data are referred to as analysis columns because they are what you want to analyze using OLAP techniques. When you combine the measurements across a number of individuals in a subpopulation, one or more statistic columns can be produced per analysis column. That is, the values for an analysis column are used to compute the output summary statistics, which become the values for the statistic column(s) in the summary data store. Typical statistics are sum, minimum, and maximum.
It is possible to store other information about a transaction that is neither classification nor measurement. These are referred to as identification columns or ID columns, because they are often used to further identify groups of transactions that are otherwise identified by class column values. For example, along with the class column state_code you could have the ID column state_name.
Determining Dimensions in the Data |
To perform OLAP reporting, the detail data must be summarized into multidimensional views, which is a way of looking at data that is organized into dimensions. A dimension acts as an index by which you can access facts according to the value (or values) you want. A dimension is a logical grouping of related columns.
For example, continuing with the Sales Detail Grp detail data, you could organize the data into these dimensions by grouping class columns based on related characteristics:
Determining Hierarchical Relationships within Dimensions |
After you have class columns grouped into dimensions, you should then determine whether there are hierarchical relationships among the class columns within each dimension. That is, if the class columns within a dimension have a hierarchical relationship, each hierarchy would provide a navigational path in order to drill down (or up) to increasing (or decreasing) levels of detail.
For example, for the Detail Logical Table Sales Detail Grp, assume that the OLAP application intends to explore the Time dimension (which groups class columns year, season, quarter, month, week, and date) in multiple ways. Therefore, three separate hierarchies can be defined as follows:
Determining Crossing(s) |
A crossing is a unique list of one or more class columns that defines a summarization level (subtable) to be stored in one or more OLAP summary data stores. That is, a crossing represents a grouping on which summary statistics are calculated. A crossing represents the physically stored data, which provides the quick response when displaying a report in an OLAP application. For each crossing, there is a single record for each unique combination of values of all named class columns in the original input detail data (population).
There are several methods you can use to help determine crossings. For example, to create a starting point for defining crossings, you can model the data using a spiral diagram. Using the toy sales hierarchies, the steps below create the spiral diagram shown in Spiral Diagram for Toy Sales Detail Data:
Spiral Diagram for Toy Sales Detail Data
First, draw an axis for each hierarchy. Then, place the class columns on the appropriate axes (working from the outside to the center) in ascending order of cardinality (number of unique values).
Note: The placement of axes in relation to each other can be significant. Try several arrangements to find one that works best. For example, you could arrange the axes in descending order of likelihood of use, in descending order of cardinality at the top dimensional level, or in descending order of number of levels.
Draw a spiral on the diagram to indicate a general ordering scheme for the columns. Start on the outside with the class column with the lowest cardinality, which is also the one most likely to be of interest to OLAP users, and draw a line from this column to an adjacent column. Continue spiraling in toward the center, as shown in Spiral Diagram for Toy Sales Detail Data.
Looking at the spiral diagram, you can produce an ordered list of class columns. Most likely, the more important class columns with the lowest cardinality are at the top and the most detailed class columns with high cardinality are at the bottom. For this example, the order would be:
From this list, you can develop a reasonable initial choice of crossings. Start with the entire list and successively drop the highest cardinality column in a "stair-step" fashion to form additional crossings as follows:
Year Country Company Gender G_Age G_Income Target Promo_Ty Season Quarter Region PClass Year Country Company Gender G_Age G_Income Target Promo_Ty Season Quarter Region Year Country Company Gender G_Age G_Income Target Promo_Ty Season Quarter Year Country Company Gender G_Age G_Income Target Promo_Ty Season Year Country Company Gender G_Age G_Income Target Promo_Ty Year Country Company Gender G_Age G_Income Target Year Country Company Gender G_Age G_Income Year Country Company Gender G_Age Year Country Company Gender Year Country Company Year Country Year
Copyright © 2012 by SAS Institute Inc., Cary, NC, USA. All rights reserved.