Previous Page | Next Page

Maintaining OLAP Groups and OLAP Summary Data Stores

Summarizing Data Using SAS/Warehouse Administrator


Generating the Appropriate OLAP Summary Data

The type and number of summary data stores you create depends largely on the data access patterns. That is, there are a wide variety of strategies that you can use ranging from a single OLAP Table with one crossing, or a single OLAP MDDB with multiple crossings, to a proxy MDDB with several associated OLAP MDDBs and OLAP Tables, some of which might reside on an external DBMS.

To facilitate the different strategies required for OLAP reporting, an OLAP Group can be one of the following types, which depends on whether you intend to store summary data in OLAP Tables, OLAP MDDBs, or both:

HOLAP

supports a hybrid OLAP solution that combines the best features of both ROLAP and MOLAP. HOLAP provides access to diverse data sources on local and remote servers. An OLAP Group of type HOLAP groups both OLAP Tables and OLAP MDDBs, which together represent the data for one OLAP application.

When an OLAP Group of type HOLAP is specified as an output data store in a Process Editor Job, SAS/Warehouse Administrator generates a proxy MDDB, which is a physical file that represents the structure of the data in an OLAP Group. The proxy MDDB can be used by SAS/EIS software to provide more efficient access to multiple OLAP Tables and OLAP MDDBs.

MOLAP

supports OLAP performed on a multidimensional database, such as a SAS MDDB. SAS/Warehouse Administrator supports MOLAP with an OLAP Group of type MOLAP. Such a group is a grouping mechanism intended to contain only OLAP MDDBs. Multiple MDDBs can be contained in the group, but each MDDB generally represents the data for separate OLAP applications.

ROLAP

supports OLAP performed on a relational database, such as a SAS table or a DBMS table. SAS/Warehouse Administrator supports ROLAP with an OLAP Group of type ROLAP. Such a group is a grouping mechanism intended to contain only OLAP Tables. Multiple OLAP Tables can be contained in the group, but each table generally represents the data for separate OLAP applications.

Note:   SAS/EIS software does not support ROLAP.  [cautionend]

MIXED

groups both OLAP Tables and OLAP MDDBs. Unlike HOLAP, the summary data stores in a MIXED group do not have to be used together. For example, you might choose the MIXED type if you do not want to define several OLAP Groups, with each having only one OLAP Table or OLAP MDDB.


Choosing the Appropriate Data Store

In SAS/Warehouse Administrator, summary data is stored in OLAP Tables and OLAP MDDBs. These are the physical storage units that will contain the derived values for crossings and statistic columns. Here are some considerations when you are determining the type of OLAP summary data store to create:


Assigning OLAP Summary Roles and Defining OLAP Structure

In addition to defining the physical properties of the OLAP Group and the summary data store, you must also

The summary roles and structure definitions include the following:

class column

is an OLAP summary role that is a numeric or character column used to group data into subpopulations. The values for each class column define groups for analysis. That is, the rows in the detail data store are grouped according to the values of the column, and a separate analysis is run for each group. Class columns typically have a relatively small number of discrete values that define the classification levels of the column.

For example, if columns state and county are class columns, you can order the columns so that states come first, and SAS/Warehouse Administrator will summarize data for each county within each state.

Each class column has an associated sort order. The following sort orders are supported:

ASCENDING

for OLAP Tables and OLAP MDDBs, sorts in ascending order by unformatted value. This is the default.

ASCFORMATTED

for OLAP Tables and OLAP MDDBs, sorts in ascending order by formatted value.

DESCENDING

for OLAP MDDBs, sorts in descending order by unformatted value.

DESFORMATTED

for OLAP MDDBs, sorts in descending order by formatted value.

DSORDER

for OLAP Tables and OLAP MDDBs, sorts in the order that the values occur in the input source.

statistic column

is an OLAP summary role that is a numeric column for storing computed summary statistics, which are the results of the analysis. Values for an input column (analysis column) are used to compute the output summary statistic, which then become the values for the statistic column in the summary data store.

For example, you could add a column named minsales, assign it as a statistic column using the MIN statistic, then define a Mapping process to compute the derived statistic from an analysis column like sales to the statistic column minsales.

Each statistic column has a specific keyword associated with it that specifies which statistic to compute. The following statistics are supported:

SUM

is the sum of nonmissing values for the column. This is the default.

MIN

is the smallest value for the column.

MAX

is the largest value for the column.

N

is the number of rows for the column having nonmissing values.

NMISS

is the number of rows in the column having missing values.

USS

is the uncorrected sum of squares.

ID column

is an OLAP summary role to include additional columns in the summary data store. You can specify ID columns to an OLAP Table only; it is not supported for an OLAP MDDB.

OLAP Cube

represents the logical relationships (dimensions and hierarchies) of the OLAP data so that you can run an OLAP report.

For HOLAP, the cube is associated with the OLAP Group and is registered in SAS/EIS software as associated with the proxy MDDB. The result is one OLAP Cube describing the relationships of all the class columns. For MOLAP and ROLAP, there is normally one cube associated with each OLAP MDDB and OLAP Table.

Note:   You can decide not to have an OLAP Cube; however, in your OLAP application, you will not be able to drill down without manual intervention. If your intention is to report directly from OLAP data using an OLAP application, then you should define an OLAP Cube.  [cautionend]

dimension

groups related class columns, which are organized as hierarchies. For example, you could organize sales data into dimensions Geography, Time, and Product. The Time dimension could include multiple hierarchies, such as Time-by-Week and Time-by-Month, which provide different paths in order to drill down to increasing levels of detail.

hierarchy

is a unique, ordered list of class columns that specifies related data and is a member of a dimension. Each hierarchy provides a navigational path in order to drill down to increasing levels of detail. For example, for a dimension named Time, you could define a hierarchy named Time-by-Month that consists of the class columns year, month, and date.

Note:   The term hierarchy as used here is not the same as a stored summary level, which is a crossing.  [cautionend]

crossing

is a unique list of one or more class columns that defines a summarization level (subtable) to be stored in one or more OLAP summary data stores. That is, a crossing represents a grouping on which summary statistics are calculated. You must have at least one crossing for an OLAP Table or an OLAP MDDB, and both summary data stores can have multiple crossings. All class columns must be in at least one crossing.

A crossing represents the physically stored data, which provides the quick response when displaying a report in an OLAP application. For each crossing, there is a single record for each unique combination of values of all named class columns in the original raw input data. Note that too many crossings result in lots of initial summarization time and disk space for storage, while too few crossings result in slower processing and reporting time for end users.

You can define an NWAY crossing, which is the most detailed type of crossing. An NWAY crossing consists of all the assigned class columns.

Note the following:

  • An OLAP MDDB must have an NWAY crossing, and it must be named NWAY if you intend on using the SAS/Warehouse Administrator code generator.

  • For an OLAP Table, you can define a crossing and not specify any class columns in it. The result is summary statistics across all rows in the input source.

When the Process Editor Job for an OLAP object (OLAP Group, OLAP Table, or OLAP MDDB) is executed, SAS/Warehouse Administrator loads the summary data store with the derived, summarized data. As each crossing is accumulated, all records of the input data are sorted into groups by the values of the class columns specified for the crossing. Each group represents a specific subpopulation. Within each group, the summary statistics are derived from the analysis columns in the input data and stored in the summary data store statistic columns. A single record is written to the crossing for each subpopulation.

Previous Page | Next Page | Top of Page