Maintaining OLAP Groups and OLAP Summary Data Stores

Summarizing Data Using SAS/Warehouse Administrator

Generating the Appropriate OLAP Summary Data

The type and number of summary data stores you create depends largely on the data access patterns. That is, there are a wide variety of strategies that you can use ranging from a single OLAP Table with one crossing, or a single OLAP MDDB with multiple crossings, to a proxy MDDB with several associated OLAP MDDBs and OLAP Tables, some of which might reside on an external DBMS.

To facilitate the different strategies required for OLAP reporting, an OLAP Group can be one of the following types, which depends on whether you intend to store summary data in OLAP Tables, OLAP MDDBs, or both:

HOLAP	supports a hybrid OLAP solution that combines the best features of both ROLAP and MOLAP. HOLAP provides access to diverse data sources on local and remote servers. An OLAP Group of type HOLAP groups both OLAP Tables and OLAP MDDBs, which together represent the data for one OLAP application. When an OLAP Group of type HOLAP is specified as an output data store in a Process Editor Job, SAS/Warehouse Administrator generates a proxy MDDB, which is a physical file that represents the structure of the data in an OLAP Group. The proxy MDDB can be used by SAS/EIS software to provide more efficient access to multiple OLAP Tables and OLAP MDDBs.
MOLAP	supports OLAP performed on a multidimensional database, such as a SAS MDDB. SAS/Warehouse Administrator supports MOLAP with an OLAP Group of type MOLAP. Such a group is a grouping mechanism intended to contain only OLAP MDDBs. Multiple MDDBs can be contained in the group, but each MDDB generally represents the data for separate OLAP applications.
ROLAP	supports OLAP performed on a relational database, such as a SAS table or a DBMS table. SAS/Warehouse Administrator supports ROLAP with an OLAP Group of type ROLAP. Such a group is a grouping mechanism intended to contain only OLAP Tables. Multiple OLAP Tables can be contained in the group, but each table generally represents the data for separate OLAP applications. Note: SAS/EIS software does not support ROLAP.
MIXED	groups both OLAP Tables and OLAP MDDBs. Unlike HOLAP, the summary data stores in a MIXED group do not have to be used together. For example, you might choose the MIXED type if you do not want to define several OLAP Groups, with each having only one OLAP Table or OLAP MDDB.

Choosing the Appropriate Data Store

In SAS/Warehouse Administrator, summary data is stored in OLAP Tables and OLAP MDDBs. These are the physical storage units that will contain the derived values for crossings and statistic columns. Here are some considerations when you are determining the type of OLAP summary data store to create:

If the amount of summary data that you want to store in one data store is very large, consider using an OLAP Table. The amount of data that is considered very large will depend on the storage and processing resources available at your site. The amount could be anywhere from 500 megabytes to 2 gigabytes.
If you have a specific tool that can access only relational DBMS tables, and you want to use this tool to access your summary data, store that data in an OLAP Table that is a DBMS table.
To create an OLAP MDDB, SAS/MDDB Server software must be licensed on the machine where the OLAP MDDB will be stored.
SAS MDDBs use less storage space than SAS or DBMS tables.
To access data stored in an OLAP MDDB, use the MDDB viewer or a multidimensional SAS/EIS or Web EIS application. (Open the DIR window and type S or B in front of the MDDB name to view the MDDB's header information or data.) If you want to use other SAS software features to access the data in your summary tables, then use OLAP Tables to store the data.
Existing records in an MDDB can be updated. However, if you add or drop any statistics or analysis columns (if the metadata definition of the table changes), then the table must be refreshed (rebuilt). If the table is not refreshed, the changes will be ignored.

Assigning OLAP Summary Roles and Defining OLAP Structure

In addition to defining the physical properties of the OLAP Group and the summary data store, you must also

assign the OLAP summary roles, which determine how the columns are used in the summarization process. The summary roles are class columns, statistic columns, and ID columns.
specify the logical structure of the data, which is how the data is to be used by an OLAP report. The structure definitions are the OLAP Cube, dimensions, and hierarchies.

The summary roles and structure definitions include the following:

class column

is an OLAP summary role that is a numeric or character column used to group data into subpopulations. The values for each class column define groups for analysis. That is, the rows in the detail data store are grouped according to the values of the column, and a separate analysis is run for each group. Class columns typically have a relatively small number of discrete values that define the classification levels of the column.

For example, if columns state and county are class columns, you can order the columns so that states come first, and SAS/Warehouse Administrator will summarize data for each county within each state.

Each class column has an associated sort order. The following sort orders are supported:

ASCENDING	for OLAP Tables and OLAP MDDBs, sorts in ascending order by unformatted value. This is the default.
ASCFORMATTED	for OLAP Tables and OLAP MDDBs, sorts in ascending order by formatted value.
DESCENDING	for OLAP MDDBs, sorts in descending order by unformatted value.
DESFORMATTED	for OLAP MDDBs, sorts in descending order by formatted value.
DSORDER	for OLAP Tables and OLAP MDDBs, sorts in the order that the values occur in the input source.

statistic column

is an OLAP summary role that is a numeric column for storing computed summary statistics, which are the results of the analysis. Values for an input column (analysis column) are used to compute the output summary statistic, which then become the values for the statistic column in the summary data store.

For example, you could add a column named minsales, assign it as a statistic column using the MIN statistic, then define a Mapping process to compute the derived statistic from an analysis column like sales to the statistic column minsales.

Each statistic column has a specific keyword associated with it that specifies which statistic to compute. The following statistics are supported:

SUM	is the sum of nonmissing values for the column. This is the default.
MIN	is the smallest value for the column.
MAX	is the largest value for the column.
N	is the number of rows for the column having nonmissing values.
NMISS	is the number of rows in the column having missing values.
USS	is the uncorrected sum of squares.

ID column

is an OLAP summary role to include additional columns in the summary data store. You can specify ID columns to an OLAP Table only; it is not supported for an OLAP MDDB.

OLAP Cube

represents the logical relationships (dimensions and hierarchies) of the OLAP data so that you can run an OLAP report.

For HOLAP, the cube is associated with the OLAP Group and is registered in SAS/EIS software as associated with the proxy MDDB. The result is one OLAP Cube describing the relationships of all the class columns. For MOLAP and ROLAP, there is normally one cube associated with each OLAP MDDB and OLAP Table.

Note: You can decide not to have an OLAP Cube; however, in your OLAP application, you will not be able to drill down without manual intervention. If your intention is to report directly from OLAP data using an OLAP application, then you should define an OLAP Cube. [cautionend]

dimension

groups related class columns, which are organized as hierarchies. For example, you could organize sales data into dimensions Geography, Time, and Product. The Time dimension could include multiple hierarchies, such as Time-by-Week and Time-by-Month, which provide different paths in order to drill down to increasing levels of detail.

hierarchy

is a unique, ordered list of class columns that specifies related data and is a member of a dimension. Each hierarchy provides a navigational path in order to drill down to increasing levels of detail. For example, for a dimension named Time, you could define a hierarchy named Time-by-Month that consists of the class columns year, month, and date.

Note: The term hierarchy as used here is not the same as a stored summary level, which is a crossing. [cautionend]

crossing

is a unique list of one or more class columns that defines a summarization level (subtable) to be stored in one or more OLAP summary data stores. That is, a crossing represents a grouping on which summary statistics are calculated. You must have at least one crossing for an OLAP Table or an OLAP MDDB, and both summary data stores can have multiple crossings. All class columns must be in at least one crossing.

A crossing represents the physically stored data, which provides the quick response when displaying a report in an OLAP application. For each crossing, there is a single record for each unique combination of values of all named class columns in the original raw input data. Note that too many crossings result in lots of initial summarization time and disk space for storage, while too few crossings result in slower processing and reporting time for end users.

You can define an NWAY crossing, which is the most detailed type of crossing. An NWAY crossing consists of all the assigned class columns.

Note the following:

An OLAP MDDB must have an NWAY crossing, and it must be named NWAY if you intend on using the SAS/Warehouse Administrator code generator.
For an OLAP Table, you can define a crossing and not specify any class columns in it. The result is summary statistics across all rows in the input source.

When the Process Editor Job for an OLAP object (OLAP Group, OLAP Table, or OLAP MDDB) is executed, SAS/Warehouse Administrator loads the summary data store with the derived, summarized data. As each crossing is accumulated, all records of the input data are sorted into groups by the values of the class columns specified for the crossing. Each group represents a specific subpopulation. Within each group, the summary statistics are derived from the analysis columns in the input data and stored in the summary data store statistic columns. A single record is written to the crossing for each subpopulation.

Top of Page