Aggregation Storage

Overview

When determining how to efficiently deliver data in a multidimensional cube, how data is stored is an important factor. Summarizing data with aggregations improves query response. Any combination of dimension levels can become a stored aggregation. Which aggregations are being stored has a direct effect on the SAS OLAP Server CPU usage, file I/O, and query response times. The aggregations that are being stored also affect cube build time and the absolute cube file size. Therefore, it is a trade-off between a single instance of resource use at cube build time and multiple instances of resource use at cube query time.

The aggregated data values for SAS OLAP cubes can be stored either with the cube in the cube's internal format or external to the cube in relational summary tables.

MOLAP

MOLAP aggregation storage is the cube-internal storage for aggregations. MOLAP aggregation tables are created as part of the cube creation step.

ROLAP

ROLAP aggregation storage is the cube-external summary tables. ROLAP summary tables need to be precalculated from the input data (using tools such as SQL or PROC MEANS/PROC SUMMARY) and made known to the cube at cube creation time.

MOLAP Aggregation Storage

SAS MOLAP aggregation storage maintains the cube data in the same table format as the format that is used by the SAS Scalable Performance Data (SPD) Engine. MOLAP aggregation storage takes advantage of key contraction and allows data access by using the cube's internal data representation directly.

MOLAP aggregation storage of SAS OLAP cubes has the same threading and scalability features as the files used by the SAS SPD Engine. The data and the index section of the files are stored in different physical files. This enables parallel access to the data and index sections. The data and index files themselves are stored in partitions, enabling parallel data retrieval within the same file. The partitions of the data and the index section can be distributed over multiple disc controllers, thus adding a further boost to the threaded and partitioned architecture by reducing contention and possible bottlenecks in the physical I/O.

ROLAP Aggregation Storage

ROLAP tables used in SAS OLAP cubes can be SAS data sets, SAS data views, or any tables or views accessible through a SAS engine. This extends the choice of available storage options for SAS OLAP cubes to include SPDE, SPDS, and any RDBMS product for which a SAS/ACCESS software product is available.

ROLAP aggregation tables must conform to the structure of the input data. The columns that feed the dimension levels must have the same column names and attributes that were used in the input data when loading the cube. In addition, all aggregations must be stored in fully de-normalized form. Here are some guidelines to make aggregation columns for measures available:

Each ROLAP aggregation table must include all columns for the cube's measures with stored statistics.
SAS OLAP cube aggregations store the following statistics: SUM, N, NMISS, USS, MIN, and MAX. Other available statistics are derived from the stored statistics by internal calculations. For example, in order to include a measure for the AVG statistic in your cube, you need to make columns available in your ROLAP aggregation tables that were generated by using SUM and N (count).

ROLAP data is stored in either a flat file or with a star schema. With ROLAP, each instance of slicing and dicing of data is part of an SQL query (or multiple SQL queries) and is comparable to a WHERE clause in the SQL statement.

ROLAP data requests can also run against the data that was used to create and load the cube, whether from a detail table or a star schema. The cube's input data can be used in place of the aggregation with the combination of the lowest level of all dimensions (often called NWAY or NWAY aggregation, which is a name borrowed from PROC MEANS/PROC SUMMARY where it denotes the combination of all CLASS variables).

Choosing MOLAP or ROLAP Aggregation Storage

MOLAP aggregation storage is optimized for SAS OLAP Server internal processing and has a minimal data-size footprint. It uses threaded, parallel data access and is tunable to any hardware environment. MOLAP aggregation storage is convenient because it doesn't require additional data management steps.

ROLAP aggregation storage enables you to use existing ROLAP schemas and reuse legacy SAS OLAP Server, SAS 8 HOLAP structures. ROLAP aggregation storage enables users to use database systems to store and serve cube aggregation data and to off-load data access, I/O, and roll-up.

A hybrid approach is possible. For example, users with existing ROLAP structures can build a “light” SAS OLAP cube with no additional stored aggregations and add MOLAP aggregations to further tune the cube performance.