Aggregation Design

Overview

Efficient drilling or traversing of the cube data is a key factor in flexible and swift decision making and analysis. In order to maintain speed and consistency in reporting, data is usually pre-calculated or aggregated. An important factor in query performance is good aggregation design, which includes decisions about total storage space, available build time, storage location, and storage format.

Aggregation Size

When planning your data storage and design, it is helpful to approximate the size of aggregations. A basis for estimating aggregation size is the number of distinct values in a dimension level, otherwise known as cardinality. The other factor that determines aggregations size is density. Density is a measure of how many members of each dimension in an aggregation occur in combination with the members of the other dimensions (For example, there might not be sales of a specific product on a specific date). The total cube size, as well as the resources that are available for the cube build process, determine the build time that is needed. It is also important to note that build time should not exceed the cube update interval.
Aggregation size and available hardware influence your choices for aggregation partitioning. You can separate aggregations into multiple files. A reduced file size might accelerate OLAP server access time, particularly if multiple processors are available for multi-threaded processing. You can use either pre-aggregated summary tables, the cube's own efficient aggregation storage, or a combination of both. Using indexes on either storage type might increase query performance, while also increasing storage space and build time.

User Query Patterns and ARM Logging

Overview

In choosing the best aggregations to summarize and store with the cube, the most important factor to take into account is user query behavior. It is recommended that you start with an initial aggregation design that is based on a minimal set of aggregations or on your best assumptions about usage patterns.
After the cube is deployed to users, you can use the ARM (Application Response Management) logging capabilities of the SAS OLAP Server to collect data about the usage pattern and the performance of individual queries. You can analyze the collected data to find out which cube aggregations are used most, which aggregations you can safely eliminate without harming query performance, and which aggregations are often requested but don't exist in the cube. The ARM data provides all the information that you need in order to get the optimal query improvement for your build time and cube storage space.

Administering ARM Logs

When using ARM logs to generate aggregations for a cube, you must consider the administration and maintenance of the ARM logs. If you are using multiple load-balanced servers, you must plan for the naming of the ARM logs that these servers write to. Each load-balanced server that you use writes to a designated ARM log file. If the ARM log filename is the same for the different load-balanced servers, then the servers will write to the same ARM log simultaneously. This can produce an invalid ARM log. It is recommended that you provide a unique ARM log name for each load-balanced server to write ARM log data to. You can then select each ARM log file to analyze in the Aggregation Tuning dialog box.
Note: For further information about ARM logging, see the topic “ARM Logging” in the chapter “Administering SAS OLAP Servers” in the SAS Intelligence Platform: Application Server Administration Guide and the SAS Application Response Measurement (ARM) Reference.

Aggregation Performance Settings

When planning and creating aggregations for a cube, you can customize various performance settings for a single aggregation or for all aggregations in the cube. You can change the following aggregation-specific settings:
  • ASYNCINDEXLIMIT=
  • COMPACT_NWAY
  • COMPRESS | NOCOMPRESS
  • CONCURRENT=
  • DATAPATH=
  • INDEXPATH=
  • INDEXSORTSIZE=
  • MAXTHREADS=
  • INDEX|NOINDEX
  • PARTSIZE=
  • SEGSIZE=
Note: ASYNCINDEXLIMIT=, CONCURRENT=, INDEXSORTSIZE=, and MAXTHREADS= are available only in the PROC OLAP statement.
You can change these settings using either the OLAP procedure or in SAS OLAP Cube Studio. In SAS OLAP Cube Studio, you can apply these settings from these functions:
  • Cube Designer – Aggregations dialog box
  • Aggregation Tuning dialog box