Maintaining OLAP Groups and OLAP Summary Data Stores |
The type and number of summary data stores you create
depends largely on the data access patterns. That is, there are a wide variety
of strategies that you can use ranging from a single OLAP Table with one crossing,
or a single OLAP MDDB with multiple crossings, to a proxy MDDB with several
associated OLAP MDDBs and OLAP Tables, some of which might reside on an external
DBMS.
To facilitate the different strategies required for
OLAP reporting, an OLAP Group can be one of the following types, which depends
on whether you intend to store summary data in OLAP Tables, OLAP MDDBs, or
both:
HOLAP |
supports a hybrid OLAP solution that
combines the best features of both ROLAP and MOLAP. HOLAP provides access
to diverse data sources on local and remote servers. An OLAP Group of type
HOLAP groups both OLAP Tables and OLAP MDDBs, which together represent the
data for one OLAP application.
When an OLAP Group of type HOLAP is specified as an
output data store in a Process Editor Job, SAS/Warehouse
Administrator generates a proxy MDDB, which
is a physical file that represents the structure of the data in an OLAP Group.
The proxy MDDB can be used by SAS/EIS software to provide more efficient access
to multiple OLAP Tables and OLAP MDDBs. |
MOLAP |
supports OLAP performed on a multidimensional
database, such as a SAS MDDB. SAS/Warehouse Administrator supports
MOLAP with an OLAP Group of type MOLAP. Such a group is a grouping mechanism
intended to contain only OLAP MDDBs. Multiple MDDBs can be contained in the
group, but each MDDB generally represents the data for separate OLAP applications. |
ROLAP |
supports OLAP performed on a relational
database, such as a SAS table or a DBMS table. SAS/Warehouse
Administrator supports ROLAP with an OLAP Group of type
ROLAP. Such a group is a grouping mechanism intended to contain only OLAP
Tables. Multiple OLAP Tables can be contained in the group, but each table
generally represents the data for separate OLAP applications.
Note: SAS/EIS software does not support
ROLAP. |
MIXED |
groups both OLAP Tables and OLAP
MDDBs. Unlike HOLAP, the summary data stores in a MIXED group do not have
to be used together. For example, you might choose the MIXED type if you do
not want to define several OLAP Groups, with each having only one OLAP Table
or OLAP MDDB. |
In SAS/Warehouse Administrator, summary data is stored in OLAP
Tables and OLAP MDDBs. These are the physical storage units that will contain
the derived values for crossings and statistic columns. Here are some considerations
when you are determining the type of OLAP summary data store to create:
-
If the amount of summary data that you want to
store in one data store is very large, consider using an OLAP Table. The amount
of data that is considered very large will depend on the storage and processing
resources available at your site. The amount could be anywhere from 500 megabytes
to 2 gigabytes.
-
If you have a specific tool that can access only
relational DBMS tables, and you want to use this tool to access your summary
data, store that data in an OLAP Table that is a DBMS table.
-
To create an OLAP MDDB, SAS/MDDB Server software
must be licensed on the machine where the OLAP MDDB will be stored.
-
SAS MDDBs use less storage space than SAS or DBMS
tables.
-
To access data stored in an OLAP MDDB, use the
MDDB viewer or a multidimensional SAS/EIS or Web EIS application. (Open the
DIR window and type S
or B
in front of the MDDB
name to view the MDDB's header information or data.) If you want to use other
SAS software features to access the data in your summary tables, then use
OLAP Tables to store the data.
-
Existing records in an MDDB can be updated. However,
if you add or drop any statistics or analysis columns (if the metadata definition
of the table changes), then the table must be refreshed (rebuilt). If the
table is not refreshed, the changes will be ignored.
In addition to defining the physical properties of the OLAP Group
and the summary data store, you must also
-
assign the OLAP summary roles, which determine
how the columns are used in the summarization process. The summary roles are
class columns, statistic columns, and ID columns.
-
specify the logical structure of the data, which
is how the data is to be used by an OLAP report. The structure definitions
are the OLAP Cube, dimensions, and hierarchies.
The summary roles and structure definitions include
the following:
class column |
is an OLAP summary role that is a
numeric or character column used to group data into subpopulations. The values
for each class column define groups for analysis. That is, the rows in the
detail data store are grouped according to the values of the column, and a
separate analysis is run for each group. Class columns typically have a relatively
small number of discrete values that define the classification levels of the
column.
For example, if columns state
and county are class columns, you can order
the columns so that states come first, and SAS/Warehouse
Administrator will summarize data for each county within
each state.
Each class column has an associated sort order. The
following sort orders are supported:
ASCENDING |
for OLAP Tables and OLAP MDDBs, sorts
in ascending order by unformatted value. This is the default. |
ASCFORMATTED |
for OLAP Tables and OLAP MDDBs, sorts
in ascending order by formatted value. |
DESCENDING |
for OLAP MDDBs, sorts in descending
order by unformatted value. |
DESFORMATTED |
for OLAP MDDBs, sorts in descending
order by formatted value. |
DSORDER |
for OLAP Tables and OLAP MDDBs, sorts
in the order that the values occur in the input source. |
|
statistic column |
is an OLAP summary role that is a
numeric column for storing computed summary statistics, which are the results
of the analysis. Values for an input column (analysis column) are used to
compute the output summary statistic, which then become the values for the
statistic column in the summary data store.
For example, you could add a column named minsales, assign it as a statistic column using the
MIN statistic,
then define a Mapping process to compute the derived statistic from an analysis
column like sales to the statistic column minsales.
Each statistic column has a specific keyword associated
with it that specifies which statistic to compute. The following statistics
are supported:
SUM |
is the sum of nonmissing values for
the column. This is the default. |
MIN |
is the smallest value for the column. |
MAX |
is the largest value for the column. |
N |
is the number of rows for the column
having nonmissing values. |
NMISS |
is the number of rows in the column
having missing values. |
USS |
is the uncorrected sum of squares. |
|
ID column |
is an OLAP summary role to include
additional columns in the summary data store. You can specify ID columns to
an OLAP Table only; it is not supported for an OLAP MDDB. |
OLAP Cube |
represents the logical relationships
(dimensions and hierarchies) of the OLAP data so that you can run an OLAP
report.
For HOLAP, the cube is associated with the OLAP Group
and is registered in SAS/EIS software as associated with the proxy MDDB. The
result is one OLAP Cube describing the relationships of all the class columns.
For MOLAP and ROLAP, there is normally one cube associated with each OLAP
MDDB and OLAP Table.
Note: You can decide not to have
an OLAP Cube; however, in your OLAP application, you will not be able to drill
down without manual intervention. If your intention is to report directly
from OLAP data using an OLAP application, then you should define an OLAP Cube. |
dimension |
groups related class columns, which
are organized as hierarchies. For example, you could organize sales data into
dimensions Geography, Time, and Product. The Time dimension could include
multiple hierarchies, such as Time-by-Week and Time-by-Month, which provide
different paths in order to drill down to increasing levels of detail. |
hierarchy |
is a unique, ordered list of class
columns that specifies related data and is a member of a dimension. Each hierarchy
provides a navigational path in order to drill down to increasing levels of
detail. For example, for a dimension named Time, you could define a hierarchy
named Time-by-Month that consists of the class columns year, month, and
date.
Note: The term hierarchy as used
here is not the same as a stored summary level, which is a crossing. |
crossing |
is a unique list of one or more class
columns that defines a summarization level (subtable) to be stored in one
or more OLAP summary data stores. That is, a crossing represents a grouping
on which summary statistics are calculated. You must have at least one crossing
for an OLAP Table or an OLAP MDDB, and both summary data stores can have multiple
crossings. All class columns must be in at least one crossing.
A crossing represents the physically stored data, which
provides the quick response when displaying a report in an OLAP application.
For each crossing, there is a single record for each unique combination of
values of all named class columns in the original raw input data. Note that
too many crossings result in lots of initial summarization time and disk space
for storage, while too few crossings result in slower processing and reporting
time for end users.
You can define an NWAY crossing, which is the most detailed
type of crossing. An NWAY crossing consists of all the assigned class columns.
Note the following:
-
An OLAP MDDB must have an
NWAY crossing, and it
must be named NWAY if you intend on using the SAS/Warehouse
Administrator code generator.
-
For an OLAP Table, you can define a crossing and
not specify any class columns in it. The result is summary statistics across
all rows in the input source.
|
When the Process Editor Job for an OLAP object (OLAP
Group, OLAP Table, or OLAP MDDB) is executed, SAS/Warehouse
Administrator loads the summary data store with the derived,
summarized data. As each crossing is accumulated, all records of the input
data are sorted into groups by the values of the class columns specified for
the crossing. Each group represents a specific subpopulation. Within each
group, the summary statistics are derived from the analysis columns in the
input data and stored in the summary data store statistic columns. A single
record is written to the crossing for each subpopulation.
Copyright © 2012 by SAS Institute Inc., Cary, NC, USA. All rights reserved.