IMSTAT Procedure

SCHEMA Statement

The SCHEMA statement is used to define a simple star schema in the server from a single fact table and one or more dimension tables.

Example:

Creating a Star Schema

Syntax

Required Argument

SCHEMA Statement Options

Details

Assumptions

About Views and Tables

Syntax

SCHEMA dim-specification1 <dim-specification2 ...> </ options>;

Required Argument

dim-specification

specifies how to use the dimension table with the fact table. You must specify the variables to use as keys for the fact table (fact-key) and the dimension table (dim-key). The variables do not need to have the same name, but they do need to have the same type.

dim-table-name (fact-key = dim-key </ <PREFIX = dim-prefix>
<TAG='server-tag'> <, variable-list>>)

dim-table-name

specifies the name of the dimension table.

fact-key

specifies the variable name in the fact table to use.

dim-key

specifies the variable name from the dimension table to use.

PREFIX=dim-prefix

specifies a prefix to use for naming variables in the schema. If you do not specify PREFIX=, then up to the first sixteen characters of the dim-table-name are used as the dimension prefix for naming the variables in the schema.

Alias

NAME=

TAG='server-tag'

specifies the server tag to use for identifying the dimension table.

variable-list

specifies the variables from the dimension table to join with the fact table. By default, all variables except the dimension key are transferred from the dimension table. The dimension key is never transferred because a corresponding value is available through the fact-key.

SCHEMA Statement Options

MODE=VIEW | TABLE

specifies whether the rows of the schema are materialized when the statement executes in the server. The default is MODE=VIEW and implies that the server resolves the relations in the tables but defers the resolution (formation) of the rows until the view is accessed. If you specify MODE=TABLE, then the table is resolved (flattened) when the statement executes. A view consumes much fewer resources (almost none), but data access is slower compared to a flattened table.

Default

VIEW

SAVE=table-name

saves the result table so that you can use it in other IMSTAT procedure statements like STORE, REPLAY, and FREE. The value for table-name must be unique within the scope of the procedure execution. The name of a table that has been freed with the FREE statement can be used again in subsequent SAVE= options.

TEMPEXPRESS="SAS-expressions"

TEMPEXPRESS=file-reference

specifies either a quoted string that contains the SAS expression that defines the temporary variables or a file reference to an external file with the SAS statements.

Alias

TE=

TEMPNAMES=variable-name

TEMPNAMES=(variable-list)

specifies the list of temporary variables for the request. Each temporary variable must be defined through SAS statements that you supply with the TEMPEXPRESS= option.

Alias

TN=

Details

Assumptions

About Views and Tables

Assumptions

The server makes the following assumptions with regard to fact and dimension tables:

Dimension tables do not have repeat values for the dimension keys. If a key value appears multiple times, the first record that matches the key is used.
The relation between the fact table and a dimension table is expressed by one pair of keys. That is, one variable in the fact table defines the relation to one variable in the dimension table.
The variable names of keys in the fact table and dimension tables do not have to be the same.
The look ups for the keys are performed based on raw values with fuzzing for doubles.
If the fact table or a dimension table contains pre-levelized CLASS variables, the class-variable signature is removed when the schema is created.
Partitioning and order-by information is preserved when the schema is created. However, only partitioning of the fact table is taken into consideration and the resulting table or view is partitioned by the same variables as the fact table.
The relations are resolved when the schema is created. This strategy makes passes through the data more efficient.

About Views and Tables

When the SCHEMA statement executes, a temporary table is generated and the name of that temporary table is returned to the client as the result set. You use the &_TEMPLAST_ macro variable to refer to the star schema.

By default, the server creates a view from the schema definition. The temporary table then has columns for all variables in the schema. The relations have been resolved, but the rows of the view have not been formed.

You can request that row resolution takes place when the temporary table is formed. The result is a flattened temporary table where the rows of the schema are materialized.

There are advantages and disadvantages to using views and flattened tables. The primary consideration is whether there is enough memory for the data volume. The following list identifies some of the considerations:

A flattened table consumes memory when the statement executes. If the memory requirement of the fully joined schema exceeds the capacity of the machine, the statement fails. For example, if you intend to work with relations that expand to seven terabytes of memory, then you cannot flatten the table unless you have that much memory on your system.
If a flattened table can be held in memory, data access is faster because it is a regular in-memory table.
A view does not consume memory until it is accessed. At that time, the table is never materialized fully in memory. The joined rows are formed only when a buffer is needed. This enables you to work with views that exceed the memory capacity of the system.
The performance difference between resolving a view at run time and accessing a flattened table is difficult to quantify. It depends, for example, on the number of columns to resolve and the data access pattern. A request that passes through the data multiple times suffers a greater performance hit (compared to a flat table) than a single-pass request.

Some operations are not supported with views (but are supported with materialized schemas):

You cannot append tables or rows to a view.
You cannot perform row updates of the view.
You cannot re-partition a view.
You cannot use a view in another schema.

If a view is based on a partitioned fact table and you want to change the partition key, then re-partition the fact table and re-create the view with another SCHEMA statement.

A view is static. For example, if you append rows to the fact table, the append operation succeeds and every new access to the fact table can use the appended rows. However, the view is not affected by the addition of rows to the fact table. The view resolves to the state of the fact table when the view was formed.

If you want a schema to change with appends and updates, then you can materialize it and then append or update the flattened table. Likewise, you can append or update the fact table and dimension tables, drop the view, and re-create it.

Using a view as the fact table or as a dimension table in a SCHEMA statement is not supported.