About Slowly Changing Dimensions

Slowly Changing Dimensions Defined

Slowly changing dimensions (SCD) is the name of a process that loads data into dimension tables. This data changes slowly, rather than changing on a time-based, regular schedule. The dimension tables are structured so that they retain a history of changes to their data. This record of data changes provides a basis for analysis.
As shown in the following diagram, dimension tables combine with fact tables to form star schemas. Fact tables store numeric events. Dimension tables store the detail data that describes the events. Key columns in the tables connect events to details. For example, a star schema might store product sales numbers in a fact table, and use dimension tables to store information about customers, suppliers, and retail locations.
You can use SAS Data Integration Studio to load data into star schemas and analyze data to extract knowledge from the star schema.
The Star Schema and SAS Data Integration Studio
The Star Schema and SAS Data Integration Studio
In SAS Data Integration Studio, the process of loading dimension tables takes place in the SCD Type 1 Loader and SCD Type 2 Loader transformations. Fact tables are loaded with the Lookup transformation.

Types of Slowly Changing Dimensions

The three most common types of slowly changing dimensions are defined as follows:
Type 1 SCD: no history of data changes
overwrites specified columns in dimension tables without retaining a history of changes. Type 1 SCD is useful for maintaining less-significant columns that are not used in historical analysis. In SAS Data Integration Studio, the SCD Type 1 Loader transformation performs Type 1 updates. You can use the SCD Type 2 Loader transformation to combine Type 1 and Type 2 updates in a single operation.
Type 2 SCD: full history of data changes
maintains multiple records for each business key in the dimension table. The latest entry is the current entry for that business key. Other rows comprise the historical record of data changes. New entries create new current rows. This comprehensive record of data changes is the primary purpose of the SCD Type 2 Loader transformation.
Type 3 SCD: limited history of data changes
maintains a limited history of changes using multiple columns for selected variables. For example, a Type 3 dimension table containing customer information has columns named New Postal Code, Old Postal Code, and Oldest Postal Code. Data is moved from column to column during the loading process. Type 3 SCD has less analytical value than Type 2 SCD.

Transformations That Support Slowly Changing Dimensions

SAS Data Integration Studio provides the following transformations that you can use to implement slowly changing dimensions:
SCD Type 1 Loader
inserts new rows, updates existing rows, and generates surrogate key values in a dimension table without maintaining a history of data changes. Each business key is represented by a single row in the dimension table.
SCD Type 2 Loader
inserts new and Type 2 rows, updates existing rows, and generates surrogate key values in a dimension table. At the same, it maintains a full history of data changes. Each business key is represented by a current row and 0 through n number of closed out rows. The closed out rows enable change analysis over time.
Compare Tables
detects differences between matching rows in specified columns in two tables. Outputs include changed, new, unchanged, and missing records tables. These outputs can be used as the basis for performing Type 1 or Type 2 updates in a dimension table.
Lookup
loads source data into fact tables and loads foreign keys from dimension tables, with configurable exception handling. The lookup process accesses dimension tables by using hash objects for optimal performance.
Key Effective Date
updates dimension tables based on changes to the business key, when change detection is unnecessary.
Surrogate Key Generator
generates unique key numbers for dimension tables in a manner that is similar but less feature-rich than the SCD Type 2 Loader transformation. Use the Surrogate Key Generator when key generation is the sole task that is required at that point in the job.

SCD Project Stages

The process for loading a star schema for slowly changing dimensions follows these general steps:
  1. Stage operational data. In this initial step you capture data and validate the quality of that data. Your staging jobs make use of the Data Validation transformation, along with other data quality transformations and processes.
  2. Load dimension tables. Data from the staging area is moved into the dimension tables of the star schema. Dimension tables are loaded before the fact table in order to generate the primary key values that are needed in the fact table.
  3. Load the fact table. In this final step, you run a job that includes the Lookup transformation. This job loads numerical columns from the staging area into the fact table. Then the Lookup transformation captures foreign key values from the dimension tables.