Data extraction is an integral
part of all data warehousing projects. Data is often extracted on
a nightly or regularly scheduled basis from transactional systems
in bulk and transported to the data warehouse. Typically, all the
data in the data warehouse is refreshed with data extracted from the
source system. However, an entire refresh involves the extraction
and transportation of huge volumes of data and is very expensive in
both resources and time. With data volumes now doubling yearly in
some organizations a new mechanism known as change data capture (CDC)
is increasingly becoming the only viable solution for delivering timely
information into the warehouse to make it available to the decision
makers. CDC is the process of capturing changes made at the data source
and applying them throughout the enterprise. CDC minimizes the resources
required for ETL processes because it deals only with data changes.
The goal of CDC is to ensure data synchronicity. SAS offers a number
of CDC options.
-
Some database vendors (Oracle 10g)
provide tables of just changed records. These tables can be registered
in SAS Data Integration Studio and used in jobs to capture changes.
-
SAS Data Integration Studio allows
the user to determine changes and take appropriate action.
-
SAS has partnered with Attunity, a company that specializes
in CDC. Their Attunity Stream software provides agents that non-intrusively
monitor and capture changes to mainframe and enterprise data sources
such as VSAM, IMS, ADABAS, DB2, and Oracle. SAS Data Integration
Studio provides a dedicated transformation for Attunity.
The Attunity based solution
does the following:
-
moves only CHANGES to the data
-
requires no window of operation
-
provides higher frequency and reduced
latency transfers. It is possible for multiple updates each day, providing
near-real-time continuous change flow.
-
reduces the performance impact
of the following activities:
-
rebuilding of target table indexes
-
recovering from a process failure
that happens mid-stream