Control data sets (temporary,
intermediate, and permanent) are the basis on which duplicate data
is rejected. Therefore, it is important that you understand how the
control data sets are created, used, updated, and stored.
-
temporary control data sets
When input data is
processed, the %RMDUPCHK macro creates a temporary control data set,
WORK._DUPCNTL, which stores information about the raw data from one
or more adapters. Specifically, for each machine or system that generated
data, the temporary control data set stores the datetime ranges and
record counts in the raw data. If raw data from more than one data
source is processed in the same job, then information is appended
to the temporary control data set with each execution of the %RMDUPCHK macro.
-
intermediate control data sets
When all the input
data has been read and stored in the temporary control data set, the
%RMDUPUPD macro writes the data from the temporary control data set,
WORK._DUPCNTL, to separate intermediate data sets, which are located
in the staging library for that staging job.
Note: The first time a staging
table is added to a staging transformation, the library is created
that will contain the control data information for all the staged
tables that are created by that staging transformation. It is called
adapter-nameStaging
nnnn, where
nnnn is a random number
that ensures that the library name is unique within the IT data mart.
For example, a library name might be “DT Perf Sentry Staging
8926”. (This library also contains other types of data.) If
all the staged tables that are created by that staging transformation
are subsequently deleted, and one or more new staged tables are added
to the transformation, then a new library is created for the new staged tables.
The first library that was created (for the original staged tables)
is not used again. It is not automatically deleted, but you can do
so.
If the data set exists,
then the data set is used and the new data overwrites the old data.
Otherwise, the data set is created and the new data is written to
that data set.
-
permanent control data sets
The data in the intermediate
control data sets is then merged by %RMDUPUPD into the corresponding
permanent control data sets, which are named
sourceCNTRL in the staging library for that staging job. The permanent
control data sets are stored and maintained in the staging library.
One permanent control data set, named
sourceCNTRL, can exist for each adapter. Each data set contains information
about that adapter's machines or systems, datetime ranges, and record
counts.
If a
sourceCNTRL data set exists, the new data is
merged into it. Otherwise, the data set is created and the new data
is written to that data set.