As raw data is being
read, one of the macros that performs duplicate-data checking reviews
the datetime information in each record and stores the information
in a SAS data set called a temporary control data set. Later, by
using intermediate control data sets, another macro merges the information
in the temporary control data set into one or more SAS data sets that
are called permanent control data sets.
When additional data
is processed into the IT data mart, the timestamps of the incoming
data are compared with the datetime information in the permanent control
data sets in order to determine whether the new data has already been
processed. If it has, the duplicate data is handled in the way that
you specify.
A duplicate-data report is printed in the SAS log after the data is
read. The report describes how many records were read for each machine
or system and how many duplicates were found, if any.
Note: The first time that you use
the macros, the permanent control data sets have not been built, so
the macro %RMDUPCHK cannot check the input records. Your data is not
checked or rejected for duplicates, but the permanent control data
sets are created and the datetime information for this data is saved
to them. Data is checked only on datetime, although SMF data is also
checked for the system name. ( For example, if you try to add a new
record type, but you have already read other record types from that
adapter for that time period, the records will not be kept.) The duplicate-data
report contains only a limited amount of information about your data.