What Is Duplicate-Data Checking?

As raw data is being read, one of the macros that performs duplicate-data checking reviews the datetime information in each record and stores the information in a SAS data set called a temporary control data set. Later, by using intermediate control data sets, another macro merges the information in the temporary control data set into one or more SAS data sets that are called permanent control data sets.

When additional data is processed into the IT data mart, the timestamps of the incoming data are compared with the datetime information in the permanent control data sets in order to determine whether the new data has already been processed. If it has, the duplicate data is handled in the way that you specify.

A duplicate-data report is printed in the SAS log after the data is read. The report describes how many records were read for each machine or system and how many duplicates were found, if any.

Note: The first time that you use the macros, the permanent control data sets have not been built, so the macro %RMDUPCHK cannot check the input records. Your data is not checked or rejected for duplicates, but the permanent control data sets are created and the datetime information for this data is saved to them. Data is checked only on datetime, although SMF data is also checked for the system name. ( For example, if you try to add a new record type, but you have already read other record types from that adapter for that time period, the records will not be kept.) The duplicate-data report contains only a limited amount of information about your data.