How to Implement Duplicate-Data Checking from a Staging Transformation

To implement duplicate-data checking, perform the following steps:
  1. Right-click the staging transformation and select Properties.
  2. On the Properties dialog box, select the Staging Parameters tab.
  3. On the Duplicate Checking page, ensure that the Enable duplicate checking field is set to Yes. This setting enables you to specify the parameters that govern the duplicate-data checking process.
    Note: The SNMP adapter requires that duplicate checking be turned on. This setting is necessary because neither method of gathering raw data for SNMP (HPNNM and RRDtool) can ensure that only the most recent raw data is saved. Therefore, invoking the duplicate-data checking code of SAS IT Resource Management is the only way to determine what is new data and what is duplicate data.
    If you do not want to implement duplicate-data checking, set the Enable duplicate checking field to No. This setting makes the duplicate-data checking parameters unavailable.
  4. Specify the following parameters:
    • Duplicate checking option
      specifies how duplicate data is handled. Select one of the following options:
      TERMINATE
      stops processing if duplicate data is encountered.
      DISCARD
      continues processing while rejecting duplicate data if it is encountered. This is the default value for this parameter.
      Note: For best results, the value for the Duplicate checking option parameter for the SNMP adapter should always be set to Discard.
      FORCE
      continues processing and accepts duplicate data if it is encountered.
      Note: Duplicate-data checking macros are designed to prevent the same data from being processed into the IT data mart twice. However, sometimes you might need to backload data. Backloading data means to process data that is in a datetime range for which the permanent control data sets have already recorded machine or system data. (For example, you might need to process data into one or more tables that you did not use earlier. You might also need to process data into one or more tables that you accidentally purged or deleted.) Make sure you restore the Duplicate checking option setting to its original value after you finish the backloading task.
    • IDVAR
      identifies the SAS variable that is used to denote the origin of each incoming record.
      Note: This parameter is visible only for the CSV, RRDtool, and user-written adapters.
    • INT
      specifies the maximum time gap (or interval) that is allowed between the timestamps on any two consecutive records from the same system or machine. If the interval between the timestamp values exceeds the value of this parameter, then an observation with the new time range is created in the control data set. This is referred to as a gap in the data.
      The value for this parameter must be provided in the format hh:mm, where hh represents hours and mm represents minutes. For example, to specify an interval of 14 minutes, use INT=0:14. To specify an interval of 1 hour and 29 minutes, use INT=1:29.
    • Keep
      specifies the number of weeks for which control data are kept. Because this value represents the number of Sundays between two dates, a value of 2 results in a maximum retention period of 20 days. This value must be an integer.
    • Report
      The REPORT parameter specifies whether to display the duplicate-data checking messages in the SAS log or to save the messages in an audit table. If set to Yes, this parameter displays all the messages from duplicate-data checking in the SAS log. If set to No, the duplicate-data checking messages are saved in an audit data table that is stored in the staging library. The name of the audit table is sourceAUDIT (where source is the 3-character data source code).
      Note: If you are monitoring very high numbers of resources, setting this option to NO can be beneficial. Eliminating the report reduces CPU consumption, shortens elapsed time, and makes the SAS log more manageable.
Note: Prior to SAS IT Resource Management 3.3, you were required to create catalog entries or files in the MXG source library of your operating system in order to handle duplicate-data checking. Although these members or files are no longer necessary, if they exist, SAS IT Resource Management continues to honor them. However, it is preferable to manage duplicate-data checking by specifying the appropriate values on the staging transformation.