To implement duplicate-data
checking, perform the following steps:
-
Right-click the staging
transformation and select
Properties.
-
On the
Properties dialog
box, select the
Staging Parameters tab.
-
On the
Duplicate
Checking page, ensure that the
Enable duplicate
checking field is set to
Yes
.
This setting enables you to specify the parameters that govern the
duplicate-data checking process.
Note: The SNMP adapter requires
that duplicate checking be turned on. This setting is necessary because
neither method of gathering raw data for SNMP (HPNNM and rrdtool)
can ensure that only the most recent raw data is saved. Therefore,
invoking the duplicate-data checking code of SAS IT Resource Management
is the only way to determine what is new data and what is duplicate
data.
If you do not want to
implement duplicate-data checking, set the Enable duplicate
checking field to No
. This setting
makes the duplicate-data checking parameters unavailable.
-
Specify the following
parameters:
-
Duplicate checking
option
specifies how duplicate
data is handled. Select one of the following options:
TERMINATE
stops processing if
duplicate data is encountered.
DISCARD
continues processing
while rejecting duplicate data if it is encountered. This is the default
value for this parameter.
Note: For best results, the value
for the Duplicate checking option parameter
for the SNMP adapter should always be set to Discard
.
FORCE
continues processing
and accepts duplicate data if it is encountered.
Note: Duplicate-data checking macros
are designed to prevent the same data from being processed into the
IT data mart twice. However, sometimes you might need to backload
data. Backloading data means to process
data that is in a datetime range for which the permanent control data
sets have already recorded machine or system data. (For example, you
might need to process data into one or more tables that you did not
use earlier or into one or more tables that you accidentally purged
or deleted.) Make sure you restore the Duplicate checking
option setting to its original value after you finish
the backloading task.
-
IDVAR
identifies the SAS
variable that is used to denote the origin of each incoming record.
Note: This parameter is visible
only for the CSV, RRDtool, and user-written adapters.
-
INT
specifies the maximum
time gap (or interval) that is allowed between the timestamps on any
two consecutive records from the same system or machine. If the interval
between the timestamp values exceeds the value of this parameter,
then an observation with the new time range is created in the control
data set. This is referred to as a gap in
the data.
The value for this
parameter must be provided in the format hh:mm,
where hh represents hours and mm represents
minutes. For example, to specify an interval of 14 minutes, use INT=0:14
.
To specify an interval of 1 hour and 29 minutes, use INT=1:29
.
-
Keep
specifies the number
of weeks for which control data are kept. Because this value represents
the number of Sundays between two dates, a value of 2 results in a
maximum retention period of 20 days. This value must be an integer.
-
Report
The REPORT parameter
specifies whether to display the duplicate-data checking messages
in the SAS log or to save the messages in an audit table. If set
to Yes
, this parameter displays all
the messages from duplicate-data checking in the SAS log. If set
to No
, the duplicate-data checking
messages are saved in an audit data table that is stored in the staging
library. The name of the audit table is sourceAUDIT
(where source is the 3-character
data source code).
Note: If you are monitoring very
high numbers of resources, setting this option to NO
can
be beneficial. Eliminating the report reduces CPU consumption, shortens
elapsed time, and makes the SAS log more manageable.
Note: Prior to SAS IT Resource
Management 3.3, you were required to create catalog entries or files
in the MXG source library of your operating system in order to handle
duplicate-data checking. Although these members or files are no longer
necessary, if they exist, SAS IT Resource Management continues to
honor them. However, it is preferable to manage duplicate-data checking
by specifying the appropriate values on the staging transformation.