The %RMDUPCHK macro
checks for duplicate data and deletes it. It also builds up record
counts of incoming and deleted data and datetime ranges for each system
or machine. These record counts are stored in the control data set.
If the control data set indicates that a gap was detected in the data,
a report is generated.
The control data set
is stored in the same library as the staged tables. This data set
is created and managed by the %RMDUPxxx macros. (Users do not usually
access this library.)
specifies the name
of the SAS variable that is used as the END= keyword for the SAS INFILE
statement that reads the raw data.
IDVAR=variable-name
specifies the name
of the SAS variable that identifies the system or machine that generated
the input data.
SOURCE=identifier
specifies a unique
three-character code that identifies the type of data.
TIMESTMP=timestamp-variable-name
specifies the name
of the SAS variable that contains the datetime stamp that uniquely
identifies the time of the event or interval being recorded.
The SOURCE entries for
the supported adapters are listed in the following table.
Source Names for Each Adapter
ADAPTER
Value for the SOURCE
Parameter for %RMDUPCHK
ASG TMON2CIC
TM2
ASG TMONDB2
TMD
BMC Mainview
IMF
BMC Perf Mgr
PAT
CA TMS
TMS
DT Perf Sentry
NTS
DT Perf Sentry with
MXG
NTS
HP Perf Agent
(Multiple values are
needed so that %RMDUPCHK can be invoked with each value.)
HP Reporter
(Multiple values are
needed so that %RMDUPCHK can be invoked with each value.)
IBM DCOLLECT
DCO
IBM EREP
ERP
IBM IMS
IMS
IBM SMF
SMF
IBM TPF
TPF
IBM VMMON
VMM
MS SCOM
SCO
SAP ERP
BAT, SAP, and others.
(Multiple values are needed so that %RMDUPCHK can be invoked with
each value.)
SAR
SAR
MS SCOM
SCO
SNMP
SNM
VMware vCenter
(Multiple values are
needed so that %RMDUPCHK can be invoked with each value.)
Web Log
WWW
%RMDUPCHK Options
FORCE=YES | NO
specifies whether duplicate
input data should still be processed, or whether it is a duplicate.
FORCE=YES indicates that, if a
duplicate is detected, the duplicate data should be processed.
FORCE=NO indicates that duplicate
data should not be processed. The default value for this option is NO.
INT=interval
represents the maximum
time gap (or interval) that is to be allowed between the timestamps
on any two consecutive records from the same system or machine. If
the interval between the timestamp values exceeds the value of this
option, then an observation with the new time range is created in
the control data set. This is referred to as a gap in the data.
The value for this
option must be provided in the format hh:mm, where hh represents hours
and mm represents minutes. For example, to specify an interval of
14 minutes, use INT=0:14. To specify an interval of 1 hour and 29
minutes, use INT=1:29.
The default value for
this option is 0:29, or 29 minutes.
KEEP=number-of-weeks
specifies the number
of weeks for which control data will be kept. Because this value represents
the number of Sundays between two dates, a value of 2 (the default)
results in a maximum retention period of 20 days.
The default value for
this option is 2.
TERM=YES | NO
controls whether SAS
terminates if any duplicate input data is detected.
The default value of
this option is NO.
%RMDUPCHK Notes
The Adapter Setup wizard prompts
the user to specify how to handle duplicate records. Valid entries
for the mode of duplicate-data checking are Inactive, Discard, Force,
or Terminate.
Inactive: Duplicate-data checking is not performed. No macros are executed.
Discard: Duplicate-data-checking macros are executed. FORCE=NO and TERM=NO
are implied.
Force: Duplicate-data-checking macros are executed. FORCE=YES and TERM=NO
are implied.
Terminate: Duplicate-data-checking macros are executed. FORCE=NO and TERM=YES
are implied.
You can change the mode
of duplicate-data-checking for a table on the Properties dialog box for that table.