Examples

Example 1: Processing Data in a DATETIME Range for Which a DATETIME and IDVAR Was Already Processed

This example describes how to process data into a table that was not used earlier or that is in a table that was accidentally deleted.
To process data in a datetime range for which a particular DATETIME and IDVAR variable was already processed, select a value of FORCE for the Duplicate checking options parameter. (That parameter is located on the Duplicate Checking page of the Staging Parameters tab of the staging transformation.) The parameter specifies that the data is accepted, even though it appears to be duplicate data.

Example 2: Processing Data for Two IBM SMF Files

This example describes how to process data for two separate IBM SMF files. In this example, a site has two separate IBM SMF contiguous data sets for the same IDVAR variable. It attempts to stage them into a supplied SAS IT Resource Management table. The first IBM SMF input file covers a datetime stamp range from 9:00 a.m. to 10:00 a.m.; the second contains a range from 10:00 a.m. to 11:00 a.m.
In addition, the following conditions apply to this example:
  • The Duplicate checking option parameter is set to DISCARD for both jobs.
  • The records that describe the last IBM SMF interval are split across the two SMF data sets.
  • Both of these IBM SMF files are processed for the same IT data mart.
  • A 15-minute IBM SMF interval is being used—for example, an interval that starts at 9:45 a.m. and ends at 9:59:59 a.m. The RMF 70-79 records that describe all the statistics in this same interval are partially written in the first data set. When the SYS1.MAN1 data set is full, the RMF 70-79 records that start at 9:59:59 a.m. are written in the SYS1.MAN2 data set.
  • Each time an IBM SMF file is read, the data is processed into SAS IT Resource Management and aggregated.
There are two methods that you can use to process this data:
  • Load all IBM SMF data into a single job, concatenating the two IBM SMF data sets. This is the preferred method because no data is lost.
  • Load IBM SMF data in two unique JCL steps or batch jobs. Each job should read one IBM SMF file at a time.
    Note: In each method, the first IBM SMF data set is loaded into the IT data mart. The duplicate-data checking macros mark the data as correctly loaded from 9:45:00 a.m. to 9:59:59 a.m. However, the second method processes the SMF data files individually. Those records that match the DATETIME and IDVAR parameters that range from 9:45:00 a.m. to 9:59:59 a.m. are considered duplicates and are discarded.

Example 3: Using Macro Variables to Subset Data for HP Reporter, MS SCOM, SAS EV, and VMware Adapters

Duplicate-data checking is always enabled for the HP Reporter, MS SCOM, and VMware adapters. Therefore, the staging transformations for those adapters do not provide a parameter for duplicate-data checking. However, you might want to override the default action that is specified by the duplicate-data checking parameter.
SAS IT Resource Management provides two macro variables that enable you to subset data for these adapters: ITRM_LoadFromDate and ITRM_LoadToDate. These macro variables enable you to override the default action of subsetting the incoming data that is based on the duplicate-data control data sets.
The ITRM_LoadFromDate and ITRM_LoadToDate macro variables can be used in the following situations:
  • to backload data into tables that are added to a staging job after it has already run once against a given set of data
  • to specify a datetime range to use during staging to extract only the data from the input database whose datetime stamps fall within the specified range.
Note: When the ITRM_LoadFromDate and ITRM_LoadToDate macro variables are set, the duplicate-data checking code is still executed. SAS IT Resource Management discards any data that is detected as duplicates.
The following code sets the ITRM_LoadFromDate and ITRM_LoadToDate macro variables to valid start and end datetime values. These values are used to subset the data from the database instead of the ranges in the duplicate-data control data sets. This code should be added to the generated code or to the deployed job code for the staging job:
%let ITRM_loadFromDate=14FEB2010:00:00:00; 
%let ITRM_loadToDate=15FEB2010:23:59:00;
Note: When these macro variables are used with the VMware adapter, you must specify the values for these macro variables in Coordinated Universal Time (UTC). (UTC time is the same as Greenwich Mean Time (GMT).)
Note: The SAP ERP and SAS EV adapters do not support using macro variables to backload or subset data based on specific datetime ranges. These adapters can process new data only by using duplicate-data control data sets. If you want to backload data, you must delete the duplicate-data control tables. The staging job can then process all the data that is available in the database or in the raw data tables.
The following list explains when the data is subset:
  • For the HP Reporter adapter, the data is subset as it is extracted from the raw data tables.
  • For the MS SCOM adapter, the data is subset while extracting from the database.
  • For the SAS EV adapter, the data is subset after it is initially extracted from the raw data tables.
  • For the VMware adapter, the data is subset after it is initially extracted from the raw data tables.
For information about backloading, see How to Backload Raw Data.

Example 4: Using Macro Variables to Process Observations with Equal Timestamps

The duplicate checking process discards an observation whose timestamp value is less than or equal to the upper limit of a datetime range. However, in some cases, SMF records with exactly the same datetime stamp are split across tapes. In these circumstances, the first few records of the second tape would be discarded because they match the criteria of being less than or equal to the upper limit of an existing datetime range.
You can prevent these records from being discarded by specifying the following two macro variables:
  • CPDUP_ALLOW_EQ
    When set to Yes, this macro variable specifies whether observations with equal timestamp values can be processed when they are the first records processed for the value that is specified in the IDVAR parameter. The default value for this macro variable is No.
  • CPDUP_ALLOW_MSG
    This macro variable specifies whether messages should be produced when both of the following conditions are met:
    • CPDUP_ALLOW_EQ is set to Yes.
    • observations with equal timestamps are encountered.
    The default value for this macro variable is No.
    The following text shows the format of the messages that are produced:
    Potential duplicate has been processed _n_=<observation number> 
     system=<system name> 
     smftime=<SMF timestamp>
To set these macro variables, add the following statements to the beginning of the batch job:
%let CPDUP_ALLOW_EQ=YES;
%let CPDUP_ALLOW_MSG=YES;

The following text is an example of the messages that might be produced:
Potential duplicate has been processed _N_=28833 SYSTEM=ABC1 ID=30 
SMFTIME=17MAR2012:01:00:00.00
Note: If the macro variables are not specified in the batch job, their absence is interpreted as No.