Staging Parameters

The following topics describe the various parameters that adapters use to stage the raw data. The appropriate combination of these staging parameters is available from the Adapter Setup wizard and from the Staging Parameters tab of the Properties dialog box for an adapter or staging transformation. If a parameter includes a browse function, the browse function might be disabled in some instances such as when the input data is associated with an application server that resides on either a traditional MVS operating environment or a UFS file system.

Age Limit for Data Acquisition Table

Age limit for data acquisition table (in days) specifies the number of days that data is kept in the data acquisition table that populates the staged tables.
This parameter is relevant only to the VMware Data Acquisition because other staging transformations purge the existing data of their corresponding staged tables and replace it with new data on execution of the staging job. However, the VMware Data Acquisition staging transformation populates staged tables that are not purged on execution of the staging job. Therefore, this parameter enables you to specify an age limit for the data in this table, after which the data is purged when the staging job executes, so that the table does not grow indefinitely.

Allow Duplicate ID Variables

Allow duplicate ID variables specifies whether duplicate ID variables are permitted when transposing data for the adapter. The value for this parameter can be Y (yes) or N (no). N is the default value.

Choose Access Command

Choose access command, relevant only to the SNMP adapter, specifies the type of raw data such as NNM, NetView, or RRDTool. The following two options for this parameter also govern other staging parameters that are available for configuration:
  • HP NNM / Netview, the default value, indicates that the raw data is from HP NNM or NetView. If you select this value, then the corresponding field snmpColDump executable appears and enables you to specify the location path and command for snmpColDump. The snmpColDump executable parameter does not have a default.
    Note: If you select HP NNM / Netview, then you must have HP OpenView installed on the server where the staging job is running either interactively or in batch mode.
  • RRDTool indicates that the raw data is stored in a Round Robin Database (RRD) that is managed by RRDTool. If you select this value, then the following fields appear:
    • rrdtool executable appears and enables you to specify the location path and command for RRDTool. The rrdtool executable parameter does not have a default.
    • Number of days to load specifies the number of days of raw data that you want to backload into the staged table. This field accepts only integers from one to 365. The default value is 2.

Choose Raw Data Input Type

Choose raw data input type specifies whether the raw data is a file or directory that is available from the client network or if the raw data input is available using an FTP access method. The following options for this parameter also govern several of the other staging parameters that are available for configuration. The options that are available for selection vary based on the adapter.
  • File indicates that the raw data input for the adapter is a file. If you select this value, then the corresponding field Raw data input file or directory appears and enables you to specify the full pathname of the raw data file or the directory. You can enter the path directly or use the browsing function to locate the path and enter it automatically. This parameter does not have a default value.
  • File or directory indicates that the raw data input for the adapter is a file or directory. If you select this value, then the corresponding field Raw data input file or directory appears and enables you to specify the full pathname of the raw data file or the directory. You can enter the path directly or use the browsing function to locate the path and enter it automatically. This parameter does not have a default value.
  • FTP indicates that the raw data input for the adapter is available using the FTP access method. For more information about additional parameters that are required if you select the FTP access method, see FTP.

Default Duration

Default duration, relevant only to the HP Reporter adapter, specifies the value (in seconds) for the duration of the intervals if the input data does not contain an INTERVAL variable. If the input data does not contain an INTERVAL variable and a value for the default duration is not specified, then the interval duration defaults to 3600 seconds, or one hour.
This value must be numeric. If specified, the integer value must be greater than or equal to 1 and less than or equal to 2,147,483,647.

Delimiter in Raw Data

Delimiter in raw data specifies the delimiter (Space, Comma, or Tab) that is used in the raw input data. The values that are available for this parameter vary based on the adapter.

Duplicate Checking

Duplicate checking specifies whether to check for duplicate data and indicates what to do when duplicate data is encountered.
Note: Duplicate checking is automatically enabled for the adapters that use a database as a raw data source (such as HP Reporter, SAP ERP, MS SCOM, and VMware). This parameter is not available for configuration with these staging transformations. For more information about how to override the default value for duplicate checking with these adapters, see Example 3: Using Macro Variables to Subset Data for HP Reporter, MS SCOM, and VMware Adapters.
Here are the values that are available for this parameter:
Discard
removes duplicates. This is the default value for most adapters.
Force
loads data regardless of whether duplicates are found.
Inactive
does not check for duplicates.
Terminate
ends the job if duplicates are found.
CAUTION:
Although the job terminates, the resulting staged table might contain data. However, termination of the job indicates an error in processing, and any data in the resulting staged table might be invalid.
For more information about duplicate data checking, see Duplicate-Data Checking Overview.
Note: If you update the source code of a job and modify the mode for duplicate checking, then you can save the functionality of the new source code to the local file system. However, SAS Data Integration Studio preserves the original source code. The new value of the mode for duplicate checking is not updated in the repository, and the new value is not reflected in the user interface.

FTP

Overview

If you select FTP for the Choose raw data input type parameter, then additional parameters appear on the parameters page to facilitate the FTP process. The following topics describe these additional parameters that are available for configuration:

Host

Host specifies the name of the remote host. This parameter does not have a default value.

Port

Port specifies the FTP port number. The default value is 21.

User

User specifies the user ID for the FTP server. This field accepts alphanumeric characters. This parameter does not have a default value.

Password

Password specifies the password of the given user ID for the FTP server. This parameter does not have a default.
CAUTION:
The Adapter Setup wizard generates a job log that displays all information that you specify in the wizard, including any passwords for accessing FTP data.
If you have concerns about this password showing in the job log, then you can set up a user ID that you use only for accessing the files via FTP.

External File Name

External file name specifies the filename of the raw data. The maximum number of characters in this field (including the dots in a z/OS filename, if applicable) is 44. This parameter does not have a default value.

Tape

Tape specifies whether the data file is on tape. The default value is No.

RCMD

RCMD specifies the FTP SITE or service command that is sent to the FTP server to provide services that are system-specific or essential to transfer files but not common enough to be included in the protocol. The default value is SITE RDW.

Debug

Debug specifies whether to write to the SAS log any messages that are sent to and from the FTP server. The default value is No.

Future Data

Future data specifies whether to check for future data and indicates what to do when it is encountered. This parameter specifies the FUTURE parameter in the RMFUTURE macro.
This parameter controls the processing of incoming data that has a datetime variable that is greater than 48 hours in the future. (That is, the datetime variable is more than 48 hours after the current time on the system where data is being staged for processing into the IT data mart.) The 48-hour buffer provides for different time zones, daylight saving time, Greenwich Mean Time, and so on.
If future data is encountered, a note is written to the SAS log. This note provides the future data option that is selected, shows the datetime that was encountered, and explains the status of the future data, such as whether it was added to the IT data mart or if the job was terminated.
Here are the values that are available for this parameter:
Accept
specifies that incoming data is staged for processing and is processed into the IT data mart. If any of the data has a datetime value of 48 hours or more in the future, then a note that future data was encountered is written to the SAS log. This value enables an IT data mart to accept future data. For example, you might want to use this setting to perform end-of-year testing with a test IT data mart.
Note: Age limits take effect from the most recent data. Therefore, dates in the future might cause at least some of the existing data to be aged out of the IT data mart.
Discard
specifies that data with a datetime value of 48 hours or more in the future is not staged for processing and is not processed into the IT data mart. This value is the default. This value prevents future data from being processed into the IT data mart. Future data might cause existing data to be aged out (the existing data would appear to be older than it is, in comparison with the future data).
Terminate
specifies that if any incoming data has a datetime value of 48 hours or more in the future, then staging of the data stops, an error message is written to the SAS log, and the job terminates. This value prevents future data from being processed into the IT data mart, which might cause existing data to be aged out (the existing data would appear to be older than it is, in comparison with the future data). This value stops processing and thus calls more attention to the future data than the Discard value.
CAUTION:
Although the job terminates, the resulting staged table might contain data. However, termination of the job indicates an error in processing, and any data in the resulting staged table might be invalid.

JES

JES specifies the version (JES2 or JES3) of the z/OS job entry subsystem that is in use by the system where the input data was recorded.

Logical Record Length

Logical record length specifies the logical record length for the input file. If specified, the parameter value must be a positive integer greater than one. This parameter is pertinent to adapters that are z/OS based for raw data sources.
For several adapters, the default value is left blank to indicate that the logical record length is dependent on the type of input file and operating system that is used.

Machine

Machine specifies the name of the machine that generated the raw data. This parameter is primarily relevant to the Web Log adapter. If the Web Log data already contains a value for machine name, then the value in the raw data is used with the staged table and you do not need to specify a value for this parameter. If the Web Log data does not already specify a machine name, then you can use this parameter to specify a machine name and that value is then associated with the staged table. This parameter does not have a default value.

Presummarization Duration

Presummarization duration specifies the duration, in seconds, of the intervals into which you want to summarize the raw data before it is staged. For example, if you enter 3600, then the raw data is summarized in intervals of one hour. If specified, the value must be a positive integer less than or equal to 86,400, which is the number of seconds in a day.

Raw Data Input Directory

Raw data input directory specifies the full pathname of the directory for the raw data. You can enter the path directly or use the browsing function to locate and select it.

Raw Data Input File

Raw data input file specifies the full pathname of the raw data file for the adapter. You can enter the path directly or use the browsing function to locate and select it.

Raw Data Input Library

Raw data input library includes two fields (Library and Libref) that specify the SAS library and corresponding libref for the appropriate adapter database. You can enter the library path or use the browsing function to locate and select it.
Note: For the HP Reporter adapter, this parameter supports only libraries with engine types of ODBC, OLE DB, and Oracle. For the SAP ERP adapter, this parameter supports only libraries with engine types of SAS or BASE.

rsh Host Command

rsh host command, relevant only to the SNMP adapter, specifies the rsh command and the name of the host for running the NNM snmpColDump command or the rrdtool command. The command in this field is used at the beginning of the rrdtool, snmpColDump, or snmpwalk commands. This field can be left blank if an rsh command is not required to run the rrdtool or snmpColDump command.

Site Name

Site name, relevant to the Web Log adapter, specifies the name of the Web site that generated the raw data. If the Web Log data already contains a Web site value, then the value in the raw data is used with the staged table and you do not need to specify a value for this parameter. If the Web Log data does not already specify a Web site, then you can use this parameter to specify a Web site value and that value is then associated with the staged table. This parameter does not have a default value.

Use Intermediate Staging View

Use intermediate staging view specifies whether to use the temporary view form when instantiating staged tables. Enter the staged table name or leave this field blank.
This optional parameter can contain the SAS data set name of a staged table or a macro variable that resolves to a SAS data set name. If not specified, this parameter defaults to a blank, which implies that you do not want presummarization.

Use snmpwalk to Gather Character Data

Overview

Use snmpwalk to gather character data specifies whether to use the snmpwalk command to capture data during the execution of the job. The following values are available:
  • No does not use the snmpwalk command.
  • Yes uses the snmpwalk command. If you select this value, then additional parameters appear. The following subtopics describe each of these additional parameters.
This parameter and others associated with snmpwalk are relevant only to the SNMP adapter.

snmpwalk Executable

The snmpwalk executable parameter indicates the location and command for snmpwalk. This parameter does not have a default value.

HostFile for snmpwalk

HostFile for snmpwalk indicates the location of the snmphost file that lists the hosts from which you want to get data. This parameter does not have a default.

Community Value for snmpwalk

Community value for snmpwalk indicates the community name that is required to run the snmpwalk command. The default value is public.

User-written Staging Parameters

For a detailed description of the staging parameters for the User-written Staging transformation, see Staging Parameters Tab.