Staging Parameters

The following topics describe the various parameters that adapters use to stage the raw data. The appropriate combination of these staging parameters is available from the Adapter Setup wizard. These parameters are also available from the Staging Parameters tab of the Properties dialog box for an adapter or staging transformation. If a parameter includes a browse function, that function might be disabled in some instances. For example, browsing might be disabled when the input data is associated with an application server that resides on either a traditional z/OS operating environment or a zFS file system.

Age Limit for Data Acquisition Table

Allow Duplicate ID Variables

Choose Access Command

Choose Raw Data Input Type

Class Columns

Consolidation Function (CF)

Default Duration

Delimiter Characters

Delimiter in Raw Data

Delimiter String

Duplicate Checking Options

Enable Duplicate Checking

How Many Rows of Data Should Be Used as Guessing Rows

IDVAR

Is the Delimiter String Case Sensitive

Is There a Header Row

Input File Parameters

INT

JES

Keep

Library for Temporary Work Space

Machine

Minimum Number of Files to Read on Each Processor

Normalize Datetime

Number of Processors to Use

Presummarization Duration

Raw Data Input Directory

Raw Data Input File

Raw Data Input Library

Report

rrdtool fetch -end option

rrdtool Executable

rrdtool fetch -start option

RSH/SSH Host Command

Site Name

Source

Temporary Work Space Library

TIMESTMP

Type of Delimiter

Use Intermediate Staging View

Use snmpwalk to Gather Character Data

Overview

snmpwalk Executable

HostFile for snmpwalk

Community value for snmpwalk

What Row Does the Data Start On

What Row Is the Header On

User-written Staging Parameters

Age Limit for Data Acquisition Table

Age limit for data acquisition table (in days) specifies the number of days that data is kept in the data acquisition table that populates the staged tables.

This parameter is relevant only to the VMware Data Acquisition because other staging transformations purge the existing data of their corresponding staged tables and replace it with new data on execution of the staging job. However, the VMware Data Acquisition staging transformation populates staged tables that are not purged on execution of the staging job. This parameter enables you to specify an age limit for the data in this table, after which the data is purged when the staging job executes. Thus, the table does not grow indefinitely.

Allow Duplicate ID Variables

Allow duplicate ID variables specifies whether duplicate ID variables are permitted when transposing data for the adapter. The value for this parameter can be Y (yes) or N (no). N is the default value.

Choose Access Command

Choose access command specifies the type of raw data such as NNM, NetView, or RRDtool. The following two options for this parameter also govern other staging parameters that are available for configuration:

HP NNM / Netview, the default value, indicates that the raw data is from HP NNM or NetView. If you select this value, then the corresponding field snmpColDump executable appears. You can then specify the location path and command for snmpColDump. The snmpColDump executable parameter does not have a default.

Note: If you select HP NNM / Netview, then you must have installed either of those products on the server where the staging job is running either interactively or in batch mode.
RRDtool indicates that the raw data is stored in a round-robin database (RRD) that is managed by RRDtool. If you select this value, then the following fields appear:
- rrdtool executable appears and enables you to specify the location path and command for RRDtool. The rrdtool executable parameter does not have a default.
- Number of days to load specifies the number of days of raw data that you want to backload into the staged table. This field accepts only integers from one to 365. The default value is 2.

Choose Raw Data Input Type

Choose raw data input type specifies whether the raw data is a file or directory that is available from the client network or if the raw data input is available using an FTP access method. The following options for this parameter also govern several of the other staging parameters that are available for configuration. The options that are available for selection vary based on the adapter.

File indicates that the raw data input for the adapter is a file. If you select this value, then the corresponding field Raw data input file or directory appears. You can then specify the full pathname of the raw data file or the directory. You can enter the path directly or use the to locate the path and enter it automatically. This parameter does not have a default value.
File or directory indicates that the raw data input for the adapter is a file or directory. If you select this value, then the corresponding field Raw data input file or directory appears. You can then specify the full pathname of the raw data file or the directory. You can enter the path directly or browse to locate the path and enter it automatically. This parameter does not have a default value.
FTP indicates that the raw data input for the adapter is available using the FTP access method. For more information about additional parameters that are required if you select the FTP access method, see FTP.

Class Columns

Class columns specifies the list of additional class columns that you want to use when staging the data. If any of these columns are computed columns, they must be based on the filename or datetime columns. This parameter requires a space-delimited list of class columns. This parameter is relevant only to the RRDtool adapter.

Consolidation Function (CF)

Consolidation Function (CF) specifies the consolidation function values that should be retrieved from the round-robin database (RRD). Select the function whose data you want to fetch from the drop-down menu. You can also enter a value or leave the field blank. If the value is blank, the FETCH command retrieves all the consolidation functions that are in the round-robin database. This parameter is relevant only to the RRDtool adapter.

Default Duration

Default duration specifies the value (in seconds) for the duration of the intervals if the input data does not contain an INTERVAL variable. If the input data does not contain an INTERVAL variable and a value for the default duration is not specified, then the interval duration defaults to 3600 seconds, or one hour. This parameter is relevant only to the HP Reporter adapter.

This value must be numeric. If specified, the integer value must be greater than or equal to 1 and less than or equal to 2,147,483,647.

Delimiter Characters

Delimiter characters specifies one or more characters that are used as delimiters in the raw input data. If you enter several characters (for example, !#*), then each of these characters is treated as a delimiter in your data. A value is not required for this parameter. This parameter is relevant only to the CSV adapter.

Note: The selected characters must not be separated from each other by spaces or any other character.

Delimiter in Raw Data

Delimiter in raw data specifies the delimiter (Space, Comma, or Tab) that is used in the raw input data. The values that are available for this parameter vary based on the adapter.

Delimiter String

Delimiter string specifies a string of characters that is used as the delimiter in the raw input data. A value is not required for this parameter. This parameter is relevant only to the CSV adapter.

Duplicate Checking Options

Duplicate checking options specifies whether to check for duplicate data and indicates what to do when duplicate data is encountered.

Note: Duplicate checking is automatically enabled for the adapters that use a database as a raw data source (such as HP Reporter, SAP ERP, SAS EV, MS SCOM, and VMware). This parameter is not available for configuration with these staging transformations. For more information about how to override the default value for duplicate checking with these adapters, see Example 3: Using Macro Variables to Subset Data for HP Reporter, MS SCOM, SAS EV, and VMware Adapters.

Here are the values that are available for this parameter:

Discard

removes duplicates. This is the default value for most adapters.

Force

loads data regardless of whether duplicates are found.

Terminate

ends the job if duplicates are found.

CAUTION:

Although the job terminates, the resulting staged table might contain data. However, termination of the job indicates an error in processing, and any data in the resulting staged table might be invalid.

For more information about duplicate data checking, see Duplicate-Data Checking Overview.

Note: If you update the source code of a job and modify the mode for duplicate checking, then you can save the functionality of the new source code to the local file system. However, SAS Data Integration Studio preserves the original source code. The new value of the mode for duplicate checking is not updated in the repository, and the new value is not reflected in the user interface.

Enable Duplicate Checking

Enable Duplicate Checking specifies whether to perform duplicate checking of the data. If this parameter is set to Yes, additional duplicate checking parameters appear and are enabled for specification. These parameters are INT, Keep, and report. If this parameter is set to No, no other duplicate checking parameters appear.

ENDFILE

ENDFILE specifies the name of the SAS variable that is used as the END= keyword for the SAS INFILE statement that reads the raw data.

FTP

Overview

If you select FTP for the Choose raw data input type parameter, then additional parameters appear on the parameters page to facilitate the FTP process. The following topics describe these additional parameters that are available for configuration:

Host

Host specifies the name of the remote host. This parameter does not have a default value.

Port

Port specifies the FTP port number. The default value is 21.

User

User specifies the user ID for the FTP server. This field accepts alphanumeric characters. This parameter does not have a default value.

Password

Password specifies the password of the given user ID for the FTP server. This parameter does not have a default.

CAUTION:

The Adapter Setup wizard generates a job log that displays all information that you specify in the wizard, including any passwords for accessing FTP data.

If you have concerns about this password showing in the job log, then you can set up a user ID that you use only for accessing the files via FTP.

External File Name

External file name specifies the filename of the raw data. The maximum number of characters in this field (including the dots in a z/OS filename, if applicable) is 44. This parameter does not have a default value.

Tape

Tape specifies whether the data file is on tape. The default value is No.

RCMD

RCMD specifies the FTP SITE or command that is sent to the FTP server to provide services that are system-specific or essential to transfer files but not common enough to be included in the protocol. The default value is SITE RDW.

Debug

Debug specifies whether to write to the SAS log any messages that are sent to and from the FTP server. The default value is No.

Future Data

Future data specifies whether to check for future data and indicates what to do when it is encountered. This parameter specifies the FUTURE parameter in the RMFUTURE macro.

This parameter controls the processing of incoming data that has a datetime variable that is greater than 48 hours in the future. (That is, the datetime variable is more than 48 hours after the current time on the system where data is being staged for processing into the IT data mart.) The 48-hour buffer provides for different time zones, daylight saving time, Greenwich Mean Time, and so on.

If future data is encountered, a note is written to the SAS log. This note provides the future data option that is selected, shows the datetime that was encountered. It also explains the status of the future data, such as whether it was added to the IT data mart or if the job was terminated.

Here are the values that are available for this parameter:

Accept

specifies that incoming data is staged for processing and is processed into the IT data mart. If any of the data has a datetime value of 48 hours or more in the future, then a note that future data was encountered is written to the SAS log. This value enables an IT data mart to accept future data. For example, you might want to use this setting to perform end-of-year testing with a test IT data mart.

Note: Age limits take effect from the most recent data. Therefore, dates in the future might cause at least some of the existing data to be aged out of the IT data mart.

Discard

specifies that data with a datetime value of 48 hours or more in the future is not staged for processing and is not processed into the IT data mart. This value is the default. This value prevents future data from being processed into the IT data mart. Future data might cause existing data to be aged out (the existing data would appear to be older than it is, in comparison with the future data).

Terminate

specifies that if any incoming data has a datetime value of 48 hours or more in the future, then staging of the data stops, an error message is written to the SAS log. In addition, the job terminates. This value prevents future data from being processed into the IT data mart, which might cause existing data to be aged out. (The existing data would appear to be older than it is, in comparison with the future data.) This value stops processing and thus calls more attention to the future data than the Discard value.

CAUTION:

How Many Rows of Data Should Be Used as Guessing Rows

How many rows of data should be used as guessing rows specifies the number of rows to be read from the raw data in order to determine the type and length of each column. This parameter is relevant only to the CSV adapter.

IDVAR

IDVAR specifies the name of the SAS variable that identifies the system or machine that generated the input data. This value can be used as class variable for duplicate checking.

Note: IF the value is not a valid column in the staged table, the staging job will fail.

Is the Delimiter String Case Sensitive

Is the delimiter string case sensitive specifies whether the delimiter that is used in the raw input data must be delimited exactly as specified in the Delimiter string parameter or if the case of the string in the raw data does not matter. This parameter is relevant only to the CSV adapter. It is available only if you have set the Type of Delimiter to

Delimiter
                     string

Is There a Header Row

Is there a header row specifies whether a header row that contains column headings is in the raw input data. If you specify Yes, the What row is the header on field is enabled. If you specify No, the What row is the header on field is not enabled and the columns are named Column1 to ColumnN. This parameter is relevant only to the CSV adapter.

Input File Parameters

Input File Parameters specifies the record format and logical record length of the input file. This parameter is relevant only to the MXG adapters.

INT

INT specifies the maximum time gap (or interval) that is to be allowed between the timestamps on any two consecutive records from the same system or machine. If the interval between the timestamp values exceeds the value of this parameter, then an observation with the new time range is created in the control data set. This is referred to as a gap in the data.

The value for this parameter must be provided in the format hh:mm, where hh represents hours and mm represents minutes. For example, to specify an interval of 14 minutes, use INT=0:14. To specify an interval of 1 hour and 29 minutes, use INT=1:29.

Note: If this time is not a valid time value, the staging job will fail.

JES

JES specifies the version (JES2 or JES3) of the z/OS job entry subsystem that is in use by the system where the input data was recorded.

Keep

KEEP specifies the number of weeks for which control data will be kept. Because this value represents the number of Sundays between two dates, a value of 2 results in a maximum retention period of 20 days. This value must be an integer.

Library for Temporary Work Space

Library for temporary work space specifies the table to use if Temporary workspace library is set to Other Library. Click Browse to locate and select a library that you have already defined.

Machine

Machine specifies the name of the machine that generated the raw data. This parameter is relevant only to the Web Log adapter. If the Web Log data already contains a value for machine name, then the value in the raw data is used with the staged table. In this case, you do not need to specify a value for this parameter.

If the data from the Web Log adapter does not already specify a machine name, then you can use this parameter to specify a machine name. In this case, that value is then associated with the staged table. This parameter does not have a default value.

Minimum Number of Files to Read on Each Processor

Minimum number of files to read on each processor specifies the minimum number of files to read in each MPConnect session. This parameter is relevant only to the SAR adapter.

Normalize Datetime

Normalize datetime specifies how to save the datetime stamps in the data. This parameter requires a value. If this parameter is set to Yes, the datetime stamps are adjusted to an even number, which enables a more efficient combination of data from multiple round-robin databases. If this parameter is set to No, the exact datetime values are saved in the data. This parameter is relevant only to the RRDtool and Ganglia adapters.

Number of Processors to Use

Number of processors to use specifies the number of processors that can be used for the MPConnect parallel processing. This parameter is relevant only to the SAR adapter.

If set to 0, all processors on the machine are used to stage the data.
If set to 1, MPConnect is not used to stage the data and the staging code is run on a single processor.
If set to more processors than are available on the machine, then the staging code uses the number of processors that are available.

Tip

For best performance, set this option to a number less than the total number of processors that are available on the machine

Presummarization Duration

Presummarization duration specifies the duration, in seconds, of the intervals into which you want to summarize the raw data before it is staged. For example, if you enter 3600, then the raw data is summarized in intervals of one hour. If specified, the value must be a positive integer less than or equal to 86,400, which is the number of seconds in a day.

Raw Data Input Directory

Raw data input directory specifies the full pathname of the directory for the raw data. You can enter the path directly or browse to locate and select it.

Raw Data Input File

Raw data input file specifies the full pathname of the raw data file for the adapter. You can enter the path directly or browsing to locate and select it.

Raw Data Input Library

Raw data input library includes two fields (Library and Libref) that specify the SAS library and corresponding libref for the appropriate adapter database. You can enter the library path or browse to locate and select it.

Note: For the HP Reporter adapter, this parameter supports only libraries with engine types of ODBC, OLE DB, and Oracle. For the SAP ERP and SAS EV adapters, this parameter supports only libraries with engine types of SAS or BASE.

Report

REPORT value for %RMDUPCHK macro specifies whether to display the duplicate-data checking messages in the SAS log or to save the messages in an audit table. If set to Yes, this parameter causes all the messages from duplicate-data checking to appear in the SAS log. If set to No, the duplicate-data checking messages are saved in an audit data table that is stored in the staging library. The name of the audit table is source AUDIT (where source is the 3-character data source code).

Note: If you are monitoring very high numbers of resources, setting this option to NO can be beneficial. Eliminating the report reduces CPU consumption, shortens elapsed time, and makes the SAS log more manageable.

rrdtool fetch -end option

rrdtool fetch -end option specifies the ending point for the data that is retrieved from the round-robin database. A value for this parameter is not required. If this value is blank, the data from the start time to the current time is retrieved. This parameter is relevant only to the RRDtool and Ganglia adapters.

Note: For information about the formats that are valid for this parameter, see http://oss.oetiker.ch/rrdtool//doc/rrdfetch.en.html.

rrdtool Executable

rrdtool executable specifies the location of the executable for the RRDtool adapter. You can enter the path to the executable or browse to locate and select it. This parameter requires a value. This parameter is relevant only to the RRDtool and Ganglia adapters.

rrdtool fetch -start option

rrdtool fetch -start option specifies the starting point for the data that is retrieved from the round-robin database. A value for this parameter is not required. This parameter is relevant only to the RRDtool and Ganglia adapters.

Note: For information about the formats that are valid for this parameter, see http://oss.oetiker.ch/rrdtool//doc/rrdfetch.en.html.

RSH/SSH Host Command

RSH/SSH host command specifies the RSH version or the SSH version of the command and the name of the host for running the NNM snmpColDump command or the command. The command in this field is used at the beginning of the rrdtool, , or commands. This field can be left blank if an RSH command is not required to run the or command. This parameter is relevant only to the Ganglia, RRDtool, and SNMP adapters.

Site Name

Site name, relevant to the Web Log adapter, specifies the name of the website that generated the raw data. If the Web Log data already contains a website value, then the value in the raw data is used with the staged table and you do not need to specify a value for this parameter. If the Web Log data does not already specify a website, then you can use this parameter to specify a website value. That value is then associated with the staged table. This parameter does not have a default value.

Source

SOURCE specifies the data source for this adapter.

Temporary Work Space Library

Temporary work space library specifies how the MPConnect process defines a library for temporary files. This parameter is relevant only to the SAR adapter.

The choices are:

WORK Library (use the standard SAS work library)
Staging Library (use the staging library to temporarily store files)
Other Library (use another library that the user defines to point to another location).

Note: You must predefine this library in metadata using the New Library wizard. If you choose this option, then the Library for temporary work space option appears.

TIMESTMP

TIMESTMP specifies the name of the SAS variable that contains the datetime stamp that uniquely identifies the time of the event or interval that is being recorded.

Type of Delimiter

Type of delimiter specifies the type of delimiter to use in the raw input data. The type can either be a list of single-character delimiters or a character string. Select the type of delimiter from the drop-down list. This parameter is relevant only to the CSV adapter.

If you select

List
                     of delimiter characters

, the Delimiter characters field is enabled. The Delimiter string and Is the delimiter string case sensitive? fields are not enabled.

If you select

Delimiter
                     string

, the Delimiter string and Is the delimiter string case sensitive? fields are enabled. The Delimiter characters field is not enabled.

Use Intermediate Staging View

Use intermediate staging view specifies whether to use the temporary view form when instantiating staged tables. Enter the staged table name or leave this field blank.

This optional parameter can contain the SAS data set name of a staged table or a macro variable that resolves to a SAS data set name. If not specified, this parameter defaults to a blank, which implies that you do not want presummarization.

Use snmpwalk to Gather Character Data

Overview

Use snmpwalk to gather character data specifies whether to use the command to capture data during job execution. The following values are available:

No does not use the command.
Yes uses the command. If you select this value, then additional parameters appear. The following subtopics describe each of these additional parameters.

This parameter and others associated with the command are relevant only to the SNMP adapter.

snmpwalk Executable

snmpwalk Executable specifies the location and command for snmpwalk. This parameter does not have a default value.

HostFile for snmpwalk

HostFile for snmpwalk indicates the location of the snmphost file that lists the hosts from which you want to get data. This parameter does not have a default.

Community value for snmpwalk

Community value for snmpwalk specifies the community name that is required to run the command. The default value is public.

What Row Does the Data Start On

What row does the data start on specifies the number of the row where the data in the raw data file starts. A value for this parameter is required. This parameter is relevant only to the CSV adapter.

What Row Is the Header On

What row is the header on specifies the number of the row that contains the headers. This parameter is relevant only to the CSV adapter.

User-written Staging Parameters

For a detailed description of the staging parameters for the User-written Staging transformation, see Staging Parameters Tab.