Generic Collector Appendix 2: Using Character-Delimited Data

Table Of ContentsIT Service Vision Help


PURPOSE:

The CHARDELIM option of the GENERIC collector extends IT Service Vision support to virtually any non-binary, flat-file type data logs as long as the data can be reformatted into a standard character-delimited format.

This allows programmers who are not familiar with SAS to use whatever tools they like to format logged systems data and process it into an IT Service Vision PDB. It also allows quick on-site additions of data sources that are not already supported by IT Service Vision.

DATA FORMAT:

  1. Flat (non-binary) source, default maximum record length 8192 bytes.

  2. The first line should contain a delimiter-separated list of variable names. If the variable names in the first line do not match the variable names in the IT Service Vision PDB data dictionary then they must match the variable's external name.

    If the first line of the input file does not contain a list of variable names, the variable names can be supplied through macro variables.  See the section on Macro Variables later in this appendix for more information.
  3. Delimiters must follow the rules for the DELIMITER= option of the SAS INFILE statement. See SAS Language, Reference Version 6 for more information.

  4. You must suply a timestamp for each data record. You may either suply the timestamp as a single variable called DATETIME or EXTDTM, or as two variables called DATE and TIME. Although the variable names must be DATETIME or DATE and TIME, you must set the external name of these variables in the IT Service Vision data dictionary to match your raw data.  If you use the EXTDTM variable to provide the timestamp, you must set the external name of the DATETIME variable to be EXTDTM.  If your raw data uses "datetime" or "date" and "time" then you can leave the external names blank. You must also set the variable's INFORMAT specifications in the data dictionary to match the format of your raw data. See the SAS Language documentation for your current version of SAS for more information on acceptible informats. You can change the informat property of a variable interactively by selecting: Manage Tables -> right mouse click on a table -> List Variables -> right mouse click on a variable -> Properties -> Advanced -> Informat. You can also change the informat using the %CPDDUTL macro.

  5. If your raw data is formatted as something other than simple numeric or strings values, then you must set the variable's INFORMAT in the data dictionary. For example if your raw data specifies a cpu busy value as 75.6%, then you will need to set INFORMAT to "PERCENT.".

  6. If a string variable contains a delimiter character, then the string must be included in double quotes ("). Otherwise enclosing strings in double quotes is optional.

  7. Two consecutive delimiter characters indicates a missing value.

  8. If your table contains interval data (as opposed to event data), you may want to include a variable called DURATION to indicate the length of the interval. (If you do not supply a value for DURATION, it will be calculated for you by using successive DATETIME values only if you have at least one variable with an interpretation type of C2RATE or D2RATE.).

CONTROLLING THE BEHAVIOUR OF THE CHARACTER DELIMITED COLLECTOR:

Macro Variables

By default, the first line of the file should contain the field names, and the data should follow on the second line.  The logical record length should not exceed 16384  bytes (16K).  However, there are situations where there may be extra non-data lines before or after the header, or the record length may exceed 16384 bytes.  By setting specific macro variables prior to running %CPPROCES, the character-delimited collector can read this data.  The macro variables are as follows:

Variable name Default Value Description
CDCLRECL 16384 Record Length
CDCHSKIP 0 Number of lines to skip in order to read the header record.
CDCDSKIP 0 Number of lines to skip AFTER the header record, in order to read the first line of data.
CDCIFLDS _blank_ Space separated list of variables contained in the input file in order that they appear in the input data.
CSDELIM _blank_ The charcter or characters delimiting the incoming data.  This variable is used in conjunction with the user exits to override the value supplied on the CPPROCES macro invocation.
CDCFOBS 0 Number of the first line of input data.  This variable is used with the macro variable CDCIFLDS for the special case when the input data does not contain a header.
CPUSEVEW _blank_ Set to NO to force CPPROCES to stage the data into a SAS data set instead of utilizing a SAS data set view.
CDCDTFMT _blank_ The informat used when reading in the datetime value from the input data.  The value of this macro variable overrides the informat assigned in the IT Service Vision data dictionary.

Exits

The character delimited collector supports several exit points.   The first exit point is invoked prior to the data step used to stage the incoming data.  This exit can be used to parse the headers of the input files and set any macro variables necessary such as CDCIFLDS.  This exit must contain a stand-alone SAS data step code.

The code for the exit should be stored in a SAS catalog entry called libname.collector.CDCINIT.SOURCE, where libname is either ADMIN, SITELIB,or PGMLIB, and collector is the value specified for the COLLCTR= parameter on the %CPPROCES  macro invocation.  The exit should be stored in the ADMIN library if it is intended to be used for just this PDB; in the SITELIB library if it is intended to be used for all PDBs that use the same SITELIB, or in the PGMLIB library if it is intended to be used for all PDBS.

The second exit point is invoked in the data statement of the generated data step program.  This exit point can be used to add additional data sets to the data statement so that multiple data sets can be staged from a single pass of the input data.  A corresponding output statement should be added to the CDCINPUT exit point in order to populate this data set  The contents of this exit is the name of a SAS data set including any applicable data set options such as a keep list.  The code for the exit should be stored in a SAS catalog entry called libname.collector.CDC020.SOURCE.

The next exit point for the character delimited collector is invoked prior to the input statement of the data step before any variables have been read.   This exit point can be used to parse the filename being processed in order to set the value of any variables that are encoded in the file names and are not contained in the data file.  The name of the file being processed is captured in the variable _FNAME.   The contents of the exit are any valid SAS data step statements.  The code for the exit should be stored in a SAS catalog entry called libname.collector.CDC025.SOURCE.

The character delimited collector supports an exit point, which is invoked immediately after each line of data has
been read from the input file.  This exit can be used to perform data manipulation, such as adusting the value of DATETIME for handling different time zones.  The contents of the exit are any valid SAS data step statements.  The code for the exit should be stored in a SAS catalog entry called libname.collector.CDCINPUT.SOURCE.

Invoking the Process Macro

A typical macro invocation might look like this:

%LET CDCLRECL=4096;  * record are only 4K;
%LET CDCHSKIP=1;     * skip one line to read the header;
%LET CDCDSKIP=2;     * two blank lines between header and data;
%CPPROCES(/u/me/rawdata, tablename, collectr=generic, toolnm=chardelim, delim='|');

Example

To help you get started with processing character-delimited data, follow the "Unix vmstat" example in the IT Service Vision Showroom.