Generic Collector Appendix 2: Using Character-Delimited Data | |
PURPOSE:
The CHARDELIM option of the GENERIC collector extends IT Service Vision support to virtually any non-binary, flat-file type data logs as long as the data can be reformatted into a standard character-delimited format.
This allows programmers who are not familiar with SAS to use whatever tools they like to format logged systems data and process it into an IT Service Vision PDB. It also allows quick on-site additions of data sources that are not already supported by IT Service Vision.
DATA FORMAT:
By default, the first line of the file should contain the field names, and the data should follow on the second line. The logical record length should not exceed 16384 bytes (16K). However, there are situations where there may be extra non-data lines before or after the header, or the record length may exceed 16384 bytes. By setting specific macro variables prior to running %CPPROCES, the character-delimited collector can read this data. The macro variables are as follows:
Variable name | Default Value | Description |
CDCLRECL | 16384 | Record Length |
CDCHSKIP | 0 | Number of lines to skip in order to read the header record. |
CDCDSKIP | 0 | Number of lines to skip AFTER the header record, in order to read the first line of data. |
CDCIFLDS | _blank_ | Space separated list of variables contained in the input file in order that they appear in the input data. |
CSDELIM | _blank_ | The charcter or characters delimiting the incoming data. This variable is used in conjunction with the user exits to override the value supplied on the CPPROCES macro invocation. |
CDCFOBS | 0 | Number of the first line of input data. This variable is used with the macro variable CDCIFLDS for the special case when the input data does not contain a header. |
CPUSEVEW | _blank_ | Set to NO to force CPPROCES to stage the data into a SAS data set instead of utilizing a SAS data set view. |
CDCDTFMT | _blank_ | The informat used when reading in the datetime value from the input data. The value of this macro variable overrides the informat assigned in the IT Service Vision data dictionary. |
The character delimited collector supports several exit points. The first exit point is invoked prior to the data step used to stage the incoming data. This exit can be used to parse the headers of the input files and set any macro variables necessary such as CDCIFLDS. This exit must contain a stand-alone SAS data step code.
The code for the exit should be stored in a SAS catalog entry called libname.collector.CDCINIT.SOURCE, where libname is either ADMIN, SITELIB,or PGMLIB, and collector is the value specified for the COLLCTR= parameter on the %CPPROCES macro invocation. The exit should be stored in the ADMIN library if it is intended to be used for just this PDB; in the SITELIB library if it is intended to be used for all PDBs that use the same SITELIB, or in the PGMLIB library if it is intended to be used for all PDBS.
The second exit point is invoked in the data statement of the generated data step program. This exit point can be used to add additional data sets to the data statement so that multiple data sets can be staged from a single pass of the input data. A corresponding output statement should be added to the CDCINPUT exit point in order to populate this data set The contents of this exit is the name of a SAS data set including any applicable data set options such as a keep list. The code for the exit should be stored in a SAS catalog entry called libname.collector.CDC020.SOURCE.
The next exit point for the character delimited collector is invoked prior to the input statement of the data step before any variables have been read. This exit point can be used to parse the filename being processed in order to set the value of any variables that are encoded in the file names and are not contained in the data file. The name of the file being processed is captured in the variable _FNAME. The contents of the exit are any valid SAS data step statements. The code for the exit should be stored in a SAS catalog entry called libname.collector.CDC025.SOURCE.
The character delimited collector supports an exit point, which is
invoked immediately after each line of data has
been read from the input file. This exit can be used to perform data manipulation,
such as adusting the value of DATETIME for handling different time zones. The
contents of the exit are any valid SAS data step statements. The code for the exit
should be stored in a SAS catalog entry called libname.collector.CDCINPUT.SOURCE.
A typical macro invocation might look like this:
%LET CDCLRECL=4096; * record are only 4K;
%LET CDCHSKIP=1; * skip one line to read the header;
%LET CDCDSKIP=2; * two blank lines between header and data;
%CPPROCES(/u/me/rawdata, tablename, collectr=generic, toolnm=chardelim, delim='|');
To help you get started with processing character-delimited data, follow the "Unix vmstat" example in the IT Service Vision Showroom.