Generic Collector Appendix 4: Other Considerations | |
Other users at your site may want to use the same table definition(s) in their PDBs. We could have every user go through the same steps we went through to create the definitions and updates and then apply the definitions and updates to the other PDBs. It is easier, however, to use %CPDDUTL to apply an INSTALL TABLE control statement that copies the table and variable definitions from the data dictionary in our PDB into the supplied (master) data dictionary. For more information about the INSTALL TABLE statement, see the CPDDUTL control statement INSTALL TABLE in the Macro Reference documentation for IT Service Vision.
In order for the INSTALL TABLE control statement to work, you must have a SAS libref named PGMLIBW. The libref must be established with write access and point to the same physical location that the PGMLIB libref points to.
For our example, the following SAS statements, which can be submitted from the PROGRAM EDITOR window or in batch, use the %CPCAT macro to store the INSTALL TABLE control statement and then use the %CPDDUTL macro to run the INSTALL TABLE statement. This example assumes that the IT Service Vision server software is in SAS.ITSV.PGMLIB. It also assumes that the active PDB was specified earlier in the job (by a %CPSTART invocation).
libname pgmlibw 'SAS.ITSV.PGMLIB' ; %cpcat; cards4; INSTALL TABLE NAME=UFAXES REPLACE ; ;;;; /* ;;;; must begin in column 1 to terminate the input stream */ %cpcat(cat=sasuser.cpddutl.install.source) ; %cpddutl(entrynam=sasuser.cpddutl.install.source);
Figure 1 - Invocation of INSTALL TABLE in Order to Copy Table and Variable Definitions into the Master Data Dictionary
When you install a new version of the IT Service Vision server software, the PGMLIB library is replaced. Thus, after you have installed a new version of IT Service Vision, you will need to re-install your table definitions, as shown in Figure 1.
As we have seen from our example, the staging code for generic collectors is user-written. As such, you can also use your own SAS formats.
SAS display formats provide a means to transform the data into meaningful descriptions for reporting purposes. In our example, we could have defined a SAS display format to be used with the fax machine name and have the format identify the office location of the machine. Because this SAS display format would be user-specific and not shipped with the IT Service Vision product, it would be the user's responsibility to maintain the format.
IT Service Vision finds SAS display formats by searching first in the PDB's catalog DICTLIB.CPFMTS, and then in the SITELIB's catalog SITELIB.CPFMTS. Therefore, any user-defined SAS display formats should be stored in either SITELIB.CPFMTS (for all users at the site to use) or DICTLIB.CPFMTS (if the format is PDB-specific).
For our example, we could implement a SAS display format for the STATUS variable, to display a meaningful description for each value, as follows:
Value of STATUS variable | Description to display |
A |
Aborted Due to Errors |
B |
Receiver Busy |
C |
Completed Successfully |
Table 3 - Display Values for the STATUS Variable
To implement this, we would first construct a SAS display format. For our example, the following SAS statements, which can be submitted from the PROGRAM EDITOR window or in batch/background, define our SAS display format and store it in the catalog DICTLIB.CPFMTS. This code must be submitted after %CPSTART has been invoked with update access to the PDB.
proc format library=dictlib.cpfmts; value $STATUS 'A' = 'Aborted Due to Errors' 'B' = 'Receiver Busy' 'C' = 'Completed Successfully' ; run;
Figure 2 - Defining the SAS Display Format for the STATUS Variable
Note that, for any values other than A, B, or C, the actual value of the STATUS variable will be displayed instead of the description given by the format.
%cpcat; cards4; SET TABLE NAME=UFAXES ; UPDATE VARIABLE NAME=STATUS FORMAT=$STATUS ; ;;;; /* ;;;; must begin in column 1 to terminate the input stream */ %cpcat(cat=sasuser.cpddutl.status.source); %cpddutl(entrynam=sasuser.cpddutl.status.source);
For more information on SAS display formats, see the Reference Documentation for your current version of SAS.
The staging code is user-written and must be managed by the user. It is the responsibility of the user to run the staging code before processing the data for the generic collector. The user is also responsible for any modifications, changes, and/or fixes to the staging code. At present, IT Service Vision has no facility of its own for managing user-written staging code.
For ease of management of the code, you may want to create a SAS macro to execute the staging code. This can be accomplished through the following steps:
%MACRO FAXES;
%MEND;
We can now use the macro to run our staging code by simply invoking the macro. This technique is used in Fax Appendix 6: Sample Daily Processing and Reduction Job.
You may choose to enhance the staging code at some point in the future or to implement support now for handling duplicate data.
In order to discuss the handling of duplicate data, it is necessary to distinguish between (new) data being presented to the detail level of PDB (that is, the staged data) and data already existing in the detail level of the PDB. By default, IT Service Vision obeys the following conventions with regard to processing data by means of %CPPROCES:
It is possible to alter this default behavior.
In the situation where the staged data should be allowed to contain (intentionally) duplicate observations, process exit proc095 can be used to override the NODUP option of the SAS SORT procedure (PROC SORT), as shown in this figure:
%macro proc095; %if &cptable = UFAXES %then %do; ; /* terminate PROC SORT statement */ * comment out NODUP option %end; %mend; %proc095
Figure 4 - Example of Using Process Exit PROC095 to Allow Duplicate Input Data
For more information about process exits, see Shared Appendix 8: Exits for the Process Task -- General Information and Shared Appendix 11: Creating a Derived Variable Using Process Exits.
The issue of duplicate data is essentially an issue of data management, as well as an issue of the collector being used. Several input filtering macros are included with IT Service Vision for MVS to aid in dealing with duplicate data. For more information, see help entry entitled CPDUP : Input filtering routines in the help index.
The C2RATE and D2RATE interpretation types are often the most difficult to understand and use. Let's go over an example to clarify the distinction between these two. Consider a situation where a collector regularly collects data every 15 minutes. Each record collected describes the entire 15 minute interval just completed. If a metric exists as some sort of accumulated count, you may want to see it as a rate in the detail level. An example of this would be the number of pages a fax machine sends during the recorded interval. Intuitively, we would consider this metric to be a COUNT. However, because we know that this count will be converted to a rate in the reduction levels (see the Statistical Notes on Weighting in Shared Appendix 6: Characteristics of Variables for more information) we may want to work with it as a RATE in the detail level as well. Therefore, we could choose to assign the C2RATE or D2RATE interpretation type to this metric.
The C2RATE interpretation type should be used in situations where the collector reports a metric in each interval as a counter which was not reset at the start of the interval. For example, suppose that in the first interval the metric reports a value of 50, and in the second interval the metric reports a value of 75. The true value for the count reported by the metric is 50 for the first interval (because the first interval is assumed to start from 0), and 25 for the second interval (because the second interval starts where the first interval ends).
The D2RATE interpretation type should be used in situations where the collector reports a metric in each interval as a counter which was reset at the start of the interval. Using the same example, the metric recorded in the first interval would report a value of 50, and in the second interval it would report a value of 25.
Using C2RATE and/or D2RATE interpretation types requires an extra step with regards to staging the data before running %CPPROCES. The %CPC2RATE macro must run after the staging code runs and before the invocation of %CPPROCES. %CPC2RATE creates a SAS DATA step view to calculate the rates for variables that have the C2RATE or D2RATE interpretation type. An example of using %CPC2RATE is shown below.
/* run your staging code here */ %CPC2RATE(in=WORK.SDATA, outview=WORK.VSDATA, tabname=UTABLE) ; %CPPROCES(UTABLE,genlib=WORK) ;
Figure 5 - Example of Using the %CPC2RATE macro to handle C2RATE and D2RATE Interpretation Types
Several explanatory notes about this example are necessary:
As part of its processing, %CPC2RATE divides the event count by the value of the variable DURATION in order to determine the rate. As such, the C2RATE and D2RATE interpretation types are only meaningful for type-interval tables, because only type-interval tables have a DURATION variable.
For more information about the %CPC2RATE and the %CPPROCES macro, see the Macro Reference documentation for IT Service Vision.
At this point our example is complete. The fax example was fairly straightforward with regards to staging the data. In this section we will discuss other data-staging situations that you may encounter. Each of these cases requires deeper experience with writing SAS DATA step code for use with staging code.
You may find that the data you want to use for your generic collector is already in a SAS data set. If so, you can immediately use GENERATE SOURCE to build your table and variable definitions. You will need to be sure that you have a variable that can serve as your DATETIME variable. If you are creating a type-interval table, you will also need a variable to serve as your DURATION variable. Again, these variables need not be named DATETIME and DURATION, respectively, in the staged data; by using the DATETIME= and DURATION= parameters on the GENERATE SOURCE statement, you can identify the names used in the staged data.
You may find that you need to construct the DATETIME and/or DURATION variables. In such a case, you may find it easier to create a SAS DATA step view in which you build the DATETIME and DURATION variables. In %CPPROCES, you would use the data view in place of the data set in the first parameter.
For more information on the use of SAS DATA step views, see SAS Technical Report P-222 Changes and Enhancements to Base SAS Software.
Regardless of whether you have the data already in a SAS data set or use a SAS DATA step view, one side effect is that there is no staging code that needs to be run, because the existing data set or view is essentially already the staged data set or view.
You may also encounter a situation in which the data that you want to stage resides in an external database management system (DBMS). You can use the SAS/ACCESS product to create a SAS/ACCESS view descriptor, which can be used just like a SAS data set or view. SAS/ACCESS can read data from the following data sources:
For more information about SAS/ACCESS, contact your SAS Sales Representative.
In situations where you are staging data for one table from one external data source (as in our example), you may find it advantageous to define a SAS DATA step view instead of a SAS data set. In essence, your staging code defines the view. This only works in situations where the entire staging process can be accomplished in one DATA step (that is, no extra steps, such as sorts, are required). The advantage of using a view is that only one pass through the data is required: a pass when processing the data into the PDB. In the flow shown in the fax example, two passes through the data are required: one pass to run the staging code, and one pass when processing the staged data into the PDB.
For more information on the use of SAS DATA step views, see SAS Technical Report P-222 Changes and Enhancements to Base SAS Software.
Note: Suppose you have data for multiple tables (such as CICS tables) and suppose the amount of data for one particular table (such as CICS transaction data) is extremely large. To avoid one pass of the large amount of data, you can stage the large amount of data as a view and stage the rest of the data as data sets. To make this work, in the invocation of the process macro you must explicitly list the tables (in the tablist parameter) and the first table must be the one whose data is staged with a view.
Sometimes the data that you want to read are not in a simple log or data file. Instead, the data are actually part of a report. Staging the data essentially becomes an exercise in parsing the report to extract the data values. There is no standard method for doing this. An advanced knowledge of SAS DATA step programming may be required in such cases. If these skills do not exist at your site, you may want to contact the Professional Services Division of SAS Institute to contract for the necessary services.
In our example, we had a simple log file that was used to populate a single table. However, we could have decided to create two tables; one for faxes sent, and another for faxes received. The decision on whether to create multiple tables from a data source depends upon the nature and intended use of the data.
If you decide to stage data for multiple tables from one data source, you only need to run the staging code once per daily production job. And you only need to invoke the %CPPROCES macro once to process the staged data into the multiple tables.
Some collectors provide the ability to store the raw data in a character-delimited flat file. Quite often the delimiter is the comma character, spaces, and/or tabs. IT Service Vision supports the use of character-delimited data as input for processing data from a generic collector. Note that in order to use this ability, you can omit the staging code but you must still define the table and variables into which the data is to be processed. In the case of character-delimited data, you cannot use GENERATE SOURCE to build the necessary table and variable definitions because there is no model data set to use.
For more information on using the support for character-delimited data, see Generic Collector Appendix 2: Using Character-Delimited Data and Generic Collector Appendix 3: Defining Tables and Variables without Using GENERATE SOURCE.