Section
1, Task 3: Create a Test PDB and Process, Reduce, and
Report on Data Action 1: Create your test PDB Step b: Add or create table and variable definitions |
|
Before data can be processed into the PDB, appropriate table and variable definitions must be provided in the PDB.
Because generic collectors do not have supplied table and variable definitions, you will need to create the definitions for one or more tables and their variables.
Creation of the staging code is technically part of preparing to process the data (Action 3). But the easiest way to create the definitions involves using the staging code. Thus, this action includes creating the staging code and running it to create the staged data.
Note: Because the staged data is not used until Action 3, you may not want to bother with running a day's worth of observations into the staged data. In that case, the staged data actually may contain the descriptors of the data and little or no actual data, in which case the term model staged data may be used instead.
In this step, you will create the code that reads the raw data log file and writes the staged data or model staged data.
Staging the data means to transform the raw log file into a SAS data set or view, from which the data can then be loaded into the PDB.
In most cases, the creation of the data set or view is easily accomplished by a simple SAS DATA step. Usually, only rudimentary SAS DATA step programming skills are required.
Note: In some cases on Open Systems, you are not required to stage the data. In these cases, you must be using character-delimited data and specific formatting criteria. For more information about this kind of data, see Generic Collector Appendix 2: Using Character-Delimited Data.
You should have some understanding the meaning of each field in the raw data. For example:
Each table must have a variable called DATETIME, whose value is the date-time of the record.
For more information about interpretation types, see Shared Appendix 6: Characteristics of Variables and Generic Collector Appendix 1: Algorithm Used by GENERATE SOURCE.
It is important to understand the interpretation types as implemented by IT Service Vision so that you can check the default interpretation types for the variables in your table and edit (or update) the interpretation types, if necessary. Also, it is important that you understand the summary statistics that are available within IT Service Vision so that you can check, for each variable, which summary statistics will be calculated by default and modify (update) the summary statistics specification, if necessary.
Another important variable attribute is the external name, which is the name of the corresponding field in the staged data. A variable's internal name (in a PDB) is limited to 7 characters, but the variables external name (in the staged data) can be 8 characters.
Note: Typically, staging the data is as simple as this. In some cases, other issues are involved. For more details, see Advanced concepts related to staging data.
Note: Later you will need a file that contains the code that stages the data. If you used the PROGRAM EDITOR window to submit that code, you may not have a copy. In that case, in the PROGRAM EDITOR window, recall the code and use the SAS pulldown menus as follows:
File -> Save -> Write to File...
Typically, you will use the %CPDDUTL utility macro to run a GENERATE SOURCE control statement against the staged data or model staged data, and generate CREATE TABLE and CREATE VARIABLE control statements. Later in this action, you will apply these CREATE TABLE and CREATE VARIABLE control statements to the PDB.
In this technique, the DATASET= parameter on the GENERATE SOURCE statement points to the staged data or model staged data that will be created by the code that you created in step a. Other parameters of the GENERATE SOURCE control statement provide information such as the name of the table you want to create (NAME=), whether the table is INTERVAL or EVENT (TYPE=), where to store the generated statements (a SAS catalog .SOURCE entry pointed to by ENTRYNAME=), and what variable to use for the DATETIME variable (DATETIME=) and DURATION variable (DURATION=). For more information about the GENERATE SOURCE control statement, see reference documentation for IT Service Vision.
Note: In some cases on Open Systems, you are not required to stage the data. In that case, you have no staged data to run against. In these cases, you will need to create the table and variable definitions by using a manual method. For more details, see Generic Collector Appendix 2: Using Character-Delimited Data and Generic Collector Appendix 3: Defining Tables and Variables without Using GENERATE SOURCE.
As previously discussed, the interpretation type chosen for a variable is critical in guiding the summarization and display of the data. You should review the generated control statements, paying attention to the value of the INTERPRET= parameter on each of the CREATE VARIABLE statements. You may find that the algorithm used by GENERATE SOURCE did not specify the interpretation type that is most appropriate. In such a case, you should use the UPDATE VARIABLE control statement to specify the correct value for the INTERPRET= parameter. Valid values for this parameter can be found in Generic Collector Appendix 1 - Variable Interpretations and Default Statistics.
The variable format (specified by the FORMAT= parameter of the CREATE VARIABLE control statement) specifies the SAS display format to be used when displaying the data. By default, this is set according to the interpretation type selected by GENERATE SOURCE. If you change the interpretation type, you may also want to change the display format. In such a case, you should use the UPDATE VARIABLE control statement to specify the appropriate value for the FORMAT= parameter. For more information on SAS display formats, see Reference documentation for IT Service Vision.
Each variable definition includes a list of requested statistics. You should review this list to make sure that any statistics that you want will be calculated. Likewise, if statistics have been requested that you would prefer not to keep, you can modify the list to specify that the statistics not be calculated.
If no summary statistics are selected for a given reduction level of the PDB (day, week, month, or year), then no data will be reduced for the table at that reduction level.
By default, all variables are marked with a Kept status of YES so that the PDB detail level keeps data for the variable. However, you may not have any need for the data for some of the variables in the staged data. In such cases, you can save disk space by marking the variable(s) as KEPT=NO instead of the default KEPT=YES. This allows the definition of the variable to be stored in the data dictionary, without storing the data for this particular variable.
In our example we create some new variables that do not exist in the raw data. Such data variables are called derived variables, in that they are derived from existing variables. Derived variables store real data in the PDB, just as normal variables do. Therefore, you can use a derived variable as a BY or CLASS variable. You can also request that statistics be calculated automatically for a derived variable.
In contrast, a formula variable can be defined in a table at one or more levels of the PDB. IT Service Vision uses views to access the data in each table. Formula variables are included in the view definition and thus are calculated dynamically. This means that formula variables cannot be used as BY or CLASS variables nor can statistics be calculated for them. Formula variable are, however, sometimes preferable to derived variables because formula variables do not require any disk space for data storage.
The BY variables list designates the sort order for detail-level data for this table. The BY variables list can also be used for detecting data with duplicate values (of the variables in the set of BY variables). The BY variables list is also important for reporting purposes. The generated source control statements use the sort order of the model data set (only if SAS knows about the sort order because SAS sorted it) to determine the BY variables list for the detail level. The list can be specified in the UPDATE TABLE control statement. If the model data set is not sorted (or, to be more precise, not sorted as known by SAS), the default BY variables list consists of the variables DATETIME HOUR SHIFT.
Be sure that DATETIME is in the updated BY variables list.
To change the BY variables list, use the UPDATE TABLE control statement. Be sure that DATETIME is in the updated BY variables list.
The CLASS variables lists designate the sort order for data in the reduction levels of this table (week, month, day, and year) in the PDB and control the granularity of the reduction levels. The CLASS variables lists are used to accumulate statistics for distinct classes of data, based upon the unique values of the CLASS variables. The CLASS variables lists are also used for reporting purposes. By default, the generated control statements specify, for each reduction level, a CLASS variables list that is identical to the sort order (as known to SAS) of the model data set. If the model data set is not sorted (or, to be more precise, not sorted as known by SAS), the default CLASS variables lists consist of the variables DATETIME HOUR SHIFT. The CLASS variables lists are not required to match each other nor are they required to match the BY list for the detail level; however, the reduction step (%CPREDUCE) runs fastest when the BY variables list and all CLASS variables lists are the same.
To change the CLASS variables lists, use the UPDATE TABLE control statement. Be sure that DATETIME is in each updated CLASS variables list.
Use the %CPDDUTL macro to apply to the PDB the control statements that you generated in step c and then to apply the control statement that you wrote in step d. For an example of using the %CPCAT macro to store control statements and the %CPDDUTL macro to store and apply control statements, see Fax Appendix 3 - Sample Code for Generating the Definitions. For more information on these macros, see the reference documentation for IT Service Vision.