Section 1, Task 3: Create a Test PDB and Process, Reduce, and Report on Data
Action 1: Create your test PDB
Step b: Add or create table and variable definitions

Table Of ContentsIT Service Vision Help


Before data can be processed into the PDB, appropriate table and variable definitions must be provided in the PDB.

Because generic collectors do not have supplied table and variable definitions, you will need to create the definitions for one or more tables and their variables.

Creation of the staging code is technically part of preparing to process the data (Action 3). But the easiest way to create the definitions involves using the staging code. Thus, this action includes creating the staging code and running it to create the staged data.

Note: Because the staged data is not used until Action 3, you may not want to bother with running a day's worth of observations into the staged data. In that case, the staged data actually may contain the descriptors of the data and little or no actual data, in which case the term model staged data may be used instead.

  1. Create the code that stages the data.

    In this step, you will create the code that reads the raw data log file and writes the staged data or model staged data.

    Staging the data means to transform the raw log file into a SAS data set or view, from which the data can then be loaded into the PDB.

    In most cases, the creation of the data set or view is easily accomplished by a simple SAS DATA step. Usually, only rudimentary SAS DATA step programming skills are required.

    Note: In some cases on Open Systems, you are not required to stage the data. In these cases, you must be using character-delimited data and specific formatting criteria. For more information about this kind of data, see Generic Collector Appendix 2: Using Character-Delimited Data.

    1. Analyze the Raw Data

    2. Plan the table.

      IT Service Vision organizes data into tables within the PDB. Typically, each table corresponds to a specific type of record from a collector.

    3. Plan the remaining variables.

      The variable definitions that are associated with a table definition contain information that corresponds to the fields in the record.

    4. On the basis of this discussion, create the code that stages the data.

      Wait to run the code until step c, below.

    Note: Typically, staging the data is as simple as this. In some cases, other issues are involved. For more details, see Advanced concepts related to staging data.

    Note: Later you will need a file that contains the code that stages the data. If you used the PROGRAM EDITOR window to submit that code, you may not have a copy. In that case, in the PROGRAM EDITOR window, recall the code and use the SAS pulldown menus as follows:

    File -> Save -> Write to File...

  2. Create the code that generates table and variable definition control statements.

    Typically, you will use the %CPDDUTL utility macro to run a GENERATE SOURCE control statement against the staged data or model staged data, and generate CREATE TABLE and CREATE VARIABLE control statements. Later in this action, you will apply these CREATE TABLE and CREATE VARIABLE control statements to the PDB.

    In this technique, the DATASET= parameter on the GENERATE SOURCE statement points to the staged data or model staged data that will be created by the code that you created in step a. Other parameters of the GENERATE SOURCE control statement provide information such as the name of the table you want to create (NAME=), whether the table is INTERVAL or EVENT (TYPE=), where to store the generated statements (a SAS catalog .SOURCE entry pointed to by ENTRYNAME=), and what variable to use for the DATETIME variable (DATETIME=) and DURATION variable (DURATION=). For more information about the GENERATE SOURCE control statement, see reference documentation for IT Service Vision.

    Note: In some cases on Open Systems, you are not required to stage the data. In that case, you have no staged data to run against. In these cases, you will need to create the table and variable definitions by using a manual method. For more details, see Generic Collector Appendix 2: Using Character-Delimited Data and Generic Collector Appendix 3: Defining Tables and Variables without Using GENERATE SOURCE.

  3. Run the staging code and the code that generates table and variable definition control statements.

    To run them interactively, in the SAS PROGRAM EDITOR window submit the code that stages the data and then submit the code that creates and runs the entry. To run them in batch job/background script, submit a SAS procedure that contains the code to stage the data followed by the code that generates the control statements.

  4. Review the generated control statements, and build a set of update control statements that correct any undesired table and/or variable characteristics.

    Depending on how you create the control statements that define the table and variable definitions, you are likely to find it necessary to create control statements that make small adjustments to the table and variables.

    It is important that you review the generated control statements in order to check that the following characteristics are set appropriately for your data source.

    1. Variable Interpretation Type

      As previously discussed, the interpretation type chosen for a variable is critical in guiding the summarization and display of the data. You should review the generated control statements, paying attention to the value of the INTERPRET= parameter on each of the CREATE VARIABLE statements. You may find that the algorithm used by GENERATE SOURCE did not specify the interpretation type that is most appropriate. In such a case, you should use the UPDATE VARIABLE control statement to specify the correct value for the INTERPRET= parameter. Valid values for this parameter can be found in Generic Collector Appendix 1 - Variable Interpretations and Default Statistics.

    2. Variable Format

      The variable format (specified by the FORMAT= parameter of the CREATE VARIABLE control statement) specifies the SAS display format to be used when displaying the data. By default, this is set according to the interpretation type selected by GENERATE SOURCE. If you change the interpretation type, you may also want to change the display format. In such a case, you should use the UPDATE VARIABLE control statement to specify the appropriate value for the FORMAT= parameter. For more information on SAS display formats, see Reference documentation for IT Service Vision.

    3. Statistics

      Each variable definition includes a list of requested statistics. You should review this list to make sure that any statistics that you want will be calculated. Likewise, if statistics have been requested that you would prefer not to keep, you can modify the list to specify that the statistics not be calculated.

      If no summary statistics are selected for a given reduction level of the PDB (day, week, month, or year), then no data will be reduced for the table at that reduction level.

    4. KEPT Status

      By default, all variables are marked with a Kept status of YES so that the PDB detail level keeps data for the variable. However, you may not have any need for the data for some of the variables in the staged data. In such cases, you can save disk space by marking the variable(s) as KEPT=NO instead of the default KEPT=YES. This allows the definition of the variable to be stored in the data dictionary, without storing the data for this particular variable.

    5. Derived Variables and Formula Variables

      In our example we create some new variables that do not exist in the raw data. Such data variables are called derived variables, in that they are derived from existing variables. Derived variables store real data in the PDB, just as normal variables do. Therefore, you can use a derived variable as a BY or CLASS variable. You can also request that statistics be calculated automatically for a derived variable.

      In contrast, a formula variable can be defined in a table at one or more levels of the PDB. IT Service Vision uses views to access the data in each table. Formula variables are included in the view definition and thus are calculated dynamically. This means that formula variables cannot be used as BY or CLASS variables nor can statistics be calculated for them. Formula variable are, however, sometimes preferable to derived variables because formula variables do not require any disk space for data storage.

    6. BY Variables

      The BY variables list designates the sort order for detail-level data for this table. The BY variables list can also be used for detecting data with duplicate values (of the variables in the set of BY variables). The BY variables list is also important for reporting purposes. The generated source control statements use the sort order of the model data set (only if SAS knows about the sort order because SAS sorted it) to determine the BY variables list for the detail level. The list can be specified in the UPDATE TABLE control statement. If the model data set is not sorted (or, to be more precise, not sorted as known by SAS), the default BY variables list consists of the variables DATETIME HOUR SHIFT.

      Be sure that DATETIME is in the updated BY variables list.

      To change the BY variables list, use the UPDATE TABLE control statement. Be sure that DATETIME is in the updated BY variables list.

    7. CLASS Variables

      The CLASS variables lists designate the sort order for data in the reduction levels of this table (week, month, day, and year) in the PDB and control the granularity of the reduction levels. The CLASS variables lists are used to accumulate statistics for distinct classes of data, based upon the unique values of the CLASS variables. The CLASS variables lists are also used for reporting purposes. By default, the generated control statements specify, for each reduction level, a CLASS variables list that is identical to the sort order (as known to SAS) of the model data set. If the model data set is not sorted (or, to be more precise, not sorted as known by SAS), the default CLASS variables lists consist of the variables DATETIME HOUR SHIFT. The CLASS variables lists are not required to match each other nor are they required to match the BY list for the detail level; however, the reduction step (%CPREDUCE) runs fastest when the BY variables list and all CLASS variables lists are the same.

      To change the CLASS variables lists, use the UPDATE TABLE control statement. Be sure that DATETIME is in each updated CLASS variables list.

    8. Age Limits

      When IT Service Vision uses the built-in defaults to create a table, it uses the following age limits for determining how much data to store at each PDB level:

      Table 2 - Default PDB Age Limits
      PDB Level Default Age Limit of Data to Keep
      DETAIL 10 days
      DAY 45 days
      WEEK 15 weeks
      MONTH 18 months
      YEAR 5 years

      The tables you added may have used the built-in defaults or may have had control statements associated with them that overwrote the built-in defaults.

      Regardless of where your present age limits originated, if you prefer to change the age limit for any or all of the levels, use the UPDATE TABLE control statement with the AGELIMIT= parameter to specify your preferred age limits.

  5. Apply the control statement definitions and updates to your PDB.

    Use the %CPDDUTL macro to apply to the PDB the control statements that you generated in step c and then to apply the control statement that you wrote in step d. For an example of using the %CPCAT macro to store control statements and the %CPDDUTL macro to store and apply control statements, see Fax Appendix 3 - Sample Code for Generating the Definitions. For more information on these macros, see the reference documentation for IT Service Vision.