Shared Appendix 12: Analyzing Your System | |
The document discusses analysis methods useful for evaluating the performance of both networks and workstations using IT Service Vision software and the SAS System.
For workstations, the chapter looks at
For networks, the chapter examines the measurements and methods of analyzing
The chapter also describes why performance analysis needs to include an analysis of service levels. It describes the measurements used to report on service levels, such as
In addition to these discussions, this document summarizes, in table form, the specific data sources for measurements of interest, which collectors gather these measurements, and where the data are stored in the IT Service Vision performance database.
Analyzing and understanding the performance of your computer system is a necessary first step to diagnosing, solving, and preventing performance problems. The first step in analyzing your system is to establish workload and performance baselines when the system is running smoothly so that you have healthy comparison values for your system when there is a problem. Having a clear idea of baseline values in your network and noticing significant deviations from these values helps you to address problems before they occur. Publicizing baselines enables you to clearly establish users' expectations of service levels. Finally, when you know your system's baseline values, you can predict performance at future load levels.
In its most basic form, the evaluation of system performance, whether for a computer network, an airline reservation system, or a five-chair beauty salon, consists of this simple process:
Though this simplistic approach does not define how to improve performance, it is essential to keep this model in mind when designing a performance study. This model is also useful to remember when you prepare reports for users and management; you want to be sure to emphasize relevant measurements and avoid unnecessary reports that obscure the primary information you need to convey.
IT Service Vision software helps you analyze your system's performance by simplifying the mechanics (such as data cleanup, data validation, and data transformation) of managing and presenting performance data. This frees you to concentrate on selecting the evaluation techniques, measurements, and reports appropriate for your situation. IT Service Vision software helps you diagnose problems, set baseline values, perform capacity planning, and present results in easy-to-understand graphs, charts, and printed reports without programming. Because IT Service Vision software uses the SAS System to manage the data, you can also handle more complex experimental designs and perform additional sophisticated statistical analyses of your data using DATA step programming, procedures, and the SQL interface in SAS software.
A complete analysis of the performance of a computer networking system must include examining resource consumption patterns of the network and the workstations attached to it as well as the service levels delivered by the system. The following sections describe suggested measurements for tracking resource consumption and indicate how to access these measurements with IT Service Vision software.
Each section that describes the kind of analysis you might
perform also includes a table to summarize useful information for
each relevant data collection software product.
Table 4.9 through Table 4.14 contain this
information:
These tables provide a workable but not exhaustive list of places to find useful performance measurements in IT Service Vision data. Many other measurements are also available; you can look through a more comprehensive list by selecting items on the Administration tab in the IT Service Vision window interface. In addition, your hardware vendor can provide more information about the devices and how their performance measurements are recorded.
The information in each of the small tables in the following sections is also summarized in a comprehensive table in Table 4.10
You must become familiar with the logical organization of the system to draw accurate conclusions about the performance of a distributed computer system and to select what factors to change to affect its performance. When you understand the general operation of the system, you can examine many measurements internal to the operation of the system (such as queue lengths, error counts, and paging rates). These measurements indicate what happens to the system under load. Analyzing this type of data helps you to arrive at sound recommendations for change.
Performance problems with client/server applications are more complicated to investigate than those of applications that run on a single system. You need to analyze the resource consumption of both the client and server and all networks connecting them. For example, if you receive complaints about poor response time at a workstation that uses NFS to access files on a remote file server, you might ask these questions to investigate the problem:
IT Service Vision software can help you analyze the problem by helping with data reduction and data presentation.
Consider this example. To get an overview of distributed analysis, you could examine
These collectors produce a voluminous quantity of data, and the measurements are difficult to correlate because the time stamps used by the collectors are different. IT Service Vision software addresses these problems by summarizing and reducing the data volume and by using standardized time stamps in the performance database for the data from all supported collectors.
IT Service Vision software also provides example reports and an interactive report builder to facilitate exploring the data after it is in your PDB. IT Service Vision formula variables can be used to calculate measurements that are not native to the collected data. For example, your data collection software may capture the number of packets in and the number of error packets; then, you can calculate the rate of errors=packetsIn/error_packets instead of reporting either of the collected variables.
Most SNMP data are reported in a data type called COUNT. These counters steadily accumulate the number of occurrences of an event such as packets out, packets in, or collisions. Because graphs of count data do not clearly show trends in the data, IT Service Vision software automatically converts count values to rates per second by subtracting the value of the measurement in the previous interval record from the current value and dividing by the length of the interval (DURATION). IT Service Vision software performs this conversion before it records the data in the PDB and displaces the old count value. You can plot or print these calculated rates without further processing.
The following sections describe measurements from the collectors supported by IT Service Vision software that help you analyze complex heterogeneous computer networks, workstations, and the applications that run on them.
To analyze the performance of a distributed computing network, you must analyze the individual workstations on the network. The four hardware resources that experience contention and thus affect performance of the workstation are the CPU, memory, disk, and network interface. In addition, major subsystems used by an application, such as NFS, often have performance measurements recorded for them individually. Thus, you can examine these subsystems separately to determine if they are a source of delay in the system.
Many of the devices and collectors supported by IT Service Vision software provide measurements that are appropriate for workstation performance analysis. Several examples of these measurements are discussed in the following sections.
To analyze the CPU for a workstation, consider these measurements:
CPU utilization is easiest to understand when it is reported as the average percent of time that a CPU is busy over some time interval. In SunNet Manager's host performance table (table SNMHPF, variable HPFCPUP), this measurement is a gauge, that is, a supposedly instantaneous reading of how busy the CPU is at a particular moment (like a speedometer). But, a CPU is either 100% busy or 100% idle at any given instant, so a true gauge reading is not very useful. Consequently, this measurement is usually a moving average calculated over a time interval that can range from a few seconds to a few minutes. The exact size of the interval varies from system to system and is not available to the monitor, so treating the CPU busy rate as a gauge is the practical compromise.
The rate at which the CPU is busy is not directly available from some collectors; they supply only the number of CPU seconds used in an interval. You can create a formula variable to calculate the CPU busy rate as CPU seconds divided by DURATION (the variable names vary with different collectors). For HP Measureware, use variable GLBCPSC from table PCSGLB to calculate the CPU busy rate:
cpubusy=glbcpsc/duration
Another commonly available measurement for the CPU is average ready-queue length. The ready-queue length, which is also called load average, measures the number of processes waiting to get CPU cycles. An increase greater than two or three in the length of the ready queue may indicate an overloaded CPU. Like CPU busy, the average ready-queue length is a moving average. Unlike CPU busy, you know the length of the interval for the moving average. Intervals are generally set for 1, 5, and 15 minutes for calculating the moving average of the queue length. The interval is part of the name of the statistic. For example, SunNet Manager collects ready-queue length statistics in its host performance table, SNMHPF, in the variables HPFAV1, HPFAV5, and HPFAV15.
An increase in the number of context switches in a system may indicate an overcommitment of some resource. Context switches can be correlated to other utilization measurements (such as disk I/O) to identify the problem resource. This measurement appears in SunNet Manager's host performance table (table SNMHPF, variable HPFCSWT).
Table 4.1 summarizes the measurements for analyzing workstation CPUs.
Table 4. 1 Measurements for Analyzing Workstation CPUs
Data Collection PDB Table Table Variable Software Name Description Name Variable Description
CPU SunNetMgr SNMHPF hostperf.data HPFCPUP Percent CPU busy HPFAV1 CPU queue length (1-min avg) HPFAV5 CPU queue length (5-min avg) HPFAV15 CPU queue length (15-min avg) HPFCSWT Context switches HP Measureware PCSGLB global data GLBCPSC CPU seconds PROBE/Net PRXCPU CPU CPUAV1 CPU queue length (1-min avg) CPUAV5 CPU queue length (5-min avg) CPUAV15 CPU queue length (15-min avg) CPUIDLP Percent CPU idle
Another resource that experiences contention and can affect workstation performance is memory utilization. Generally, performance data logs indirectly report memory utilization as average paging rates. An exception is HP Measureware, which provides memory wait times for processes and applications (tables PCSAPP and PCSPRO, variables APPSVM and PROSVM) and the queue length of the memory subsystem (table PCSGLB, variable GLBMEMQ). For other collectors, you can detect problems with memory overcommitment by tracking paging rates. Suspect a memory shortage if the rates become significantly higher than normal. Paging measurements are available in SunNet Manager's host performance table (table SNMHPF, variables HPFPGI, HPFPGO, HPFPSWI, and HPFPSWO). The PROBE/Net buffer cache table contains measurements indicating the effectiveness of the cache on the system (table PRXBUF, variables BUFRCAC and BUFWCAC).
Table 4.2 summarizes the measurements for analyzing workstation memory.
Table 4. 2 Measurements for Analyzing Workstation Memory
Data Collection PDB Table Table Variable Software Name Description Name Variable Description
MEMORY Accton ACCESA pacct data ACCKMIN Process memory use (K-core min) ACCMEMA Process average memory use ACCHP7 pacct data ACCKMIN Process memory use (K-core min) ACCMEMA Process memory use ACCRS6 pacct data ACCKMIN Process memory use (K-core min) ACCMEMA Process average memory use ACCSUN pacct data ACCKMIN Process memory use (K-core min) ACCMEMA Process average memory use HP Measureware PCSAPP application data APPSVM Application memory wait time PCSPRO process data PROSVM Process memory wait time PCSGLB global data GLBMEMQ Memory queue length SunNetMgr SNMHPF hostperf.data HPFPGI Pages read in HPFPGO Pages written out HPFPSWI Pages swapped in HPFPSWO Pages swapped out PROBE/Net PRXBUF buffer cache BUFRCAC cache hit ratio BUFWCAC write cache hit ratio PRXSYS system statistics SYSPINR page in rate SYSPOTR page out rate
Disk I/O, the level of disk activity, is another significant factor in workstation performance. Disk I/O measurements are generally recorded in performance data as inputs and outputs per second rather than as percentages. Measurements that report on the total disk activity of all disks on a host are available in SunNet Manager's host performance data (table SNMHPF, variable HPFDISK). Measurements that report on disk activity by disk volume appear in SunNet Manager's disk I/O table (table SNMIOD) and HP Measureware's global disk metrics table (table PCSGDK). You might also want to examine data on seek times. When seek times increase, they may indicate that data on the disk is fragmented. For this information, also check SunNet Manager's disk I/O table (table SNMIOD).
Table 4.3 summarizes the measurements for analyzing workstation disk I/O.
Table 4. 3 Measurements for Analyzing Workstation Disk I/O
Data Collection PDB Table Table Variable Software Name Description Name Variable Description
Disk I/O SunNetMgr SNMHPF hostperf.data HPFDISK Total system disk I/O SNMIOD iostat.disk IOD* Multiple variables pertaining to disk I/O (seek times, read/write rates and times, and so on) per disk volume PROBE/Net PRXDSK disk information DSK* Multiple variables pertaining to disk I/O (seek times, read/write rates and times, and so on) per disk volume. PRXFBL I/O load FBL* Multiple variables pertaining to disk I/O (seek times, read/write rates and times, and so on) per disk volume. HP Measureware PCSGDK global disk metrics GDK* Multiple variables pertaining to disk I/O (seek times, read/write rates and times, and so on) per disk volume.
Network interface I/O can also affect workstation performance. Large mainframe systems typically perform work on one host, access data on disks attached to that system, and use the network only to handle terminal traffic. Open computer systems depend heavily on their network interface to service disk data requests and to perform terminal I/O. Client/server applications and distributed processing, in general, have made it important to monitor both the network itself and the network interface on the processor. Network interface I/O rates appear in several collectors' data.
Note: In this and following discussions of SNMP data, you will see the symbol * as the prefix for some table names. These are table definitions for those SNMP data collectors (SunNet Manager, HP Network Node Manager, and IBM Network for AIX) that group the SNMP metrics they collect according to the MIB table definitions. IT Service Vision contains similar table and variable definitions for the tables of these collectors. These tables are labeled "SNMP data collector" below. For SNMP data gathered by SunNet Manager, substitute "S" for the prefix to the table names. For HP Network Node Manager and IBM Netview for AIX, substitute "H" for the prefix.
Table 4.4 summarizes the measurements for analyzing network interface I/O.
Table 4. 4 Measurements for Analyzing Network Interface I/O
Data Collection PDB Table Table Variable Software Name Description Name Variable Description
Network Interface I/O SunNetMgr SNMHPF hostperf.data HPFIPKT Input packets HPFOPKT Output packets SNME* Etherif.* Multiple variables SNML* layers.* Multiple variables SNMT* traffic.* Multiple variables HP Measureware PCSGLN global network GLN* Multiple variables interface
Client/server applications rely heavily on network resources, so when you analyze the performance of these applications, you must examine the utilization of the network as well as the workstation to investigate performance problems with these applications. To analyze network performance, you monitor
If you periodically examine measurements in each of these areas, you can establish baseline values that you can use to detect and address problems before they affect performance. To more readily understand the cause of the problem, you can also compare these baseline values to the values you gather when a problem occurs.
Many of the devices and collectors supported by IT Service Vision software provide network performance measurements. Several examples of these measurements are discussed in the following section.
You can gain insight into the source of network performance problems by examining the type of traffic on a network segment as well as the quantity of traffic. Two types of traffic that are monitored in MIB-II data are broadcast and non-broadcast (also called unicast) packets. Broadcast packets are addressed to every node; non-broadcast are addressed to only one node. A high ratio of broadcast to non-broadcast packets on a network could indicate that network segment bandwidth is being wasted by inappropriately forwarded broadcast messages. In a transparent or source-route bridged environment, broadcasts must be forwarded to all segments. However, many Ethernet networks can restrict broadcast message propagation and free considerable bandwidth with the judicious addition of intelligent routers or hubs. In addition to restricting the range of the broadcast packets, you can recover considerable useful bandwidth by investigating hosts that are generating excessive broadcasts. You may then be able to stop or slow the rate of packet generation by these hosts.
The SNMP MIB-II interface table (*N2IFT) contains packet counts for broadcast or nonbroadcast packets To convert the input variables to rates, divide the variable by the sum of the broadcast and nonbroadcast input packets; for output variables, divide by the sum of the output packets. These measurements are also available in TRAKKER and SPECTRUM data.
Table 4.5 summarizes the measurements for analyzing network traffic.
Table 4. 5 Measurements for Analyzing Network Traffic
Data Collection PDB Table Table Variable Software Name Description Name Variable Description
Network Traffic Patterns SNMP data *N2IFT ifInterfaces IFIUCP Input unicast collector packets/second IFINUCP Input broadcast packets/second IFOUCPT Output unicast packets/second IFONCTP Output broadcast packets/second SPECTRUM C* <device CSILOAD Percent of bandwidtch tables> utilized CCI* <Cisco LOCIBSC Input bits/second router (5-min avg) tables> <Cisco LOCOBSC Output bits/second router (5-min avg) tables> <Cisco LOCIPSC Input pkts/second router (5-min avg) tables> <Cisco LOCOPSC Output pkts/second router (5-min avg) tables> TRAKKER TKRSTS segment IPOSPR IP off-segment pkts statistics received/second IPOSPX IP off-segment pkts xmitted/second IPPKTS IP packets/second IPTRAN IP transit packets/second
Note: In this table, the symbol * is the prefix for table names. For SunNet Manager MIB-II SNMP data tables, substitute S for the prefix to the table names for IBM NetView and HP-Network Mode Mgr, substitue H..
Utilization is another measurement of performance for you to monitor to help analyze the network. You monitor the utilization of network segments and links to establish performance baselines, to compare individual samples to established baselines, and to assist in trend analysis and capacity planning. Use SNMP MIB-II interface I/O table (*N2iFT) to calculate link utilization. These measurements give the number of bytes sent and received on an interface. You can calculate other useful values from these variables.
Table 4.6 summarizes the measurements for analyzing network utilization.
Table 4. 6 Network Utilization
Data PDB Collection Table Variable Software Name Table Description Name Variable Description
Utilization SNMP data *N2IFT ifInterfaces IFIOCTS Input octets/second collector IFOOCTS Output octets/second IFSPEED Interface speed SPECTRUM C* <device CSILOAD Percent of bandwidth tables> utilized CCI* <Cisco LOCIBSC Input bits/second router (5-min avg) tables> <Cisco LOCOBSC Output bits/second router (5-min avg) tables> <Cisco LOCIPSC Input pkts/second router (5-min avg) tables> <Cisco LOCOPSC Output pkts/second router (5-min avg) tables> TRAKKER TKRSTS segment DLLBYTS Segment bytes/second statistics
Note: In this table, the symbol * is the prefix for table names. For SunNet Manager MIB-II SNMP data tables, substitute S for the prefix to the table names. For HP-OV, substitute H.
High error rates on a network segment have much the same effect on network performance as high utilization: they cause slow response time, low throughput, and, ultimately, congestion. Errors are reported for all protocols and at different protocol levels (for example, the interface level, IP level, and TCP level). Errors for different protocols are counted separately. That is, a UDP error does not affect the IP error counter. Because errors are counted separately, higher layers in the protocol do not necessarily count errors detected at the lower layers. For example, an IP address error is not counted in the TCP table (unless it causes a TCP retransmission to occur).
Errors generally appear as the rate of error packets per second in IT Service Vision data sets. You can monitor and compare these rates to baseline values to detect changes, or you may find it useful to convert these rates to a percentage of the total packets handled. For example, encountering 1000 error packets when 2000 packets were sent (50 percent errors) indicates a problem, but the same rate of errors, 1000 errors, is probably not significant if a million packets were sent.
Error counts on an interface are available in SNMP MIB-II interface tables (*N2iFT). Error percentage rates are also available as formula variables (ifInErrors/(IfInUCastPkts+ifInNUCastPkts) and ifOutErrors/(ifOutUCastPkts+ifOutNUCastPkts)). A high interface input error percentage may indicate a problem with the network software on a node on the network (for example, sending packets that are too large or too small) or transmission
clocking issues. Interface output errors generally indicate a problem with the physical network medium.
At the IP level, you can use bad input header and address error rates to determine if datagrams were discarded because of bad headers or an invalid destination address. These measurements are in SNMP MIB-II IP input table (*N2IP_). To convert these values to rates, divide them by the total datagrams received (ipInAddrErrors/ipInDelivers and ipInHdrErrors/ipInDelivers).
ICMP error rates appear in SNMP MIB-II ICMP tables (*N2ICM). To convert these rates to percentages, divide them by the total ICMP messages (icmpInErrors/icmpInMsgs or icmpOutErrors/icmpOutMsgs). If you compare the ICMP error percentage to the IP error percentage, you can determine how many of the errors were caused by ICMP-detected errors and how many had other causes. This may help you isolate the cause of the errors.
TCP error rates appear in SNMP MIB-II TCP table (N2TCP). To convert these variables to percentages, divide them by the total TCP segments (tcpInErrs/tcpInSegs or tcpRetransSegs/tcpOutSegs).
UCP error rates appear in SNMP MIB-II UDP table (*N2UDP). To convert these variables to percentages, divide them by the total UDP datagrams ((udpInErrors+udpNoPorts)/udpInDatagrams).
EGP error rates appear in SNMP MIB-II EGP table (*N2EGP). To convert these variables to percentages, divide them by the total EGP messages (egpInErrors/egpInMsgs or egpOutErrors/egpOutMsgs) and (egnInErrors/egnInMsgs or egnOutErrors/egnOutMsgs). The egpNeighTable (*N2EGN) provides rates of error messages by originator and, thus, can be used to isolate which EGP neighbor is causing the errors.
SNMP MIB-II SNMP groups (*N2SNP) count many different kinds of errors that are encountered when sending and receiving SNMP packets. To convert these rates to percentages, divide them by the total SNMP packet rate (SNMPINPKTS or SNMPOUTPKTS).
Table 4.7 summarizes the measurements for analyzing errors on the network.
Table 4. 7 Measurements for Analyzing Network Errors
Data PDB Collection Table Variable Software Name Table Description Name Variable Description
Error Rates SNMP data *N2IFT ifInterfaces IFIERRS Input errors/second collector IFIUCP Input unicast packets/sec IFINUCP Input broadcast packets/sec IFOERRS Output errors/second IFOUCPT Output unicast packets/sec IFONCTP Output broadcast packets/sec SNMP data *N2IP_ ip IPIADER Input address errors/sec collector IPIHDER Input header errors/sec IPIDELV Input packets delivered/sec SNMP data *N2ICM icmp ICIERRS Input errors/sec collector ICIMSGS Input messages/sec ICOERRS Output errors/sec ICOMSGS Output messages/sec SNMP data *N2TCP tcp TCPIERR Input errors/sec collector TCPORST Retransmissions/sec TCPINSG Input segments/sec TCPOTSG Output segments/sec SNMP data *N2UDP udp UDPIERR Input errors/sec collector UDPINDG Input datagrams/sec SNMP data *N2EGP egp EGPIERR Input errors/sec collector EGPIMSG Input messages/sec EGPOERR Output errors/sec EGPOMSG Output messages/sec SNMP data *N2EGN egn EGNIERR Input errors/sec collector EGNIMSG Input messages/sec EGNOERR Output errors/sec EGNOMSG Output messages/sec SNMP data *N2SNP snmp SNPIPKT Input packets/sec collector SNPI* Multiple variables pertaining to various SNMP input errors/second SNPOPKT Output packets/sec SNPO* Multiple variables pertaining to various SNMP output errors/second SPECTRUM C* <device CSISERT Soft error rate second tables> IPIADER Input address errors/second IPIHDER Bad header errors/second CSIPKRT Packet rate/second IPIDELV Datagrams delivered/second TRAKKER TKRSTS segment DLLEFRM Ethernet frames/second statistics ELLERRS Errors/second
Note: In this table, the symbol * is the prefix for table names. For SunNet Manager MIB-II SNMP data tables, substitute SM2 for the prefix to the table names. Use H for HP-OV data.
Monitoring network congestion is especially effective when you analyze network performance. Congestion in a network is defined as the point at which throughput on a segment or link is significantly degraded because data cannot be transmitted fast enough to avoid retransmission. Detecting congestion on a network is similar to detecting overutilization of a CPU. For networks, to locate congestion you monitor the length of the output queues associated with the network interfaces of the devices attached to that segment. Look for these two signs of congestion:
Network congestion is even more problematic than CPU overload. If a packet waits too long, is discarded, or is destroyed by a collision, it is retransmitted by higher-level software. This multiplies the load on the already-overloaded network.
Congestion is a symptom of a problem in the network rather than a cause of poor response time or low throughput. It is usually caused by overutilization or by error rates that are too high.
The number of packets discarded is available in SNMP MIB-II interface tables (table *N2IFT, variables in In Discards and if Out Discards). Interface output queue length table *N2TP variable IFOQLEN) and the number of retransmissions in a TCP/IP network (table *N2TCP, variable TCPRTSG) are also available in SNMP MIB-II data.
SPECTRUM data provides information on discards and retransmitted segments. Table 4.8 summarizes the measurements for analyzing congestion on the network.
Table 4. 8 Measurements for Analyzing Network Congestion
Data PDB Collection Table Variable Software Name Table Description Name Variable Description
Congestion SNMP data *N2IFT ifInterfaces IFODISC Discards on output collector IFIDISC Discards on input IFOQLEN Output queue length tcp IFODISC Discards on output TCPRTSG Segments retransmitted TCPRALG Retransmission algorithm SPECTRUM C* <device CSIIIND Interface input discards tables> CSIIOTD Interface output discards IPIDISC IP discards TCPRTSG Segments retransmitted TCPOTSG TCP segments transmitted
Note: In this table, the symbol * is the prefix for table names. For SunNet Manager MIB-II SNMP data tables, substitute SM2 for the prefix to the table names. Use H for HP-OV data.
Service-level reports can be used to monitor the effectiveness of any changes you make to your computer system as well as to establish an explicit service-level agreement with your user community. Though you may not have a formal, written service-level agreement with users at your site, an idea of what constitutes acceptable throughput and responsiveness of a system always exists in the minds of the users. Service-level reports clarify the expectations of all concerned and provide concrete proof to your customers of the progress you have made in improving performance.
System service levels are generally reported in four areas:
These measurements can also be reported to your users for the system as a whole or for a specific application.
Many of the devices and data collection software supported by IT Service Vision software provide measurements that are appropriate for service-level reporting. The following sections discuss several examples of these measurements.
System availability is reported as the percentage of time the system (or a major component of the system) is available to service the users' requests. SunNet Manager ping status (table SNMPGS, variable PGSRCHA) is a good source of availability information because the ICMP request acknowledge packet indicates whether a specific network device (workstation, router, and so on) is available and also shows whether it is reachable over the network. HP Measureware configuration information (table PCSCFG, all variables) contains information about the existence of hardware components at certain times (such as during a system boot) rather than periodic status reports on the components. Thus, HP Measureware configuration information can be used for inventory and maintenance contract purposes. Table 4.9 summarizes the measurements for analyzing availability.
Table 4. 9 Measurements for Analyzing Availability
Data PDB Collection Table Variable Software Name Table Description Name Variable Description
Availability SunNetMgr SNMPGS ping.stats PGSRCHA Reachable HP Measureware PCSCFG configuration CFG* Multiple variables data showing the status of various hardware components at boot SPECTRUM C* <device tables> CSICNST contact status
System response time is the amount of time between the completion of a request for service and the system's response to that request. For interactive work on open systems, this may be defined as the time between the user pressing a mouse button or return key and the system appearing to respond. Whether response time should be measured to the first or last character of the response is sometimes hotly debated in CPE circles, but you rarely have the choice of whether output time is included in the response time value in open systems networks; in addition, the first or last character is irrelevant for windowing applications that buffer all characters of the response and rewrite the entire screen at once.
Response times are generally reported for four types of work in your system:
Response times for individual commands (or processes) are widely available in UNIX systems in accton process data (IT Service Vision tables prefixed ACC, variable name ACCETM) and in HP Measureware process records (table PCSPRO, variables PRO1RSP and PRORUNT). Monitoring response time for individual commands does convey some information about the level of service being offered, but it is difficult to set up a consistently repeatable test in a heterogeneous, multitasking operating system environment. For example, accton data might be used to track the usage for large applications, such as the SAS System. But the responsiveness of the application may depend on external factors such as the file system or the network. This limitation is addressed somewhat by combining process response data with network response data generated by SNMP ping requests from SunNet Manager (table SNMPGS, variables PGSTAVG, PGSTMAX, and PGSTMIN).
Though not as commonly available as process elapsed-time data, response-time measurements associated with a particular, user-defined application are even more informative. These applications are usually more homogeneous, and you are probably more familiar with them. Statistics for these applications reflect both command response time within the application as well as response times for user-defined transaction boundaries, which are typically of more interest to your customers than that of the individual commands. Application boundaries allow data records to be logged at informative times within an application. For example, each PROC or DATA step within a SAS session is an application boundary. Transaction response times appear in HP Measureware application records (table PCSAPP, variables APP1RSP , APPURSP , and APPRUTM).
For noninteractive work (such as an application invoked by a cron task), response time equates to turnaround time, the time between submitting the command script and its completion. Because true batch mode is not native to UNIX environments, measurements that show turnaround time are not directly available.
You can calculate the approximate total turnaround time used by a script by gathering and summing accton process accounting data for the commands used in the batch script (variable ACCETM in tables ACCESA, ACCHP7, ACCRS6, and ACCSUN), but a command that forks may hide some resource usage. To overcome this problem, you can use special userids for each script, but this can become an administrative burden. Using a monitor (such as HP Measureware) that enables you to associate an application ID with certain commands is a cleaner solution. HP Measureware application data (table PCSAPP) provides information on application response and process time (variables APP1RSP, APPURSP and APPRUTM).
Response time can also mean the time it takes for various components within a system to respond to a request. For example, the response time of a disk drive is the average length of time it takes for the drive to respond to an I/O request. This type of internal measurement is useful for analyzing the performance of your system and locating bottlenecks, but it is not usually meaningful to your users. Disk response time appears in PROBE/Net disk information records (table PRXDSK, variable DSKRESP).
Table 4.10 summarizes the measurements for analyzing response time.
Table 4. 10 Measurements for Analyzing Response Time
Data PDB Collection Table Variable Software Name Table Description Name Variable Description
Response Time Accton ACCESA pacct data ACCETM Process elapsed time ACCHP7 pacct data ACCETM Process elapsed time ACCRS6 pacct data ACCETM Process elapsed time ACCSUN pacct data ACCETM Process elapsed time HP Measureware PCSAPP application data APP1RSP Application first response APPURSP Application transaction response APPRUTM Application runtime PCSPRO process data PRO1RSP Process first response PRORUNT Process runtime SunNetMgr SNMPGS ping.stats PGSTAVG Average roundtrip time PGSTMAX Maximum roundtrip time PGSTMIN Minimum roundtrip time PROBE/Net PRXDSK disk info DSKRESP Disk response time
Another service level of interest is system throughput. System throughput is the rate at which requests for work are serviced by a system. Up to a certain point, throughput generally increases as the workload on a system increases. Eventually, your system reaches the point where the overhead of managing additional tasks causes less work to be accomplished.
You monitor throughput for two primary reasons:
In addition to balancing the throughput to avoid excessive overhead, you must also keep track of the response time. Unfortunately, throughput and response time are almost always inversely related, so improving one degrades the other. You can determine the point at which response time is not significantly degraded but throughput is still kept high by plotting the measurements.
Examples of throughput measurements are readily available in open systems networks. They include accton process accounting data (tables prefixed ACC, nearly all variables), HP Measureware process and application records (tables PCSPRO and PCSAPP, nearly all variables), and PROBE/Net process disk access records (table PRXPDR, variables DPRPCNT, DPRRDS, and DPRWRTS).
Table 4.11 summarizes the measurements for analyzing throughput.
Table 4. 11 Measurements for Analyzing Throughput
Data PDB Collection Table Variable Software Name Table Description Name Variable Description
Throughput Accton ACCESA pacct data ACCUTM Process user CPU time ACCSTM Process system CPU time ACCRW Process disk I/O ACCHP7 pacct data ACCUTM Process user CPU time ACCSTM Process system CPU time ACCRW Process disk I/O ACCRS6 pacct data ACCUTM Process user CPU time ACCSTM Process system CPU time ACCRW Process disk I/O ACCSUN pacct data ACCUTM Process user CPU time ACCSTM Process system CPU time ACCRW Process disk I/O HP Measureware PCSPRO process data PROCPSC Process CPU time PRODKTO Process disk I/O PCSAPP application data APPCPSC Application CPU seconds APPCTNO Application process count APPDKIO Application disk I/O APPTRNO Application transactions PROBE/Net PRXDPR process disk DPRPCNT Process percent disk ops DPRRDS Process disk reads DPRWRTS Process disk writes
System utilization of a resource can be either
Why should you examine utilization as part of service-level reporting? Response time and throughput are clearly evident to your users; therefore, it makes sense to report them when you describe the service the system delivers. However, utilization is usually invisible to users of the system.
The reason for analyzing utilization is to maximize the use of the system and provide this information to the people who purchase the equipment. Keeping system resources busy (especially the expensive ones) is a high priority to those who pay for the resources. You also need to measure utilization to be able to effectively plan capacity. These measurements help you to locate bottlenecks and to project the effect of future workloads on response time and throughput. Monitoring utilization of major system components also helps you decide what additional acquisitions or system design changes are appropriate to improve system performance.
The most appropriate utilization measurements to monitor are those for the expensive, well-known system components, such as CPU utilization, disk I/O rates for a server, or bytes per second on a network communication line. Measurements for smaller components, such as port utilization on a router, are not generally as appropriate to include in service-level reports to users and management. Examples of utilization measurements that are appropriate to track for an audience of users and management are SunNet Manager host performance data (table SNMHPF, variables HPFCPUP and HPFDISK) and SNMP MIB-II data on a hub or router (table *N2IFT, variables iFiOCTS/iFSPEED and iFOOCTS/iFSPEED).
Utilization data is usually reported as rates of usage. A rate does not convey the capacity of the device, however, so it is not clear what the rate really means. One method of analyzing this data is to calculate the percentage of the device's top speed. You can convert rates to percentages by defining a IT Service Vision formulat variable that divides the rate by the nominal capacity of the device or line. You may prefer to record baseline values for these rates and then simply compare rates to the baseline values on a regular basis to spot exceptions or new trends in the rates.
Table 4.12 summarizes the measurements for analyzing utilization.
Table 4. 12 Measurements for Analyzing Utilization
Data PDB Collection Table Variable Software Name Table Description Name Variable Description
Utilization SunNetMgr SNMHPF hostperf.data HPFCPUP Percent CPU busy HPFDISK Disk I/O's/second SNMP data *N2IFT ifInterfaces IFOOCTS Output octets/second collector IFIOCTS Input octets/second IFSPEED Line speed of interface TRAKKER TKRSTS segment statistics DLLBYTS Segment bytes/second
Note: In this table, the symbol * is the prefix for table names. For SunNet Manager MIB-II SNMP data tables, substitute SM2 for the prefix to the table names. Use H for H P-OV data.
For most users, the most interesting indicator of service is the performance (the responsiveness and throughput) of the application they use. For many users, the performance of the application is more important than system service levels. Specific measurements for user applications are collected by HP Measureware (table PCSAPP, variables APP1RST and APPRUTM for response time and variables APPCPSC, APPCTNO, APPDKIO, and APPTRNO for throughput). Other collectors record process or program service levels that you can use to calculate application service levels.
Table 4.13 summarizes the measurements for analyzing user applications.
Table 4. 13 Measurements for Analyzing User Applications
Data PDB Collection Table Variable Software Name Table Description Name Variable Description
User Applications Accton ACCESA, pacct data ACC* Multiple variables ACCHP7, pertainint to resource ACCRS6, consumption (CPU ACCSUN seconds, disk I/O/s, net I/O's, memory utilization) HP Measureware PCSPRO process data PRO* Multiple variables to resource consumption (CPU seconds, disk I/O's, net I/O's, memory utilization) PCSAPP application APP* Multiple variables data pertaining to resource consumption (CPU seconds, disk I/O's, net I/O's, memory utilization) PROBE/Net PRXDPR process disk DPRPCNT Process percent access disk ops DPRRDS Process disk reads DPRWRTS Process disk writes PRXPGM program PGM* Multiple variables information pertaining to resource consumption program TRAKKER TKRDLL Data link table APPBYTS Application bytes/second APPPKTS Application packets/second
You can monitor the resource consumption of applications and individual processes on your system to determine which applications and processes consume the most resources. These high-resource applications are good targets for further analysis. Improving these high-resource applications can significantly improve system response time.
Information on processes is recorded in accton process data (tables ACC*) and in HP Measureware process records (table PCSPRO). Information on user-identified applications is recorded in HP Measureware application records (table PCSAPP).
In addition to general purpose process and application information, some data collection software records performance measurements for system applications such as NFS. SunNet Manager records performance information on NFS in its remote procedure call tables (table SNMRNC and table SNMRPS). HP Measureware records this information in its global table (table PCSGLB, variable GLBNRQ). You can compare the values of the call rate variables in these tables to known baseline values to see if NFS is busier than normal when a problem occurs.
Table 4.14 summarizes the measurements for analyzing system applications.
Table 4. 14 Measurements for Analyzing System Applications
Data PDB Collection Table Variable Software Name Table Description Name Variable Description
User Applications Accton ACCESA, pacct data ACC* Multiple variables ACCHP7, pertaining to resource ACCRS6, consumption (CPU seconds, ACCSUN disk I/O's, net I/O's, memory utilization) HP Measureware PCSPRO process data PRO* Multiple variables to resource consumption (CPU seconds, disk I/O's, net I/O's, memory utilization) PCSAPP application data APP* Multiple variables pertaining to resource consumption (CPU seconds, disk I/O's, net I/O's, memory utilization) PROBE/Net PRXDPR process disk DPRPCNT Process percent disk ops DPRRDS Process disk reads DPRWRTS Process disk writes PRXPGM program PGM* Multiple variables information pertaining to resource consumption for a program TRAKKER TKRDLL Data link table APPBYTS Application bytes/second APPPKTS Application packets/second System Applications HP Measureware PCSGLB global data GLBNRQ NFS queue length SunNetMgr SNMRNC rpcnfs.client RNC* Multiple variables pertaining to NFS client activity (call and wait times, errors, and so on) SNMRNS rpcnfs.server RNS* Multiple variables pertaining to NFS server activity (call and wait times, errors, and so on) PROBE/Net PRXNFS Client NFS table NFSREQP % requests of all clients NFSRWSV Service time (read/write) NFSTIMP % server time of all clients PRXPCS User/proc NFS PCVSARSP Average response time table
This section summarizes the tables included in the separate sections earlier in the chapter. The tables contain this information:
These tables provide a workable but not exhaustive list of places to find useful performance measurements in IT Service Vision data.
Table 4. 15 Measurements for Analyzing Service Levels
Data PDB Collection Table Variable Software Name Table Description Name Variable Description
Availability SunNetMgr SNMPGS ping.stats PGSRCHA Reachable HP Measureware PCSCFG configuration data CFG* Multiple variables showing the status of various hardware components at boot SPECTRUM C* <device tables> CSICNST contact status Response Time Accton ACCESA pacct data ACCETM Process elapsed time ACCHP7 pacct data ACCETM Process elapsed time ACCRS6 pacct data ACCETM Process elapsed time ACCSUN pacct data ACCETM Process elapsed time HP Measureware PCSAPP application data APP1RSP Application first response APPURSP Application transaction response APPRUTM Application runtime PCSPRO process data PRO1RSP Process first response PRORUNT Process runtime SunNetMgr SNMPGS ping.stats PGSTAVG Average roundtrip time PGSTMAX Maximum roundtrip time PGSTMIN Minimum roundtrip time PROBE/Net PRXDSK disk info DSKRESP Disk response time Throughput Accton ACCESA pacct data ACCUTM Process usr CPU time ACCSTM Process system CPU time ACCRW Process disk I/O ACCHP7 pacct data ACCUTM Process user CPU time ACCSTM Process system CPU time ACCRW Process disk I/O ACCRS6 pacct data ACCUTM Process user CPU time ACCSTM Process system CPU time ACCRW Process disk I/O ACCSUN pacct data ACCUTM Process user CPU time ACCSTM Process system CPU time ACCRW Process disk I/O HP Measureware PCSPRO process data PROCPSC Process CPU time PRODKTO Process disk I/O PCSAPP application data APPCPSC Application CPU seconds APPCTNO Application process count APPDKIO Application disk I/O APPTRNO Application transactions PROBE/Net PRXDPR process disk DPRPCNT Process percent disk ops access DPRRDS Process disk reads DPRWRTS Process disk writes Utilization SunNetMgr SNMHPF hostperf.data HPFCPUP Percent CPU busy HPFDISK Disk I/O's/second TRAKKER TKRSTS segment DLLBYTS Segment bytes/second statistics
Table 4. 16 Measurements for Analyzing the Workstation
Data PDB Collection Table Variable Software Name Table Description Name Variable Description
CPU SunNetMgr SNMHPF hostperf.data HPFCPUP Percent CPU busy HPFAV1 CPU queue length (1-min avg) HPFAV5 CPU queue length (5-min avg) HPFAV15 CPU queue length (15-min avg) HPFCSWT Context switches HP Measureware PCSGLB global data GLBCPSC CPU seconds PROBE/Net PRXCPU CPU CPUAV1 CPU queue length (1-min avg) CPUAV5 CPU queue length (5-min avg) CPUAV15 CPU queue length (15-min avg) CPUIDLP Percent CPU idle MEMORY Accton ACCESA pacct data ACCKMIN Process memory use (K-core min) ACCMEMA Process average memory use ACCHP7 pacct data ACCKMIN Process memory use (K-core min) ACCMEMA Process average memory use ACCRS6 pacct data ACCKMIN Process memory use (K-core min) ACCMEMA Process average memory use ACCSUN pacct data ACCKMIN Process memory use (K-core min) ACCMEMA Process average memory use HP Measureware PCSAPP application APPSVM Application memory data wait time PCSPRO process data PROSVM Process memory wait time PCSGLB global data GLBMEMQ Memory queue length SunNetMgr SNMHPF hostperf.data HPFPGI Pages read in HPFPGO Pages written out HPFPSWI Pages swapped in HPFPSWO Pages swapped out PROBE/Net PRXBUF buffer cache BUFRCAC Cache hit ratio BUFWCAC Write cache hit ratio PRXSYS system SYSPINR Page in rate statistics SYSPOTR Page out rate Disk I/O SunNetMgr SNMHPF hostperf.data HPFDISK Total system disk I/O SNMIOD iostat.disk IOD* Multiple variables pertaining to disk I/O (seek times, read/write rates and times, and so on) per disk volume PROBE/Net PRXDSK disk information DSK* Multiple variables pertaining to disk I/O (seek times, read/write rates and times, and so on) per disk volume PRXFBL I/O load FBL* Multiple variables pertaining to disk I/O (seek times, read/write rates and times, and so on) per disk volume HP Measureware PCSGDK global disk GDK* Multiple variables pertaining metrics to disk I/O (seek times, read/write rates and times, and so on) per disk volume Network Interface I/O SunNetMgr SNMHPF hostperf.data HPFIPKT Input packets HPFOPKT Output packets SNME* Etherif.* Multiple variables SNML* layers.* Multiple variables SNMT* traffic.* Multiple variables HP Measureware PCSGLN global network GLN* Multiple variables interface
Table 4. 17 Measurements for Analyzing Network Levels
Data PDB Collection Table Variable Software Name Table Description Name Variable Description
Congestion SNMP data *N2IFT ifInterfaces IFODISC Discards on output collector IFIDISC Discards on input IFOQLEN Output queue length tcp IFODISC Discards on output TCPRTSG Segments retransmitted TCPRALG Retransmission algorithm SPECTRUM C* <device CSIIIND Interface input discards tables> CSIIOTD Interface output discards IPIDISC IP discards TCPRTSG Segments retransmitted TCPOTSG TCP segments transmitted Error Rates SNMP data *N2IFT ifInterfaces IFIERRS Input errors/second collector IFIUCP Input unicast packets/sec IFINUCP Input broadcast packets/sec IFOERRS Output errors/second IFOUCPT Output unicast packets/sec IFONCTP Output broadcast packets/sec SNMP data *N2IP_ ip IPIADER Input address errors/sec collector IPIHDER Input header errors/sec IPIDELV Input packets delivered/sec SNMP data *N2ICM icmp ICIERRS Input errors/sec collector ICIMSGS Input messages/sec ICOERRS Output errors/sec ICOMSGS Output messages/sec SNMP data *N2TCP tcp TCPIERR Input errors/sec collector TCPORST Retransmissions/sec TCPINSG Input segments/sec TCPOTSG Output segments/sec SNMP data *N2UDP udp UDPIERR Input errors/sec collector UDPINDG Input datagrams/sec SNMP data *N2EGP egp EGPIERR Input errors/sec collector EGPIMSG Input messages/sec EGPOERR Output errors/sec EGPOMSG Output messages/sec SNMP data *N2EGN egn EGNIERR Input errors/sec collector EGNIMSG Input messages/sec EGNOERR Output errors/sec EGNOMSG Output messages/sec SNMP data *N2SNP snmp SNPIPKT Input packets/sec collector SNPI* Multiple variables pertaining to various SNMP input errors/second SNPOPKT Output packets/sec SNPO* Multiple variables pertaining to various SNMP output errors/second SPECTRUM C* <device CSISERT Soft error rate/second tables> IPIADER Input address errors/second IPIHDER Bad header errors/second CSIPKRT Packet rate/second IPIDELV Datagrams delivered/second TRAKKER TKRSTS segment DLLEFRM Ethernet frames/second statistic ELLERRS Errors/second Utilization SNMP data *N2IFT ifInterfaces IFOOCTS Output octets/second collector IFIOCTS Input octets/second IFSPEED Line speed of interface SPECTRUM C* <device CSILOAD Percent of bandwidth tables> utilized CCI* <Cisco LOCIBSC Input bits/second router (5-min avg) tables> <Cisco LOCOBSC Output bits/second router (5-min avg) tables> <Cisco LOCIPSC Input pkts/second router (5-min avg) tables> <Cisco LOCOPSC Output pkts/second router (5-min avg) tables> Network Traffic Patterns SNMP data *N2IFT ifInterfaces IFIUCP Input unicast collector packets/second IFINUCP Input broadcast packets/second IFOUCPT Output unicast packets/second IFONCTP Output broadcast packets/second SPECTRUM C* <device CSILOAD Percent of bandwidth tables> utilized CCI* <Cisco LOCIBSC Input bits/second router (5-min avg) tables> <Cisco LOCOBSC Output bits/second router (5-min avg) tables> <Cisco LOCIPSC Input pkts/second router (5-min avg) tables> <Cisco LOCOPSC Output pkts/second router (5-min avg) tables> TRAKKER TKRSTS segment IPOSPR IP off-segment pkts statistics received/second IPOSPX IP off-segment pkts xmitted/second IPPKTS IP packets/second IPTRAN IP transit packets/second
Table 4.18 Measurements for Analyzing Applications
Data PDB Collection Table Variable Software Name Table Description Name Variable Description
User Applications Accton ACCESA, pacct data ACC* Multiple variables ACCHP7, pertaining to resource ACCRS6, consumption (CPU seconds, ACCSUN disk I/O's, net I/O's, memory utilization) HP Measureware PCSPRO process data PRO* Multiple variables to resource consumption (CPU seconds, disk I/O's, net I/O's, memory utilization) PCSAPP application APP* Multiple variables data pertaining to resource consumption (CPU seconds, disk I/O's, net I/O's, memory utilization) PROBE/Net PRXDPR process disk DPRPCNT Process percent disk ops access DPRRDS Process disk reads DPRWRTS Process disk writes PRXPGM program PGM* Multiple variables information pertaining to resource consumption for a program TRAKKER TKRDLL Data link table APPBYTS Application bytes/second APPPKTS Application packets/second System Applications HP Measureware PCSGLB global data GLBNRQ NFS queue length SunNetMgr SNMRNC rpcnfs.client RNC* Multiple variables pertaining to NFS client activity (call and wait times, errors, and so on) SNMRNS rpcnfs.server RNS* Multiple variables pertaining to NFS server activity (call and wait times, errors, and so on) PROBE/Net PRXNFS Client NFS table NFSREQP % requests of all clients NFSRWSV Service time (read/write) NFSTIMP % server time of all clients PRXPCS User/proc NFS PCSARSP Average response time table
Before you begin using the reports in IT Service Vision software, you need to understand how the reports classify data, what types of tables contain the data, and how variables can be used to group data. The following sections provide an overview of this information. For more information on table types, variables, or using statistics to summarize data, refer to the Macro reference, which is available from the following path in the IT Service Vision interface.
OnlineHelp -> Documentation Contents -> Usage -> Macro Reference
Performance data can be classified by its
Nominal variables have a discrete set of values. For example, shift is a nominal measure consisting of a set of labels defining the work shifts within an organization (for example, Prime Time or Off-Hour). Character data always have a nominal measurement level.
Interval measurements are numeric data that vary across a continuous range. Differences between values are significant and the data will have an inherent order. Network Error rate or total number of disk I/O's are examples of interval data measurements.
Discrete scales have a fixed set of values. Continuous data can have an infinite number of values. Numeric CPE data is typically measured on a continuous scale.
One additional characteristic of performance data is time. Most performance measurements are collected sequentially and represent a quantity over an interval of time. You can detect significant trends or cycles by plotting the collected data in time-sequence.
IT Service Vision software groups performance measurements into two types of tables: interval and event. Interval tables contain performance measurements that represent the quantity or count of a resource that occurs over time. Most data collected and analyzed with IT Service Vision software are stored in interval tables.
Event tables contain the amount of resources consumed during the life of the event. Event table records are recorded only at the termination of the event, whereas interval tables typically contain data sampled at regular intervals. Since many reports analyze data over a time period, you must be careful when you select reports for event tables. Be sure to always specify PERIOD=ASIS for event tables.
When you use the Reporting facility to create reports, or when you invoke one of the report macros, you can specify BY and class variables to change the appearance of the report.
BY variables group observations with the same value for the BY variable. For example, if you specify MACHINE as a BY variable, all observations with the same value for machine are treated as a single group. When you specify BY variables for a report, the system generates a separate analysis for each group defined by the values of the BY variables. SAS procedures expect the input dataset to be sorted in order of the BY variables. Graphical procedures typically produce a separate plot for each unique value of the BY variable. The axis for all plots is based on the maximum and minimum values of the analysis variable in the data population.
Class variables can also group data. Class variables are typically used to summarize observations that contain the same value for the class variable. The CLASS statement identifies class variables, which typically have a small number of discrete values or unique levels. Graphical procedures produce one plot for all class values and assign a unique symbol or color to distinguish each of the values of the class variables.
Typically, you use either MACHINE or SHIFT as the class or BY variable. If you use shift as a BY variable, it produces one plot or analysis section per shift, which enables you to distinguish load patterns unique to prime or off-hour work shifts. If you specify MACHINE as a class variable, a one-Y-axis plot produces one line (color or symbol group) per machine in the data. From this plot you can see performance patterns for individual machines in comparison to other machines included in the data.
Summarizing measured data is one of the most common problems faced by the performance analyst. A monitor may produce hundreds or even millions of records. You need to use descriptive statistics or percentiles to generalize and compare the collected information. IT Service Vision software provides two principal methods for summarizing data: ad-hoc summary statistics, which are available in the Reporting facility, and performance database reduction.
In addition, base SAS software provides several procedures, such as CORR, FREQ, MEANS, SUMMARY, and UNIVARIATE, which calculate descriptive statistics.
PDB reductions, the Reporting facility in IT Service Vision software, and base SAS statistical procedures all support a number of descriptive statistics. For more information refer to the statistical codes in the Macro Reference, which is available from this path the IT Service Vision window interface:
OnlineHelp -> Documentation Contents -> Usage -> Macro Reference
In addition to the capabilities available in IT Service Vision software, the SAS System provides a wealth of analytical tools. Several of these tools are discussed here and in the following sections.
The UNIVARIATE procedure in base SAS software provides a comprehensive statistical analysis of the distribution of performance measurements, including
The CORR procedure, also in base SAS software, provides insight into variable relationships.
SAS/INSIGHT software produces most descriptive and distribution statistics in an interactive, multi-window exploration format.
Several SAS procedures produce descriptive statistics (for example, the FREQ, MEANS, and UNIVARIATE procedures), but the UNIVARIATE procedure produces the most extensive summary. For example, to summarize the CPU usage for all machines collected by SunNet Manager hostperf, execute this step:
proc univariate data=detail.snmhpf plot; var hpfcpup; id machine; title '% CPU Usage'; run;
The PROC UNIVARIATE statement asks for a summary of the detail HOSTPERF dataset and requests a histogram with the PLOT option. The VAR statement selects the variable HPFCPUP (% CPU usage), and the ID statement selects MACHINE to identify the extreme observations. Finally, the TITLE statement generates a descriptive title on each page of output.
The moments section of the output contains a number of descriptive statistics. Descriptive statistics provide insight into the character and range of the data and are essential to understanding complex data distributions and relationships. Mean, or the arithmetic average, is the most common measure. N is the number of observations with nonmissing values. Variance and Std Dev are measures of the variance within the data. The kurtosis is a measure of the tendency for the data distribution to be spread out more on one side of the mean than the other. Most of the other statistics are normality and distribution tests. Refer to The SAS Procedures documentation for yur current version of SAS, for a definition of these statistics.
The mean presents the average value within the data population and is commonly used to summarize performance measurements. However, does the mean represent 5 samples or 1000? N provides the answer by displaying the number of nonmissing values used to calculate the mean. If presented with a mean of 5 for 100 samples, is the value 5 for all 100 samples, or is there a large variability within the samples? Variance, standard deviation, and coefficient of variation all provide additional information about the variability within the data sample. Range, the maximum subtracted from the minimum, can also be used.
The quantiles section of the output gives distribution information in the form of the following percentiles:
The range is the difference between the largest and smallest value.
For %CPU usage, the quantiles section clearly shows a positive skewness. This can be verified by the positive value of kurtosis (.55569). The histogram on the fourth page of output shows this skewing.
The Extremes section lists the five lowest and five highest values in the population. The ID is the value of the ID value for the observation selected. In this case, all extreme values are represented by machines SOL and SUPERNOVA.
Finally, if your data contain missing values, the Missing values section displays the label used for missing values (in this case, a period), the count of missing values, and the percentage of missing values to the number of observations in the sample.
The CORR procedure, a base SAS procedure, measures the strength of linear relationships between two variables. If one variable can be expressed exactly as a linear function of another variable, then the correlation is 1. If a variable is inversely related, then the correlation value will be -1. If there is no correlation between two variables, then the correlation value will be 0. For example, to determine if CPU usage, disk I/O, interface packet traffic, and interface output collisions show a strong relationship, execute the following statements:
proc corr data=detail.snmhpf noprob nomiss; var hpfcpup hpfdisk hpfopkt hpfcoll hpfipkt; run;
The PROC CORR statement asks for a correlation analysis of the detail dataset HOSTPERF; the VAR statement selects the variables HPFCPUP HPFDISK, HPFOPKT, HPFCOLL, and HPFIPKT. The NOMISS option drops an observation if any missing value exists for any requested variable. The NOPROB option suppresses printing of significant probabilities. For more information on the CORR procedure and its options, see the SAS Procedures Guide for your current version of SAS.
SAS/INSIGHT software is an interactive and graphical software package for data exploration. Descriptive statistics, percentiles, charts, and plots can be produced and linked across multiple analysis windows. SAS/INSIGHT software also produces correlation matrices, scatter plots, data transforms, and the general linear model analysis for data fitting, distribution tests, and regression analysis. For more information, refer to the SAS/INSIGHT User's Guide for your current version of SAS software..