Resource
|
Alert name
|
Description
|
---|---|---|
Linux
|
CPU Count
|
Triggered if the number
of CPUs on the platform changes. This alert indicates a possible hardware
problem.
|
CPU Usage >70
|
Triggered if the overall
CPU usage in the system exceeds 70%.
|
|
CPU Usage >95
|
Triggered if the overall
CPU usage in the system exceeds 95%.
|
|
Pct Free Memory
|
Triggered if the percentage
of free memory falls below 20% of the maximum free memory.
|
|
Pct Free Swap
|
Triggered if the percentage
of free swap memory falls below 20% of the maximum free swap memory
|
|
Swap Out Rate
|
Triggered if the number
of pages swapped out of memory exceeds 20% of the baseline value of
page swaps.
This alert indicates
that your system is memory constrained. Swapping occurs when the system
requires more memory than is physically available.
|
|
TCP Attempt Fails
|
Triggered if the number
of failed attempts to connect to the TCP service exceeds 20% of the
baseline value of attempted connections. The number of failed attempts
should normally be close to zero.
|
|
TCP In Errors
|
Triggered if the number
of TCP interface errors exceeds 20% of the baseline value of TCP interface
requests. The number of TCP interface errors should normally be close
to zero.
|
|
Zombie Processes
|
Triggered if the number
of zombie processes exceeds 20% of the baseline value of total processes.
Zombie processes are processes that have completed execution but still
have entries in the process table. This alert indicates an application
problem.
|
|
Win32
|
CPU Count
|
Triggered if the number
of CPUs on the platform changes. This alert indicates a possible hardware
problem.
|
CPU Usage >70
|
Triggered if the overall
CPU usage in the system exceeds 70%.
|
|
CPU Usage >95
|
Triggered if the overall
CPU usage in the system exceeds 95%.
|
|
Pct Free Memory
|
Triggered if the percentage
of free memory falls below 20% of the maximum free memory.
|
|
Pct Free Swap
|
Triggered if the percentage
of free swap memory falls below 20% of the maximum free swap memory.
|
|
Swap Out Rate
|
Triggered if the number
of pages swapped out of memory exceeds 20% of the baseline value of
page swaps.
This alert indicates
that your system is memory constrained. Swapping occurs when the system
requires more memory than is physically available.
|
|
TCP Attempt Fails
|
Triggered if the number
of failed attempts to connect to the TCP service exceeds 20% of the
baseline value of attempted connections. The number of failed attempts
should normally be close to zero.
|
|
TCP In Errors
|
Triggered if the number
of TCP interface errors exceeds 20% of the baseline value of TCP interface
requests. The number of TCP interface errors should normally be close
to zero.
|
|
Zombie Processes
|
Triggered if the number
of zombie processes exceeds 20% of the baseline value of total processes.
Zombie processes are processes that have completed execution but still
have entries in the process table. This alert indicates an application
problem.
|
|
AIX
|
CPU Count
|
Triggered if the number
of CPUs on the platform changes. This alert indicates a possible hardware
problem.
|
CPU Usage >70
|
Triggered if the overall
CPU usage in the system exceeds 70%.
|
|
CPU Usage >95
|
Triggered if the overall
CPU usage in the system exceeds 95%.
|
|
Pct Free Memory
|
Triggered if the percentage
of free memory falls below 20% of the maximum free memory.
|
|
Pct Free Swap
|
Triggered if the percentage
of free swap memory falls below 20% of the maximum free swap memory.
|
|
Swap Out Rate
|
Triggered if the number
of pages swapped out of memory exceeds 20% of the baseline value of
page swaps.
This alert indicates
that your system is memory constrained. Swapping occurs when the system
requires more memory than is physically available.
|
|
TCP Attempt Fails
|
Triggered if the number
of failed attempts to connect to the TCP service exceeds 20% of the
baseline value of attempted connections. The number of failed attempts
should normally be close to zero.
|
|
TCP In Errors
|
Triggered if the number
of TCP interface errors exceeds 20% of the baseline value of TCP interface
requests. The number of TCP interface errors should normally be close
to zero.
|
|
Zombie Processes
|
Triggered if the number
of zombie processes exceeds 20% of the baseline value of total processes.
Zombie processes are processes that have completed execution but still
have entries in the process table. This alert indicates an application
problem.
|
|
SAS Application Server
Tier
|
Metadata Cluster Avail
|
Triggered if the availability
of the SAS Metadata Server cluster falls below 100%.
|
Metadata Quorum Chg
|
Triggered if the SAS
Metadata Server cluster is not in quorum.
|
|
SAS License Termination
|
Triggered if there are
fewer than 30 days remaining before the SAS license terminates. If
this alert is triggered, it recurs once every 12 hours.
|
Resource
|
Alert name
|
Description
|
---|---|---|
HQ Agent
|
HQ Agent ERROR message
in log
|
Triggered if an error
message appears in the HQ agent log.
|
HQ Agent Memory
|
Triggered if the JVM
free memory for the HQ agent falls below 14.3 MB.
|
|
HQ Time Agent Spends
Fetching Metrics
|
Triggered if the time
that the HQ agent spends collecting metric data exceeds five seconds
per minute. This alert might indicate an overloaded agent or a problem
with the scheduling thread. These problems might be present even with
values for this metric greater than 3 or 4 seconds per minute.
|
|
PostgreSQL 9.x
|
PostgreSQL 9.x - Availability
|
Triggered if the availability
of PostgreSQL falls below 100%.
|
pg: Buffer Hits % <50%
of Max
|
Triggered if the number
of buffer hits is less than 50% of the total block read requests.
A buffer hit is a block read request that is avoided because the block
is in the buffer cache). This alert might indicate that more system
memory is needed or that you need to adjust the shared buffers.
|
|
pg: Commits per Second
>20
|
Triggered if the number
of commits to PostgreSQL is greater than 20 per second. This alert
indicates that you might need to provide a durable write cache to
prevent potential data loss.
|
|
pg: Connection Usage
>80% of Max
|
Triggered if the number
of connections used is greater than 80% of the maximum number allowed.
This alert indicates that you might need to increase the maximum number
of available connections in order to prevent denial of service.
|
|
pg: Memory Size changed
|
Triggered if the memory
used by PostgreSQL falls below 90% of the baseline value. If this
condition is met, the alert is triggered once every 12 hours.
|
|
SAS Config Level Directory
9.4
|
SASConfig Disk Use %
> 95
|
Triggered if the volume
that contains the SASConfig directory is more than 95% full.
|
SAS Connect Spawner
9.4
|
Connect Spawner Health
% < 100
|
Triggered if the health
of the SAS Connect Spawner falls below 100%. This metric is the equivalent
of the Validate command in SAS Management
Console, and confirms that the server is responding.
|
SAS Home Directory 9.4
|
SASHome Disk Use % >
95
|
Triggered if the volume
that contains the SASHome directory is more than 95% full.
|
SAS Metadata Server
9.4
|
Metadata - Availability
|
Triggered if the availability
of the SAS Metadata Server falls below 100%.
|
Metadata Major (page)
Faults
|
Triggered if the number
of page faults that require disk activity is above 10% of the baseline
value of total page faults. This alert might indicate a memory constraint
that is causing slow performance.
|
|
Metadata Server ERROR
message in log
|
Triggered if an error
message appears in the SAS Metadata Server log.
|
|
Metadata Server Health
% < 100
|
Triggered if the health
of the SAS Metadata Server falls below 100%. This metric is the equivalent
of the Validate command in SAS Management
Console, and confirms that the server is responding.
|
|
Metadata Time in Calls
per Minute
|
Triggered if the time
taken by calls to the SAS Metadata Server exceeds 300% of the baseline
value of calls to the server. This alert might be an indication of
slow performance.
|
|
Metadata User Lockout
|
Triggered if the message
“locked out due to excessive log on failures” appears
in the SAS Metadata Server log.
|
|
SAS OLAP Server 9.4
|
OLAP - Availability
|
Triggered if the availability
of the OLAP server falls below 100%.
|
OLAP Server ERROR message
in log
|
Triggered if an error
message appears in the OLAP server log.
|
|
OLAP Server Health %
< 100
|
Triggered if the health
of the OLAP server falls below 100%. This metric is the equivalent
of the Validate command in SAS Management
Console, and confirms that the server is responding.
|
|
OLAP Server User Lockout
|
Triggered if the message
“locked out due to excessive log on failures” appears
in the OLAP server log.
|
|
SAS Object Spawner 9.4
|
Object Spawner ERROR
message in log
|
Triggered if an error
message appears in the SAS Object Spawner log.
|
Object Spawner User
Lockout
|
Triggered if the message
“locked out due to excessive log on failures” appears
in the SAS Object Spawner log.
|
|
Object Spawner - Availability
|
Triggered if the availability
of the SAS Object Spawner falls below 100%.
|
|
Object Spawner Failed
Connections
|
Triggered if the SAS
Object Spawner fails to spawn a server.
|
|
Object Spawner Major
(page) Faults
|
Triggered if the number
of page faults that require disk activity is above 10% of the baseline
value of total page faults. This alert might indicate a memory constraint
that is causing slow performance.
|
|
Object Spawner Server
Health % < 100
|
Triggered if the health
of the SAS Object Spawner falls below 100%. This metric is the equivalent
of the Validate command in SAS Management
Console, and confirms that the server is responding.
|
|
SAS SMP LASR Server
|
LASR SMP Major (page)
Faults
|
Triggered if the number
of page faults that require disk activity is above 10% of the baseline
value of total page faults. This alert might indicate a memory constraint
that is causing slow performance.
|
SMP LASR - Availability
|
Triggered if the availability
of the SAS LASR Analytic Server falls below 100%.
|
|
SAS System Info
|
EMI Event Log Alert
|
This alert is a template
for detecting a string in the EMI Events log (sasev.events). This
log contains messages generated by the SAS Environment Manager application.
Through the use of macros, you can also write log messages from SAS
applications to this log.
To have the alert trigger
when a specific string appears in the log, edit the alert and replace
the string “match this text” with the string that you
want to use.
|
SpringSource tc Runtime
6.0
|
Deadlocks Detected
|
Triggered if a deadlock
is detected. A deadlock occurs when multiple actions are waiting for
the other to complete, so none of the actions ever finish.
|
Excessive Time Spent
in Garbage Collection
|
Triggered if the amount
of time spent in garbage collection exceeds 40% of the total process
time.
|
|
SpringSource tc Runtime
7.0
|
Deadlocks Detected
|
Triggered if a deadlock
is detected. A deadlock occurs when multiple actions are waiting for
the other to complete, so none of the actions ever finish.
|
Excessive Time Spent
in Garbage Collection
|
Triggered if the amount
of time spent in garbage collection exceeds 40% of the total process
time.
|
|
Webapp CPU Time in Garbage
Collection >30%
|
Triggered if the amount
of time spent in garbage collection exceeds 30% of the total process
time.
|
|
Webapp Heap Free Memory
< 5% of Max
|
Triggered if the free
JVM heap memory falls below 5% of the total memory. It is recommended
that you have a minimum of 400MB of free heap space (calculated after
garbage collection).
|
|
vFabric Web Server 5.2
|
Web Server: Apache Idle
Workers <20
|
Triggered if the number
of available idle workers falls below 20% of the maximum number of
workers. If no idle workers are available, a service interrupt occurs.
|
Resource
|
Alert name
|
Description
|
---|---|---|
FileServer Mount
|
File Mount Use Pct
|
Triggered if the percentage
of space used on the file mount exceeds 95%.
|
HTTP
|
HTTP Response Server
Error Code => 500
|
Triggered if the response
code for an HTTP service ping is greater than 500, indicating an error.
Possible response codes are:
500
Unexpected Error
501
Does Not Support
502
Overload
503
Gateway Timeout
|
Network Server Interface
|
NetIF Rcv Dropped
|
Triggered if the number
of dropped network receive packets exceeds 20% of the baseline value
of total network receive packets. This alert requires that the metrics
be monitored long enough to establish a baseline value of network
packets for your system.
|
NetIF Rcv Errors
|
Triggered if the number
of network receive errors exceeds 20% of the baseline value of total
network attempts. This alert requires that the metrics be monitored
long enough to establish a baseline value of network traffic for your
system.
|
|
NetIF Tx Collisions
|
Triggered if the number
of network interface transmit collisions exceeds 20% of the baseline
value of total network attempts. This alert requires that the metrics
be monitored long enough to establish a baseline value of network
traffic for your system.
|
|
HQ Agent
|
HQ Agent ERROR message
in log
|
Triggered if an error
message appears in the HQ agent log.
|
HQ Agent Memory
|
Triggered if the JVM
free memory for the HQ agent falls below 14.3 MB.
|
|
HQ Time Agent Spends
Fetching Metrics
|
Triggered if the time
that the HQ agent spends collecting metric data exceeds five seconds
per minute. This alert might indicate an overloaded agent or a problem
with the scheduling thread. These problems might be present even with
values for this metric greater than 3 or 4 seconds per minute.
|
|
PostgreSQL 9.x
|
PostgreSQL 9.x - Availability
|
Triggered if the availability
of PostgreSQL falls below 100%.
|
pg: Buffer Hits % <50%
of Max
|
Triggered if the number
of buffer hits is less than 50% of the total block read requests.
(A buffer hit is a block read request that is avoided because the
block is in the buffer cache). This alert might indicate that more
system memory is needed or that you need to adjust the shared buffers.
|
|
pg: Commits per Second
>20
|
Triggered if the number
of commits to PostgreSQL is greater than 20 per second. This alert
indicates that you might need to provide a durable write cache to
prevent potential data loss.
|
|
pg: Connection Usage
>80% of Max
|
Triggered if the number
of connections used is greater than 80% of the maximum number allowed.
This alert indicates that you might need to increase the maximum number
of available connections in order to prevent denial of service.
|
|
pg: Memory Size changed
|
Triggered if the memory
used by PostgreSQL falls below 90% of the baseline value. If this
condition is met, the alert is triggered once every 12 hours.
|
|
SAS Environment Manager
Data Mart 9.4 ACM ETL Processing
|
Data Mart ACM ETL
|
Triggered if the availability
of the ACM ETL process falls below 100%.
|
SAS Environment Manager
Data Mart 9.4 APM ETL Processing
|
Data Mart APM ETL
|
Triggered if the availability
of the APM ETL process falls below 100%.
|
SAS Environment Manager
Data Mart 9.4 Kits ETL Processing
|
Data Mart Kits ETL
|
Triggered if the availability
of the kits ETL process falls below 100%.
|
SAS Home Directory 9.4
SAS Directory
|
SASWork Disk Use % >
70
|
Triggered if the volume
that contains the SASWork directory is more than 70% full.
|
SASWork Disk Use % >
95
|
Triggered if the volume
that contains the SASWork directory is more than 95% full.
|
|
SAS Object Spawner 9.4
SAS Logical Pooled Workspace Server
|
Logical Pooled Workspace
Server Timed Out Clients
|
Triggered if there are
any failed connections between the logical pooled workspace server
and applications that are trying to connect to the server.
|
Logical Pooled Workspace
Server Unauthorized Accesses
|
Triggered if there are
any unauthorized accesses to the logical pooled workspace server.
|
|
SAS Object Spawner 9.4
SAS Logical Stored Process Server
|
Logical Stored Process
Server Timed Out Clients
|
Triggered if there are
any failed connections between the logical stored process server and
applications that are trying to connect to the server.
|
Logical Stored Process
Server Unauthorized Accesses
|
Triggered if there are
any unauthorized accesses to the logical stored process server.
|
|
SAS Object Spawner 9.4
SAS Logical Workspace Server
|
Logical Workspace Server
Unauthorized Accesses
|
Triggered if there are
any unauthorized accesses to the logical workspace server.
|
SAS Object Spawner 9.4
SAS Pooled Workspace Server
|
Pooled Workspace Server
ERROR message in log
|
Triggered if an error
message appears in the pooled workspace server log.
|
SAS Object Spawner 9.4
SAS Stored Process Server
|
Stored Process Server
ERROR message in log
|
Triggered if an error
message appears in the stored process server log.
|
SAS Object Spawner 9.4
SAS Workspace Server
|
Workspace Server ERROR
message in log
|
Triggered if an error
message appears in the workspace server log.
|
Spring Insight Application
|
Application error rate
is high
|
Triggered if the application
error rate for the past five minutes exceeds 10%.
|
SpringSource tc Runtime
6.0 Thread Diagnostics Context
|
Slow or Failed Request
|
Triggered if a record
is written to the log for the service. This alert indicates that a
request is taking too long or has failed.
|
SpringSource tc Runtime
6.0 Thread Diagnostics Engine
|
Slow or Failed Request
|
Triggered if a request
is taking too long or has failed, which is indicated by an entry appearing
in the service’s log.
|
SpringSource tc Runtime
6.0 Thread Diagnostics Host
|
Slow or Failed Request
|
Triggered if a record
is written to the log for the service. This alert indicates that a
request is taking too long or has failed..
|
SpringSource tc Runtime
6.0 Tomcat JDBC Connection Pool Context
|
JDBC Connection Abandoned
|
Triggered if a JDBC
connection was abandoned, identified by a “CONNECTION ABANDONED”
entry in the log.
|
JDBC Connection Failed
|
Triggered if a JDBC
connection failed, identified by a “CONNECTION FAILED”
entry in the log.
|
|
JDBC Query Failed
|
Triggered if a JDBC
query failed, identified by a “FAILED QUERY” entry in
the log.
|
|
Slow JDBC Query
|
Triggered if some JDBC
queries are taking a long time to execute, identified by a “SLOW
QUERY” entry in the log.
|
|
SpringSource tc Runtime
6.0 Tomcat JDBC Connection Pool Global
|
JDBC Connection Abandoned
|
Triggered if a JDBC
connection was abandoned, identified by a “CONNECTION ABANDONED”
entry in the log.
|
JDBC Connection Failed
|
Triggered if a JDBC
query failed, identified by a “FAILED QUERY” entry in
the log.
|
|
JDBC Query Failed
|
Triggered if a JDBC
query failed, identified by a “FAILED QUERY” entry in
the log.
|
|
Slow JDBC Query
|
Triggered if some JDBC
queries are taking a long time to execute, identified by a “SLOW
QUERY” entry in the log.
|
|
SpringSource tc Runtime
7.0 Executor
|
Webapp Active Thread
Count >250
|
Triggered if the number
of active threads exceeds 250, which indicates heavy use. You can
add additional servers to provide load balancing.
The maximum number of
threads allowed is 300, and the minimum is 50. If the number of active
threads exceeds 300, the thread queue resets to 100, and then additional
threads are refused.
|
SpringSource tc Runtime
7.0 Manager
|
Webapp Manager Rejected
Sessions
|
Triggered if the number
of rejected sessions exceeds 10% of the baseline number of sessions.
|
SpringSource tc Runtime
7.0 Thread Diagnostics Context
|
Slow or Failed Request
|
Triggered if a record
is written to the log for the service. This alert indicates that a
request is taking too long or has failed.
|
SpringSource tc Runtime
7.0 Thread Diagnostics Engine
|
Slow or Failed Request
|
Triggered if a record
is written to the log for the service. This alert indicates that a
request is taking too long or has failed.
|
SpringSource tc Runtime
7.0 Thread Diagnostics Host
|
Slow or Failed Request
|
Triggered if a record
is written to the log for the service. This alert indicates that a
request is taking too long or has failed.
|
SpringSource tc Runtime
7.0 Tomcat JDBC Connection Pool Context
|
JDBC Connection Abandoned
|
Triggered if a JDBC
connection was abandoned, identified by a “CONNECTION ABANDONED”
entry in the log.
|
JDBC Connection Failed
|
Triggered if a JDBC
connection failed, identified by a “CONNECTION FAILED”
entry in the log.
|
|
JDBC Query Failed
|
Triggered if a JDBC
query failed, identified by a “FAILED QUERY” entry in
the log.
|
|
Slow JDBC Query
|
Triggered if some JDBC
queries are taking a long time to execute, identified by a “SLOW
QUERY” entry in the log.
|
|
SpringSource tc Runtime
7.0 Tomcat JDBC Connection Pool Global
|
JDBC Connection Abandoned
|
Triggered if a JDBC
connection was abandoned, identified by a “CONNECTION ABANDONED”
entry in the log.
|
JDBC Connection Failed
|
Triggered if a JDBC
connection failed, identified by a “CONNECTION FAILED”
entry in the log.
|
|
JDBC Query Failed
|
Triggered if a JDBC
query failed, identified by a “FAILED QUERY” entry in
the log.
|
|
Slow JDBC Query
|
Triggered if some JDBC
queries are taking a long time to execute, identified by a “SLOW
QUERY” entry in the log.
|
|
Application health is
degrading
|
Triggered if the application
health metric (measured over the past five minutes) falls below 85%.
|