CPU and Memory Utilization alarms set and clear within seconds in CA Spectrum

Document ID : KB000021370
Last Modified Date : 14/02/2018
Show Technical Document Details

Issue:

When you monitor CPU or memory utilization in CA Spectrum, cpu and/or memory violation alarms may generate and clear in the same second, or generate and clear a few seconds later.

Cause:

The problem is that the poll interval and duration are set to be exactly the same interval.

Resolution:

 

The solution is to increase the Duration value to be greater than the device polling interval (by at least 5-10 seconds). This will allow more than enough time for CA Spectrum to poll and process the info prior to generating the alarm.

Additional Info:

When CA Spectrum polls a given host and determines that the CPU is above the threshold CA Spectrum will generate an event (0x10f07) then kick off a timer (duration attribute 0x12bce) and if the normal event (0x10f08) isn't received within the defined duration then an alarm will be generated. In this case the alarm is being generated as the poll is initiated. The poll is then finding that CPU is now below the threshold (normal) and the clear event (0x10f08) is generated within seconds of the alarm being generated.

The EventDisp entries define the process:

0x00010f07 R Aprisma.EventPairTimeAttr, 0x00010f08, "0x00010f09 -:-", 0x12bce

  • 0x10f07 is generated when the CPU threshold is crossed. If the reset event, 0x10f08 isn't received within the time specified in the Duration attribute (0x12bce) then generate the alarm event, 0x10f09

0x00010f09 E 50 A 2,0x00010f09,N

0x00010f08 E 50 C 0x00010f09

The Duration attribute, 0x12bce, is currently set to 300 seconds.

The actual event sequence can be seen below. The event indicating that CPU utilization is above the threshold is generated at 6:58:29 (0x10f07). At 7:03:29, 5 minutes later the alarm event (0x10f09) is generated. At the same second, but after the 0x10f09 event has been processed, the reset event (0x10f08) is generated, clearing the previously generated alarm.

  1. Event Time: Sep 19, 2011 6:58:29 AM PDT
    Model Name: host01.ca.com
    Event Message: High Aggregate CPU Utilization.
    The average CPU Utilization of 89% for all CPU instances exceeds the 85% threshold on model host01.ca.com
    Event Type: 0x10f07

  2. Severity: Major
    Event Time: Sep 19, 2011 7:03:29 AM PDT
    Clear Time: Sep 19, 2011 7:03:29 AM PDT
    Model Name: host01.ca.com
    Event Message: High Aggregate CPU Utilization.
    The average CPU Utilization of 89% for all CPU instances has exceeded the 85% threshold on model host01.ca.com for more than the acceptable time period.
    Event Type: 0x10f09

  3. Event Time: Sep 19, 2011 7:03:29 AM PDT
    Model Name: host01.ca.com
    Event Message: Normal Aggregate CPU Utilization.
    The average CPU Utilization for all CPU instances is now below the % reset threshold for model host01.ca.com
    Event Type: 0x10f08