alarm_enrichment probe become unresponsive

Document ID : KB000004215
Last Modified Date : 10/08/2018
Show Technical Document Details
Issue:

Not getting new alarms generated while both nas probe and alarm_enrichment probe are up.


In the hub probe GUI - [Status] tab - [Subscribers/Queues] the number of messages with [Queued] status in "alarm_enrichment" queue continues to increase and is not draining or not draining fast enough.

Environment:
UIM 8.x
Cause:

In an environment with heavy alarm traffic, the alarm_enrichment probe become unresponsive to incoming alarms.

Resolution:

The problem is seen when the alarm_enrichment probe is close to its maximum allocated RAM, while a lot of incoming alarms keep coming in...Such a busy state causes the alarm_enrichment probe struggle with allocating internal resources to perform certain operations that are necessary to continue processing.

Please add the following keys in the main <setup> section of nas.cfg.

lower_memory_usage_threshold_percentage=0.90
upper_memory_usage_threshold_percentage=0.90
memory_usage_exceeded_threshold=1


This has the effect of causing alarm_enrichment probe to detect when it is using 90% of its allocated memory - and then restart itself.

Any alarms which are queued but have not yet been processed by alarm_enrichment will be held in the queue
and processed after the restart, so no alarms or enrichment will be lost.

This restart causes the probe to clear its memory, and re-subscribe to the queue,which seems to resolve this issue quite reliably, and allow alarms to continue processing after a momentary delay (10 seconds or so) caused by the restart.

If the key values recommended above do not resolve this issue, the following values for these key values are suggested be used instead:

lower_memory_usage_threshold_percentage = 0.80
upper_memory_usage_threshold_percentage = 0.80
memory_usage_exceeded_threshold = 3