Tuning the dashboard engine to minimize alarm synchorization issues

Document ID : KB000033579
Last Modified Date : 14/02/2018
Show Technical Document Details
First and foremost, it must be said that the AlarmConsole portlet in UMP is currently (As of the release of UMP 7.5) deprecated - one major reason for the deprecation of this portlet is that it is known to be unreliable and suffers from synchronization issues - it should be expected that the synchronization between AlarmConsole and IM/NAS will never be perfect, and customers will have a generally better experience using the Alarm Views in USM.

The following information may be useful for minimizing this problem but is not likely to eliminate it completely.

Alarm processing and display in the AlarmConsole portlet is the responsibility of the dashboard_engine probe. ? The dashboard_engine contains something called the Alarm Processor which can be tuned or adjusted to some extent for better performance.

The alarm processor is a thread that runs in the background of dashboard_engine and does the following:
-pulls alarms off the message bus in batches and puts them into an internal queue
-processes these batches of alarms from the internal queue to determine ACL/assignment/status/etc information for display in the UMP
-pushes the processed messages out for display

The following configuration keys (in the <data> section of dashboard_engine.cfg are relevant to this process:
alarm_processor_batch_delay controls how often the thread 'sleeps' in between batches of alarms. ?If this is set too low it can sometimes overwhelm the CPU/memory on the system, but if it is set too high it can cause a backup in alarms so that alarms will take longer to appear in the UMP. ?This is in milliseconds, and in general, it is better to set it fairly low, but if alarms are coming in faster than the batch processing can handle it, then you need to go a little higher. ?Start with a value of 100 and go from there.
alarm_processor_batch_size?controls the maximum size batch of alarms that is pulled off the internal queue at once. ?When this is set to 0, the batch size is determined dynamically - sometimes it is better to set this to 500 or 1000 for better performance. ?In theory you can't set this too high, but in practice you may be limited by cpu/memory on the machine, but you could set it as high as 10000 in most cases with no ill effects. ? ?Setting it lower than 100 is probably not a good idea.
alarm_processor_queue_capacity controls how many alarms get stored in the internal queue at once while they wait for processing - this basically helps prevent OutOfMemory exceptions. ?This usually doesn't need to be adjusted, but sometimes setting it higher will help alarms process more quickly or reliably. ?A rule of thumb is that every alarm message will take about 4k of memory, so if you set this to 100000 then you will use 400,000k (400mb) of memory to hold this queue. ?If you set this too high it will exceed the maximum heap size and generate OutOfMemory exceptions, so we don't usually recommend going over 100000 here. If it is set too low you will see log messages (at log level 2) indicating that the Alarm Queue is Full.

The trick here is to avoid this queue from becoming full by making sure the alarm processor is processing alarms fast enough - usually by setting the batch size higher, and delay lower (without going too high or too low.)