vmware probe monitoring 5000+ VMs. Monitoring template applied is monitoring 4 metrics and generating ToT alarms for them. The alarm_enricnment probe stops publishing these alarms after the first or second monitoring interval even though the QoS metric is over the configured threshold for the configured ToT amount of time.
When new VMs are added or deleted the vmware probe publishes ToT Configure Rule messages for all monitored VMs. The alarm enrichment was purging the ToT tracking information for all QoS metrics being monitored. The loss of the tracking information prevented the probe from generating the ToT alarms. It was also discovered that the baseline_engine probe was not handling uncaught Out of Memory silent exceptions.