Recently we experienced a UIM outage, the outage presented itself by a lack of updates being made to the existing alarms within the Infrastructure Management alarm console (ie:Time Received data not updated when polling cycles are expected to occur) and new alarms were not created.
The options for addressing a scenario where alarms are no longer being generated due to an issue within the UIM environment include:
1. Deploy a second instance of UIM to monitor the first one via the dirscan probe to see if the nas/AE queue files are growing/changing, or...
2. Use of a subscription service such as Runscope or AWS Lambda to externally monitor UIM Health, or
3. Follow the content in the attached KB Article titled 'Best Practices for Monitoring CA UIM (self-health monitoring)' and use logmon to monitor the nas and AE log files for fatal errors and the string, max restarts (at loglevel 1)