After applying 10.7 SP1 my Alerts and Management Modules are not loading

Document ID : KB000100914
Last Modified Date : 15/06/2018
Show Technical Document Details
Issue:
In APM 10.7 SP1, because "Periods Over Threshold” and “Observed Periods” in the alert definition in Management modules can have very large numbers entered so for example 999999999 which could lead to OOM exceptions (java.lang.OutOfMemoryError: Java heap space) the clamp was introduced made that is not allowing the loading of any existing Alerts defined with "at least x out y" where y > 20, which is 5 minutes.  This is due to the following changes introduced in DE353429: Introducing a limit for alert periods

We introduced a new property called "introscope.enterprisemanager.alerts.maxPeriods" in IntroscopeEnterpriseManager.properties. Any Alert Danger/Caution period larger than maxPeriods will be considered invalid. 

The following message is seen in the log.

[WARN] [main] [Manager.Bootstrap] Error in loading Management Module "NOC TI"
com.wily.introscope.spec.server.descriptor.ESEDescriptorException: The Dashboard "MyDashboard" references an unknown Dashboard.
 
What this means is one of the referenced DashBoards was not loaded properly as evidenced by the log messages below.

[ERROR] [main] [Manager.ManagementModule] Unable to load the Alert named "MyAlert" in Management Module "MyManagementModule" - 'The Alert "MyAlert" cannot have danger periods of at least 4 out of 40 or caution periods of at least 4 out of 40.'. 
[WARN] [main] [Manager.Bootstrap] Error in linking Management Module "MyManagementModule"

 
Environment:
APM 10.7 SP1
Cause:
DE353429: Introducing a limit for alert periods
We've introduced a new property called "introscope.enterprisemanager.alerts.maxPeriods". Any Alert Danger/Caution period larger than maxPeriods will be considered invalid. The default value is 20. 40>20, so it fails.
Resolution:
The root cause of this issue creates a chain reaction.  One Alert will fail to load causing other Elements referencing this Alert to also fail to load, and so on and so forth. This can cause other Alerts and Management Modules to be impacted.
 
The workaround, after applying SP1 and before starting up your EM, is to set maxPeriods to be a big enough number to cover all existing Alerts. You may want to consider setting it to 200 or 2000.

You must add the property introscope.enterprisemanager.alerts.maxPeriods to the IntroscopeEnterpriseManager.properties file.
 
Additional Information:
A script was written by CA Services to obtain the value needed for maxPeriods prior to the upgrade of SP1.

You can download the script (check_mm.sh) from here:  https://github.com/CA-APM/ca-apm-automation
  1. Set EM_PATH in environment.properties
  2. Make sure JAVA_HOME is set and “jar” is available
  3. Run ./check_mm.sh.
  4. The script prints what it is doing:
    1. check all MM in EM_PATH
    2. print the maximum period duration encountered and
    3. write all alerts that exceed introscope.enterprisemanager.alerts.maxPeriods (or 20) to checkmm.csv.