Alarms were not suppressed during maintenance period for hundreds of servers in a maintenance mode group used for windows patching.
- UIM 9.0.2
- maintenance_mode 9.0.2
- nas 9.06
Customer's maintenance mode script that uses the REST API was working and maintenance was applied during the time frame specified as we could tell from the grey-shadowed areas in the metrics/graphs being displayed in USM. So we turned our attention to the nas.
We updated the nas.cfg, and added the following parameter to the nas <setup> section:
maint_max_resp_time = 60
This setting can be added/configured to override the default timeout period.
Set the value to 60 so the nas will override the default period of 20 seconds for re-registration with the maintenance_mode probe. This setting fixes an issue where the nas intermittently allowed alarms through for devices that are in maintenance when the maintenance_mode probe fails to respond to the re-registration request from the nas within the default timeout period.
The nas re-registers with the maintenance_mode probe, dropping all devices currently in maintenance from memory, and if the request to maintenance_mode fails, no devices are in maintenance until the next time the nas attempts a re-registration with the maintenance_mode probe.
Then the customer tested the change by running their maintenance (python) script which uses the REST API. This successfully worked and no alarms were generated during the maintenance period.
We also changed the nas configuration because there was a loglevel parameter set. nas does not use 'loglevel' to set the log level, it uses the parameter named 'debug.' The side effect of this was the GUI was still set to 0 reflecting what the debug value was set to...We just deleted the bad key and set debug to 3 in the nas.cfg.