How to generate an alarm when a group of devices goes down using Service Manager and Condition Correlation Editor.

Document ID : KB000021210
Last Modified Date : 14/02/2018
Show Technical Document Details

Description:

If modeling is not done properly in Spectrum topology, fault isolation may not work properly and you may get many alarms when a link goes down.

One option here is to use Service Manager and Condition Correlation Editor to suppress the unwanted alarms and generate a single alarm.

The "Device has stopped responding to polls" alarm in used in the example below.

Solution:

Here are the steps which needs to be followed to get the right output:

  1. Create a Global Collection with group of devices.

    When one device goes down you need individual device down alarm on the devices.

    Now if all the devices in the group goes down you need only one alarm and the "Device Stopped" alarms needs to get suppressed or hidden on individual devices

    Use a combination of Service Manager and Condition Correlation Editor to achieve this.

  2. Created a Global Collection and add the required devices into it.

  3. Create a service with a condition low sensitivity rule with the resource monitors as devices in the Global Collection.

    Now when one device goes down Spectrum will populate individual device down alarms and when all the devices goes down you get alarm on the service model.

    In this case you still get individual device down alarms which should be hidden.

  4. Use condition correlation editor to suppress the individual device down alarms and assert a single service down alarm on the service.

Steps to be followed in Condition Correlation Editor :

  1. Create 2 conditions:

    Condition 1 : Use the Default contact lost condition as the first condition. This is nothing but the set/clear event codes of the Device down alarm(0x10d35).

    Condition 2: Create a new condition named Test3 with the set event : 0x4500006 and clear event : 0x4500007.

    Used the model handle as parameter (optional).

  2. Rule:

    Symptom condition : contact lost condition

    Relationship : Caused by

    Root cause condition : Service down condition (Test 3)

    Advanced Rule criteria : Nil

  3. Map the condition and Rule to a policy in Condition Correlation Editor and then map the policy to a domain containing this Global collection.

    Now when the service goes down the device down alarms are suppressed and you get only a single service down alarm on the service model. And when any device comes up in the Global Collection the service down alarm is cleared and the device down alarm (0x10d35) pops up for the rest of the models which are still down in this Global Collection.

    In this way you can avoid bulk alarms in OneClick console when a link or the upstream device goes down in the client network.