MCS troubleshooting

Document ID : KB000072506
Last Modified Date : 13/08/2018
Show Technical Document Details
Question:
 
  • How do I configure the mon_config_service probe to fail over to my fail over hub?
  • Why am I getting duplicate alarms?
  • When I click on a Group or a Device in USM and then select the Monitoring Tab how come I can't see my profile template?
  • Why did my new or updated Device Profile end up in an error state?
  • Why did one or more of my Group Device Profiles become full fledged Device Profiles when I deleted my Group Profile?
  • How do I modify the MCS profile deployment behavior?
  • How do I enable the REST API WADL?
Answer:

How do I configure the mon_config_service probe to fail over to my fail over hub?

Assumptions:

  1. You have already installed UIM on your fail over hub.
  2. You have already installed the HA or "Nimsoft High Availability" Probe on your fail over hub and the HA probe has been already been configured to fail over the other required probes to support fail over.
  3. The data_engine probe has been configured for fail over on your fail over hub.
  4. The automated_deployment_engine has been configured for fail over on your fail over hub.

Process:

  1. Perform the steps outlined in the Set Up High Availability for the Primary Hub.
    1. For additional information see the HA probe documentation.
  2. Next ensure the same version of the mon_config_service probe (or selfservice_cm probe) that is installed on your primary hub on is also installed on the failover hub.
    1. Note that this should not be required for the mon_config_service probe because the mon_config_service probe is installed by UIM on UIM versions greater than 8.40.
  3. Ensure that the mon_config_service probe (or selfservice_cm probe) is deactivated.
  4. Copy the mon_config_service.cfg (or the selfservice_cm.cfg) file from the primary hub mon_config_service probe (or selfservice_cm probe) instance to the <UIM Install Root>/Nimsoft/probes/service/mon_config_service (or the <UIM Install Root>/Nimsoft/probes/application/selfservice_cm) directory.
  5. Deactivate the primary hub mon_config_service probe (or the selfservice_cm probe).
  6. Activate the fail over hub mon_config_service probe (or selfservice_cm probe). 
  7. Ensure that the mon_config_service probe is running on the on the fail over hub.
    1. Note this step should be performed after every upgrade of the mon_config_service probe or (selfservice_cm probe) in order to allow any required UIM database migrations to be performed before a fail over scenario has been encountered in the production environment.
  8. Deactivate the fail over mon_config_service probe (or the selfservice_cm probe).
  9. Activate the primary hub mon_config_service probe (or the selfservice_cm probe).
  10. Add the mon_config_service probe (or selfservice_cm probe) to the HA probe configuration on the fail over hub.
    1. Using the Infrastructure Manager or Admin Console edit the HA probe raw configuration add the mon_config_service probe or (selfservice_cm probe) to the probes_up configuration section.
    2. Add the mon_config_service probe or (selfservice_cm) probe using be creating a new key in the probes_up section whose key is "probe_<integer>" and whose value is mon_config_service (or selfservice_cm). 
      1. The <integer> in the key is a place holder for the integer that you must select. The selected integer should be unique and greater than the integer used for the data_engine and automated_deployment_engine probes.

 Additional Considerations:

  1. Once you have the mon_config_service probe (or selfservice_cm probe) configured for fail over, you should be prepared for a potential fail over event; however there may be newer or patched releases of the mon_config_service probe (or selfservice_cm probe)  that you wish to install.  If you decide install a newer version the mon_config_service probe (or selfservice_cm probe) on your primary hub, you must also remember to install the same version of the mon_config_service (or selfservice_cm probe) on the fail over hub as well.  This can be done using steps 2-9 from the instructions above.
  2. If the mon_config_service probe (or selfservice_cm probe) fails to start after fail over use the Infrastructure Manager or Admin Console targeting the fail over hub and deactivate the mon_config_service probe (or selfservice_cm probe) and then activate the probe. 
  3. If the mon_config_service probe (or selfservice_cm probe) fails to start after fail back use the Infrastructure Manager or Admin Console targeting the primare hub and deactivate the mon_config_service probe (or selfservice_cm probe) and then activate the probe.

Why am I getting duplicate alarms?

  1. If you have multiple profiles with same probe template on a shared device within different groups or the same group and the profiles have some unique configurations overlapping, then you can get duplicate alarms from different profiles.
  2. For example: Disk(s) Alarms Profile1 with mount point '/apps/oracle' and Profile2 with mount point '/apps/'  can both produce an alarm on /apps/oracle disk
  3. If Unix file system is specified with regular expressions like /apps/oracle/.* in the Disk(s) Alarm template, it will result in the probe monitoring ALL unique disks mounted below the /apps/oracle/ file system path. So profiles defined as /apps/oracle/.*  or (/apps/oracle/.*/d/disk1 and /apps/oracle/.*/d/disk2) will end up with duplicate alarms because the disk1 and disk2 file systems are mounted below /apps/oracle/ file system path.  

General 

When I click on a Group or a Device in USM and then select the Monitoring Tab how come I can't see my profile template?

  1. Have you installed the mon_config_service_templates package?

    1. To check connect to your database and execute the sql statement

      select from ssrv2template
  2. For the profile template in question is the <packagetemplates> element defined?
    1. To check connect to your database and execute the sql statement

      select from ssrv2packagetemplate
      where template = (select templateId from SSRV2Template where templateName = '<your template name>' and production = 1
      1. If the result of the above query DOES NOT return a single record 

        1. Add the following to your template definition.

          <packagetemplates>
             <packageName>Server</packageName>
             <!-- If your probe monitors remote systems then set the nimbus_type value to 0 -->
             <nimbus_type>1</nimbus_type>
             <!-- Does your probe work on multiple operating systems then don't specify os_type -->
             <os_type>Windows</os_type>
             <defaultTemplate>false</defaultTemplate>
           </packagetemplates>
        2. Be sure to the root template and all child template <version> elements to ensure the template will be imported.
      2. If the result of the above query returns a single record
        1. Is the value of the os_type field set and does it match either the os_type of the Device you have selected or the os_type of the Device selected as the Device Type for the selected Group?
          1. To check the Device connect to your database and execute the sql statement

            select from ssrv2packagetemplate
            where template = (select templateId from SSRV2Template where templateName = '<your template name>' and production = 1) and
                  os_type = (select os_type from cm_computer_system where caption = '<name as seen in usm>' or name '<name as seen in usm>' or ip = '<devices ip address>')
          2. To check the Group connect to your database and execute the sql statement

            select from ssrv2packagetemplate
            where template = (select templateId from SSRV2Template where templateName = '<your template name>' and production = 1) and
                  os_type = (select os_type from cm_computer_system where cs_id = (select model_device from SSRV2DeviceGroup where name '<selected group name>'))
        2. Is the value of the nimbus_type field set and is it greater than or equal to the nimbus_type of the Device you have selected or the nimbus_type of the Device selected as the Device Type for the selected Group?
          1. To check the Device connect to your database and execute the sql statment

            select from ssrv2packagetemplate
            where template = (select templateId from SSRV2Template where templateName = '<your template name>' and production = 1) and
                  nimbus_type >= (select nimbus_type from cm_computer_system where caption = '<name as seen in usm>' or name '<name as seen in usm>' or ip = '<device's ip address>')
          2. To check the Group connect to your database and execute the sql statement

            select from ssrv2packagetemplate
            where template = (select templateId from SSRV2Template where templateName = '<your template name>' and production = 1) and
                  nimbus_type >= (select nimbus_type from cm_computer_system where cs_id = (select model_device from SSRV2DeviceGroup where name '<selected group name>'))
        3. Has the Device you have selected or the Device selected as the Device type for the selected Group been assigned the value of the <packageName> element for the profile template in question?
          1. To check the Device connect to your database and execute the sql statment

            select from ssrv2packagetemplate pt
            left join SSRV2Template t on t.templateId = pt.template
            left join SSRV2DevicePackage dp on dp.package = pt.package
            left join CM_COMPUTER_SYSTEM cs on cs.cs_id = dp.cs_id
            where t.templateName = '<your template name>' and  t.production = 1 and (cs.caption = '<name as seen in usm>' or cs.name '<name as seen in usm>' or cs.ip = '<device's ip address>')
          2. To check the Group connect to your database and execute the sql statement

            select from ssrv2packagetemplate pt
            left join SSRV2Template t on t.templateId = pt.template
            left join SSRV2DevicePackage dp on dp.package = pt.package
            left join CM_COMPUTER_SYSTEM cs on cs.cs_id = dp.cs_id
            left join SSRV2DeviceGroup dg on dg.model_device = cs.cs_id
            where t.templateName = '<your template name>' and  t.production = 1 and dg.name '<selected group name>'

Profile Deployment

Why did my new or updated Device Profile end up in an error state?

When MCS attempts to deploy a Group Profile:

  1. A Device Profile is created for each device in the group.  
  2. MCS attempts to deploy all of the Device Profiles.  
    1. If a device is unavailable or unreachable while MCS is attempting to deploy the associated Device Profile
      1. MCS will keep track of the deployment attempt.  
      2. If after 30 attempts MCS is still unable to deploy the Device Profile
        1. The Device Profile state is set to 'error' and MCS abandons deployment of this profile.
        2. If no modifications are made to the Group Profile associated with the Device Profile
          1. The profile will be deleted (without configuration removal) after the robot on the device has been active for 7 days.

 

Why did one or more of my Group Device Profiles become full fledged Device Profiles when I deleted my Group Profile?

When MCS attempts to delete a Group Profile:

  1. All associated Group Device Profiles are marked for deletion.
  2. MCS attempts to delete all of the Group Device Profiles.
    1. If a device is unavailable or unreachable while MCS is attempting to delete the associated Group Device Profile
      1. MCS will detach the Group Device Profile from its Group Profile and create the equivalent full fledged Device Profile
      2. MCS will attempt to delete the Device Profile if the device becomes available
        1. MCS keeps track of each deletion attempt.
        2. If after 30 attempts MCS is still unable to delete the Device Profile
          1. The Device Profile state is set to 'error' and MCS abandons the deletion of the Device Profile.
          2. If no modifications are made to the Device Profile
            1. The profile will be deleted (without configuration removal) after the robot on the device has been active for 7 days.

How do I modify the MCS profile deployment behavior?

  1. Edit the <UIM Install Root>/probes/service/mon_config_service/mon_config_service.cfg file.
    1. To increase the number of deployment attempts
      1. Increase the /timed/max_retries configuration parameter
    2. To increase the amount of time between deployment attempts
      1. Increase the /timed/retry_multiplier configuration parameter
    3. To allow users more time to update Group Profiles before their Device Profiles are deleted
      1. Increase the /timed/retry_multiplier configuration parameter
  2. Restart the mon_config_service probe.
     

How do I enable the REST API WADL?

Edit the <UIM Install Root>/probes/service/wasp/webapps/mcsws/WEB-INF/web.xml, and change the <param-value> from true to false
 

<!-- Disable WADL generation. Use Swagger API docs and swagger.json instead. -->
<init-param>
    <param-name>jersey.config.server.wadl.disableWadl</param-name>
    <param-value>true</param-value>
</init-param>


After modifying the web.xml file the WADL will be available at <UMP_URL>/mcsws/application.wadl.