Missing Robots and/or Robot metrics in USM
- NMS and UMP versions 6.x.
First examine the dashboard_engine using Raw Configure mode, and make sure that your dashboard_engine is using the nis_server.
Under the 'data' section 'use_nis_server' should be set to 1.
The dashboard_engine MUST be configured to use the nis_server.
When you define the groups in USM the nis_server manages them -- NOT the group_server.
Note that the group_server table is NOT used by NMS Discovery and USM relies quite heavily on Discovery.
Note also that the group_server is at end-of-life and is no longer supported. If the group_server is running then deactivate it.
The nis_server and the group_server were not designed to be used/running at the same time and this configuration is not supported.
Note that the cm_nimbus_robot table most likely will not have entries for the Robots that are missing in USM.
You can check the robot table by running this query:? select * from cm_nimbus_robot
Or if it?s related to a specific origin or robot with specific user tags you can use an argument like
where user_tag_1 = 'xyz'
where origin ='ABC'
The most likely cause for missing robots in USM is duplicate device id's. Cloning VM's has been identified as the cause of duplicate dev ids for robots.
Customers like to have VM templates available with robots already on them, and if the niscache is already populated with entries,
the robot controller will not notice this and/or automatically cleanup or re-create those files. Therefore, we recommend when cloning a VM, to delete the niscache contents first.
You can identify duplicate dev ids for Robots in the discovery_server.log at log level 3.
Here are two example entries from discovery_server log:
You may see a message like this:
Jan 30 11:23:13:898 WARN [hubWorker4, com.nimsoft.nimbus.probe.service.discoveryserver.nimbusscan.HubInfoFetcherDelegate] robot lcnwvahdgs1 / 10.64.<x>.<xxx> (status=0) has duplicate dev_id DB00C38BB3EA481CF292EDD9D9670DBBF
172.10.<xx>.<xxx> (status=0) has duplicate dev_id
Develop a list of robots showing dev_id duplication errors in the discovery_server log and make sure you have all of them documented.
You can run the discovery_server at loglevel 3, long enough to get a list of Robots that need the change to eliminate the duplicate dev_id's.
You may need to increase the logsize for the discovery_server so that you can see sufficient information.
You have to make your changes, restart the Robot and then check the log again to be sure you've identified them all.
You can also check against the Robots that should have been listed in the cm_nimbus_robot table in the first place.
Once you have a list of robots with duplicate device_ids to resolve the issue, RDP/login to each robot machine and delete the contents of the niscache directory, then restart the robot.
1. Login to each robot machine.
2. Stop the Nimsoft Robot Watcher service
3. Delete the contents of the niscache folder, for example in:? ...\Program Files (x86)\Nimsoft\niscache
4. Start the Nimsoft Robot Watcher service.
5. Restart the discovery_server probe
6. Restart the nis_server probe
7. Recheck the discovery_server log for duplicate dev_id errors and repeat if necessary.
USM is not displaying metrics/monitoring values for the given Robots
Additionally, if after resolving the issue of missing Robots, USM is not displaying the metrics for those robots,
then check the ci metric ID mappings between S_QOS_DATA and cm_configuration_item_metric tables.
If the metric IDs for a given metric in S_QOS_DATA do not have a corresponding row in cm_configuration_item_metric, they simply won't display in USM.
So... for those hosts, you can run a query like:
select s.ci_metric_id, cm.ci_metric_id from s_qos_data s left join cm_configuration_item_metric cm on s.ci_metric_id = cm.ci_metric_id
One will be the ci_metric_id from S_QOS_DATA, the other will either be the ci_metric_id from cm_configuration_item_metric (which is good), or a NULL (which means the ci_metric_id in S_QOS_DATA is bad and needs to be NULL'd out itself, then the data_engine restarted).
So check the rows/two resultant ci_metric_id columns. Any blanks for ci_metric_id in the second column listed in the query output are bad entries. You can then use those 'bad' ci_metric_id values and run:
update s_qos_data set ci_metric_id = NULL
Use one robot as a test, then restart the data_engine.
This will take some time but the mappings should be corrected after this - but please try it on a single robot first to be sure.