How Do I Troubleshooting Missing Data

Document ID : KB000045476
Last Modified Date : 14/02/2018
Show Technical Document Details

Question:

How can I troubleshoot missing data in USM?

 

Answer:

1)   Database

The first thing to do is to query the database directly to see if the data exists.  This can be accomplished via the SLM portlet or the SQL interface on your database server.

 

There are two tables that should be checked; S_QOS_DATA and the corresponding RN table(s). The first query checks the S_QOS_DATA table to ensure that we are receiving and processing QOS_DEFINITION messages.

SELECT * FROM S_QOS_DATA WHERE probe = '<probe name>';

If no data was returned, jump to TOPIC 2 and 3 below for troubleshooting hub queues and data_engine.

 

You will want to note the table_id and r_table fields from the above query for the second query:

SELECT * FROM '<r_table>' WHERE table_id = <table_id> ORDER BY sampletime DESC;

If there is no recent data, jump to TOPIC 2 and 3 below.

If there is current data, go to TOPIC 4.

 

2) Hub queues

The parent hub of the robot needs to have an “ATTACH” queue that, at minimum, listens for QOS_MESSAGE,QOS_DEFINITION messages.

The hub that retrieves data from that hub also needs a queue as stated above unless that hub is the primary hub, at which point the data_engine probe creates its own listening queue.

If the hubs are not configured with the queues, then those queues need to be created along with the corresponding “GET” queues and the probe needs to be restarted.

Here is the documentation on hub queues:

https://docops.ca.com/ca-unified-infrastructure-management-probes/en/alphabetical-probe-articles/hub/hub-versions-7-8-7-6/v7-8-hub-im-configuration#v7.8HubIMConfiguration-CreateHub-to-HubQueues#SettingupQueues

 

3) data_engine

Provided that the queues are set up correctly, the next place to inspect is data_engine.  data_engine has two jobs related to this topic.  One job is to prepare the schema by setting up the entries in S_QOS_DEFINITION and S_QOS_DATA.  These are created by QOS_DEFINITION messages generated by the probe on startup.

 

If you didn't see the entries in the S_QOS_DATA table in the query above then you'll see errors in the data_engine log when you restart the problem probe.  You'll want the data_engine log at level 3 to catch these.

 

If you have already seen data in the S_QOS_DATA table then QOS_DEFINITION messages are being processed and set up correctly.  In that case, there may be a problem with the QOS definition that won’t allow it to save the monitored data.  We sometimes see issues when the definition has been set up with a hasmax value but the probe isn’t sending data with a max value.  Again, this will be logged in the data_engine log.  The steps to fix depend on the situation and a support ticket is probably the best way to approach this.

 

4) UMP

Usually if it is showing up in the database, then it should be showing up in Performance Reports Designer (PRD) as well as PRDs are very aligned with the S_QOS_DATA table.  It’s a good idea to double check a PRD to make sure it will graph your data, however we most often see problems in USM.

 

If the issue is that it's not seen in USM, then there could be three problems:

a) The device doesn't exist in inventory

If the device doesn't exist in inventory, then it could be a failure on discovery_server's part.  There are a few reasons why this might happen.  Depending on the probe architecture, it could be a queue issue or it could be an inability of the discovery_server to contact the robot that the probe is installed on.

Probes that rely on discovery queues to publish inventory are

cisco_ucs

claarion

cm_data_import

discovery_agent

hyperv

ibmvm

icmp

salesforce

snmpcollector

vmware

xenserver

In this case, it is necessary to ensure that there are discovery queues in place to pass the discovery messages up to your primary hub.  The parent hub of the robot needs to have an “ATTACH” queue that listens for probe_discovery messages.

The hub that retrieves data from that hub also needs a queue as stated above unless that hub is the primary hub, at which point the discovery_server creates its own listening queue.

If the hubs are not configured with the queues, then those queues need to be created along with the corresponding “GET” queues and the probe needs to be restarted.

Here is the documentation on hub queues:

https://docops.ca.com/ca-unified-infrastructure-management-probes/en/alphabetical-probe-articles/hub/hub-versions-7-8-7-6/v7-8-hub-im-configuration#v7.8HubIMConfiguration-CreateHub-to-HubQueues#SettingupQueues

 

If the problem probe is not one of the above it could be a failure of discovery_server to be able to contact the robot.  If you restart discovery_server and watch the logs on level 5, you'll see discovery_server reporting problems contacting the robot.  This is unusual, but could be caused by a firewall blocking communications.

 

b) There are correlation problems with devices in inventory and the data is matched to an unexpected entry or the data is attached to an unexpected device.

There are many tables that rely on JOIN statements to form a complete chain from CM_COMPUTER_SYSTEM to S_QOS_DATA.  This will verify that this chain is complete.

Log back into the database to run some queries

SELECT * FROM S_QOS_DATA WHERE probe = '<probe name>';

Choose one of those results and copy the ci_metric_id value.  Then run the following query.

SELECT * FROM CM_CONFIGURATION_ITEM_METRIC WHERE ci_metric_id = '<ci_metric_id>';

If data is not returned, jump down to TOPIC C

If data is returned, take the ci_id value from the returning record and run

SELECT * FROM CM_CONFIGURATION_ITEM WHERE ci_id = '<ci_id>';

Then take the dev_id from the returning record and run

SELECT * FROM CM_DEVICE WHERE dev_id = '<dev_id>';

Then take the cs_id from the returned record and run

SELECT * FROM CM_COMPUTER_SYSTEM WHERE cs_id = '<cs_id>';

This will return the entry in USM that you will find the QOS data listed under.  Sometimes, it is not the device you are expecting.

 

c) There is a ci_metric_id mismatch

ci_metric_id mismatches can be figured out fairly quickly.  The first step is to go to the robot, clear out the niscache folder and restart the robot.  This ensures that we don't have an old robot device ID, which all metric IDs are ultimately based on.  This commonly happens on cloned VMs that already have a robot installed on them.

Then pull up DrNimbus and watch for any QOS_MESSAGE from the target probe.  When you see a message from that probe, click on it.  Look for a field called met_id.  You’ll need to manually type the met_id into your query below as DrNimbus does not allow copy/paste.

SELECT * FROM S_QOS_DATA WHERE ci_metric_id = '<met_id>';

If this query doesn't return data, then you need to

UPDATE S_QOS_DATA SET ci_metric_id = NULL WHERE probe = '<probe name>';

Then restart data_engine and wait for the probe to send metrics again.  Check USM and see if your data shows up.

 

If you still don’t have data showing up after resetting the ci_metric_id, then it’s time to examine the discovery_server side of things.

SELECT * FROM CM_CONFIGURATION_ITEM_METRIC WHERE ci_metric_id = '<met_id>';

If this query doesn't return data, then it's time to start checking discovery_server for errors in the logs related to the robot that homes that probe and could be due to issues discussed in TOPIC A.