I can see QOS data coming in from my probe to the primary hub via Dr. Nimbus but I can not see it in SLM or in PRD.
Steps already done:
- re-started Data_Engine;
- re-started NAS;
- re-deployed probe
What can be causing the issue?
To find the solution we need to do the following:
1) increase the loglevel to 3 and logsize to 50000 on the data_engine
2) deactivate the data_engine clear the current logs
3) Activate the data_engine wait 5 minutes
4) deactivate and activate the problem probe
5) Activate the problem probe and wait for 5 minutes
In the data_engine logs we saw entries such as the following:
ug 18 16:39:52:238  de: ADO_QoSInsert::InsertQosObjectEx - exists already qos=QOS_CPU_USAGE, source=ComputerName, target=Total
Aug 18 16:39:52:239  de: QoSInsert::AddQoSObjectToMaps - table=886174 qos=QOS_CPU_USAGE source=ComputerName target=Total metric=MD82CBBD26C4661B3795896A4903BDBBA Complete: yes
Aug 18 16:39:52:239  de: QoSInsert::CreateQoSObjectDB - QoS object creation failed, unknown reason
Aug 18 16:39:52:258  de: HandleMessage: Insert Data - nimid: ZN95990931-01412 (2016-08-18 16:39:50) table_id: 0 value= 22275.00 time: 2016-08-18 16:39:50 qos=QOS_DISK_USAGE source=ComputerName target=D:\
Aug 18 16:39:52:259  de: ADO_QoSInsert::InsertQosObjectEx - exists already qos=QOS_DISK_USAGE_PERC, source=ComputerName, target=D:\
Aug 18 16:39:52:259  de: QoSInsert::AddQoSObjectToMaps - table=886187 qos=QOS_DISK_USAGE_PERC source=ComputerName target=D:\ metric=M8EB9F6BDCD43D9068B4C9C3CA1C7345E Complete: yes
Aug 18 16:39:52:259  de: QoSInsert::CreateQoSObjectDB - QoS object creation failed, unknown reason
Aug 18 16:39:52:259  de: QoSInsert::InsertData Failed to create data object in DB. id=-1 defid 2 qos=QOS_DISK_USAGE_PERC
This was tracked down to a corruption or problem in the S_QOS_DATA table were the check some that is stored in the database is not longer valid.
When the entry in the s_qos_data is first created a check sum is created on the following fields.
If any of these values changes then the check sum calculation will fail and you will see the above message.
The best way to correct this is to delete the record having an issue from the s_qos_data and then depreciate and activate the problem probe.
This will cause a new record to be created and the data can then be inserted again.
In this case the client had run a manual update query on the source field to change it from FQDN to short host name. This caused the problem.
To change from using the FQDN to short host name the below is a better solution.
The default for any probe is to use the source of the robot that it sits on - which by default is the server's hostname (such as uslil740.am.jllnet.com). Some probes have a source override, these are usually remote monitoring probes like interface_traffic, rsp, net_connect, etc... as well as cdm since it can monitor remote shares. Probes that are limited to monitoring the robot machine itself generally do not have a source override, and instead simply use the robot's source.
The robot's source is controlled through the config GUI, under Setup > Misc. If the "Set QoS source to robot name instead of computer hostname" is checked, it will use the short name. If it is unchecked, it will use the hostname from DNS (usually the FQDN). You can determine exactly what the controller's source is by using the controller's probe utility (highlight the controller and press ctrl+p) and issuing the get_info callback, which will have the source.
Where possible, I would avoid custom sources inside the probes and only do so where necessary, such as the case of remote monitoring probes like I mentioned earlier. Get the robot to report the desired source (FQDN or short name) and the rest of the probes will follow unless they are overriding the source themselves.
As for the recommendation of whether to go FQDN or short name as a best practice, the direction of engineering and PM is to encourage FQDN. This is especially important when you are working with multiple customers or domains where you may have the same name if it is not fully qualified. (i.e. exchange.apj.jll.com; exchange.us.jll.com) this is typically more true in MSP’s rather than single businesses. However, that does not preclude the use of short name if it is determined to be a better answer.