The probe actually makes 2 kinds of queries to the monitored vCenter for metrics:
- real time performance
- summary performance
The real time performance queries are made to each individual VM/ESX host managed by the monitored vCenter and does not touch the vCenter database. The number of metrics retrieved by these queries are controlled by the perf_request_batch_size key set in the setup section of the vmware probe's configuration file. You can access this through the probe's Raw Configure GUI. By default, this is set to 64 and typically does not need to be changed.
The summary performance metrics are collected separately for the following 4 vCenter items:
The collection of these metrics are made against the vCenter database and are affected by the setting of the maxQueryMertics setting in your vCenter. If the vmware log files contain the "Max query size exceeded" error when trying to pull the metrics for any of these 4 items, it means that there are more metrics available for the item than the vCenter maxQueryMetrics setting allows.
The VMware KB 2107096 article recommendation for setting maxQueryMetrics will only work for the resource pools summary performance metrics if you have no resources associated with the virtual machines or hosts in the cluster. When the vmware probe retrieves the summary performance metrics for the resource pools, the query is requesting all resources associated with each virtual machine. This includes resources such as CPUs, memory, disks, network adapters, etc. So if you have 254 virtual machines in your vCenter each with 4 CPUs, memory, disks, and network adapters and you set maxQueryMetrics to 1024, then the vmware probe will continue to raise this error because there are more than a total of 1024 resources associated with all 254 virtual machines.
If you look at the vpxd log in the vCenter, it will tell you the actual number of metrics associated with the query being made by the vmware probe. The message will look something like the following:
error vpxd[7F1E957A1700] [Originator@6876 sub=MoPerfManager opID=2d0077b] The query size of 1964 metrics exceeded the vpxd.stats.maxQueryMetrics limit of 1024 metrics. Dropping.
Setting maxQueryMetrics to -1 disables the limit check, but if you are concerned with potential performance degradation in the vCenter by using this setting, then you will need to establish a value which will allow the vwmare probe to retrieve these metrics without exceeding the maxQueryMetrics setting. You can use a formula like
<average number of resources on a VM> * <number of VMs in the vCenter>
The other option is to check for the "query size" error messages in the vpxd.log file and use a value slightly larger than the biggest query size value reported. In either case, you will have to keep an eye on the maxQueryMetrics setting if you start to increase the total number of VMs in the monitored vCenter.
If you have no interest in monitoring any of the metrics associated with the summary metrics collected by the vmware probe and you do not want to increase the maxQueryMetrics setting in your vCenter, you can basically ignore the error message in the vmware.log file.
The probe, by design, will continue to attempt to retrieve these summary performance metrics every polling cycle.
In this case there was one more change that had to be made to resolve the issue.
There was a missing hostname that had to be added (for the given host that vmware was running on), in the /etc/hosts file, because it wasn’t resolving via DNS.