What do you look at when troubleshooting the vmware probe's performance, stability, and response?
There are various components and complexities involved in the vmware probe's performance, stability and response. A few are addressed here.
Unified Infrastructure Management (UIM) vmware probe
Review the following and apply as determined relevant -
Establish resources at the VCenter Level
The vmware probe operates most efficiently when resources are configured at the VCenter level or, at the absolute minimum, ESX host level. Typically, we do not recommend setting up more than 1 VCenter (up to 10 ESX hosts) per vmware probe. Again, if you have 10 ESX hosts all under the same VCenter, it is most efficient to setup the resource at the VCenter level.
Automonitors vs. Templates vs. Explicitly Set Monitors
Typically it is always most efficient to set monitors up as automonitors. This makes dealing with monitors more efficient for the probe and it's management of memory and cpu resources to handle monitors. On occasion, you may need to setup a monitor explicitly because you may need a very customized monitor for that specific resource, but we recommend setting up all monitors as automonitors when possible. Also, you can drag templates to automonitors as a way of setting up automonitors. However, applying templates to resources directly is just as inefficient for the vmware probe as setting monitors explicitly.
You start by looking at the size of the vmware configuration file. If it is in the order of megabytes in size, there is likely a workload balancing issue. You can often either break up the resources being monitored or the monitor volume among multiple instances of the vmware probe deployed among multiple robots. So balancing the workload helps.
Performance Log File
Under the vmware probe directory, you will find a log file that is titled 'performance.log' (sometimes there may be more than one which are suffixed with a number). Looking at these logs, you never want to have times larger than 10's of thousands of milliseconds. When the milliseconds start translating into minutes, you need to investigate whether or not there are either too many resources or monitors configured for that probe AND/OR if the check interval is set too low.
You may want to consider increasing the resource check interval(s) initially. A 1 minute check interval is considered too tight for response back from the ESX host or VCenter. If certain monitors are more critical than others that the check interval needs to be set at 1 minute intervals, you may want to consider setting up 2 resouces for the same ESX host or VCenter and then you will be able to set a smaller check interval for those more critical monitors on on resource and a longer check interval on the other resource for those monitors that are less critical.
Memory and CPU available to the probe
Review the resource utilization and availability on the host/VM where the vmware probe is deployed and make sure there is ample memory and CPU available to the probes running on that robot. Typically if you are below 20 percent memory or CPU available, you may need to either move probes off that robot to another less loaded robot, or increase the CPU and/or memory resources available on that robot.
If the availability of resource isn't the issue, you can look at going into the vmware probe's Raw Configure and setting the starting and maximum java heap size in the 'properties' -> 'java_options' section. The '-Xms' value is the number of bytes the probe will claim when starting and the '-Xmx' value is the number of maximum bytes the probe will claim if needed. Always remember to append the number with the letter 'm' (megabytes).