Tip for loadbalancing configuration when upgrading an Enterprise Manager cluster.

Document ID : KB000031358
Last Modified Date : 14/02/2018
Show Technical Document Details

Description:

 

This documents a suggested configuration change when upgrading an APM cluster.

 

 

 

Solution:

 

When upgrading an APM cluster, either an in-place upgrade of the software or an installation of a new cluster using existing historical data, it can take time for agents to settle down while connecting to collectors.

Consequently, agents can be sent to more than one collector with the effect that they will post historical data to multiple collectors. As a result, the MOM will be extraordinarily busy trying to deal with that process.

You may notice that a lot of agents do not show reporting data while the MOM attempts to process the load, agents may show up in the Denied Agents list in the APM Status Console until they can be allocated to a collector.

This process of agents moving between collectors can increase greatly the historical data on a collector possibly leading to a metric clamp on one or more collectors. In the worst case, this can lead to all collectors being clamped. Once the clamped condition happens, the agents will not be able to connect to any collector and you will not see any data in the APM Workstation/WebView.

Once all collectors are clamped, all you can do is raise the live and historical metric clamp. However, continuous raising of the clamps can lead to performance degradation in the APM UI.

 

 To help avoid this situation, there are special properties related to load balancing:

 

 There is a hot property since APM 9.1 called "introscope.enterprisemanager.loadbalancing.staywithhistoricalcollector", with three possible options: "always", "notoverloaded", and "rarely" - the default value is "notoverloaded". 

 

 More explanation on the three different options:

 

 always Always set an agent to the collector that previously hosted this agent even if the collector is overloaded.  This lowers the load on the MOM during a restart/upgrade where the agents are not brought down and the agents just wait to connect when their original collectors catch up 
 notoverloaded Set an agent to the collector that previously hosted this agent, unless it is overloaded and some other collector is not.This is the default for 9.1 
 rarely Set an agent to the collector that previously hosted this agent only if there is no under-loaded alternative collector in the cluster This is the  pre-9.1 mechanism

 

 

From the descriptions of each option, it is recommended to set the property to "always" prior to an upgrade.

                                     introscope.enterprisemanager.loadbalancing.staywithhistoricalcollector=always

 

 This only needs to be configured in the IntroscopeEnterpriseManager.properties file in the config folder of the MOM, as it is a extra property it will need to be added to the file.

Once all of the agents are reconnected, you can change the property value to "notoverloaded" to restore default APM 9.1 load balancing behaviour