hub configuration - timeout and retry settings (explained)

Document ID : KB000097954
Last Modified Date : 04/10/2018
Show Technical Document Details
Introduction:
This is a summary of UIM hub timeout and retry settings described and explained in detail, as well as some advice on under what circumstances you may want to adjust them (increase/decrease).
Background:
UIM customers need to understand the timeout settings, defaults, general recommendations, and details regarding hub timeout and retry settings so they can make appropriate modifications based on hub behavior.
Environment:
- UIM v8.5.1
- hub v7.93
- robot v7.93
Instructions:
hub timeouts
hub timeouts collectively control the hub's behavior when it encounters slow responses from queues and other hubs.  Specifically the postroute_* timeouts have to do with queues.

postroute_interval
Controls how frequently the hub checks queue subscriptions. The default = 30.

postroute_reply_timeout
This value is also in seconds, and determines how long the hub will wait for a reply from any queue/subscriber after sending messages. The default = 180.
It controls how long the hub waits (in seconds) for a reply from the remote hub after sending a bulk of messages on a queue before deciding that it didn't go through and then re-sends the bulk.

postroute_passive_timeout
This value is also in seconds, and decides how long the hub will let the queue be 'passive,' before disconnecting it. The default = 60.
This setting controls how long the hub will allow a queue to have no data/traffic flowing across it before it decides that the queue needs to be reset.  Note: this should always be set higher than postroute_interval to avoid false resets!

hub_request_timeout
The timeout value for hub (hub-to-hub) communication requests. The default = 60.
This setting controls how long the hub waits for other (non-bulk-post) request, e.g. nametoip requests, from another hub before deciding it has timed out

tunnel_hang_timeout
This controls how long the hub will allow a tunnel to have no data flowing across it, before it decides that the tunnel needs to be closed and reconnected. The default is 300.
The hub continuously checks if one or more of the active tunnels are hanging. No new connections can be established through tunnels that are hanging. If a tunnel is hanging, the hub attempts to restart the tunnel. If the restart fails, the hub performs a restart after the specified number of seconds. On systems with very low latency and fast response between hubs, it may be beneficial to decrease tunnel_hang_timeout to 120, or even 60.

tunnel_hang_retries
tunnel_hang_retries setting is not present in the hub.cfg by default, and when not present, it defaults to 1. This controls how many times a tunnel connection will be reattempted when unresponsive, before the hub simply restarts itself entirely to attempt to self-heal. Setting this higher will allow the hub to internally retry the connection a couple more times before restarting. It controls how many times the hub will try reconnecting a tunnel -- if it fails this number of times, the hub will perform an internal restart to try and get the tunnels operational again

reply_timeout
Specifies the reply timeout setting on a per queue basis. This value overrides the global postroute_reply_timeout for the specific queue. You can specify this setting in the <postroute>/<name_of_queue> section of the hub.cfg.

max_heartbeat = 30
Indicates how long the tunnel server will wait for client heartbeats.

The following setting is applied in the controller 'setup' section:

reuse_async_session = 1
Resolves an issue where the probe_config_get callback fails every other time. To implement the fix, add the new key reuse_async_session = 1 to the controller section of robot.cfg. The default is 0, which is off.

Suggested settings from the Help doc for hubs version 7.x:

postroute_interval = 120
postroute_reply_timeout = 300
postroute_passive_timeout = 300
hub_request_timeout = 120
tunnel_hang_timeout = 300
tunnel_hang_retries = 3