Adjusting Timing Variables for Distributed eHealth Clusters

Document ID : KB000018474
Last Modified Date : 14/02/2018
Show Technical Document Details

Description:

Customers using a distributed eHealth cluster may open cases in which reports run from a reporting front end (RFE) time out or contain errors that not all data could be retrieved, but the same reports run perfectly well when run directly off the back-end poller (BE).

In most every case all that is required is adjusting the distributed environment variables to higher setting to prevent timeouts.

Solution:

To add or change these variables on a Windows server, customers should open the System control panel, go to the Advanced tab, and click the Environment Variables button. Use the New or Edit buttons in the lower section of the window (for System variables) to make changes. The server must be rebooted for changes to take effect.

To add or change variables on a Solaris or Linux server, customers should edit the $NH_HOME/nethealthrc.sh.usr file. If the variable is not already defined in the file, add it using the following format:

VARIABLE=value; export VARIABLE

Do not add spaces between the variable name and its value. Changes do not take effect until you stop and restart the eHealth services.

nhServer stop
nhServer start

Add or edit the following variables on the RFEs only:

Variable NameDefault ValueRecommended Change
NH_DRPT_HEARTBEAT_INTVL60120
NH_DRPT_MISSED_MSG_LIMIT23
NH_DRPT_RUN_CMD_TIMEOUT14590
NH_DRPT_STATUS_MSG_TIMEOUT23060

 

Add or edit the following variables on all cluster members (RFEs and BEs):

Variable NameDefault ValueRecommended Change
NH_CLUSTER_CMD_TIMEOUT30300
NH_RCS_CONNECT_TIMEOUT5300
NH_RCS_MSG_TIMEOUT3520
NH_RCS_RETRY_QUEUE_TIME355

 

1 The value of NH_DRPT_RUN_CMD_TIMEOUT should be less than NH_DRPT_HEARTBEAT_INTVL and more than NH_DRPT_STATUS_MSG_TIMEOUT

2 The value of NH_DRPT_STATUS_MSG_TIMEOUT should be less than NH_DRPT_HEARTBEAT_INTVL

3 The value of these two variables, when added together and multiplied by two, should be less than the values of all the other variables listed except NH_DRPT_MISSED_MSG_LIMIT. In the recommended changes, for example, (20+5)*2 = 50. All other listed variables should be no lower than 51.

The variables may still need to be adjusted upwards for large clusters where reports will include many elements.

For full details of what each variable is used for, their default values, and their maximum values, please review the eHealth Command and Environment Variables Reference Guide.

ADDENDUM: NH_DRPT_COMPRESS deprecated in 6.3.0, replaced with parameter reporting.distributed.enableRdiCompression

When dealing with eHealth distributed clusters, some customer have experienced report timeouts and/or failures when the reporting front end console (RFE) cannot collect data from one or more back end pollers (BEs) fast enough. The solution to this problem is to adjust the environment variables related to distributed reports so that more time is allowed before erroring out. Specifically, the key variable has been NH_DRPT_COMPRESS_RDI, which needs to be set to YES on every member of the cluster (both RFEs and BEs).

While the NH_DRPT_COMPRESS_RDI variable is still completely functional and will work for anyone using it now, there is a newer, preferred method for averting potential report timeouts.

The new preferred method to enable RDI compression is to use the following command:

nhParameter -set reporting.distributed.enableRdiCompression "yes"

This has a number of advantages over the environment variable.

  • The new method is compatible with the old in that both can be enabled at once, so there is no pressing need to remove NH_DRPT_COMPRESS_RDI if you've already set it.
  • You only need to enable RDI compression on the RFE consoles; you don't need to enable it on the BE pollers
  • After changing the nhParameter value, there is no need to restart the eHealth servers or reboot the machine

The new method also avoids the issue where you bring a new machine into the cluster but forget to set the environment variable on the new machine, which will cause distributed reports to fail.