How does eHealth calculate availability

Document ID : KB000023052
Last Modified Date : 14/02/2018
Show Technical Document Details

Question:

How does eHealth calculate availability

 

Answer:

Availability measures the percentage of time that an element is active and running. eHealth calculates availability for each element type except modem pools.

Device Availability
For devices such as routers, servers, and remote access servers, eHealth measures availability using the sysUpTime variable or the equivalent (verify with an element variable report). The sysUpTime variable is the amount of time that the device has been running since the last time it rebooted. At each poll, eHealth checks the sysUpTime value to determine if it is greater than or less than the time between successful polls. If sysUpTime is greater, the device has been available for the entire time between polls.

If sysUpTime is less than the time between successful polls, the device rebooted between polls. The device might have rebooted several times since the last good poll; however, sysUpTime only indicates the time since the last reboot

If eHealth attempts to poll the device and does not receive a response, it records the time of the missed poll. The next time that eHealth successfully polls the device, it compares the sysUpTime value to the time during which missed polls occurred. If sysUpTime is greater, the device was available since the last good poll. If sysUpTime is less, it was unavailable for the time between the last good poll and sysUpTime, and available for the time between sysUpTime and the current poll.

Interface Availability
For interface elements such as local area network (LAN)/wide area network (WAN), Asynchronous Transfer Mode (ATM), and Frame Relay, eHealth measures availability using the ifLastChange, ifOperStatus, and sysUpTime variables or the equivalent (verify by running an element variable report for the element in question). The ifLastChange variable is the time when the element last changed state. The ifOperStatus variable is the state of the element. (Frame Relay elements, for instance, do not have the ifOperStatus and ifLastChange variables, but they do have states that eHealth maps to available or unavailable states.)

The element states vary for each type of element, but the states indicate whether the element is operational (available) or non-operational (unavailable). Sometimes an element can change state without affecting availability. For example, if an element changes from state A to state B, and both states are operational states, the element remains available during that time. When eHealth polls the device where the interface resides, the agent at the device reports the state of the interface and the value of the ifLastChange variable.

If the ifLastChange time is earlier than the time of the last good poll, the element state has not changed since the last good poll. If the element is available at the current poll, it has been available for the time since the previous poll. If the element is unavailable at the current poll, it has been unavailable for the time since the previous poll.

When ifLastChange indicates that the element state changed between polls, eHealth assigns the available and unavailable time using the following guidelines:
1) If the element was unavailable at the last good poll and available at the current poll, it was unavailable for the time between the last good poll and ifLastChange. The element was available for the time between ifLastChange and the current poll.

2) If the element was available at the last good poll and unavailable at the current poll, it was available for the time between the last good poll and ifLastChange. The element was unavailable for the time between ifLastChange and the current poll.

3) If the element state is the same at the current poll as it was at the last good poll, eHealth uses the value of the NH_UNKNOWN_AVAIL_PCT variable to assess the unknown time (that is, the time between the last good poll and ifLastChange. By default, the value of NH_UNKNOWN_AVAIL_PCT is 0%, which means that eHealth .assumes that the element was in the same state (observed at both polls) for all of the unknown time. If the variable is set to 50%, eHealth assumes that the element was in the opposite state for 50% or half the unknown time. If the variable is set to 100%, eHealth assumes the element was in the opposite state for all (100%) of the unknown time.

 

For example:

 

If an element was in an 'up' state at the last good poll and is the same 'up' state at the time of the next (or current) good poll; and the NH_UNKNOWN_AVAIL_PCT is set for a value of 0% ---- then eHealth will assume that the element was up for all of the unknown time.

 

If in the same scenario the NH_UNKNOWN_AVAIL_PCT is set for a value of 100% --- then eHealth will assume that the element was in the opposite state (in this example it would be ' down ') for all ( 100% ) of the unknown time.

 

Server Process and Process Set Availability
A server process is available when it is in one of the following states: sleeping, waiting, running, or idle. A server process is unavailable when it is in the zombie or stopped state.

A process set is available when all of the critical processes in the set are available. A process set is unavailable when at least one of its critical processes is unavailable. (You specify which processes in the set are critical when you create the process set.)

Response Path Availability
eHealth measures the availability of response path elements based on successful attempts. During a polling interval, if one attempt is successful, eHealth assumes that the path element was available for the polling interval. If all attempts failed, it assumes that the path was unavailable for the polling interval. If there were no attempts during the polling interval, or if polling is stopped, eHealth cannot determine the availability of the path; Availability charts will show a gap for that time.

Availability When Polling Is Stopped
In previous releases, eHealth availability charts showed gaps for time when polling was stopped. (Polling stops when the eHealth administrator stops the poller or the eHealth server.) As of Release 4.5, eHealth "backfills" availability information for device and interface elements for the time between the last good poll and the time that polling resumes. eHealth will backfill up to 72 hours of availability data.  Refer to the Device Availability and Interface Availability sections above for a description of how eHealth calculates availability for these elements.