How does eHealth handle a missed poll (Legacy KB ID CNC TS19165 )

Document ID : KB000052040
Last Modified Date : 14/02/2018
Show Technical Document Details
There are two types of OIDs that we poll:
    Counters, ongoing incremental values; the Delta is used for a poll value (example Bits In)
    Gauges, which have a range and are often a % of a total, so a range of 0-100% (CPU utilization, for example)


Counters use a delta, so after missed polls, the next successful poll will get the delta from the last successful poll.  A Trend report will see a spike, as a poll will suddenly have near double a "normal" value.  A longer term report and rollups will get normalized and the data will be correct, as no delta was missed overall.


Gauges use an instantaneous reading, but weight the reading with time. The value recorded is (gauge reading) * (delta time).  So after missed poll(s), the reading gets assigned over the time since the last successful poll. For example, if the reading is 60%, that 60% will be applied for the past missed poll cycles.  It is not exact, but there is no missing data, and once normalized the impact is trivial.

This is assuming it was one missed poll among many good polls, but not the first or last in a series.  It depends on what happened between the missed polls.



Related Issues/Questions:
What happens on reports for a missed poll
How do rollups handle missed polls


Missing poll data on reports


Problem Environment:

eHealth


Additional Information:
Rollups of counters are not an issue, as it simply becomes a larger delta over the rollup time. 


Rollups of gauges will be an average of all the individual gauge values.  Any missed polls will be back filled by the next poll with that current gauge value, and that value will be used to cover as many gaps as necessary, per the weighted value method explained below.


A more detailed breakdown of possible scenarios:


1) Missed polls and then good poll.


  Poll 1 - Good
  Poll 2 - Missed
  Poll 3 - Missed
  Poll 4 - Good


Let's say poll 1 is the last time eHealth wrote to the db. Each poll is 5 minutes in length.


At poll 2, we can't poll the agent. Nothing would be written but the counters in the agent would increase, the gauges would be whatever value they were at the moment, but we can't poll them.


At poll 3, same thing.


At poll 4, we finally can poll. The delta time is 15 minutes (three 5-minute polls). The counters (bytes, etc) would use the delta from the last known value of the counter at poll 1 vs. the current poll. The gauges would be whatever the agent has at the moment we request poll 4. They probably don't represent the value over the last 15 minutes. Gauges have some sort of defined behavior in the agent (either instantaneous at poll time or a rolling average over the past 1, 5, 10 minutes). But we really don't know what the gauge values were at poll 2 and poll 3.


The trend line will show a 15 minute line, no breaks in the data.


**************************************


2) Missed polls, reboot, good poll


  Poll 1 - Good
  Poll 2 - Missed
  Reboot happens
  Poll 3 - Missed
  Poll 4 - Good
  Poll 5 - Good


In this scenario, the eHealth poller will get the same behavior. It won't detect the reboot of the agent until poll 4. At this point though, the RMON standard says that the state of the agent is unknown so we can't delta back to poll 1. What eHealth does at this point is throw away the data collected and baselines for a new poll at poll 4. We record the fact that a reboot occurred between polls 3 & poll 4 and that we missed two polls.


At poll 5, we can delta, but only back to poll 4. The delta time is 5 minutes. The counter value is the difference from poll 4 to poll 5. Gauges will be whatever value they were at poll 5.


The trend line will be broken from poll 1 to poll 4 and a single 5 minute trend will be from poll 4 to 5.

.


(Legacy KB ID CNC TS19165 )