How the eHealth poller detects and treats a 32-bit counter wrapping or decrementing

Document ID : KB000023676
Last Modified Date : 14/02/2018
Show Technical Document Details

Question:

How does the eHealth poller detect and treats a 32-bit counter, wrapping or decrementing?

 

Answer:

Think of the 32 bit counter as a circle.  Put 0 and 4294967295 at the top of a circle. Then walk your way around the circle clockwise.   Taking delta's after every read.  The poller will read a value from the counter,  store it in memory, then 5 minutes later read a second value and do a subtraction, then write a delta.

 

Only the delta is stored in the database.  Some counters that are displayed this way are:  InOctets,  ifOutOctets, ifInUcastPkts, ifInNUcastPkts. 

 

If the poller observes that the counters next value is lower than the previous value, it adds to the negative value, 2^32 (4 Billion). Then checks to see if the new delta is less than 2^31 (2 Billion). If it is, it uses the value.  If it is not, the poller  throws a large delta error and show both the old value as well as the current value in the messages.stats.log.

 

Most times, if you see that error, you may notice that the new value is actually just a few (or one) click(s) smaller than the old value.  If that is the case, the counter is indeed decrementing. And the problem is the device vendors to fix.

 

If the delta is somewhere between 2 billion and 3 billion it is possible that the port simply requires fast polling to keep up.  (this is true only on 100MB and 1GB links).  This should not be required of ports running at 10MB or less.

 

Example that show a decrementing counter.

(This one is the fault of the MIB. No doubt.)

Pgm nhiPoller[Net]: Received large delta from 'BR030-Thewet-Ethernet0/0'. Poll is dropped (OID in error is ifOutUcastPkts. Delta is 4294967275. Old value is 33830740. Current value is 33830719.). <<<< Counter decremented by 21 !!!

Pgm nhiPoller[Net]: Received large delta from 'BR030-Thewet-Ethernet0/0'. Poll is dropped (OID in error is ifOutUcastPkts. Delta is 4294967293. Old value is 33830719. Current value is 33830716.). <<<< Counter decremented by 3 !!!!

 

 Note the entries for "Old Value" and "Current Value".   These are the actual delta's recorded from the MIB entry for that particular variable.  To verify a customer may manually walk the mib to validate/verify the decrementing counter. 

 

An over simplified example (below) shows a counter that only counts from 0 to 39 (You can change it to use 0 - 4 billion if you like).

Counter is reading 20

eHealth initializes and starts polling.....

Poll1 - We read the counter (Let's say It's 20), store 20 in memory. No delta written to DB, not enough data....Yet.

Poll2 - We read the counter, (Let's say It's 30), store 30 in memory,  do math, 30 - 20 = 10.   Store 10 in the DB.

Poll3 - We read the counter, (This time it is 1.  The counter has wrapped past 0), store 1 in memory, do math but since the current value is smaller than the previous value we'll add (counter max) which is 40 , and so results: 1  -  30  +  40 = 11 (Check delta not > (half the counter))  then store 11 in the DB.

Poll4 - We read the counter, (Now it is 15), store 15 in memory, do math, 15 - 1 = 14  Store 14 in the DB.

 

Decrementing:

Let's say Poll3 was actually 29 as what would happen if the MIB had a bug and the counter was actually decrementing. look like this:

Poll3 - We read the counter, (It's 29), store 29 in memory, do math but since the current value is smaller than previous (30) value,  we'll add (counter max) which is 40, and so results:  29  -  30  +  40  =  39 (Check delta not > (half the counter))  39 > 20, = log an error to messages.stats.log, discard delta.

 

If the poll was missed, and the second one (at 600 seconds) see that sysUptime has not reset, it will a.pply the same math to the new counter values and generate a delta over 600 seconds.

 

Additional Information:

Related topic:  64 Bit counters can never wrap.