How does Statistics Rollup work in eHealth?

Document ID : KB000023397
Last Modified Date : 14/02/2018
Show Technical Document Details


How does Statistics Roll up work ?



As eHealth collects and stores data, the size of the database increases. To minimize the amount of disk space that the database uses, eHealth summarizes the data by rolling it up into hourly, daily, and weekly samples for a specified number of days or weeks. By default eHealth keeps "As Polled" data for 2 days, "1 Hour Samples" for 6 weeks, and "1 Day Samples" for 70 weeks. These default values may be modified via the Console --> Scheduled Jobs --> Statistics Rollup --> Modify. When data ages beyond the last sample, eHealth removes it from the database.

The statistics rollup occurs at 8:00 P.M. by default. When the rollup occurs, eHealth retains the previous two days of as-polled data in addition to the current day. You can change the time of day to roll up the database, the number of days to retain as-polled data, and the number of days or weeks to retain data summarized hourly and daily. The statistics rollup schedule logs information about the job in the Statistics_rollups.<jobId>.log file located in the $NH_HOME/log directory of your eHealth installation.
Stats tables

nh_stats0 = hourly data, 24 tables per day containing 1 row per element every 5 minutes.
nh_stats1 = daily data, the contents of 24 stats0 tables is averaged into one stats1 table, one row per element for each nh_stats0 table (hourly).
nh_stats2 = weekly data, the contents of 7 stats1 tables averaged into one stats2 table, one row per element for each nh_stats1 table (daily).

Basic algorithm

Hourly rollups:
For every element, find all rows whose sample times fall between the beginning and end of the hour.
Add the rows together by summing each column. Note that even gauge columns are summed together. This is acceptable because when the poller puts the data into the database, it multiplied it by deltaTime. When reports read the row, they divide gauges by deltaTime.

Daily rollups:
Do same as hourly, but look for data at the beginning of the day and end of the day. The only artificiality of rollups is that the data is smoothed. If you see 20 minutes of polls at 80% utilization and the remaining 40 minutes at 20%, the rolled up hourly average is 24% utilization. Here is where you could potentially see "lost data". If you are reporting on data that has been rolled daily samples and then have requested the report for 9-5, we can no longer tell what data was for those hours. All we have is one single datapoint, a daily average, so we prorate the data (take 1/3 of the daily average for an 8 hour window)..