After a great deal of group deletion/recreation we now are observing scheduled Health report failures.
The failures reference errors like:
Warning: Job 1360612939@1000038 'Health' is incorrectly defined (invalid group '1427721803@1000156').
Warning: Job 1360612939@1000039 'Health' is incorrectly defined (invalid group '1427721803@1000157').
Warning: Job 1360612939@1000040 'Health' is incorrectly defined (invalid group '1427721803@1000155').
1360612939 is the RFE machine ID.
1000038 is the RFE based Health report ID for one report among others failing.
1427721803 is a BE Poller machine ID that hosts the group.
1000156 is the group ID from the BE Poller.
Normally these errors are due to the group ID changing due to the group being deleted and then recreated. In this case the group ID is missing from the nh_group tables on RFE and BE Poller. Why?
It is because the Job ID is for the deleted Health report not the new recreated one. Why do we not see the Job ID for those reports in the nh_schedule DB table on RFE or BE Poller, and still see errors related to it in the eHealth system messages log?
The problem is the job being known to the Data Analysis job. This is based on the ID of the job being present in the DB table that lists the Data Analysis queue of report jobs requiring analysis.
If we examine, in the RFE and BE Poller(s) involved, the nh_da_queue DB table we will see the old scheduled reports ID value listed for the deleted scheduled reports. It should have been removed from the table when the job was deleted.