Invalid group ID error thrown by Health report ID that does not exist

Document ID : KB000005347
Last Modified Date : 14/02/2018
Show Technical Document Details
Issue:

After recreating some groups they now have new IDs in the database. As a result any scheduled reports that use that group need to be fixed. Normally to do so we would either set the existing report to use the new group, or delete and recreate the report against the new group.

Environment:
eHealth 6.3.2.x
Cause:

After a great deal of group deletion/recreation we now are observing scheduled Health report failures.

 

The failures reference errors like: 

Warning: Job 1360612939@1000038 'Health' is incorrectly defined (invalid group '1427721803@1000156'). 

Warning: Job 1360612939@1000039 'Health' is incorrectly defined (invalid group '1427721803@1000157'). 

Warning: Job 1360612939@1000040 'Health' is incorrectly defined (invalid group '1427721803@1000155'). 

 

1360612939 is the RFE machine ID. 

1000038 is the RFE based Health report ID for one report among others failing. 

1427721803 is a BE Poller machine ID that hosts the group. 

1000156 is the group ID from the BE Poller. 

 

Normally these errors are due to the group ID changing due to the group being deleted and then recreated. In this case the group ID is missing from the nh_group tables on RFE and BE Poller. Why? 

It is because the Job ID is for the deleted Health report not the new recreated one. Why do we not see the Job ID for those reports in the nh_schedule DB table on RFE or BE Poller, and still see errors related to it in the eHealth system messages log? 

The problem is the job being known to the Data Analysis job. This is based on the ID of the job being present in the DB table that lists the Data Analysis queue of report jobs requiring analysis.

If we examine, in the RFE and BE Poller(s) involved, the nh_da_queue DB table we will see the old scheduled reports ID value listed for the deleted scheduled reports. It should have been removed from the table when the job was deleted.

Resolution:

To fix this we must manually delete the old scheduled report job ID value from the nh_da_queue DB table on both RFE and BE Poller servers. Deleting it from only one server or the other will not fully resolve this. It must be removed from the nh_da_queue table on all servers involved. To resolve this:

 

1 - Back up the nh_da_queue table as follows: 

create table nh_da_queue_bak as select * from nh_da_queue; 

2 - Remove the errant old jobs based on job_id using: 

delete from nh_da_queue WHERE JOB_ID = <JobID>; 

 

After removing the bad entries from the database, the next time the Data Analysis job is run the these errors should be resolved.

Additional Information:
  • If there are multiple JOB_ID values to remove, they can be done together using the following:
    • delete from nh_da_queue WHERE JOB_ID = <JobID> or <JobID> or <JobID>;
    • For each instance of <JobID> replace it with the real ID to be deleted

 

Here is an example set of steps to restore the data from the backup table created if necessary.

1: First confirm what is in the table using: select * from nh_service_profile_backup where SERVICE_PROFILE_ID = 10000; 

2: The formatting is possibly off, but we should see something like this:

MACHINE_ID SERVICE_PROFILE_ID 

---------- ------------------ 

FILENAME 

-------------------------------------------------------------------------------- 

NAME 

-------------------------------------------------------------------------------- 

CREATE_TIME MODIFY_TIME EXPIRE_TIME 

----------- ----------- ----------- 

1 10000 

standard.1.10000.dac 

standard 

1468538465 1468538465 2147483647 

3: Assuming the info returned is correct we'd run this to insert it back into the normal DB tables: 

insert into nh_service_profile values(1,10000,'standard.1.10000.dac','standard',1468538465,1468538465,2147483647);