Workload Manager (WLM) keeps track of the date and time the latest checkpoint information (e.g. jobinit/jobterm) was received from all Workload Agent (WLA) machines. The following command executed on the WLM will display this information:
cautil status remote
If checkpoint date and time for the offending WLA machine is ahead of the local (WLM) system date and time, WLM will not process any further checkpoint requests from that WLA until the local (WLM) date and time has reached date and time displayed for that WLA.
To correct the problem, please perform the following procedure:
- On WLA, stop Unicenter:
unicntrl stop all
- On WLA, remove or rename the checkpoint file:
- On WLM:
cautil delete remote id=WLA_hostname
cautil status remote -> make sure that WLA is no longer listed in the output
- On WLA, restart Unicenter:
unicntrl start all
- Submit a job from WLM to that WLA to verify that the job status gets successfully updated on WLM.