All Agent Jobs abort with Launch Error status

Document ID : KB000084864
Last Modified Date : 28/08/2018
Show Technical Document Details
Issue:
Error Message :
N/A

Launch Errors that occur for all Jobs running on a specific Agent will generally produce an error in the Agent's AgentService<timestamp>.logs.
If Launch Errors occurs without any errors, this could indicate an issue on the OS level.

In addition to investigating possible causes listed in the solution field, if an Automic Support case is required, the following information is required for further review:
  • Debugged RmiServer<timestamp>.logs covering a period of an hour before to an hour after the first Launch Error.
  • Debugged AgentService<timestamp>.logs covering a period of an hour before to an hour after the first Launch Error. This should be from the specific Agent where Launch Errors are occurring.
  • A screenshot of the Backlog/History showing the first Jobs to go to Launch Error. The full Job Name, timestamps, and Job ID should be viewable.
Information on how to enable debug on the RMI (master) and Agent can be found at the link below:

Applications Manager Debug Matrix


 
Resolution:
Check the following:
  • If the Agent server has sufficient disk space.
  • If any recent patch or security changes that is stopping the Agent from successfully running Jobs.
  • Stopping the Agent service should stop all Java processes owned by the AM OS user. Any leftover Java process could indicate a stuck or hung process. Kill these processes before starting the Agent service.
  • Have any of the agent's file been deleted (such as files in it's exec directory)
  • If the Backlog has Jobs that are 4 or more virtual days or older (These should be cleaned up).
  • If Jobs immediately go to Launch Error, check the RMI to see if it is completely up and running:
    • Check RMI for session ID
    • Check RMI connection for OK status and uncheck any row that is not a eligible RMI server
    • Check for failover RMI to see if it is the active RMI.