Autosys jobs running in back end but status is "ST"

Document ID : KB000113213
Last Modified Date : 12/09/2018
Show Technical Document Details
Issue:
We can see all  jobs are having issues on the same System Agent.
They started well but not going to " Running"  status.
Status of the job shows as "ST" but as per Application teams jobs are running in background. 

 
Environment:

System Agent release 11.3 SP2  on Linux but it might happen with any other release
 
Cause:

Autosys is configured in a full High Availability System.

The primary scheduler failed over and the shadow machine took over.

During this process, the agentparm.txt file of the System Agent is automatically updated and the communication address of the primary scheduler is replaced by the same for the shadow scheduler 

When the primary scheduler is back and up again, same process is done again and  the communication address of the shadow scheduler is replaced by the primary scheduler

On this System Agent, someone manually updated the agentparm.txt file and the first communication address still belonged to the shadow scheduler and there was another one for the primary scheduler.

When the System Agent tried to send the job status to the scheduler, it was trying to communication with the shadow scheduler which was the first in the list

The shadow scheduler rejected this communication attempt with a NAK signal, hence the problem with the job stuck in Starting status
Resolution:

Cleaned up the communication addresses in the agentparm.txt file of the System Agent to only keep the address of the primary scheduler which is the active scheduler
Variables to cleanup are:

communication.manageraddress_2
communication.managerhealthmon_2
communication.managerid_2
communication.managerport_2
communication.socket_2

To only keep details of the active scheduler

communication.manageraddress_1
communication.managerhealthmon_1
communication.managerid_1
communication.managerport_1
communication.socket_1