System Agent 11.3 build 765 in manual offline with no apparent reason

Document ID : KB000047699
Last Modified Date : 14/02/2018
Show Technical Document Details

Issue: 

Due to a disk resource below critical threshold condition, the agent shut down and is placed in manual offline mode

Later on, when disk space is available, the agent is able to start, but the scheduler does not put it automatically online

A “sendevent -E MACH_ONLINE -N <node>” command has to be executed

Environment: 

System Agent 11.3 SP5 build 765 on Unix

CA WAAE 11.3.5

Cause:

During the communication process with the system agent, when the disk threshold was reached, the Agent sent an NAK status to the scheduler: 

From the receiver.log file on the agent, below message can be seen:
Can't parse the message: cybermation.library.communications.CybConversationWrongMessageException: Rejected due to resource shortage .NAK sent 

That's the reason the scheduler put the agent in a manual offline state. 

Solution:

In CA-WAAE 11.3.5 version "610 xxx yyy 11.3.5 INCR4_CUM_SEP2016" we can export a new variable: 

DISABLE_AUTO_OFFLINE_ON_EXPLICIT_REJECT 

Machine went into manual offline status when an agent explicitly rejects 
incoming requests due to disk space problem. Customers requested to 
disable the manual offline as scheduler does not bring the machine online 
even after agent comes online after additional disk space is added to the 
machine. A MACH_ONLINE event must be issued for the machine to come online. 

New environment variable named DISABLE_AUTO_OFFLINE_ON_EXPLICIT_REJECT is 
introduced to control the scheduler behavior. When this variable is set, the machine is placed in automatic offline status (not manual offline status)

This environmental variable can be added in the $AUTOUSER/autosys.sh.<srv_name> file
Syntax is:

DISABLE_AUTO_OFFLINE_ON_EXPLICIT_REJECT=1
export DISABLE_AUTO_OFFLINE_ON_EXPLICIT_REJECT