Agent on machine has not responded in a timely fashion

Document ID : KB000129209
Last Modified Date : 13/03/2019
Show Technical Document Details
Issue:
we are getting error as Agent on [xxxxxxxxx] has not responded in a timely fashion.
We have also re-installed agent but issue still persists.


autoping status.
CAUAJM_I_50023 AutoPinging Machine [xxxxxxxxx]
CAUAJM_W_10496 Agent on [xxxxxxxxx] has not responded in a timely fashion. Try again later. [CA WAAE Autoping]
CAUAJM_E_50281 AutoPing from the Scheduler WAS NOT SUCCESSFUL.
CAUAJM_W_10496 Agent on [xxxxxxxxx] has not responded in a timely fashion. Try again later. [CA WAAE Autoping]
CAUAJM_E_50283 AutoPing from the Application Server WAS NOT SUCCESSFUL.
CAUAJM_E_50026 ERROR: AutoPing WAS NOT SUCCESSFUL.



Machine definition.
 /* ----------------- xxxxxxxxx ----------------- */
insert_machine: xxxxxxxxx
type: a
max_load: 100
factor: 1.00
description: OBDX
port: 7520
node_name: xxxxxxxxx
agent_name: xxxxxxxxx_xxx
/* key_to_agent: *** masked value ***/
encryption_type: AES
opsys: linux
character_code: ASCII

 
Environment:
CA Workload Automation Agent 11.3, Build 979, Service Pack 7
Redhat Linux 7.x
Cause:
The agent's transmitter.log shows a lot of "Address already in use (Bind failed)" errors that prevent it from opening a connection to the scheduler.

03/11/2019 16:21:55.946 IST+0530 1 TCP/IP Controller Plugin.Transmitter pool thread <Regular:1>.CybTargetHandlerChannel.constructConversation[:1198] - Error connecting to XXX_SCH:
cybermation.library.communications.CybConversationConnectBindException: Address already in use (Bind failed)

That usually occurs when all the ephemeral ports are in use or in a pending state (CLOSE_WAIT, etc).
 
Resolution:
So stopped the agent
./cybAgent -s

output of  netstat -anp showed that there were 28k processes with CLOSE_WAIT for port 35004 help by PID 9667 for java. This is perhaps not allowing the agent to get any more ephemeral ports.

tcp6       0      0 x.x.x.x:59727    x.x.x.x:35004    CLOSE_WAIT  9667/java           
tcp6       0      0 x.x.x.x:53710    x.x.x.x:35004    CLOSE_WAIT  9667/java           
tcp6       0      0 x.x.x.x:46768    x.x.x.x:35004    CLOSE_WAIT  9667/java           
tcp6       0      0 x.x.x.x:47232    x.x.x.x:35004    CLOSE_WAIT  9667/java           
tcp6       0      0 x.x.x.x:60299    x.x.x.x:35004    CLOSE_WAIT  9667/java          


killed the process ID 9667

kill -9  9667

Restarted the agent

./cybAgent  -a

Now the autoping works  fine

CAUAJM_I_50023 AutoPinging Machine [xxxxxxxxx]
CAUAJM_I_50025 AutoPing WAS SUCCESSFUL.