Collectors Dropping All Agents Irregularly

Document ID : KB000105772
Last Modified Date : 11/07/2018
Show Technical Document Details
Issue:
Irregularly, yet about at a frequency of once a week, noticed that one of the controllers loses all of its agents. It is not the same collector that drops agents every time.  
Environment:
CA APM 10.5.2
Cause:
As per the Log Analysis, the errors related to 'nio' in the outgoing delivery threads' stack trace.

6/20/18 08:41:27.899 PM MDT [DEBUG] [Outgoing Delivery 1] [Manager.OutgoingMessageDeliveryNioTask] Caught exception writing to connected hub
java.io.IOException: Broken pipe
at sun.nio.ch.FileDispatcherImpl.write0(Native Method) at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
at sun.nio.ch.IOUtil.write(IOUtil.java:65)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:471)
at com.wily.isengard.postofficehub.link.v1.server.ByteBufferOutputStream.writeToChannel(ByteBufferOutputStream.java:229)
at com.wily.isengard.postofficehub.link.v1.server.ByteBufferOutputStream.writeTo(ByteBufferOutputStream.java:158)
at com.wily.isengard.postofficehub.link.v1.IsengardObjectOutputStream.writeToDataOutput(IsengardObjectOutputStream.java:532)
at com.wily.isengard.postofficehub.link.v1.server.OutgoingMessageDeliveryNioTask.writeToDataOutput(OutgoingMessageDeliveryNioTask.java:255)
at com.wily.isengard.postofficehub.link.v1.server.OutgoingMessageDeliveryNioTask.deliverNextMessageInternal(OutgoingMessageDeliveryNioTask.java:169)
at com.wily.isengard.postofficehub.link.v1.server.OutgoingMessageDeliveryNioTask.deliverNextMessage(OutgoingMessageDeliveryNioTask.java:119)
at com.wily.isengard.postofficehub.link.v1.server.OutgoingMessageDelivererNio.run(OutgoingMessageDelivererNio.java:138)
at com.wily.util.concurrent.SetExecutor.doWork(SetExecutor.java:224)
at com.wily.util.concurrent.SetExecutor.access$0(SetExecutor.java:178)
at com.wily.util.concurrent.SetExecutor$WorkerRequest.run(SetExecutor.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Resolution:
Please try disabling 'nio' in all the Collectors and MOM, then restart the cluster 
Please add the following property in IntroscopeEnterpriseManager.properties to disable 'nio'.

transport.enable.nio=false

Disabling NIO will switch back to the previous classic socket operations, there is not loss of functionality

There is a similar issue which got fixed through 10.5.2 HF#9 (DE248777).