MOM keeps losing metrics and connection with collectors

Document ID : KB000095765
Last Modified Date : 10/06/2018
Show Technical Document Details
Issue:
MOM keeps losing metrics and connection with Collectors. GC configuration has already been changed but error persists

Various symptoms includes: MOM-Collector disconnections, slowness, data gap, ping time with high spikes of 45 seconds for some collectors.
Environment:
APM10.5.1 HF57 
Cause:
MOM Collector Communication Issue ( CollectorEventsReceiverImpl is blocking  the Isengard processing thread)
Resolution:
Applying APM 10.5.1 HF64 from DE358338. 
Additional Information:

One thing special about this issue is there is no harvest duration/GC spikes or any other obvious performance issues on MOM/Collector. The Cluster just starts to go wrong suddenly, and once it occurs a restart for the MOM might help immediately but the issue could come back very quickly. Seems that most of time it is a setup of ETC with multiple MOMs. One or more of the MOMs could have this issue.

Taking thread dumps is the only way to tell if this is the case.