MOM keeps losing metrics and connection with Collectors. GC configuration has already been changed but error persists
Various symptoms includes: MOM-Collector disconnections, slowness, data gap, ping time with high spikes of 45 seconds for some collectors.
MOM Collector Communication Issue ( CollectorEventsReceiverImpl is blocking the Isengard processing thread)
Applying APM 10.5.1 HF64 from DE358338.
One thing special about this issue is there is no harvest duration/GC spikes or any other obvious performance issues on MOM/Collector. The Cluster just starts to go wrong suddenly, and once it occurs a restart for the MOM might help immediately but the issue could come back very quickly. Seems that most of time it is a setup of ETC with multiple MOMs. One or more of the MOMs could have this issue.
Taking thread dumps is the only way to tell if this is the case.