Started seeing the following message in the MOM regarding one of the collectors:
[INFO] [PO:main Mailman 5] [Manager] Ingore collector went away: test-collector-usaca.com@5001
In spite of the above message:
- Seeing the collector still connected to the MOM
- Do Not See see anything within the collector logs at that time to indicate any disconnection from the MOM
- Do Not see anything that would point to root cause
Why we still continue to get that particular [INFO] messages?
Our Engineering team declares that this message is harmless, so long as end-user experience is acceptable. They state that the message is poorly-worded, too generic and misleading.
Understanding of the technical explanation is that the Event query that MOM sent to Collector has failed due to communication error. Some communication errors are expected as one-sided termination of the connection (closing Workstation, restarting WebView server) will result in message delivery error. Some communication errors are a result of physical problems with the connection and the last kind of errors are a result of overflowing message queues due to a high load or internal err in the application.
So, when the user is using the Workstation to query some data (i.e. trace data), if the connection abruptly quits between the MOM/Collector while the MOM has executed querying data from Collectors,
but that query has not completed -- this message will be elicited.
It could be caused also by the Collector EM Performance, should start seeing [WARN] or [ERROR] messages followed by.this message.