Problem Description :
Customers from time to time notice the following error message in the Apache Karaf.log of the DC.
015-03-10 09:17:27,429 | WARN | Timer-2 | SnmpSession |m.ca.im.dm.snmp.snmp4j.SnmpProxy 188 | 197 -com.ca.im.data-collection-manager.snmp - 2.4.1.SNAPSHOT | | Cannot find proxy listener for Response listener:com.ca.im.dm.snmp.collector.TooBigListener@e437f22 for Request: GET[reqestID=26794671, errorStatus=0, errorIndex=50, VBS[22.214.171.124.126.96.36.199.188.8.131.52.1.3 = Null; 184.108.40.206.220.127.116.11.18.104.22.168.1.6 = Null; 22.214.171.124.126.96.36.199.188.8.131.52.1.9 = Null; 184.108.40.206.220.127.116.11.18.104.22.168.1.12 = Null; 22.214.171.124.126.96.36.199.188.8.131.52.1.3 = Null; 184.108.40.206.220.127.116.11.18.104.22.168.1.12 = Null; 22.214.171.124.126.96.36.199.188.8.131.52.1.9 = Null; 184.108.40.206.220.127.116.11.18.104.22.168.1.6 = Null; 22.214.171.124.126.96.36.199.188.8.131.52.184.108.40.20672706 = Null; 220.127.116.11.18.104.22.168.22.214.171.124.126.96.36.19972706 = Null; 188.8.131.52.184.108.40.206.220.127.116.11.18.104.22.16872706 = Null; 22.214.171.124.126.96.36.199.188.8.131.52 = Null; 184.108.40.206.220.127.116.11.1.5.10 = Null; 18.104.22.168.22.214.171.124.0 = Null]]
This is a warning message and can be seen in a couple of different conditions:
1)It is typically seen when a response comes in from a device beyond the timeout interval and the DC has already timed out that response. Given the default SNMP profile settings of 2 retries and 3 second timeouts, this would indicate a device is responding after 9 seconds, or that the DC is processing the request after 9 seconds.
2) It could also be seen when the data collector receives a response to request # 2 (for the same requestID) after it has already processed response to request # 1. This message is kind of tricky - it is logged whenever we receive a response (SNMP) and the listener for the originating request has already been notified of the response earlier, and therefore has been been removed. When we issue snmp requests, we use a retry value of 2 (which means we would try three times in total for any request that was timing out) and a timeout of 3 seconds by default. We do end up re-using the request ID for the retries. If the original request goes out and the response does not come back in 3 seconds, (or comes back on the wire, but is not processed by the DC application within 3 seconds due to some delay), then we would re-send the request (assuming that it timed out). In this retry we will re-use the same request ID.
If after that 3 second time, the DC eventually began processing the first response, the asynchronous listener would be notified and removed from a map. If the second response (response to the retry) then came in and was processed by the DC you could run in to the scenario where the retry's response comes back, and since the same request ID is used, we would end up logging that there was no listener for the response. Ultimately, the data has been processed by the DC but we might just have issued more than one request. The possible causes for this could be latency on the network for a given device, or the DC application processing latency that could even be caused by a large amount of polled items