Best practice for resolving CA Performance Center when graph chart plot data lack/missing

Document ID : KB000010743
Last Modified Date : 14/02/2018
Show Technical Document Details
Introduction:

The CA Performance Center metric chart sometimes interrupted/lacked the plot.

  

Background:

Following are the main cause of missing data in CAPC graph. 

  • A: SNMP timeout (device is not responding or delaying to polling)
  • B: The device only supports SNMP 32bit counter.
  • C: CA Data Aggregator is slow with high load. 
     

So, it is necessary to determine which type of cause will be suitable for your case according with following materials.

1. If any error generated at the problem time in following Data Aggregator and Data Collector logs: 

  • /opt/IMDataAggregator/apache-karaf-*/data/log/*
  • /opt/IMDataCollector/apache-karaf-*/data/log/*

2. The "Number of Event Rules Evaluated" and "Percentage of Poll Cycle of Complete Event Processing" chart in the Data Aggregator Pages and the Data Aggregator health charts on the CA Performance Center System Health tab.

3. Run DcDebug (* See Additional Information) for the problem device.

4. Confirm if the monitored device physically changed.

  • Whether the device Status does not set as "Management Lost" in the CA Performance Center Administration menu > Monitored Devices > select the device > Details tab
  • Whether the SNMP Poll Rate does not set as "true-null" in the CA Performance Center Administration menu > Monitored Devices > select the device > Polled Metric Families tab > select Interface Metric Family line > see  right below pane Components list 

 

Environment:
OS: RHEL 6.x
Instructions:

 Type A: SNMP timeout (device is not responding or delaying to polling)

If the following error appeared in the "Poll Errors by IP" log at the DcDebug -- then its Type A.

POLLING_ERROR: errors for cycle 1491007500000[REQUEST_TIMED_OUT]

 

By default, the maximum responding time set as 9 seconds.

https://docops.ca.com/ca-performance-management/3-2/en/building/snmp-profiles#SNMPProfiles-ModifytheTimeoutandRetriesParameters

Moreover frequent SNMP time-out occurrence generates CA Data Aggregator polling stop event.

https://docops.ca.com/ca-performance-management/3-2/en/troubleshooting/polling-stopped-event-message/ 

Please examine to increase the Timeout and/or Retries parameter of the CA Performance Center SNMP Profile.

  

 Type B: The device only supports SNMP 32bit counter

If the following WARN message appeared in the Data Collector karaf log -- then its Type B.

com.ca.im.data-collection-manager.core.interfaces - | | Counter value rolled over, dropping response: previous=4285888934 / current=4049163 for IP IP address, OID polling OID, item ID id, in poll group gid
Further counter rollover messages for this IP will be suppressed unless DEBUG is enabled or the DC is restarted.


When the SNMP Counter rollover occurs within one polling cycle, the polling data will be lacked.

https://docops.ca.com/ca-performance-management/3-2/en/building/manage-interfaces/configure-counter-behavior/

This may happens when the monitoring device supports only 32bit counter or only SNMPv1.

The following error may appeared in the "Discover Logging by IP" log at the DcDebug when getting SNMP 64bit counter MIB for those box.  

Finished on demand read. Response = SnmpResponse [error=SNMP_PARTIAL_FAILURE, errorIndex=-1, queriedIP=Device IP]
? SnmpResponseVariable [oid=Polling OID, type=NULL, value={}, isDelta=false, isList=true, error=NO_SUCH_NAME
, isDynamicIndex=false, indexList=[]]

 

One of the workaround is to shorten the poll interval from 5 minutes to 1 for the device.

https://docops.ca.com/ca-performance-management/3-2/en/building/manage-interfaces/poll-critical-interfaces-faster-than-non-critical-interfaces/ 

 

 

Type C: CA Data Aggregator is slow with high load

If the following WARN message appeared in the Data Aggregator karaf log -- then its Type C.

WARN | tory-thread-id | date time | onitoringProcessLimitManagerImpl | onitoringProcessLimitManagerImpl 98 | .ca.im.aggregator.loader | | Threshold Monitoring processing took too long. The system will shut that feature down in 15 minutes if the threshold monitoring continues to exceed capcacity

 

And same time the following event would be occur at the CA Performance Center Event List:

The Threshold Monitoring Engine has transitioned to a degraded state.

You will also see the some peak in the following graph chart at the event time.

  • The "Number of Event Rules Evaluated" and "Percentage of Poll Cycle of Complete Event Processing" chart in the Data Aggregator Pages

Please examine to increase the PercentOfPollCycleThreshold value.

https://docops.ca.com/ca-performance-management/3-2/en/using/events/threshold-monitoring-and-threshold-limiter-behavior/ 

  

Additional Information:

DcDebug is the built-in discovery and polling debug tool.  See here to access the URL.