Comprehensive list of common reasons for missing defects in APM CE

Document ID : KB000019253
Last Modified Date : 14/02/2018
Show Technical Document Details

Description:

This is a comprehensive look at what to do for APM CE defect issues. Supplement to TEC596035.

Solution:

Comprehensive list of common reasons for missing defects in APM CE

  • Did the problem begin after some event:

    • install/upgrade

    • network/database outage

    • new application release

    • change in APM CE definitio

    • Next Steps:

      • Review install logs and schematools.log to see if there are any issues which need to be addressed

  • Examine Monitored Transactions

    • Are too many transactions being monitored

    • Are defect thresholds too low?

    • Are Business Transactions and Defect Types enabled

    • Next Steps:

      • Review definition and definition thresholds. Raise thresholds and make definitions more specific as needed. Remove any unneeded definitions.

  • TIM

    • Is TIM seeing any network traffic?

      • Is the network traffic correct?

      • Is the network traffic two way?

      • Is the network traffic for the correct servers?

      • Ensure there is no unneeded traffic such as router requests

    • Is TIM able to successfully decode SSL traffic?

    • Is data backing up on TIM in /etc/wily/cem/tim/data/out/defects? Typically there should be only a few files located here.

    • Do timfiles show defect files being created and deleted?

    • Next Steps:

      • Check private keys and passphrase. Reload as needed.

      • Work with network team to resolve TIM-Switch span/tap/aggregator issues

      • Restart Tim Collector if defects are backed up on TIM.

      • Do the steps under "Examine Monitored Transactions" (above).

  • Database/EM

    • Are there error messages in the database or EM logs for:

    • Bad Rows

    • ts_defect table does not exist

    • issues during re-aggregation

  • Next Steps:

  • Bad Rows - follow steps in TEC596330

  • If there is a row causing issues during re-aggregation, simply delete the affected row from ts_defects_reaggr_interval.

  • If ts_defect table is completely empty, this indicates that no defects have been generated. Possible reasons include a defect threshold which is too high, or TIM issues.

  • Check the last aggregated interval field in the ts_defects_interval table. If it is null, no aggregation has happened to date. To aggregate all defects in the ts_defects table for all intervals, add an entry in ts_defects_interval to specify performing aggregation for this interval.

    • If, after doing this, some new defects were added to the ts_defects table, it will now contain a mixture of defects which were already aggregated and new ones which are yet to be aggregated. If this is the case, re-check the ts_defects_interval table to verify the last aggregated row and start processing defects after this interval from the ts_defects table. If needed, update the last aggregated row to the appropriate date.

All steps will require at least one restart of the APM Enterprise Manager running Defects Aggregation.

Supplement to TEC596035.