About Introscope Agent Overhead

Document ID : KB000020331
Last Modified Date : 14/02/2018
Show Technical Document Details


  1. How does the Agent overhead distinct?
  2. What is the methodology for measuring the Agent Overhead?
  3. Where does this overhead come from?
  4. How can one control this overhead?


  1. How does the Agent overhead distinct?

    Our overhead can be observed through several ways:

    • CPU:
      CPU overhead on the box (JVM and CLR CPU overhead + any additional "system" overhead + EPAgents overhead).
      The CPU overhead will tell whether additional servers to run the workload or can make it with the current servers. Can be measured using systems performance monitor tool; measure the CPU usage of the system WITH and WITHOUT the Wily Agent.

    • Memory:
      Will mostly impact the JVM and CLR Heaps and the server if extra EPAgents processes run on it. The best way to measure doing heap dump analysis.

    • Response time:
      This is probably the most important one, since it's a direct increase in end users transaction response times.

  2. What's the methodology for measuring it?

    • Run a load test (preferably 3 identical ones in a row) for about 20 minutes without the Agent in place. Measure CPU usage on the box and transaction response times. At the end of the run, you get an "average CPU usage % without Agent" and an "average R/T without Agent", which you can average across several runs
    • Run the same identical load test (again 3 if possible) for the same amount of time with the Agent in place. Measure same metrics. You get "average CPU usage % with Agent" and "average R/T with Agent"
    • The CPU overhead is ("average CPU usage % with Agent" - "average CPU usage % without Agent")/ "average CPU usage % without Agent" *100
    • The response time overhead is ("average R/T with Agent" - "average R/T without Agent") / "average R/T without Agent" *100


      "average CPU usage % without Agent": 50%
      "average CPU usage % with Agent": 55%
      "average R/T without Agent": 1200 ms
      "average R/T with Agent": 1400 ms

    • CPU overhead = (55-50)/50*100 = 10%
    • R/T overhead = (1400-1200)/1200*100=16,7%

  3. Where does this overhead come from?

    Two forms of overhead:

    1. Static (just the footprint of the Agent being here)
    2. Dynamic (Agent probes being exercised)

    Static :
    Static is mostly negligible, unless in rare cases where we've seen PMI/JMX metrics data collection or Platform monitors showing high levels of CPU. Or if you have a LOT of metrics, you might be consuming a lot of memory which in turns triggers a high GC activity and increases CPU utilization.

    90% of cases, your overhead will come from the probes executed at runtime during transactional activity.

  4. How can one control this overhead?

    First you have to determine where it comes from. Since the most likely source of overhead are probes executed too often.

    Go to the Investigator on your Agent and use the search tab for "Responses Per Interval", sort by highest invocations and start trimming down your instrumentation by commenting the tracers responsible for those metrics.

    Run a new load test, clear and repeat until you arrive below the desired overhead level.

    If you've removed a good number of tracers and are still faced with too high overhead levels, your overhead might be coming from Platform Monitors, PMI, JMX or another problem in the Agent (too many metrics for example) Disable these one by one, followed by a load test until you find your culprit.