This document explains what is being reported by the Component Timing Diagram particularly with the common question of gaps between the bars.
CEM Transaction Time Breakdown
The specific defect information that is most requested is a breakdown of the time spent in processing the transaction based on the infrastructure component. i.e. How much time was spent in the customer infrastructure (web server/application server, other backend components) versus how much time was spent on the outside (in the network, client)?
The most common way this requirement is communicated is by asking for a breakdown of the Response Time into Server Time and Network Time. If the majority of the transaction time is spent in the server, then the problem is to be handed over to the server team. If the majority of the transaction time is spent in the network, the problem will be handed over to the network team.
While the motivation behind the requirement seems simple enough, providing these metrics in a meaningful manner is complicated by the nature of web transactions: A web transaction involves many parallel activities happening on the server, network and the client. Due to the parallelism of these activities, the amount of time spent in the server, network and client do not add up to the total response time. The response time of the transaction cannot be simply broken down into network and server time.
While APM CE (CEM) cannot pinpoint the cause of the defective transaction as it relates to the network or the server infrastructure in all cases, as explained above, it can provide deep insight into the performance of the components that form the user transaction. This insight should be sufficient for the Application Support personnel to triage the performance problem to the various teams. At the same time, APM CE does not make any assumptions or define terms such as Server Time or Network Time. The insight that APM CE provides is based on the measurements that CEM currently makes.
Figure 1: Simple Business Transaction with 1 Transaction and multiple components
Use Case: Triage Slow Transaction Defect
Actor: Application Support Triager
Preconditions: A Slow Time defect has occurred and possibly created an Incident
- Go to the Incidents Page and click on Defects link for an Incident
- Click on a specific defect to see the details.
- The "Component Timing Information/Transaction Characteristic Details" is displayed as shown in Figure 1.
- Consider what is the major part of the response time of the transaction:
- If inside the "Server Time" of the main component or other dynamic content, triage the problem to the Application Server team.
- If between components, it could be the result of slowness in the network and/or client. This needs to be investigated further by looking at other defects for the same incident.
- If in the "Network time" portion of components, it could be the result of slowness in the network and/or client. This needs to be investigated further by looking at other defects for the same incident.
Here are some illustrations of the above use case:
Illustration 1: The web application does HTTP redirect which cause extra HTTP request and time delay.
Figure 2: Simple Business Transaction with 2 Transaction and an HTTP re-direct
As shown in Figure 2, the user authenticates to LoginServlet which re-directs the user to the Homepage.jsp. CEM TIM times the request for Homepage.jsp to come several seconds after the response for LoginServlet has been sent to the client. With this figure, the application support personnel can point to the network or the client contributing to the Transaction Response time. Additionally, the recommendation can be made to change the HTTP redirect to a "forward request" in the application itself.
Illustration 2: The Business Transaction has many images that are causing a slowdown.
As shown in Figure 3, the response time for ViewAccount.jsp is a small portion of the complete business transaction response time. Most of the time is spent in sending the static content of the transaction. In this case the Application Support should triage the performance issue to the Web Designer team in order to optimize the layout of the web transaction.
Illustration 3: The Time between Start of response and end of response of all Components in the business transaction is large.
Figure 4 shows that the server starts responding to the request fairly quickly for the components of the transaction but it takes a long time to transfer the data related to the components. This indicates that there could be slowness in the network or the client. It could also mean that the size of the components is large and so it takes a longer time to transfer them to the client. The Application Support person can see several such defects within the same business incident and find out their characteristics. If the size of the components is small then the incident should be triaged to the network team for a deeper look.
Illustration: Summary of component times in Incident Management Screens
Figure 5 shows the representation of this summary of the transaction component times. This can be displayed in the Incident Detail screen. In the example that is represented by Figure 5, all the bars are the averages of the individual component times from the defective transactions that contribute to the incident.
Figure 5: Average Component Times shown in Incident Detail. All the times shown here are averages of the individual response times of defective transactions.
The average response time for Homepage.jsp is 5.2 seconds. In figure 5, the transaction time starts at 0 seconds. The start time of checkUser.js shown is the average of the start time relative to time 0 (the start of Homepage.jsp). This is true for all components.