This is based on the late Andrew Fluegelman's talk, the Seven Mysteries of Telecommunications. This article takes those original mysteries and retrofits them for TIM Communication. Going through these questions will help in debugging problems. This applies to MTP and TIM.
Below are the seven mysteries:
- Have I made a connection?
For TIM to "make a connection", it needs to be powered on, and "cabled" to a switch or like device to see the HTTP/HTTPS/FLEX traffic.
If the TIM or web console is accessible, check if the TIM is receiving packets.
- Are you speaking the same language?
The TIM and other devices need to speak "the same language." This can be
- Using the same speed and duplex
- Different types and levels of VLAN tagging
- Storing the same private keys and passphrases as used by web servers, load balancers, and firewalls.
- The same SSL cipher suites, TLS versions, SSL compression.
- The same HTTP optional features.
- Using the same language encoding such as ISO8859-1 and UTF-8.
- Using the same protocols such as TCP, HTTP, HTTPS, and SOAP/Web Services.
- Who are you talking to?
Is the TIM talking to a switch, firewall or load balancer? Knowing this will help in answering the questions below:
- Does your TIM/server understand?
What TIM "understands" is a topic in itself.
For example, TIM "understand" HTTPS traffic because of having the correct private key or not? An incorrect or no key means the inability to record and see anything in the TIM logs.
A related issue could be a server could be using an SSL cipher suite or SSL feature that it cannot decode.
- What is the weakest link?
The "weakest link" is always changing in a monitoring environment. This can include:
- Too much traffic over the network connection resulting in dropped packets.
- TIM being over capacity on memory or CPU due to number of transactions being monitored, too many defects, SSL traffic etc.
- The Database is unable to keep up with open connections.
- The MOM or Tim Collector does not have enough heap size.
- Issues with spans/taps/load balancer/firewalls/servers.
- Is it the fault of something else?
This can include :
- Firewalls/load balancers, etc. filtering out desired traffic, or only allowing one way traffic. Another issue could be these devices could be sending TCP packets with various integrity issues so the HTTP content is unreliable or unreadable.
- There is no path affinity so a request goes through one load balancer/firewall/web server and a response returns through a different route.
- Needed ports are not open
- Can you make sense of it all?
Trying to make sense of it all is a demanding effort requiring the following:
- Are we seeing the following in the logs?
- connections opening and closing from users?
- HTTP Logins and Sessions?
- SSL traffic decoded?
- Private keys loaded
- HTTP Requests and Response (two-way traffic) from the same Client and Server IP?
- Is the complete HTTP traffic in the logs or is it being truncated?
- Are we see the same traffic as a HTTP capture?
- Are any headers being stripped?