How do I troubleshoot .NET agent deadlocks

Document ID : KB000109811
Last Modified Date : 06/08/2018
Show Technical Document Details
Introduction:
This document intent to explain a bit more about deadlocks in .NET Agent.
Environment:
.NET Agent from 9.7 awards.
Instructions:
When encountering re-entrancy issues, here is how to debug them.The process is slightly more involved than that for the Java agent because a deadlock can happen in the .NET native layer, too. The idea behind debugging this scenario is to narrow down which classes cause the deadlock so that the nature of the deadlock can be better understood. It also helps identify the classes to skip if there is no way around the deadlock.

The instrumentation process is as follows:
  • The CLR Profiler (wily.AutoprobeConnector.dll) gets the callback for a method being compiled by the Just In Time (JIT) compiler.
  • The Profiler decides if the method needs to be skipped or passed onto the managed layer to be matched for instrumentation.
  • The Profiler sends it to the managed layer (wily.Autoprobe.dll/wily.Agent.dll/wily.ProbeBuilder.ext ) to make a decision on whether to instrument it.
  • The managed layer makes callbacks on the native layer to get the meta-data it needs to perform the matching against the directives.
  • Once there is a match, the managed layer modifies the method buffer and sets the modified method buffer to be fed to the CLR
The deadlock can either occur in steps 1, 2, 3, or 4, where no instrumentation happens, or it can happen because of instrumentation. You can diagnose this by turning off AutoProbe in the agent profile and seeing if the deadlock occurs again. If it does, irrespective of if you turn it back on, the instrumentation layer is not the cause. If instrumentation is causing it, try adding skips to the PBDs and narrow down the classes. If the deadlock occurs even with AutoProbe off, the problem may be in the native layer.

When the native layer processes methods, It uses a different set of skips.The processing of some classes at the native layer, even before it gets to the managed instrumentation layer, can cause deadlocks. For example, we skip mscorlib.dll and System.dll at the native level. These skips are a part of the binary, and there is no list of these types of skips. There is, however, a way to externally add skips for the native layer. It is controlled by the system environment variable below that points to a file containing the skips. Before debugging, make sure you have native logging turned on.

When first debugging, also try skipping assemblies and namespaces to get to the results faster. Here is the valid list of skips you can use in this file:
  • SkipAssembly
  • SkipAssemblyPrefix
  • SkipNamespace
  • SkipNamespacePrefix
  • SkipClass
For example: SkipClass: System.Messaging.Res

The aim is to arrive at a finite set of SkipClass entries that stop the deadlock from occurring. Make sure the native log says that it has loaded your skip file. Also look for the messages where the native log says it has skipped your class/assembly. Another way to make sure the skips are in effect is to look at the AutoProbe log file. You should never see the message "Processing:System.Messaging.Res...", as the skip is at the native level and should never get to the managed layer.