Application hung at start up due to deadlock in APM Java Agent.

Document ID : KB000006199
Last Modified Date : 14/02/2018
Show Technical Document Details
Issue:

A WebLogic instance couldn't start due to hung threads from APM Java agent. JVM thread dumps indicate a deadlock had occurred. For example:

A.  Circular (deadlocked) lock chains
Chain 2:
"Main Thread" id=1 idx=0x4 tid=21556 waiting for com/wily/util/extension/EagerAllPermissionsClassLoader@0x10ac94178 held by:
"Agent Execution" id=16 idx=0x68 tid=21662 waiting for sun/misc/Launcher$AppClassLoader@0x10aca3f28 held by:
"Main Thread" id=1 idx=0x4 tid=21556
 
B. Blocked lock chains
Chain 3:
"Agent Heartbeat" id=10 idx=0x60 tid=21660 waiting for com/wily/util/extension/JarExtension$AllPermissionsClassLoader@0x10ab47e90 held by:
"Main Thread" id=1 idx=0x4 tid=21556 in chain 2

Chain 4:
"Command Queue Heartbeat" id=17 idx=0x6c tid=21663 waiting for sun/misc/Launcher$AppClassLoader@0x10aca3f28 held by:
"Main Thread" id=1 idx=0x4 tid=21556 in chain 2

Environment:
Agent Release 9.6.0.0 (Build 350228)WebLogic 10.3Oracle JRockit 1.6.0_105-b15 R28
Cause:

This is a classic deadlock which can happen in any application that has parallel custom loaders. The custom classloaders randomly deadlock due to JVM locking in SystemClassLoader.

Resolution:

As this is a JAVA (JVM) specific issue, it needs to be addressed on the JAVA end. JAVA 7 has implemented a solution for this issue. However, since most of our existing user base still requires the support for JAVA 1.5 for their agent and monitored application, we cannot adopt this solution at present. We have notified our development team to consider the solution for future releases and they are evaluating it.

At the meantime, we have implemented a workaround for this issue from the Introscope agent perspective, called classload caching. It would be a onetime activity to generate a list of .cls files in the monitored environment where the deadlock is happening. The content of the .cls file may differ from application to application and also depends on the agent extensions. It is then followed by caching these classes for future class loading to prevent unnecessary deadlock.

Below are the steps to implement the workaround:

1. In the IntroscopeAgent.profile, set introscope.agent.extensions.eagerloader=regenerate

-  Setting this properties to regenerate should regenerate the list of classes that need to be loaded by that particular class loader of that extension jar and update the list of classes within that JAR a bunch of files with extension “.cls” in “AGENT_HOME/core/ext” folder. Each of the .cls contains set of classes that can be preloaded and cached. For each of the extension, a separate .cls file is created. This file need to be updated in the corresponding jar by replacing an already existing .cls file in the JAR.

2. Restart the agent to allow the list of class files to be regenerated (.cls file updated with the latest list of classes).

3. Once the list is updated, go to IntroscopeAgent.profile and set introscope.agent.extensions.eagerloader=cached

- If the CACHED property is set, all the classes mentioned in the .cls file which will be loaded in the initial agent start up, and will also be cached to prevent having to be re-loaded again shall it be referenced in future. This will in turn prevent deadlocks from happening.

4. Restart the agent to put the cached setting into effect.