Why are WebSphere nodes in a cluster failing to start, unable to instrument the application or unable to connect to the Enterprise Manager?

Document ID : KB000093003
Last Modified Date : 25/04/2018
Show Technical Document Details
Introduction:
Symptoms:
-One instrumented server WebSphere does not start even after uninstalling the Introscope Agent.
-A different node in the cluster on the same machine does not start and it is not instrumented.
-Agent fails to connect to the EM after re-configuring the websphere node to use AgentNoRedefNoRetrans.jar
-Below examples of the common messages reported in the Agent or application server log:

Example 1:
SRVE0232E: Internal Server Error. 
Exception Message: [com.wily.introscope.agent.trace.IMethodTracer]

Example 2:
java.lang.NoClassDefFoundError: com.wily.introscope.agent.blame.VirtualStack$TransactionCache (initialization failure)
at java.lang.J9VMInternals.initialize(J9VMInternals.java:140)
at com.wily.introscope.agent.blame.VirtualStack$VirtualStackCursor.<init>(VirtualStack.java:694)
at com.wily.introscope.agent.blame.VirtualStack$1.initialValue(VirtualStack.java:723)
at com.wily.introscope.agent.blame.VirtualStack$1.initialValue(VirtualStack.java:721)

Example 3:
SystemErr R Exception in thread "Thread-47" java.lang.NoClassDefFoundError: com.wily.introscope.agent.blame.VirtualStack$VirtualStackCursor (initialization failure)
SystemErr R at java.lang.J9VMInternals.initializationAlreadyFailed(J9VMInternals.java:140)
SystemErr R at com.wily.introscope.agent.blame.VirtualStack$1.initialValue(VirtualStack.java:806)
OR
Request processing failed; nested exception is java.lang.NoClassDefFoundError: com/wily/introscope/agent/AgentShim

Example 4:
java.lang.NoSuchMethodError: com/wily/introscope/agent/connection/IsengardServerConnectionManager$LoadBalancerNotificationListener.<init>(Lcom/wily/introscope/agent/connection/IsengardServerConnectionManager;Lcom/wily/introscope/agent/connection/IsengardServerConnectionManager$LoadBalancerNotificationListener;)V
Question:
Why are WebSphere nodes in a cluster failing to start, unable to instrument the application or unable to connect to the Enterprise Manager?
Environment:
Websphere Application Server
Answer:
Root cause:

By default the IBM JVM creates a shared class cache persisted to the disk in a /tmp directory named as "javasharedresources." This directory typically contains all the core Java API classes, as well as the core WAS classes. This cache is reused by other JVMs started on the same machine.

This is an enhancement to reduce the virtual memory usage with multiple WAS servers running on the same box and also as reduces the startup time for subsequent VMs created. However, since Wily inserts probes into the native/core Java classes (for example, threads, sockets), as well the core WAS classes, this could cause the above issues.

For example, if the java.lanag.Thread class is substituted with a call to the ManagedThread class from the Wily code (which is how we get all the thread level information) and this class was persisted to the cache, any other JVM starting on the same machine trying to use the same Thread Class from the shared cache will fail since this core class now has references to the Wily codebase. 

JVMs make use of this shared cache feature through Inter Process Commmunication (IPC) and shared memory segments.


Solution

The agent uses bytecode manipulation to modify WAS and JVM classes to reference wily classes. WAS uses class sharing to cache the contents of class files to provide time and memory performance benefits. However, if it is populated with bytecodes from classes that have been manipulated by the agent, then all server JVMs will attempt to load the wily classes. If the agent is not active on a particular server, then failures such as this can occur. Use one of the following options resolve this issue:

1) (Recommended) Disable Class Sharing by adding -Xshareclasses:none to the JVM startup arguments.
Add -Xshareclasses:none to all servers that use the java agent so that classes modified by the agent are not stored in the shared class cache.

2) Use a named cache for the instrumented JVM so that other unsuspecting JVM's do not try to reuse it.
Add -Xshareclasses:name=wily to all servers that use the jave agent so that classes modified by the agent are stored in their own shared class cache.
This shared class feature is controlled through a JVM argument -Xshareclasses[:name=<cachename>] passed on the command line to the JVM. 

3) Clearing out existing caches on the box:
Find out any existing caches on the box using "java -Xshareclasses:listAllCaches"
Destroy them using "java -Xshareclasses:destroy[=<cachename>] or "java -Xshareclasses:destroyAll”