Some scenarios that require clearing the ActiveMQ persistence store.

Document ID : KB000012112
Last Modified Date : 02/03/2018
Show Technical Document Details
Introduction:

From time to time we will run into unexpected scenarios with our environment. When these scenarios occur it can often lead to a connectivity problem between the NAC (aka management server) and NES (aka execution server).  Good examples of these unexpected scenarios include:

  • exhaustion of disk space where one of our Release Automation server components are running.
  • unexpected network outages.
  • server patching and reboots.
  • various environment related problems.
Here, we will discuss some easy methods to discern if any of the scenarios just mentioned, and others, may have caused any corruption within the vital framework used for JMS (messaging) between the NAC and NES components.

 

 

Question:

So, what is, and how exactly do we determine if a problem resides within this framework/subsystem of Release Automation?

Environment:
Operating System: N/A
Database: N/A
Release Automation Version: 5.0.X -> 6.2.X
Answer:

The framework we are referring to is ActiveMQ, which is a high performance message broker utilized by the NAC and NES(management/execution) server components only for all inter-communication.  ActiveMQ utilizes a persistence store on disk with pre-defined parameters, and from time to time, particularly for example, in the case of a disk outage, AMQ is unable to write messages to the store, causing inevitable corruption which can show up in a variety of behaviors. 

The biggest, and most common behavior is going to be the inability for the execution server to connect to the management server, even though it appears the execution server context may have started.  Almost every time, you can look to two log files in particular:
 

NAC: active_mq_nac.log (found in %installdir%\logs)

NES: active_mq_nes.log (found in %installdir%\logs)

 

Look these over carefully for WARN and ERROR priority log entries, specifically errors concerning missing and/or corrupt index, IO Exceptions, such as this example:

2016-10-31 11:20:11,167 [LevelDB IOException handler.] INFO  (org.apache.activemq.util.DefaultIOExceptionHandler:155) - Stopping BrokerService[brokerNacServer] due to exception, java.io.IOException: Short write
java.io.IOException: Short write


The above is typical when disk space has been exhausted, as well as:

2016-10-31 11:02:29,541 [ActiveApplicationContextManager-1] ERROR (org.apache.activemq.broker.BrokerService:1985) - Temporary Store limit is 500 mb, whilst the temporary data directory: /opt/ca_lisa/LISAReleaseAutomationServer/activemq-data/brokerNacServer/tmp_storage only has 0 mb of usable space - resetting to maximum available 0 mb.


Interrupted connections between the NAC and NES (like a forced reboot after a patch) can be problematic for establishing a healthy connection after the NES has been rebooted. These log entries might be seen at such times:

2018-02-18 10:04:58,875 [ActiveMQ Transport: ssl:///x.x.x.x:61616] WARN  (org.apache.activemq.network.DemandForwardingBridgeSupport:579) - Network connection between vm://brokerNacServer#8 and ssl:///x.x.x.x:61616 shutdown due to a remote error: java.io.IOException: ShutdownInfo {commandId = 63670, responseRequired = false}

Followed by a number of attempts to establish a connection and the connection being refused - like so:

2018-02-27 11:35:51,275 [ActiveMQ Task-2] INFO  (org.apache.activemq.network.DiscoveryNetworkConnector:120) - Establishing network connection from vm://brokerNacServer?network=true to ssl://x.x.x.x:61616
2018-02-27 11:35:52,310 [ActiveMQ Task-2] WARN  (org.apache.activemq.network.DiscoveryNetworkConnector:156) - Could not start network bridge between: vm://brokerNacServer?network=true and: ssl://x.x.x.x:61616 due to: Connection refused: connect

The above examples can cause improper startup or subsequent NAC/NES connection problems.


The steps that need to be taken to clear out corruption in this scenario, and any others witnessed that hint toward persistence store corruption are as follows:

  1. Stop all servers that are experiencing the corruption/connectivity issues.
  2. For NES servers, remove the directory(and all contents):  %nolio_home%\activemq-data\nes
  3. For NAC servers, remove the directory(and all contents):  %nolio_home%\activemq-data\nac
  4. After removal of this directory, start the component(s) back up normally, and the store will be rebuilt on startup.

 

Assuming this was the issue, it should now(hopefully) be resolved.

 

Additional Information:

Please contact support if you require any assistance 24/7