Cannot start nodes in a USS cluster

Document ID : KB000093495
Last Modified Date : 02/05/2018
Show Technical Document Details
Issue:
17.1/17.0/14.1 CA Unified Self Service install with 2 or more clustered nodes results in the inability to start USS on the nodes.

When you start USS Service, it fails to start and the Liferay logs contain an exception like this:

21:32:37,247 ERROR [MainServlet:217] net.sf.ehcache.CacheException: java.io.EOFException 
net.sf.ehcache.CacheException: java.io.EOFException 
at com.liferay.portal.cache.ehcache.EhcacheStreamBootstrapCacheLoader.start(EhcacheStreamBootstrapCacheLoader.java:58) 
at com.liferay.portal.events.StartupAction.doRun(StartupAction.java:145) 
at com.liferay.portal.events.StartupAction.run(StartupAction.java:50) 
at com.liferay.portal.servlet.MainServlet.processStartupEvents(MainServlet.java:1300) 
at com.liferay.portal.servlet.MainServlet.init(MainServlet.java:214) 
at javax.servlet.GenericServlet.init(GenericServlet.java:160) 
at org.apache.catalina.core.StandardWrapper.initServlet(StandardWrapper.java:1280) 
at org.apache.catalina.core.StandardWrapper.loadServlet(StandardWrapper.java:1193) 
at org.apache.catalina.core.StandardWrapper.load(StandardWrapper.java:1088)
Resolution:

1) Implement a change like this on both nodes: 

#cluster 
cluster.link.enabled=true 
ehcache.cluster.link.replication.enabled=false 
lucene.replicate.write=false 
net.sf.ehcache.configurationResourceName=/custom-ehcache/hibernate-clustered.xml 
ehcache.multi.vm.config.location=/custom-ehcache/liferay-multi-vm-clustered.xml 
dl.store.file.system.root.dir=\\\\YourNetworkShare\\Share_For_USS_Attachments 
# change YourNetworkShare  and Share_For_USS_Attachments  to appropriate host/folder

#Cluster-direct-server-unicast 
multicast.group.address["cluster-link-control"]=239.0.0.12 
multicast.group.port["cluster-link-control"]=23301 
multicast.group.address["cluster-link-udp"]=239.0.0.13 
multicast.group.port["cluster-link-udp"]=23302 
multicast.group.address["cluster-link-mping"]=239.0.0.14 
multicast.group.port["cluster-link-mping"]=23303 
multicast.group.address["hibernate"]=239.0.0.15 
multicast.group.port["hibernate"]=23304 
multicast.group.address["multi-vm"]=239.0.0.16 
multicast.group.port["multi-vm"]=23305 

# below is your database server/port 
cluster.link.autodetect.address=YourDBServerIPAddress:PortNumber
#note change YourDBServerIPAddress to your database server's IP / port number


A restart of USS is needed for the above to be effective

2) To prevent further EOF exception that could happen due to corrupt cache of any node in the cluster, say you need to restart node1 first and node2 is available. 
1. Stop node1 
2. Go to the control panel of NODE2 (http://node2:8686/group/control_panel 
3. Navigate to Server Administration and Execute the “Clear content cached across the cluster.” 
4. Start node1 

If you need to restart node2 and node1 is available: 
1. Stop node2 
2. Go to the control panel of NODE1 (http://node1:8686/group/control_panel 
3. Navigate to Server Administration and Execute the “Clear content cached across the cluster.” 
4. Start node2 


Unfortunately this is needed because of the architecture of Liferay's clustering implementation and there's no other way at this time.