Several Micro Services do not get to healthy state
When trying to reconnect the the Message Server on the MSC demo z/OS system MVSDE25 the receiving MOI system MUEWO01-MOI was not accessible displaying a message on vSphere that there is an issue with a full disk volume. This was fixed by our colleagues running this VMware system - but now many microservices stay in status unhealthy:
Cassandra commit log corruption is the cause of the unhealthy state of the MUEWO01-MOI (18.104.22.168) appliance. Cassandra cannot start up so any of the other microservices that are dependent on Cassandra will show a status of unhealthy until Cassandra is up and healthy. Using the Cassandra node logs, we identified the following corrupt log files that need to be deleted and then Cassandra needs to be brought up. It may well be that it will fail again due to more log corruption. In that case we will have to remove more commit log files and retry bringing Cassandra up. We don’t have a root cause of the corruption that is occurring with the commit logs. Please note that none of the Cassandra nodetool commands can be used until Cassandra is up and healthy. cd
There were four corrupted Cassandra commit logs. I removed all four of the files and now all containers are showing as healthy with Cassandra healthy for the last 30+ minutes.