CA API Gateway Appliance did not respond due to vmotion

Document ID : KB000110898
Last Modified Date : 28/08/2018
Show Technical Document Details
Introduction:
Case description:
Customer tested the API earlier around 8AM and it responded for the WEB service call. Customer did another test around 9.30AM. It did not return any response . Customer was unable to connect to CA API  Gateway. The connection was refused. CA API gateway did not respond . When customer checked the status of the services using ssgconfig account, the status was not NOT_MODIFY . Customer was unable to stop and start the services. Customer restored the services by rebooting the server. After the services was up, when viewed the log, 5 am logs was found to be empty. How to check what caused the system to be non-response. ?

Troubleshooting:
1.  from the process control log(/opt/SecureSpan/Controller/var/logs/sspc_*.log),  the gateway kept crashing, first message from:
2018-07-20T08:14:42.699+0800 WARNING 1 com.l7tech.server.processcontroller.ProcessController: default crashed on startup with exit code 0 

2. from mysqld log (/var/log/mysqld.log), 
180720 8:14:28 InnoDB: Database was not shut down normally! 
InnoDB: Starting crash recovery. 

3. There should be similar info in ssg log at the same timestamp but ssg log was rotated and overwrote.

4. from messages log (/var/log/messages*), a lot of "kernel" activity started at the same timestamp,
Jul 20 08:14:24 caapigw kernel: imklog 5.8.10, log source = /proc/kmsg started.
Jul 20 08:14:24 caapigw rsyslogd: [origin software="rsyslogd" swVersion="5.8.10" x-pid="1642" x-info="http://www.rsyslog.com"] start
Jul 20 08:14:24 caapigw kernel: Initializing cgroup subsys cpuset
Jul 20 08:14:24 caapigw kernel: Initializing cgroup subsys cpu
Jul 20 08:14:24 caapigw kernel: Linux version 2.6.32-504.8.1.el6.x86_64 (mockbuild@x86-002.build.bos.redhat.com) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-9) (GCC) ) #1 SMP Fri Dec 19 12:09:25 EST 2014
Jul 20 08:14:24 caapigw kernel: Command line: ro root=/dev/mapper/vg00-lv_root rd_LVM_LV=vg00/lv_swap rd_NO_LUKS rd_LVM_LV=vg00/lv_root rd_NO_MD SYSFONT=latarcyrheb-sun16 crashkernel=auto  KEYBOARDTYPE=pc KEYTABLE=us rd_NO_DM LANG=en_US.UTF-8 rhgb quiet audit=1
Jul 20 08:14:24 caapigw kernel: KERNEL supported cpus:
Jul 20 08:14:24 caapigw kernel:  Intel GenuineIntel
Jul 20 08:14:24 caapigw kernel:  AMD AuthenticAMD
Jul 20 08:14:24 caapigw kernel:  Centaur CentaurHauls
Jul 20 08:14:24 caapigw kernel: Disabled fast string operations
Jul 20 08:14:24 caapigw kernel: BIOS-provided physical RAM map:
​Jul 20 08:14:24 caapigw kernel: BIOS-e820: 0000000000000000 - 000000000009f800 (usable)
Jul 20 08:14:24 caapigw kernel: BIOS-e820: 000000000009f800 - 00000000000a0000 (reserved)
Jul 20 08:14:24 caapigw kernel: BIOS-e820: 00000000000dc000 - 0000000000100000 (reserved)
Jul 20 08:14:24 caapigw kernel: BIOS-e820: 0000000000100000 - 00000000bfef0000 (usable)
Jul 20 08:14:24 caapigw kernel: BIOS-e820: 00000000bfef0000 - 00000000bfeff000 (ACPI data)
Jul 20 08:14:24 caapigw kernel: BIOS-e820: 00000000bfeff000 - 00000000bff00000 (ACPI NVS)
Jul 20 08:14:24 caapigw kernel: BIOS-e820: 00000000bff00000 - 00000000c0000000 (usable)
Jul 20 08:14:24 caapigw kernel: BIOS-e820: 00000000e8000000 - 00000000f0000000 (reserved)
Jul 20 08:14:24 caapigw kernel: BIOS-e820: 00000000fec00000 - 00000000fec10000 (reserved)
Jul 20 08:14:24 caapigw kernel: BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
Jul 20 08:14:24 caapigw kernel: BIOS-e820: 00000000fffe0000 - 0000000100000000 (reserved)
Jul 20 08:14:24 caapigw kernel: BIOS-e820: 0000000100000000 - 0000000440000000 (usable)
Jul 20 08:14:24 caapigw kernel: DMI present.
Jul 20 08:14:24 caapigw kernel: SMBIOS version 2.4 @ 0xF69C0
Jul 20 08:14:24 caapigw kernel: Phoenix BIOS detected: BIOS may corrupt low RAM, working around it.
Jul 20 08:14:24 caapigw kernel: Hypervisor detected: VMware
...

The vmware community, https://communities.vmware.com/thread/252014 (VM seems to restart after DRS vmotion to another host) shows almost the same kernel activities.
Therefore the crash of gateway should be due to vmotion.
Instructions:
1. reboot the gateway resolved the problem.
2. isolation the gateway server from VMotion.
The discussion on VMware community could help, 
https://communities.vmware.com/thread/476929