Zombie process left over post upgrade that is impacting the operation of the problem node which is node 1 in a 3 node cluster. As a result it is consuming 98% of the CPU resources on a Vertica node.
This results in an inability to start the database and use it.
The details of the offending process are:
- Owned by dradmin user
- Running for over 10 days since 10/23/16
- Parent PID is 1 indicating it was started out of the system, not from another process
- Details for the errant process that remains running per the screen shots provided by the reporting customer:
- dradmin 17995 1 92 Oct23 ? 10-05:19:27 /opt/vertica/bin/dialog --backtitle Vertica Analytic Database 7.0.2-5 Administration Tools --aspect 15 --help-button --menu Main Menu 16 60 9 1 View Database Cluster State 2 Connect to Database 3 Start Database 4 Stop Database 5 Restart Vertica on Host 6 Configuration Menu 7 Advanced Menu 8 Help Using the Administration Tools E Exit
Note that this is a system that was just upgraded 10 days prior from older CAPM release 2.4.1 to the latest 2.8 release.
This /opt/vertica/bin/dialog command and related process is what is started when the /opt/vertica/bin/adminTools UI is launched by the dradmin user. Under normal circumstances we should see something like the the following running when adminTools has been started properly.
- Here we have root user CLI 23644 launching PID 24571 for user switch to dradmin user:
- root 24571 23644 0 09:19 pts/2 00:00:00 su dradmin
- Here we have it showing a bash shell PID 24580 from the root login su to dradmin PID 24571:
- dradmin 24580 24571 0 09:19 pts/2 00:00:00 bash
- Then bash shell dradmin login PID 24580 launches adminTools under PID 24659:
- dradmin 24659 24580 0 09:19 pts/2 00:00:00 /opt/vertica/oss/python/bin/python ./adminTools
- When that appears in view in the CLI as a UI we then get adminTools request PID 24659 owning PID 25807 launch for the dialog command:
- dradmin 25807 24659 0 09:20 pts/2 00:00:00 /opt/vertica/bin/dialog --backtitle Vertica Analytic Database 7.1.2-6 Administration Tools --aspect 15 --help-button --menu Main Menu 16 60 9 1 View Database Cluster State 2 Connect to Database 3 Start Database 4 Stop Database 5 Restart Vertica on Host 6 Configuration Menu 7 Advanced Menu 8 Help Using the Administration Tools E Exit
Another key clue is that the errant process showed an older release of Vertica than was actually installed. It still showed as release of 7.0.2-5 when it should be 7.1.2-6.