CA PM vertica single node down in 3.5

Document ID : KB000071705
Last Modified Date : 23/02/2018
Show Technical Document Details
Introduction:
By some reason Vertica node down, even restarting of DR, DC and DA.
The following troubleshooting steps were observed.


1. Checked Firewall in the vertica server and network then cleared.

The error Message at Db.log

01/12/18 16:05:46 SP_connect: unable to connect via UNIX socket to /opt/vertica/spread/tmp/4803 (pid=7165): Error: No such file or directory

1. For example, the following file was missing in this folder. /opt/vertica/spread/tmp srw-rw-rw- 1 dradmin verticadba 0 Jan 25 15:37 4803

2. Checked the following vertica ports availability, however none of them available. $ netstat –nap | grep 5433 $ netstat –nap | grep 5434 $ netstat –nap | grep 4803

3. Checked vertica processes table eg: #ps -ef|grep -i vertica None of processes were running.

4. At this stage, the suggestion was to COLD REBOOT on vertica server from scratch. Here is the reason of this suggestion:

The ’s’ bit indicates a UNIX/LINUX file type of ‘socket’. A socket file is not a regular file, it's more like an IP address. A socket file is created by the system when a program attempts to bind to a unix domain socket (by calling a TCP socket Bind function).
This type of network socket is one that is internal to one computer and is used primarily for inter-process communications. The system then associates this special file with the socket file descriptor that the program bound, or more specifically, the "inode" to which that file descriptor refers.

An inode is a data structure that describes objects (such as files or directories) in Unix type filesystems. After its creation, the program that created the socket ‘file’ does not interact with the socket via the filename. Instead, it communicates via the inode that is referenced by the files.
So you cannot create it manually and changing it after it has been created by the system will not achieve anything since you would only be changing the set of names that point at that inode. Which means you can connect to the listening program at a new name or manually created file.

The error in the DB.log; 01/12/18 16:05:46 SP_connect: unable to connect via UNIX socket to /opt/vertica/spread/tmp/4803 (pid=7165): Error: No such file or directory Shows that the problem is that when attempting to start Vertica, the OS does not allow it to create the 4803 socket file since the run-time data for a previous running session has locked this and has not been cleared.

The /var/run directory is where run-time variable data is located.
This should be cleared out at each boot of the system. So the quickest way to clear this would be to reboot the OS on which the Vertica node is running.


5. After cold reboot of vertica then following steps are recommended
  • A. Make DB connect manually by adminTools
  • B. DA and DC connections re-establish again
  • C. If it is still not UP then execute the following command

Eg: [dradmin_drdata_node0001_catalog]$ /opt/vertica/bin/adminTools -t list_allnodes
Node |                                 Host |             State |   Version |   DB
-------------------+--------------+-------+-----------------+-------------------------
_drdata_node0001 | XXX.XXX.XXX.XXX | UP | vertica-8.1.0.4 | drdata
Background:

 
Environment:
CAPM 3.5
RedHat 7.X
Instructions:
This case was a single Vertica Node, if there are multiple nodes then it might be taking actions in different way.