MTP vertica database stuck in the state INITIALIZING

Document ID : KB000048678
Last Modified Date : 14/02/2018
Show Technical Document Details

Description:

Multi-Port Monitor 9.2.0.0038 system is failing to start the vertica database.
The database is all the time in the state INITIALIZING.

/nqxfs/vertica/capture/v_capture_node0001_catalog/vertica.log shows following lines:

2013-03-12 17:18:52.811 Init Session:0x2aaab8008460 <FATAL>
@v_capture_node0001: {SessionRun} 57V03: Node startup/recovery in progress.
Not yet ready to accept connections
LOCATION: initSession,
/scratch_a/release/vbuild/vertica/Session/ClientSession.cpp:305
2013-03-12 17:18:53.002 Cluster Inviter:0x2aaab8003f10 [Comms] <WARNING> All
nodes are present, but nobody seems to have a good catalog
2013-03-12 17:18:53.404 Spread Client:0x1b0b90b0 [Comms] <INFO> Saw
membership message 4352 on Vertica:all
2013-03-12 17:18:53.456 Spread Client:0x1b0b90b0 [Comms] <INFO> Saw
membership message 5120 on Vertica:all
2013-03-12 17:18:55.002 Cluster Inviter:0x2aaab8005960 [Comms] <WARNING> All
nodes are present, but nobody seems to have a good catalog
2013-03-12 17:18:57.002 Cluster Inviter:0x2aaab8008460 [Comms] <WARNING> All
nodes are present, but nobody seems to have a good catalog
2013-03-12 17:18:59.002 Cluster Inviter:0x2aaab8005960 [Comms] <WARNING> All
nodes are present, but nobody seems to have a good catalog
2013-03-12 17:19:00.704 Spread Client:0x1b0b90b0 [Comms] <INFO> Saw
membership message 4352 on Vertica:all
2013-03-12 17:19:00.756 Spread Client:0x1b0b90b0 [Comms] <INFO> Saw
membership message 5120 on Vertica:all
2013-03-12 17:19:01.001 Cluster Inviter:0x2aaab8004250 [Comms] <WARNING> All
nodes are present, but nobody seems to have a good catalog
2013-03-12 17:19:03.002 Cluster Inviter:0x2aaab8005960 [Comms] <WARNING> All
nodes are present, but nobody seems to have a good catalog
2013-03-12 17:19:05.002 Cluster Inviter:0x2aaab8008460 [Comms] <WARNING> All
nodes are present, but nobody seems to have a good catalog
2013-03-12 17:19:07.002 Cluster Inviter:0x2aaab8008460 [Comms] <WARNING> All
nodes are present, but nobody seems to have a good catalog
2013-03-12 17:19:09.002 Cluster Inviter:0x2aaab8004250 [Comms] <WARNING> All
nodes are present, but nobody seems to have a good catalog
2013-03-12 17:19:10.001 Timer Service:0x1afd2ea0 [Util] <INFO> Task
'ManageEpochs' enabled
2013-03-12 17:19:10.001 Timer Service:0x1afd2ea0 [Util] <INFO> Task
'AnalyzeRowCount' enabled
2013-03-12 17:19:10.001 Timer Service:0x1afd2ea0 [Util] <INFO> Task 'TM
Mergeout(00)' enabled
2013-03-12 17:19:10.001 Timer Service:0x1afd2ea0 [Util] <INFO> Task 'TM
Mergeout(01)' enabled
2013-03-12 17:19:10.001 Timer Service:0x1afd2ea0 [Util] <INFO> Task
'LicenseSizeAuditor' enabled
2013-03-12 17:19:11.001 Cluster Inviter:0x2aaab80039b0 [Comms] <WARNING> All
nodes are present, but nobody seems to have a good catalog
2013-03-12 17:19:11.173 Spread Client:0x1b0b90b0 [Comms] <INFO> Saw
membership message 4352 on Vertica:all
2013-03-12 17:19:11.225 Spread Client:0x1b0b90b0 [Comms] <INFO> Saw
membership message 5120 on Vertica:all
2013-03-12 17:19:13.002 Cluster Inviter:0x2aaab80039b0 [Comms] <WARNING> All
nodes are present, but nobody seems to have a good catalog

Solution:

The error is an indicator that something in the Vertica database became corrupt which is typically caused by powering down or restarting the system without first stopping the database.

To resolve this we have to recreate the Vertica database again using the steps below.

  1. SSH to the MTP

  2. Log into the MTP Linux command line using the netqos account

  3. Run the following commands

    sudo /opt/NetQoS/scripts/stopprocs.sh (stops all the daemon processes)
    sudo /opt/NetQoS/install/setupVertica.sh --new
    sudo /opt/NetQoS/install/setupReplicateMySqlToVertica.sh
    sudo /opt/NetQoS/scripts/startprocs.sh
    In addition to this, please check the following directory and make sure it's permissions are set to 777:

    /nqxfs/SuperAgent/localfiles

    --> The following section gives additional troubleshooting information if the steps to recreate the vertica database are not successful.

 

  1. Even though Vertica runs on localhost, we have sometimes seen problems when there are oddities/inconsistencies with DNS resolution of the MTP host name. The best way we've found to resolve this is to add an entry in the /etc/hosts file to make sure the MTP host name is properly resolved.

    1. On the MTP, add an entry to the /etc/hosts file for the MTP's IP address and hostname. Note: you must have root privileges to edit this file; when logged in using the netqos account, you will need to prefix the edit command with sudo: sudo vi /etc/hosts

      An example of the entry that gives the IP followed by the fully qualified domain name followed by the short name would be:

      10.10.0.12 mtp1.netqos.local mtp1

    2. Confirm that the MTP's IP address and hostname can now be resolved consistently by performing the following commands:

      hostname
      nslookup <ip address>
      nslookup <hostname>
      ping -a <hostname>

  2. Stop the processes that directly access the Vertica database.

    sudo /sbin/service nqwatchdog stop
    sudo /sbin/service nqinspectoragentd stop

  3. Manually drop the database (including making sure that the /nqxfs/vertica/capture folder is removed).

    su - dbadmin -c "/opt/vertica/bin/adminTools -t drop_db -d capture"

    (when prompted for password, enter 'dbadmin'). Note syntax of command is different. It is 'su dash'

    sudo rm -r /nqxfs/vertica/capture

  4. Confirm that there is no Vertica database process running. To find whether a Vertica process is running, use the following command:

    ps -ef | grep vertica

    If the Vertica process is running, it will display a line similar to the following:

    dbadmin 9047 1 2 Jul02 ? 04:25:30 /opt/vertica/bin/vertica -C capture -D /nqxfs/vertica/capture/v_capture_node0001_catalog -h 127.0.0.1 -p 5433

    If there is a process running, kill it using the Linux kill <pid> command (e.g. kill 9047 would kill the above process).

    Note: If there was a Vertica process running that you had to kill, repeat Step 3 to ensure that the database has been dropped.

  5. Restart the Vertica spreadd daemon.

    sudo /sbin/service spreadd stop
    sudo /sbin/service spreadd start

  6. Recreate the database.

    sudo /opt/NetQoS/install/setupVertica.sh --new

  7. Restart nqinspectoragentd and nqwatchdog

    sudo /sbin/service nqinspectoragentd start
    sudo /sbin/service nqwatchdog start