Troubleshooting time synchronization problems

Document ID : KB000049511
Last Modified Date : 14/02/2018
Show Technical Document Details

Description:

Message:

The following servers are no longer synchronized to a time server: srv1, srv2, srv3, srv4, srv5, srv6. Appliances running on these servers may report the incorrect time.

appears in the dashboard of the controller.

This document describes a few troubleshooting steps which may be taken to determine the cause of this message

Solution:

In Applogic 3.1, the nodes synchronize their time with the BFC. At the same time the BFC synchronizes itself with an external ntp server, which may be a primary ntp server (what is called a stratum 1 server) or a subrogate server to another time server in the net (stratum 2 and upper).

The messages above imply that the nodes listed have been unable to complete their synchronization to the BFC time server, be it because they can't reach it for whatever reason or because the BFC is unable to synchronize itself with its time server.

There are several steps that may be undertaken to check the synchronization status:

  • Check that the ntp service is running on both the nodes and the BFC:

    [root@bfc1 ~]# service ntpd status ntpd (pid 17126) is running...[root@bfc1 ~]# ps -ef | grep ntpdroot    7563  18605  0  11:51 pts/3   00:00:00 grep ntpdntp     17126     1  0   Apr12 ?      00:00:02 ntpd -u ntp:ntp -p /var/run/ntpd.pid -g

  • Verify that the ntpd.conf is correct. In the nodes it should contain a reference to the BFC server as the time server. That is, if BFC is 192.168.100.2 there should be a line like:
    # --- OUR TIMESERVERS -----server 192.168.100.2
    in the dom0 computers and in the BFC's ntp.conf entries like
    # Servers maintained by the BFCserver time.pool.ntp.org # BFC server 1server time2.pool.ntp.org # BFC server 2corresponding to the external servers should be present as well

  • Make sure that the ntp.conf on the BFC contains a rule to allow the servers on the backbone to query it for time. The rule should look like:

    restrict 192.168.100.0 mask 255.255.255.0 nomodify notrap which is indicating that the servers there may query but not modify the time on the BFC

  • The ntp server uses port 123 ingoing and outgoing for operation. You need to make sure that there is no added security in the backbone network forbidding communications to that port on the backbone network. To do that you can run

    iptables -L on the BFC and verify no restriction/rule exists for that port

  • If all the above has been verified, try running
    ntpq -pnremote refid st t when poll reach delay offset jitter================================================10.20.10.1 .LOCL. 1 u 6 64 1 1.680 1372.10 0.001 ntpq -nntpq> asind assID status conf reach auth condition last_event cnt===========================================================1 30381 9014 yes yes none reject reachable 1ntpq> exit
    This is indicating that server 10.20.10.1 is considered to be the ntp server with stratum 1 (so it is not a client of anyone), and it is also showing that this server is reachable but rejected the connections.

    If the same operations are repeated in one of the dom0 nodes, the results look like the following

    ntpq -pnremote refid st t when poll reach delay offset jitter===============================================192.168.0.2 .INIT. 16 u 4 64 0 0.000 0.000 0.000ntpq -nntpq> asind assID status conf reach auth condition last_event cnt===========================================================1 26109 8000 yes yes none rejectntpq> exit
    A stratum of 16 implies that the date on the server never synchronizes to the time server. The reason is that synchronization is rejected and it is rejected because the BFC itself could not synchronize to its time server

  • To find out why connections are rejected the following commands may be issued to test different aspects of the synchronization to the remote server:

    ntpq -pn 10.20.10.110.20.10.1: timed out, nothing receivedntpq -crv 10.20.10.110.20.10.1: timed out, nothing receivedntpq -c associations 10.20.10.210.20.20.1: timed out, nothing received
  • Supposing this turns out not to work, it is possible to test the functionality of the remote server using ntpdate, which manually sets the date of the current server to that of the remote one, using unprivileged ports. First, the ntpd service needs to be stopped:
    service ntpd stopShutting down ntpd: [ OK ]ntpdate 10.20.10.127 Jun 16:27:50 ntpdate[4309]: step time server 10.20.10.1 offset 20.152613 secntpdate 10.20.10.127 Jun 16:41:01 ntpdate[7008]: adjust time server 10.20.10.1 offset 0.081244 sec
    If this works but ntpq does not work it means the problem is probably caused by some setting in the ntp server causing connections to be rejected for the client. If not even this command works, there is probably a network security related issue preventing synchronization to the time server