Basic troubleshooting procedure of time synchronization issue of AppLogic

Document ID : KB000049115
Last Modified Date : 14/02/2018
Show Technical Document Details

Description:

In some cases, the AppLogic BFC, controller, appliance and physical node(dom0) may have incorrect time, this document introduces basic troubleshooting procedure to address such sort of problem.

Solution:

Background knowledge

In AppLogic 2.x and 3.0, controller is solo ntp server in the grid, time sync flow is like below. Ntp on controller and physical node processes the time sync up, and hypervisor is in charge of passing time from physical node to appliance VM.

 	controller<= physical node <= appliance VM

From 3.1, BFC take the role of controller and become the ntp server of all grids managed by it, time sync flow is changed to the following procedure. Ntp on BFC and physical node processes the time sync up, and hypervisor is in charge of passing time from physical node to controller and appliance VM.'

 	BFC<=physical node<= controller and application VM 

If external ntp server is configured from BFC GUI, time sync flow should look like the below

 	external ntp server<=BFC<=physical node<= controller and application VM 

Another major change from 3.1 is all physical node clock (both system clock and hwclock) use UTC+0 time as opposed to local time. Time drift sync with BFC to physical node is also based on UTC time.

If you would like know more details of ntp, please refer to following link

http://en.wikipedia.org/wiki/Network_Time_Protocol

Troubleshooting procedure

The time synchronization issue of any link of the chain may result in bfc, controller, appliance VM or physical node of next links have incorrect time.

Please follow the below check list to locate which part has time time synchronization issue

Check list for AppLogic 2.x and 3.0

  1. Controller time is correctly sync or not

  2. Physical node system time is correctly sync or not

  3. Application VM time zone is correctly configured or not

Check list for AppLogic 3.1 and newer release

  1. If external ntp server is configured in BFC GUI, BFC time is correctly configured or not

  2. Physical node system time is correctly sync or not

  3. Physical node hwclock is correctly configured or not

  4. Affected Appliance VM time zone is correctly configured or not

  5. Affected appliance is windows or linux box which is running HVM mode or PV mode

Note: there is known time sync issue in 3.1 due to a Xen time drift bug in which physical node(dum0) has trouble to pass time drift to hypervisor, the end result is controller and appliance VM has incorrect time. The solution is set independent wall clock in appliance VM, additionally, install and configure ntp to sync time from either BFC or external ntp server.

How to identify the time sync with external ntp works properly

This section applies to controller in AppLogic 3.0 and prior release , as well as BFC of 3.1 and newer release if external ntp server is configured.

Note: When configuring ntp server in the BFC GUI 3.1 and newer release, you may input any valid and available external ntp server, but not BFC name or ip address.

  1. In /etc/ntp.conf, the entry with keyword "server" is the external ntp server name/ip.

    server < controller private ip>

  2. Run "ntpq -p" to show ntp configuration of local server. In the following sample, external ntp servers are ntpsrv1 and ntpsrv2
    remote          refid            st  t         when        poll      reach     delay   offset   jitter
    ==============================================================================
    *ntpsrv1      141.202.0.2        4  u          995         1024       377      0.411    0.019    0.031
     ntpsrv2      141.202.0.25       5  u          708         1024       377     46.798   -0.100    0.051

    if there are multiple entities in the output of "ntpq -p", the entity started with *(asterisk) is the current (preferred) ntp source.

    Note: please refer to following documet for more details of how to utilize ntpq to address the connection issue with ntp source

    https://support.ca.com/irj/portal/anonymous/redirArticles?reqPage=search&searchID=TEC573076

  3. Run "ntpq -c readvar" to presents external ntp server name/ip and status. Here is a sample of output

    assID=0 status=06f4 leap_none, sync_ntp, 15 events, event_peer/strat_chg,
    version="ntpd 4.2.2p1@1.1570-o Fri Nov 18 13:21:16 UTC 2011 (1)",
    processor="i686", system="Linux/2.6.18-238.el5PAE", leap=00, stratum=5,
    precision=-20, rootdelay=17.898, rootdispersion=75.212, peer=26686,
    refid=141.202.0.25,
    reftime=d3965450.304f487c Wed, Jun 27 2012 23:56:00.188, poll=10,
    clock=d3965990.187e30f8 Thu, Jun 28 2012 0:18:24.095, state=4,
    offset=0.019, frequency=115.304, jitter=0.108, noise=0.634,
    stability=0.003, tai=0

  4. If configuration is correct, next step is to verify if time sync up from external work properly. The recommend procedure including following steps

    1. service ntpd stop

    2. ntpdate -d <ntp server name/ip> --> <ntp server name/ip> can be found in "ntpq -p" output

    3. service ntpd start

    4. date

      if step b or c has any error, please check if external ntp server name/ip is valid or accessible. If you would like refer to a public ntp server, please refer to the following link

      http://support.ntp.org/bin/view/Servers/WebHome

      Note: when ntpd service is started up, it may take a while, usually, less than 5 minutes, for ntpd service to connect to the primary external ntp server and mark *(asterisk) in the output of "ntpq -p"

How to identify the time sync to physical node work properly

If physical node time are different to the solo ntp server in the grid(controller in 3.0 and prior release, BFC in 3.1 and newer release), similarly, we can also utilize the following approaches for verification and troubleshooting.

  1. Check /etc/ntp.conf. In 3.0 and prior release, ntp server should point to controller, in 3.1 and newer release, it should be BFC. For instance, the below entity in ntp.conf stands for ntp server is controller(private ip)

    server < controller private ip>

  2. Run "ntpq -p" and "ntpq -c readvar" to verify ntp configuration. The below is a smaple of "ntpq -p" output, 192.168.6.254 is the controller ip. If there is multiple entity, the entity started with *(asterisk) is the current (preferred) ntp source, please make sure it's controller private ip in 3.0 and older release, or BFC private ip in 3.1 and newer release

    remote refid st t when poll reach delay offset jitter
    ==============================================================================
    *192.168.6.254 LOCAL(0) 11 u 91 1024 377 0.204 467.992 0.764
    LOCAL(0) .LOCL. 10 l 29 64 377 0.000 0.000 0.001

  3. Sync node system clock as below

    1. service ntpd stop

    2. ntpdate -d <ntp server name/ip>

    3. service ntpd start

    4. date

  4. Sync node hardware clock by running "hwclock -systohc". "hwclock" without parameter is used to display current hardware clock time.

Note: From 3.1, both system time and hardware clock time of physical node should be UTC+0 time, and they should not have significant gap. BFC is still local time. For instance, current time on BFC is 20:00 PM (UTC+10 time zone), node time is 10:00 AM (UTC+0), in such case, their time are consistent.

[root@ srv1 ~]# date
Thu Jun 28 06:19:37 UTC 2012 -> OS system time
[root@ srv1 ~]# hwclock
Thu 28 Jun 2012 06:19:38 AM UTC -0.549140 seconds -> hardware clock
The time zone of physical node system is stored in /etc/localtime, it should either set as UTC like below or link to a /usr/share/zoneinfo/UTC
[root@ srv1 ~]# cat /etc/localtime
TZif2UTCTZif2UTC
UTC0

The time zone of hardware clock is stored in /etc/sysconfig/clock as below.

[root@ srv1 ~]# cat /etc/sysconfig/clock
ZONE="UTC"
UTC=true
ARC=false

How to identify the time of appliance VM correct or not

Basically, if physical node time is correct, the appliance VM time should be correct as long as its time zone is configured as correct local time zone. If appliance VM time is incorrect, following information may help to address the problem

  1. PV appliance VM has incorrect time in 3.1(and ONLY in 3.1)

    Appliance VM in AppLogic 3.1 may not obtain the correct time due to Xen time drift bug even though BFC and physical node have correct time. This bug only affect PV appliance, not HVM appliance(Windows appliance/VDS always run as HVM mode)

    If the system is affected by this bug, the recommended workaround is to set independent wall clock in PV appliance VM as what below document indicates.

    http://docs.vmd.citrix.com/XenServer/4.0.1/guest/ch04s06.html

    In addition, it's strongly recommended to install ntp package into PV appliance VM in such a scenario and configure either BFC or external ntp server as the source of time sync.

    Note: Run "hwclock -systohc" to correct physical node hwclock then reboot physical node can only temporarily pass correct time to appliance for hours, it's not the final solution.

  2. Windows appliance has incorrect time in 3.1 and newer release

    From 3.1, the physical node system time and hwclock is set as UTC+0 time during the installation. Microsoft Windows is expecting the realtime clock to be set to localtime rather than UTC by default. As a result of this, the time/date is not correctly calculated inside the windows appliance for the various timezones.

    To fix this a registry edit is needed to tell the system the realtime clock is set to UTC as follows. After rebooting the date/time is adjusted correctly for the timezone set.

    Navigate to HKLM\SYSTEM\CurrentControlSet\Control\TimeZoneInformation\ and create or set the RealTimeIsUniversal value to a dword value of 1

  3. Windows appliance in domain does incorrect time.

    If Windows box join the domain, by default, it sync time from domain controller. It's necessary to make sure domain controller time is correct.

    You may refer to following document for details of how time is sync in the domain.

    http://blogs.msdn.com/b/w32time/archive/2007/07/07/welcome.aspx
    http://blogs.msdn.com/b/w32time/archive/2007/09/04/keeping-the-domain-on-time.aspx

    "w32tm /query [/peers | /status | /configuration]" is used to display time sync source and status on a windows box. In the following sample, you may see time sync up source is AUSYDC02.ca.com.
    C:>w32tm /query /peers
    #Peers: 1
    Peer: AUSYDC02.ca.com
    State: Active
    Time Remaining: 282.9147723s
    Mode: 3 (Client)
    Stratum: 6 (secondary reference - syncd by (S)NTP)
    PeerPoll Interval: 13 (8192s)
    HostPoll Interval: 13 (8192s)