(Yea! You know me!)
NTP. How can I explain it? I’ll take you frame by frame it.
I’m sure that no readers of Virtual Insanity would ever neglect to setup NTP properly on every single ESXi host. But occasionally, our NTP source hiccups, or something happens to skew the time. Recently I found a host with the NTP service stopped.
Why? No idea really. Maybe it was stopped while someone was troubleshooting. Maybe it just crashed. But it will cause issues with backups, and with applications running during backups or vMotions.
When a snapshot is taken, or a VM is vMotioned, the time is sync’d inside the guest by default. This can be a problem if your host NTP time is off. All my guests use Active Directory for NTP, and the Linux guests use an AD domain controller for NTP, so I do not rely on guest time syncing up to my ESXi hosts. Or so I thought. . .
Even if you have your guests configured NOT to do periodic time syncs with VMware Tools, it will still force NTP to sync to the host on snapshot operations, suspend/resume, or vMotion. There is a way to prevent VMware Tools from syncing the time for these events, but it’s better just to make sure NTP is up and running, and getting the correct time. There is a clear reason VMware insists on doing these sync’s during times when I/O is quiesced, or transferred to another host. Timekeeping in a hypervisor environment when you’re sharing CPU cycles is no trivial task.
If you use a backup solution that snapshots the VM, VSS quiesces the I/O inside that guest. When it does, there’s a VSS timeout for a snapshot to complete. If the time is exceeded by the snapshot, VSS will timeout, and your job will fail with error code 4 quiesce aborted.
By default, this timeout is set to 10 mins on Windows guests. Of course, my time was off on the ESXi host by 12 minutes, so when the backup job started, VSS kicked off, and then VMware Tools sync’d the time 12 minutes forward. VSS times out instantly. If you see this error code on your backups, an easy thing to check first is NTP.
I recommend setting the NTP service to start and stop automatically.
Previously, I had set this to start and stop with the host. But if something happens, and it stops, or gets stopped for some reason, it will not restart until the host restarts.
So who’s down with NTP?
Hopefully all the homies. . .