KVM Archive mirror
 help / color / mirror / Atom feed
From: Marcelo Tosatti <mtosatti@redhat.com>
To: David Woodhouse <dwmw2@infradead.org>
Cc: kvm@vger.kernel.org, Paolo Bonzini <pbonzini@redhat.com>,
	Jonathan Corbet <corbet@lwn.net>,
	Sean Christopherson <seanjc@google.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	x86@kernel.org, "H. Peter Anvin" <hpa@zytor.com>,
	Paul Durrant <paul@xen.org>, Shuah Khan <shuah@kernel.org>,
	linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-kselftest@vger.kernel.org,
	Oliver Upton <oliver.upton@linux.dev>,
	jalliste@amazon.co.uk, sveith@amazon.de, zide.chen@intel.com,
	Dongli Zhang <dongli.zhang@oracle.com>
Subject: Re: [PATCH v2 01/15] KVM: x86/xen: Do not corrupt KVM clock in kvm_xen_shared_info_init()
Date: Sat, 4 May 2024 04:42:47 -0300	[thread overview]
Message-ID: <ZjXm9w/y3/NLCxLQ@tpad> (raw)
In-Reply-To: <20240427111929.9600-2-dwmw2@infradead.org>

On Sat, Apr 27, 2024 at 12:04:58PM +0100, David Woodhouse wrote:
> From: David Woodhouse <dwmw@amazon.co.uk>
> 
> The KVM clock is an interesting thing. It is defined as "nanoseconds
> since the guest was created", but in practice it runs at two *different*
> rates — or three different rates, if you count implementation bugs.
> 
> Definition A is that it runs synchronously with the CLOCK_MONOTONIC_RAW
> of the host, with a delta of kvm->arch.kvmclock_offset.
> 
> But that version doesn't actually get used in the common case, where the
> host has a reliable TSC and the guest TSCs are all running at the same
> rate and in sync with each other, and kvm->arch.use_master_clock is set.
> 
> In that common case, definition B is used: There is a reference point in
> time at kvm->arch.master_kernel_ns (again a CLOCK_MONOTONIC_RAW time),
> and a corresponding host TSC value kvm->arch.master_cycle_now. This
> fixed point in time is converted to guest units (the time offset by
> kvmclock_offset and the TSC Value scaled and offset to be a guest TSC
> value) and advertised to the guest in the pvclock structure. While in
> this 'use_master_clock' mode, the fixed point in time never needs to be
> changed, and the clock runs precisely in time with the guest TSC, at the
> rate advertised in the pvclock structure.
> 
> The third definition C is implemented in kvm_get_wall_clock_epoch() and
> __get_kvmclock(), using the master_cycle_now and master_kernel_ns fields
> but converting the *host* TSC cycles directly to a value in nanoseconds
> instead of scaling via the guest TSC.
> 
> One might naïvely think that all three definitions are identical, since
> CLOCK_MONOTONIC_RAW is not skewed by NTP frequency corrections; all
> three are just the result of counting the host TSC at a known frequency,
> or the scaled guest TSC at a known precise fraction of the host's
> frequency. The problem is with arithmetic precision, and the way that
> frequency scaling is done in a division-free way by multiplying by a
> scale factor, then shifting right. In practice, all three ways of
> calculating the KVM clock will suffer a systemic drift from each other.
> 
> Eventually, definition C should just be eliminated. Commit 451a707813ae
> ("KVM: x86/xen: improve accuracy of Xen timers") worked around it for
> the specific case of Xen timers, which are defined in terms of the KVM
> clock and suffered from a continually increasing error in timer expiry
> times. That commit notes that get_kvmclock_ns() is non-trivial to fix
> and says "I'll come back to that", which remains true.
> 
> Definitions A and B do need to coexist, the former to handle the case
> where the host or guest TSC is suboptimally configured. But KVM should
> be more careful about switching between them, and the discontinuity in
> guest time which could result.
> 
> In particular, KVM_REQ_MASTERCLOCK_UPDATE will take a new snapshot of
> time as the reference in master_kernel_ns and master_cycle_now, yanking
> the guest's clock back to match definition A at that moment.

KVM_REQ_MASTERCLOCK_UPDATE stops the vcpus because:

 * To avoid that problem, do not allow visibility of distinct
 * system_timestamp/tsc_timestamp values simultaneously: use a master
 * copy of host monotonic time values. Update that master copy
 * in lockstep.

> When invoked from in 'use_master_clock' mode, kvm_update_masterclock()
> should probably *adjust* kvm->arch.kvmclock_offset to account for the
> drift, instead of yanking the clock back to defintion A.

You are likely correct...

> But in the meantime there are a bunch of places where it just doesn't need to be
> invoked at all.
> 
> To start with: there is no need to do such an update when a Xen guest
> populates the shared_info page. This seems to have been a hangover from
> the very first implementation of shared_info which automatically
> populated the vcpu_info structures at their default locations, but even
> then it should just have raised KVM_REQ_CLOCK_UPDATE on each vCPU
> instead of using KVM_REQ_MASTERCLOCK_UPDATE. And now that userspace is
> expected to explicitly set the vcpu_info even in its default locations,
> there's not even any need for that either.
> 
> Fixes: 629b5348841a1 ("KVM: x86/xen: update wallclock region")
> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
> Reviewed-by: Paul Durrant <paul@xen.org>
> ---
>  arch/x86/kvm/xen.c | 2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c
> index f65b35a05d91..5a83a8154b79 100644
> --- a/arch/x86/kvm/xen.c
> +++ b/arch/x86/kvm/xen.c
> @@ -98,8 +98,6 @@ static int kvm_xen_shared_info_init(struct kvm *kvm)
>  	wc->version = wc_version + 1;
>  	read_unlock_irq(&gpc->lock);
>  
> -	kvm_make_all_cpus_request(kvm, KVM_REQ_MASTERCLOCK_UPDATE);
> -
>  out:
>  	srcu_read_unlock(&kvm->srcu, idx);
>  	return ret;
> -- 
> 2.44.0

So KVM_REQ_MASTERCLOCK_UPDATE is to avoid the race above.

In what contexes is kvm_xen_shared_info_init called from again?

Not clear to me KVM_REQ_MASTERCLOCK_UPDATE is not needed (or that is
needed, for that matter...).



  reply	other threads:[~2024-05-04  7:44 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-27 11:04 [RFC PATCH v2] Cleaning up the KVM clock mess David Woodhouse
2024-04-27 11:04 ` [PATCH v2 01/15] KVM: x86/xen: Do not corrupt KVM clock in kvm_xen_shared_info_init() David Woodhouse
2024-05-04  7:42   ` Marcelo Tosatti [this message]
2024-05-07 19:08     ` David Woodhouse
2024-04-27 11:04 ` [PATCH v2 02/15] KVM: x86: Improve accuracy of KVM clock when TSC scaling is in force David Woodhouse
2024-04-27 11:05 ` [PATCH v2 03/15] KVM: x86: Add KVM_[GS]ET_CLOCK_GUEST for accurate KVM clock migration David Woodhouse
2024-04-27 11:05 ` [PATCH v2 04/15] KVM: selftests: Add KVM/PV clock selftest to prove timer correction David Woodhouse
2024-04-27 11:05 ` [PATCH v2 05/15] KVM: x86: Explicitly disable TSC scaling without CONSTANT_TSC David Woodhouse
2024-04-27 11:05 ` [PATCH v2 06/15] KVM: x86: Add KVM_VCPU_TSC_SCALE and fix the documentation on TSC migration David Woodhouse
2024-04-27 11:05 ` [PATCH v2 07/15] KVM: x86: Avoid NTP frequency skew for KVM clock on 32-bit host David Woodhouse
2024-04-27 11:05 ` [PATCH v2 08/15] KVM: x86: Fix KVM clock precision in __get_kvmclock() David Woodhouse
2024-04-27 11:05 ` [PATCH v2 09/15] KVM: x86: Fix software TSC upscaling in kvm_update_guest_time() David Woodhouse
2024-04-27 11:05 ` [PATCH v2 10/15] KVM: x86: Simplify and comment kvm_get_time_scale() David Woodhouse
2024-04-27 11:05 ` [PATCH v2 11/15] KVM: x86: Remove implicit rdtsc() from kvm_compute_l1_tsc_offset() David Woodhouse
2024-04-27 11:05 ` [PATCH v2 12/15] KVM: x86: Improve synchronization in kvm_synchronize_tsc() David Woodhouse
2024-04-27 11:05 ` [PATCH v2 13/15] KVM: x86: Kill cur_tsc_{nsec,offset,write} fields David Woodhouse
2024-05-10  9:03   ` Chenyi Qiang
2024-05-14 13:17     ` David Woodhouse
2024-04-27 11:05 ` [PATCH v2 14/15] KVM: x86: Allow KVM master clock mode when TSCs are offset from each other David Woodhouse
2024-04-27 11:05 ` [PATCH v2 15/15] KVM: x86: Factor out kvm_use_master_clock() David Woodhouse
2024-05-01 17:55   ` Chen, Zide
2024-05-01 20:45     ` David Woodhouse

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZjXm9w/y3/NLCxLQ@tpad \
    --to=mtosatti@redhat.com \
    --cc=bp@alien8.de \
    --cc=corbet@lwn.net \
    --cc=dave.hansen@linux.intel.com \
    --cc=dongli.zhang@oracle.com \
    --cc=dwmw2@infradead.org \
    --cc=hpa@zytor.com \
    --cc=jalliste@amazon.co.uk \
    --cc=kvm@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=oliver.upton@linux.dev \
    --cc=paul@xen.org \
    --cc=pbonzini@redhat.com \
    --cc=seanjc@google.com \
    --cc=shuah@kernel.org \
    --cc=sveith@amazon.de \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    --cc=zide.chen@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).