Re: [PATCH v2 01/15] KVM: x86/xen: Do not corrupt KVM clock in kvm_xen_shared_info_init()

KVM Archive mirror
 help / color / mirror / Atom feed

From: David Woodhouse <dwmw2@infradead.org>
To: Marcelo Tosatti <mtosatti@redhat.com>
Cc: kvm@vger.kernel.org, Paolo Bonzini <pbonzini@redhat.com>,
	Jonathan Corbet <corbet@lwn.net>,
	Sean Christopherson <seanjc@google.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	x86@kernel.org, "H. Peter Anvin" <hpa@zytor.com>,
	Paul Durrant <paul@xen.org>, Shuah Khan <shuah@kernel.org>,
	 linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
	 linux-kselftest@vger.kernel.org,
	Oliver Upton <oliver.upton@linux.dev>,
	 jalliste@amazon.co.uk, sveith@amazon.de, zide.chen@intel.com,
	Dongli Zhang <dongli.zhang@oracle.com>
Subject: Re: [PATCH v2 01/15] KVM: x86/xen: Do not corrupt KVM clock in kvm_xen_shared_info_init()
Date: Tue, 07 May 2024 20:08:40 +0100	[thread overview]
Message-ID: <51dea3a632131d9a49af3991a633f26ce8592dd3.camel@infradead.org> (raw)
In-Reply-To: <ZjXm9w/y3/NLCxLQ@tpad>

[-- Attachment #1: Type: text/plain, Size: 4555 bytes --]

On Sat, 2024-05-04 at 04:42 -0300, Marcelo Tosatti wrote:
> On Sat, Apr 27, 2024 at 12:04:58PM +0100, David Woodhouse wrote:
> > 
> > In particular, KVM_REQ_MASTERCLOCK_UPDATE will take a new snapshot of
> > time as the reference in master_kernel_ns and master_cycle_now, yanking
> > the guest's clock back to match definition A at that moment.
> 
> KVM_REQ_MASTERCLOCK_UPDATE stops the vcpus because:

Took me a while to work that one out, btw. The fact that the 
KVM_REQ_MCLOCK_INPROGRESS request is asserted but never actually
*handled*, so all it does is repeatedly kick the vCPU out and make it
spin until the request is cleared is... interesting. Likewise the way
that we set KVM_REQ_MASTERCLOCK_UPDATE on *all* vCPUs, so they *all*
call kvm_update_masterclock(), when only one of them needed to. I may
clean that up a little.

>  * To avoid that problem, do not allow visibility of distinct
>  * system_timestamp/tsc_timestamp values simultaneously: use a master
>  * copy of host monotonic time values. Update that master copy
>  * in lockstep.

Right. That comment is a lot longer than the part you cited here, and
starts with 'assuming a stable TSC across pCPUS, and a stable TSC
across vCPUs'. It's the "if (ka->use_master_clock)" case.

And yes, what it's basically saying is a special case of the fact that
if you let the KVM clock run at its "natural" rate based on the guest
TSC (definition B), but each vCPU runs at that rate from a *different*
point on the line that is definition A (the host CLOCK_MONOTONIC_RAW),
bad things will happen.

I'm OK with it stopping the vCPUs (although I'd like it to do so in a
less implicitly awful way). But when we don't need to update the
reference time at all, let's not do so.

> > When invoked from in 'use_master_clock' mode, kvm_update_masterclock()
> > should probably *adjust* kvm->arch.kvmclock_offset to account for the
> > drift, instead of yanking the clock back to defintion A.
> 
> You are likely correct...
> 
> > But in the meantime there are a bunch of places where it just doesn't need to be
> > invoked at all.
> > 
> > To start with: there is no need to do such an update when a Xen guest
> > populates the shared_info page. This seems to have been a hangover from
> > the very first implementation of shared_info which automatically
> > populated the vcpu_info structures at their default locations, but even
> > then it should just have raised KVM_REQ_CLOCK_UPDATE on each vCPU
> > instead of using KVM_REQ_MASTERCLOCK_UPDATE. And now that userspace is
> > expected to explicitly set the vcpu_info even in its default locations,
> > there's not even any need for that either.
> > 
> > Fixes: 629b5348841a1 ("KVM: x86/xen: update wallclock region")
> > Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
> > Reviewed-by: Paul Durrant <paul@xen.org>
> > ---
> >  arch/x86/kvm/xen.c | 2 --
> >  1 file changed, 2 deletions(-)
> > 
> > diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c
> > index f65b35a05d91..5a83a8154b79 100644
> > --- a/arch/x86/kvm/xen.c
> > +++ b/arch/x86/kvm/xen.c
> > @@ -98,8 +98,6 @@ static int kvm_xen_shared_info_init(struct kvm *kvm)
> >         wc->version = wc_version + 1;
> >         read_unlock_irq(&gpc->lock);
> >  
> > -       kvm_make_all_cpus_request(kvm, KVM_REQ_MASTERCLOCK_UPDATE);
> > -
> >  out:
> >         srcu_read_unlock(&kvm->srcu, idx);
> >         return ret;
> > -- 
> > 2.44.0
> 
> So KVM_REQ_MASTERCLOCK_UPDATE is to avoid the race above.
> 
> In what contexes is kvm_xen_shared_info_init called from again?
> 
> Not clear to me KVM_REQ_MASTERCLOCK_UPDATE is not needed (or that is
> needed, for that matter...).

We cal kvm_xen_shared_info_init() when the Xen "shared info" page is
set up. The only interesting part of that is the *wallclock* epoch.

The wallclock (just like KSR_KVM_WALL_CLOCK{,_NEW}) is *entirely* hosed
ever since the KVM clock stopped being based on CLOCK_MONOTONIC, since
that means that the value of "wallclock time minus KVM clock time"
actually *changes* as the KVM clock runs at a different rate to
wallclock time. 

I'm looking at a replacement for that which uses the gtod information
to give the guest a direct mapping of guest TSC to host CLOCK_TAI. And
in doing so we can *also* indicate when live migration has potentially
disrupted the guest TSC, so any further NTP/PTP refinement that the
guest may have done for itself needs to be thrown away.

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5965 bytes --]

next prev parent reply	other threads:[~2024-05-07 19:08 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-27 11:04 [RFC PATCH v2] Cleaning up the KVM clock mess David Woodhouse
2024-04-27 11:04 ` [PATCH v2 01/15] KVM: x86/xen: Do not corrupt KVM clock in kvm_xen_shared_info_init() David Woodhouse
2024-05-04  7:42   ` Marcelo Tosatti
2024-05-07 19:08     ` David Woodhouse [this message]
2024-04-27 11:04 ` [PATCH v2 02/15] KVM: x86: Improve accuracy of KVM clock when TSC scaling is in force David Woodhouse
2024-04-27 11:05 ` [PATCH v2 03/15] KVM: x86: Add KVM_[GS]ET_CLOCK_GUEST for accurate KVM clock migration David Woodhouse
2024-04-27 11:05 ` [PATCH v2 04/15] KVM: selftests: Add KVM/PV clock selftest to prove timer correction David Woodhouse
2024-04-27 11:05 ` [PATCH v2 05/15] KVM: x86: Explicitly disable TSC scaling without CONSTANT_TSC David Woodhouse
2024-04-27 11:05 ` [PATCH v2 06/15] KVM: x86: Add KVM_VCPU_TSC_SCALE and fix the documentation on TSC migration David Woodhouse
2024-04-27 11:05 ` [PATCH v2 07/15] KVM: x86: Avoid NTP frequency skew for KVM clock on 32-bit host David Woodhouse
2024-04-27 11:05 ` [PATCH v2 08/15] KVM: x86: Fix KVM clock precision in __get_kvmclock() David Woodhouse
2024-04-27 11:05 ` [PATCH v2 09/15] KVM: x86: Fix software TSC upscaling in kvm_update_guest_time() David Woodhouse
2024-04-27 11:05 ` [PATCH v2 10/15] KVM: x86: Simplify and comment kvm_get_time_scale() David Woodhouse
2024-04-27 11:05 ` [PATCH v2 11/15] KVM: x86: Remove implicit rdtsc() from kvm_compute_l1_tsc_offset() David Woodhouse
2024-04-27 11:05 ` [PATCH v2 12/15] KVM: x86: Improve synchronization in kvm_synchronize_tsc() David Woodhouse
2024-04-27 11:05 ` [PATCH v2 13/15] KVM: x86: Kill cur_tsc_{nsec,offset,write} fields David Woodhouse
2024-05-10  9:03   ` Chenyi Qiang
2024-05-14 13:17     ` David Woodhouse
2024-04-27 11:05 ` [PATCH v2 14/15] KVM: x86: Allow KVM master clock mode when TSCs are offset from each other David Woodhouse
2024-04-27 11:05 ` [PATCH v2 15/15] KVM: x86: Factor out kvm_use_master_clock() David Woodhouse
2024-05-01 17:55   ` Chen, Zide
2024-05-01 20:45     ` David Woodhouse

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=51dea3a632131d9a49af3991a633f26ce8592dd3.camel@infradead.org \
    --to=dwmw2@infradead.org \
    --cc=bp@alien8.de \
    --cc=corbet@lwn.net \
    --cc=dave.hansen@linux.intel.com \
    --cc=dongli.zhang@oracle.com \
    --cc=hpa@zytor.com \
    --cc=jalliste@amazon.co.uk \
    --cc=kvm@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=mtosatti@redhat.com \
    --cc=oliver.upton@linux.dev \
    --cc=paul@xen.org \
    --cc=pbonzini@redhat.com \
    --cc=seanjc@google.com \
    --cc=shuah@kernel.org \
    --cc=sveith@amazon.de \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    --cc=zide.chen@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).