KVM Archive mirror
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: Breno Leitao <leitao@debian.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
	rbc@meta.com, paulmck@kernel.org,
	 "open list:KERNEL VIRTUAL MACHINE (KVM)" <kvm@vger.kernel.org>,
	open list <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] KVM: Addressing a possible race in kvm_vcpu_on_spin:
Date: Fri, 10 May 2024 07:39:14 -0700	[thread overview]
Message-ID: <Zj4xkoMZh8zJdKyq@google.com> (raw)
In-Reply-To: <Zj3ksShWaFSWstii@gmail.com>

On Fri, May 10, 2024, Breno Leitao wrote:
> > IMO, reworking it to be like this is more straightforward:
> > 
> > 	int nr_vcpus, start, i, idx, yielded;
> > 	struct kvm *kvm = me->kvm;
> > 	struct kvm_vcpu *vcpu;
> > 	int try = 3;
> > 
> > 	nr_vcpus = atomic_read(&kvm->online_vcpus);
> > 	if (nr_vcpus < 2)
> > 		return;
> > 
> > 	/* Pairs with the smp_wmb() in kvm_vm_ioctl_create_vcpu(). */
> > 	smp_rmb();
> 
> Why do you need this now? Isn't the RCU read lock in xa_load() enough?

No.  RCU read lock doesn't suffice, because on kernels without PREEMPT_COUNT
rcu_read_lock() may be a literal nop.  There may be a _compiler_ barrier, but
smp_rmb() requires more than a compiler barrier on many architectures.

And just as importantly, KVM shouldn't be relying on the inner details of other
code without a hard guarantee of that behavior.  E.g. KVM does rely on
srcu_read_unlock() to provide a full memory barrier, but KVM does so through an
"official" API, smp_mb__after_srcu_read_unlock().

> > 	kvm_vcpu_set_in_spin_loop(me, true);
> > 
> > 	start = READ_ONCE(kvm->last_boosted_vcpu) + 1;
> > 	for (i = 0; i < nr_vcpus; i++) {
> 
> Why do you need to started at the last boosted vcpu? I.e, why not
> starting at 0 and skipping me->vcpu_idx and kvm->last_boosted_vcpu?

To provide round-robin style yielding in order to (hopefully) yield to the vCPU
that is holding a spinlock (or some other asset that is causing a vCPU to spin
in kernel mode).

E.g. if there are 4 vCPUs all running on a single CPU, vCPU3 gets preempted while
holding a spinlock, and all vCPUs are contented for said spinlock then starting
at vCPU0 every time would result in vCPU1 yielding to vCPU0, and vCPU0 yielding
back to vCPU1, indefinitely.

Starting at the last boosted vCPU instead results in vCPU0 yielding to vCPU1,
vCPU1 yielding to vCPU2, and vCPU2 yielding to vCPU3, thus getting back to the
vCPU that holds the spinlock soon-ish.

I'm sure more sophisticated/performant approaches are possible, but they would
likely be more complex, require persistent state (a.k.a. memory usage), and/or
need knowledge of the workload being run.

  reply	other threads:[~2024-05-10 14:39 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-09  9:01 [PATCH] KVM: Addressing a possible race in kvm_vcpu_on_spin: Breno Leitao
2024-05-09 16:42 ` Sean Christopherson
2024-05-10  9:11   ` Breno Leitao
2024-05-10 14:39     ` Sean Christopherson [this message]
2024-05-10 15:52       ` Breno Leitao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Zj4xkoMZh8zJdKyq@google.com \
    --to=seanjc@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=leitao@debian.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=paulmck@kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=rbc@meta.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).