Re: [PATCH 3/4] KVM: Register cpuhp and syscore callbacks when enabling hardware

KVM Archive mirror
 help / color / mirror / Atom feed

From: "Huang, Kai" <kai.huang@intel.com>
To: "kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	"pbonzini@redhat.com" <pbonzini@redhat.com>,
	"seanjc@google.com" <seanjc@google.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 3/4] KVM: Register cpuhp and syscore callbacks when enabling hardware
Date: Thu, 9 May 2024 12:10:56 +0000	[thread overview]
Message-ID: <d58005fc50fcc1366b40f7ab5e68c94280307c53.camel@intel.com> (raw)
In-Reply-To: <ZjpXeyzU46I1eu0A@google.com>

On Tue, 2024-05-07 at 09:31 -0700, Sean Christopherson wrote:
> On Thu, Apr 25, 2024, Sean Christopherson wrote:
> > Register KVM's cpuhp and syscore callback when enabling virtualization
> > in hardware instead of registering the callbacks during initialization,
> > and let the CPU up/down framework invoke the inner enable/disable
> > functions.  Registering the callbacks during initialization makes things
> > more complex than they need to be, as KVM needs to be very careful about
> > handling races between enabling CPUs being onlined/offlined and hardware
> > being enabled/disabled.
> > 
> > Intel TDX support will require KVM to enable virtualization during KVM
> > initialization, i.e. will add another wrinkle to things, at which point
> > sorting out the potential races with kvm_usage_count would become even
> > more complex.
> > +static int hardware_enable_all(void)
> > +{
> > +	int r;
> > +
> > +	guard(mutex)(&kvm_lock);
> > +
> > +	if (kvm_usage_count++)
> > +		return 0;
> > +
> > +	r = cpuhp_setup_state(CPUHP_AP_KVM_ONLINE, "kvm/cpu:online",
> > +			      kvm_online_cpu, kvm_offline_cpu);
> > +	if (r)
> > +		return r;
> 
> There's a lock ordering issue here.  KVM currently takes kvm_lock inside
> cpu_hotplug_lock, but this code does the opposite.  I need to take a closer look
> at the locking, as I'm not entirely certain that the existing ordering is correct
> or ideal.  
> 

Do you mean currently (upstream) hardware_enable_all() takes
cpus_read_lock() first and then kvm_lock?

For this one I think the cpus_read_lock() must be taken outside of
kvm_lock, because the kvm_online_cpu() also takes kvm_lock.  Switching the
order in hardware_enable_all() can result in deadlock.

For example, when CPU 0 is doing hardware_enable_all(), CPU 1 tries to
bring up CPU 2 between kvm_lock and cpus_read_lock() in CPU 0:

cpu 0 			   cpu 1 		cpu 2

(hardware_enable_all())	   (online cpu 2)	(kvm_online_cpu())

1) mutex_lock(&kvm_lock);	   

			   2) cpus_write_lock();
			      bringup cpu 2

						4) mutex_lock(&kvm_lock);

3) cpus_read_lock();				...

						mutex_unlock(&kvm_lock);

			   5) cpus_write_unlock();

   ...

6) mutex_unlock(&kvm_lock);

In this case, the cpus_read_lock() in step 3) will wait for the
cpus_write_unlock() in step 5) to complete, which will wait for CPU 2 to
complete kvm_online_cpu().  But kvm_online_cpu() on CPU 2 will in turn
wait for CPU 0 to release the kvm_lock, so deadlock.

But with the code change in this patch, the kvm_online_cpu() doesn't take
the kvm_lock anymore, so to me it looks it's OK to take cpus_read_lock()
inside kvm_lock.

Btw, even in the current upstream code, IIUC the cpus_read_lock() isn't
absolutely necessary.  It was introduced to prevent running
hardware_enable_nolock() from on_each_cpu() IPI call for the new cpu
before kvm_online_cpu() is invoked.  But due to both hardware_enable_all()
and kvm_online_cpu() both grabs kvm_lock, the hardware_enable_nolock()
inside the kvm_online_cpu() will always wait for hardware_enable_all() to
complete, so the worst case is hardware_enable_nolock() is called twice. 
But this is fine because the second call will basically do nothing due to
the @hardware_enabled per-cpu variable.

> E.g. cpu_hotplug_lock is taken when updating static keys, static calls,
> etc., which makes taking cpu_hotplug_lock outside kvm_lock dicey, as flows that
> take kvm_lock then need to be very careful to never trigger seemingly innocuous
> updates.
> 
> And this lockdep splat that I've now hit twice with the current implementation
> suggests that cpu_hotplug_lock => kvm_lock is already unsafe/broken (I need to
> re-decipher the splat; I _think_ mostly figured it out last week, but then forgot
> over the weekend).

I think if we remove the kvm_lock in kvm_online_cpu(), it's OK to hold
cpus_read_lock() (call the cpuhp_setup_state()) inside the kvm_lock.

If so, maybe we can just have a rule that cpus_read_lock() cannot be hold
outside of kvm_lock.

next prev parent reply	other threads:[~2024-05-09 12:10 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-25 23:39 [PATCH 0/4] KVM: Register cpuhp/syscore callbacks when enabling virt Sean Christopherson
2024-04-25 23:39 ` [PATCH 1/4] x86/reboot: Unconditionally define cpu_emergency_virt_cb typedef Sean Christopherson
2024-05-13 12:50   ` Huang, Kai
2024-05-13 16:01     ` Sean Christopherson
2024-05-13 22:44       ` Huang, Kai
2024-05-14 22:41         ` Huang, Kai
2024-04-25 23:39 ` [PATCH 2/4] KVM: x86: Register emergency virt callback in common code, via kvm_x86_ops Sean Christopherson
2024-04-26  8:52   ` Chao Gao
2024-04-26 17:08     ` Sean Christopherson
2024-05-13 12:55       ` Huang, Kai
2024-05-13 16:17         ` Sean Christopherson
2024-04-25 23:39 ` [PATCH 3/4] KVM: Register cpuhp and syscore callbacks when enabling hardware Sean Christopherson
2024-04-26  8:32   ` Chao Gao
2024-04-26 17:07     ` Sean Christopherson
2024-05-07 16:31   ` Sean Christopherson
2024-05-09 12:10     ` Huang, Kai [this message]
2024-05-13 12:56   ` Huang, Kai
2024-04-25 23:39 ` [PATCH 4/4] KVM: Rename functions related to enabling virtualization hardware Sean Christopherson
2024-05-13 12:59   ` Huang, Kai
2024-05-13 16:20     ` Sean Christopherson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d58005fc50fcc1366b40f7ab5e68c94280307c53.camel@intel.com \
    --to=kai.huang@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=seanjc@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).