From: "Huang, Kai" <kai.huang@intel.com>
To: "kvm@vger.kernel.org" <kvm@vger.kernel.org>,
"pbonzini@redhat.com" <pbonzini@redhat.com>,
"seanjc@google.com" <seanjc@google.com>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 3/4] KVM: Register cpuhp and syscore callbacks when enabling hardware
Date: Thu, 9 May 2024 12:10:56 +0000 [thread overview]
Message-ID: <d58005fc50fcc1366b40f7ab5e68c94280307c53.camel@intel.com> (raw)
In-Reply-To: <ZjpXeyzU46I1eu0A@google.com>
On Tue, 2024-05-07 at 09:31 -0700, Sean Christopherson wrote:
> On Thu, Apr 25, 2024, Sean Christopherson wrote:
> > Register KVM's cpuhp and syscore callback when enabling virtualization
> > in hardware instead of registering the callbacks during initialization,
> > and let the CPU up/down framework invoke the inner enable/disable
> > functions. Registering the callbacks during initialization makes things
> > more complex than they need to be, as KVM needs to be very careful about
> > handling races between enabling CPUs being onlined/offlined and hardware
> > being enabled/disabled.
> >
> > Intel TDX support will require KVM to enable virtualization during KVM
> > initialization, i.e. will add another wrinkle to things, at which point
> > sorting out the potential races with kvm_usage_count would become even
> > more complex.
> > +static int hardware_enable_all(void)
> > +{
> > + int r;
> > +
> > + guard(mutex)(&kvm_lock);
> > +
> > + if (kvm_usage_count++)
> > + return 0;
> > +
> > + r = cpuhp_setup_state(CPUHP_AP_KVM_ONLINE, "kvm/cpu:online",
> > + kvm_online_cpu, kvm_offline_cpu);
> > + if (r)
> > + return r;
>
> There's a lock ordering issue here. KVM currently takes kvm_lock inside
> cpu_hotplug_lock, but this code does the opposite. I need to take a closer look
> at the locking, as I'm not entirely certain that the existing ordering is correct
> or ideal.
>
Do you mean currently (upstream) hardware_enable_all() takes
cpus_read_lock() first and then kvm_lock?
For this one I think the cpus_read_lock() must be taken outside of
kvm_lock, because the kvm_online_cpu() also takes kvm_lock. Switching the
order in hardware_enable_all() can result in deadlock.
For example, when CPU 0 is doing hardware_enable_all(), CPU 1 tries to
bring up CPU 2 between kvm_lock and cpus_read_lock() in CPU 0:
cpu 0 cpu 1 cpu 2
(hardware_enable_all()) (online cpu 2) (kvm_online_cpu())
1) mutex_lock(&kvm_lock);
2) cpus_write_lock();
bringup cpu 2
4) mutex_lock(&kvm_lock);
3) cpus_read_lock(); ...
mutex_unlock(&kvm_lock);
5) cpus_write_unlock();
...
6) mutex_unlock(&kvm_lock);
In this case, the cpus_read_lock() in step 3) will wait for the
cpus_write_unlock() in step 5) to complete, which will wait for CPU 2 to
complete kvm_online_cpu(). But kvm_online_cpu() on CPU 2 will in turn
wait for CPU 0 to release the kvm_lock, so deadlock.
But with the code change in this patch, the kvm_online_cpu() doesn't take
the kvm_lock anymore, so to me it looks it's OK to take cpus_read_lock()
inside kvm_lock.
Btw, even in the current upstream code, IIUC the cpus_read_lock() isn't
absolutely necessary. It was introduced to prevent running
hardware_enable_nolock() from on_each_cpu() IPI call for the new cpu
before kvm_online_cpu() is invoked. But due to both hardware_enable_all()
and kvm_online_cpu() both grabs kvm_lock, the hardware_enable_nolock()
inside the kvm_online_cpu() will always wait for hardware_enable_all() to
complete, so the worst case is hardware_enable_nolock() is called twice.
But this is fine because the second call will basically do nothing due to
the @hardware_enabled per-cpu variable.
> E.g. cpu_hotplug_lock is taken when updating static keys, static calls,
> etc., which makes taking cpu_hotplug_lock outside kvm_lock dicey, as flows that
> take kvm_lock then need to be very careful to never trigger seemingly innocuous
> updates.
>
> And this lockdep splat that I've now hit twice with the current implementation
> suggests that cpu_hotplug_lock => kvm_lock is already unsafe/broken (I need to
> re-decipher the splat; I _think_ mostly figured it out last week, but then forgot
> over the weekend).
I think if we remove the kvm_lock in kvm_online_cpu(), it's OK to hold
cpus_read_lock() (call the cpuhp_setup_state()) inside the kvm_lock.
If so, maybe we can just have a rule that cpus_read_lock() cannot be hold
outside of kvm_lock.
next prev parent reply other threads:[~2024-05-09 12:10 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-04-25 23:39 [PATCH 0/4] KVM: Register cpuhp/syscore callbacks when enabling virt Sean Christopherson
2024-04-25 23:39 ` [PATCH 1/4] x86/reboot: Unconditionally define cpu_emergency_virt_cb typedef Sean Christopherson
2024-05-13 12:50 ` Huang, Kai
2024-05-13 16:01 ` Sean Christopherson
2024-05-13 22:44 ` Huang, Kai
2024-05-14 22:41 ` Huang, Kai
2024-04-25 23:39 ` [PATCH 2/4] KVM: x86: Register emergency virt callback in common code, via kvm_x86_ops Sean Christopherson
2024-04-26 8:52 ` Chao Gao
2024-04-26 17:08 ` Sean Christopherson
2024-05-13 12:55 ` Huang, Kai
2024-05-13 16:17 ` Sean Christopherson
2024-04-25 23:39 ` [PATCH 3/4] KVM: Register cpuhp and syscore callbacks when enabling hardware Sean Christopherson
2024-04-26 8:32 ` Chao Gao
2024-04-26 17:07 ` Sean Christopherson
2024-05-07 16:31 ` Sean Christopherson
2024-05-09 12:10 ` Huang, Kai [this message]
2024-05-13 12:56 ` Huang, Kai
2024-04-25 23:39 ` [PATCH 4/4] KVM: Rename functions related to enabling virtualization hardware Sean Christopherson
2024-05-13 12:59 ` Huang, Kai
2024-05-13 16:20 ` Sean Christopherson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=d58005fc50fcc1366b40f7ab5e68c94280307c53.camel@intel.com \
--to=kai.huang@intel.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=pbonzini@redhat.com \
--cc=seanjc@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).