From: Sean Christopherson <seanjc@google.com>
To: David Matlack <dmatlack@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>, kvm@vger.kernel.org
Subject: Re: [PATCH v2] KVM: Mark a vCPU as preempted/ready iff it's scheduled out while running
Date: Fri, 26 Apr 2024 14:01:15 -0700 [thread overview]
Message-ID: <ZiwWG4iHQYREwFP2@google.com> (raw)
In-Reply-To: <20240307163541.92138-1-dmatlack@google.com>
On Thu, Mar 07, 2024, David Matlack wrote:
> Mark a vCPU as preempted/ready if-and-only-if it's scheduled out while
> running. i.e. Do not mark a vCPU preempted/ready if it's scheduled out
> during a non-KVM_RUN ioctl() or when userspace is doing KVM_RUN with
> immediate_exit.
>
> Commit 54aa83c90198 ("KVM: x86: do not set st->preempted when going back
> to user space") stopped marking a vCPU as preempted when returning to
> userspace, but if userspace then invokes a KVM vCPU ioctl() that gets
> preempted, the vCPU will be marked preempted/ready. This is arguably
> incorrect behavior since the vCPU was not actually preempted while the
> guest was running, it was preempted while doing something on behalf of
> userspace.
>
> This commit also avoids KVM dirtying guest memory after userspace has
> paused vCPUs, e.g. for Live Migration, which allows userspace to collect
> the final dirty bitmap before or in parallel with saving vCPU state
> without having to worry about saving vCPU state triggering writes to
> guest memory.
>
> Suggested-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: David Matlack <dmatlack@google.com>
> ---
> v2:
> - Drop Google-specific "PRODKERNEL: " shortlog prefix
>
> v1: https://lore.kernel.org/kvm/20231218185850.1659570-1-dmatlack@google.com/
>
> include/linux/kvm_host.h | 1 +
> virt/kvm/kvm_main.c | 5 ++++-
> 2 files changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 7e7fd25b09b3..5b2300614d22 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -378,6 +378,7 @@ struct kvm_vcpu {
> bool dy_eligible;
> } spin_loop;
> #endif
> + bool wants_to_run;
> bool preempted;
> bool ready;
> struct kvm_vcpu_arch arch;
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index ff588677beb7..3da1b2e3785d 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -4438,7 +4438,10 @@ static long kvm_vcpu_ioctl(struct file *filp,
> synchronize_rcu();
> put_pid(oldpid);
> }
> + vcpu->wants_to_run = !vcpu->run->immediate_exit;
> r = kvm_arch_vcpu_ioctl_run(vcpu);
> + vcpu->wants_to_run = false;
> +
> trace_kvm_userspace_exit(vcpu->run->exit_reason, r);
> break;
> }
> @@ -6312,7 +6315,7 @@ static void kvm_sched_out(struct preempt_notifier *pn,
> {
> struct kvm_vcpu *vcpu = preempt_notifier_to_vcpu(pn);
>
> - if (current->on_rq) {
> + if (current->on_rq && vcpu->wants_to_run) {
> WRITE_ONCE(vcpu->preempted, true);
> WRITE_ONCE(vcpu->ready, true);
> }
>
> base-commit: 687d8f4c3dea0758afd748968d91288220bbe7e3
Long story short, I was playing around with wants_to_run for a few hairbrained
ideas, and realized that there's a TOCTOU bug here. Userspace can toggle
run->immediate_exit at will, e.g. can clear it after the kernel loads it to
compute vcpu->wants_to_run.
That's not fatal for this use case, since userspace would only be shooting itself
in the foot, but it leaves a very dangerous landmine, e.g. if something else in
KVM keys off of vcpu->wants_to_run to detect that KVM is in its run loop, i.e.
relies on wants_to_run being set if KVM is in its core run loop.
To address that, I think we should have all architectures check wants_to_run, not
immediate_exit. And loading immediate_exit needs to be done with READ_ONCE().
E.g. for x86 (every other arch has similar code)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index e9ef1fa4b90b..1a2f6bf14fb2 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -11396,7 +11396,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
kvm_vcpu_srcu_read_lock(vcpu);
if (unlikely(vcpu->arch.mp_state == KVM_MP_STATE_UNINITIALIZED)) {
- if (kvm_run->immediate_exit) {
+ if (!vcpu->wants_to_run) {
r = -EINTR;
goto out;
}
@@ -11474,7 +11474,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
WARN_ON_ONCE(vcpu->mmio_needed);
}
- if (kvm_run->immediate_exit) {
+ if (!vcpu->wants_to_run) {
r = -EINTR;
goto out;
}
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index f9b9ce0c3cd9..0c0aae224000 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1497,9 +1497,7 @@ int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu,
struct kvm_guest_debug *dbg);
int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu);
-void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu);
-
-void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu);
+void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu, bool sched_in);
void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu);
int kvm_arch_vcpu_precreate(struct kvm *kvm, unsigned int id);
int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 9501fbd5dfd2..4384bbdba65c 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -4410,7 +4410,7 @@ static long kvm_vcpu_ioctl(struct file *filp,
synchronize_rcu();
put_pid(oldpid);
}
- vcpu->wants_to_run = !vcpu->run->immediate_exit;
+ vcpu->wants_to_run = !READ_ONCE(vcpu->run->immediate_exit);
r = kvm_arch_vcpu_ioctl_run(vcpu);
vcpu->wants_to_run = false;
---
Hmm, and we should probably go a step further and actively prevent using
immediate_exit from the kernel, e.g. rename it to something scary like:
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 2190adbe3002..9c5fe1dae744 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -196,7 +196,11 @@ struct kvm_xen_exit {
struct kvm_run {
/* in */
__u8 request_interrupt_window;
+#ifndef __KERNEL__
__u8 immediate_exit;
+#else
+ __u8 hidden_do_not_touch;
+#endif
__u8 padding1[6];
/* out */
next prev parent reply other threads:[~2024-04-26 21:01 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-03-07 16:35 [PATCH v2] KVM: Mark a vCPU as preempted/ready iff it's scheduled out while running David Matlack
2024-04-02 16:41 ` David Matlack
2024-04-26 21:01 ` Sean Christopherson [this message]
2024-04-29 17:22 ` David Matlack
2024-04-29 18:05 ` Sean Christopherson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZiwWG4iHQYREwFP2@google.com \
--to=seanjc@google.com \
--cc=dmatlack@google.com \
--cc=kvm@vger.kernel.org \
--cc=pbonzini@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).