KVM Archive mirror
 help / color / mirror / Atom feed
From: "Vineeth Pillai (Google)" <vineeth@bitbyteword.org>
To: Ben Segall <bsegall@google.com>, Borislav Petkov <bp@alien8.de>,
	Daniel Bristot de Oliveira <bristot@redhat.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	"H . Peter Anvin" <hpa@zytor.com>, Ingo Molnar <mingo@redhat.com>,
	Juri Lelli <juri.lelli@redhat.com>, Mel Gorman <mgorman@suse.de>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Andy Lutomirski <luto@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Sean Christopherson <seanjc@google.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Valentin Schneider <vschneid@redhat.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Vitaly Kuznetsov <vkuznets@redhat.com>,
	Wanpeng Li <wanpengli@tencent.com>
Cc: "Vineeth Pillai (Google)" <vineeth@bitbyteword.org>,
	Steven Rostedt <rostedt@goodmis.org>,
	Joel Fernandes <joel@joelfernandes.org>,
	Suleiman Souhlal <suleiman@google.com>,
	Masami Hiramatsu <mhiramat@kernel.org>,
	himadrics@inria.fr, kvm@vger.kernel.org,
	linux-kernel@vger.kernel.org, x86@kernel.org
Subject: [RFC PATCH v2 0/5] Paravirt Scheduling (Dynamic vcpu priority management)
Date: Wed,  3 Apr 2024 10:01:11 -0400	[thread overview]
Message-ID: <20240403140116.3002809-1-vineeth@bitbyteword.org> (raw)

Double scheduling is a concern with virtualization hosts where the host
schedules vcpus without knowing whats run by the vcpu and guest schedules
tasks without knowing where the vcpu is physically running. This causes
issues related to latencies, power consumption, resource utilization
etc. An ideal solution would be to have a cooperative scheduling
framework where the guest and host shares scheduling related information
and makes an educated scheduling decision to optimally handle the
workloads. As a first step, we are taking a stab at reducing latencies
for latency sensitive workloads in the guest.

v1 RFC[1] was posted in December 2023. The main disagreement was in the
implementation where the patch was making scheduling policy decisions
in kvm and kvm is not the right place to do it. The suggestion was to
move the polcy decisions outside of kvm and let kvm only handle the
notifications needed to make the policy decisions. This patch series is
an iterative step towards implementing the feature as a layered
design where the policy could be implemented outside of kvm as a
kernel built-in, a kernel module or a bpf program.

This design comprises mainly of 4 components:

- pvsched driver: Implements the scheduling policies. Register with
    host with a set of callbacks that hypervisor(kvm) can use to notify
    vcpu events that the driver is interested in. The callback will be
    passed in the address of shared memory so that the driver can get
    scheduling information shared by the guest and also update the
    scheduling policies set by the driver.
- kvm component: Selects the pvsched driver for a guest and notifies
    the driver via callbacks for events that the driver is interested
    in. Also interface with the guest in retreiving the shared memory
    region for sharing the scheduling information.
- host kernel component: Implements the APIs for:
    - pvsched driver for register/unregister to the host kernel, and
    - hypervisor for assingning/unassigning driver for guests.
- guest component: Implements a framework for sharing the scheduling
    information with the pvsched driver through kvm.

There is another component that we refer to as pvsched protocol. This
defines the details about shared memory layout, information sharing and
sheduling policy decisions. The protocol need not be part of the kernel
and can be defined separately based on the use case and requirements.
Both guest and the selected pvsched driver need to match the protocol
for the feature to work. Protocol shall be identified by a name and a
possible versioning scheme. Guest will advertise the protocol and then
the hypervisor can assign the driver implementing the protocol if it is
registered in the host kernel.

This patch series only implements the first 3 components. Guest side
implementation and the protocol framework shall come as a separate
series once we finalize rest of the design.

This series also implements a sample bpf program and a kernel-builtin
pvsched drivers. They do not do any real stuff now, but just skeletons
to demonstrate the feature.

Rebased on 6.8.2.

[1]: https://lwn.net/Articles/955145/

Vineeth Pillai (Google) (5):
  pvsched: paravirt scheduling framework
  kvm: Implement the paravirt sched framework for kvm
  kvm: interface for managing pvsched driver for guest VMs
  pvsched: bpf support for pvsched
  selftests/bpf: sample implementation of a bpf pvsched driver.

 Kconfig                                       |   2 +
 arch/x86/kvm/Kconfig                          |  13 +
 arch/x86/kvm/x86.c                            |   3 +
 include/linux/kvm_host.h                      |  32 +++
 include/linux/pvsched.h                       | 102 +++++++
 include/uapi/linux/kvm.h                      |   6 +
 kernel/bpf/bpf_struct_ops_types.h             |   4 +
 kernel/sysctl.c                               |  27 ++
 .../testing/selftests/bpf/progs/bpf_pvsched.c |  37 +++
 virt/Makefile                                 |   2 +-
 virt/kvm/kvm_main.c                           | 265 ++++++++++++++++++
 virt/pvsched/Kconfig                          |  12 +
 virt/pvsched/Makefile                         |   2 +
 virt/pvsched/pvsched.c                        | 215 ++++++++++++++
 virt/pvsched/pvsched_bpf.c                    | 141 ++++++++++
 15 files changed, 862 insertions(+), 1 deletion(-)
 create mode 100644 include/linux/pvsched.h
 create mode 100644 tools/testing/selftests/bpf/progs/bpf_pvsched.c
 create mode 100644 virt/pvsched/Kconfig
 create mode 100644 virt/pvsched/Makefile
 create mode 100644 virt/pvsched/pvsched.c
 create mode 100644 virt/pvsched/pvsched_bpf.c

-- 
2.40.1


             reply	other threads:[~2024-04-03 14:01 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-03 14:01 Vineeth Pillai (Google) [this message]
2024-04-03 14:01 ` [RFC PATCH v2 1/5] pvsched: paravirt scheduling framework Vineeth Pillai (Google)
2024-04-08 13:57   ` Vineeth Remanan Pillai
2024-04-03 14:01 ` [RFC PATCH v2 2/5] kvm: Implement the paravirt sched framework for kvm Vineeth Pillai (Google)
2024-04-08 13:58   ` Vineeth Remanan Pillai
2024-04-03 14:01 ` [RFC PATCH v2 3/5] kvm: interface for managing pvsched driver for guest VMs Vineeth Pillai (Google)
2024-04-08 13:59   ` Vineeth Remanan Pillai
2024-04-03 14:01 ` [RFC PATCH v2 4/5] pvsched: bpf support for pvsched Vineeth Pillai (Google)
2024-04-08 14:00   ` Vineeth Remanan Pillai
2024-04-03 14:01 ` [RFC PATCH v2 5/5] selftests/bpf: sample implementation of a bpf pvsched driver Vineeth Pillai (Google)
2024-04-08 14:01   ` Vineeth Remanan Pillai
2024-04-08 13:54 ` [RFC PATCH v2 0/5] Paravirt Scheduling (Dynamic vcpu priority management) Vineeth Remanan Pillai
2024-05-01 15:29 ` Sean Christopherson
2024-05-02 13:42   ` Vineeth Remanan Pillai

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240403140116.3002809-1-vineeth@bitbyteword.org \
    --to=vineeth@bitbyteword.org \
    --cc=bp@alien8.de \
    --cc=bristot@redhat.com \
    --cc=bsegall@google.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=himadrics@inria.fr \
    --cc=hpa@zytor.com \
    --cc=joel@joelfernandes.org \
    --cc=juri.lelli@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=mgorman@suse.de \
    --cc=mhiramat@kernel.org \
    --cc=mingo@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=seanjc@google.com \
    --cc=suleiman@google.com \
    --cc=tglx@linutronix.de \
    --cc=vincent.guittot@linaro.org \
    --cc=vkuznets@redhat.com \
    --cc=vschneid@redhat.com \
    --cc=wanpengli@tencent.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).