KVM Archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/5] KVM: arm64: Move host-specific data out of kvm_vcpu_arch
@ 2024-03-02 11:19 Marc Zyngier
  2024-03-02 11:19 ` [PATCH 1/5] KVM: arm64: Add accessor for per-CPU state Marc Zyngier
                   ` (4 more replies)
  0 siblings, 5 replies; 18+ messages in thread
From: Marc Zyngier @ 2024-03-02 11:19 UTC (permalink / raw
  To: kvmarm, linux-arm-kernel, kvm
  Cc: James Morse, Suzuki K Poulose, Oliver Upton, Zenghui Yu,
	James Clark, Anshuman Khandual, Mark Brown

It appears that over the years, we have accumulated a lot of cruft in
the kvm_vcpu_arch structure. Part of the gunk is data that is strictly
host CPU specific, and this result in two main problems:

- the structure itself is stupidly large, over 8kB. With the
  arch-agnostic kvm_vcpu, we're above 10kB, which is insane. This has
  some ripple effects, as we need physically contiguous allocation to
  be able to map it at EL2 for !VHE. There is more to it though, as
  some data structures, although per-vcpu, could be allocated
  separately.

- We lose track of the life-cycle of this data, because we're
  guaranteed that it will be around forever and we start relying on
  wrong assumptions. This is becoming a maintenance burden.

This series rectifies some of these things, starting with the two main
offenders: debug and FP, a lot of which gets pushed out to the per-CPU
host structure. Indeed, their lifetime really isn't that of the vcpu,
but tied to the physical CPU the vpcu runs on.

This results in a small reduction of the vcpu size, but mainly a much
clearer understanding of the life-cycle of these structures.

Patches against v6.8-rc6.

Marc Zyngier (5):
  KVM: arm64: Add accessor for per-CPU state
  KVM: arm64: Exclude host_debug_data from vcpu_arch
  KVM: arm64: Exclude mdcr_el2_host from kvm_vcpu_arch
  KVM: arm64: Exclude host_fpsimd_state pointer from kvm_vcpu_arch
  KVM: arm64: Exclude FP ownership from kvm_vcpu_arch

 arch/arm64/include/asm/kvm_emulate.h      |  4 +-
 arch/arm64/include/asm/kvm_host.h         | 65 ++++++++++++++---------
 arch/arm64/kvm/arm.c                      |  8 +--
 arch/arm64/kvm/fpsimd.c                   | 13 +++--
 arch/arm64/kvm/hyp/include/hyp/debug-sr.h |  8 +--
 arch/arm64/kvm/hyp/include/hyp/switch.h   | 23 ++++----
 arch/arm64/kvm/hyp/nvhe/debug-sr.c        |  8 +--
 arch/arm64/kvm/hyp/nvhe/hyp-main.c        |  3 --
 arch/arm64/kvm/hyp/nvhe/psci-relay.c      |  2 +-
 arch/arm64/kvm/hyp/nvhe/setup.c           |  3 +-
 arch/arm64/kvm/hyp/nvhe/switch.c          |  6 +--
 arch/arm64/kvm/hyp/vhe/switch.c           |  6 +--
 arch/arm64/kvm/hyp/vhe/sysreg-sr.c        |  4 +-
 arch/arm64/kvm/pmu.c                      |  2 +-
 14 files changed, 79 insertions(+), 76 deletions(-)

-- 
2.39.2


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 1/5] KVM: arm64: Add accessor for per-CPU state
  2024-03-02 11:19 [PATCH 0/5] KVM: arm64: Move host-specific data out of kvm_vcpu_arch Marc Zyngier
@ 2024-03-02 11:19 ` Marc Zyngier
  2024-03-04 12:05   ` Suzuki K Poulose
  2024-03-11  4:50   ` Dongli Zhang
  2024-03-02 11:19 ` [PATCH 2/5] KVM: arm64: Exclude host_debug_data from vcpu_arch Marc Zyngier
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 18+ messages in thread
From: Marc Zyngier @ 2024-03-02 11:19 UTC (permalink / raw
  To: kvmarm, linux-arm-kernel, kvm
  Cc: James Morse, Suzuki K Poulose, Oliver Upton, Zenghui Yu,
	James Clark, Anshuman Khandual, Mark Brown

In order to facilitate the introduction of new per-CPU state,
add a new host_data_ptr() helped that hides some of the per-CPU
verbosity, and make it easier to move that state around in the
future.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_host.h         | 13 +++++++++++++
 arch/arm64/kvm/arm.c                      |  2 +-
 arch/arm64/kvm/hyp/include/hyp/debug-sr.h |  4 ++--
 arch/arm64/kvm/hyp/include/hyp/switch.h   | 11 +++++------
 arch/arm64/kvm/hyp/nvhe/psci-relay.c      |  2 +-
 arch/arm64/kvm/hyp/nvhe/setup.c           |  3 +--
 arch/arm64/kvm/hyp/nvhe/switch.c          |  4 ++--
 arch/arm64/kvm/hyp/vhe/switch.c           |  4 ++--
 arch/arm64/kvm/hyp/vhe/sysreg-sr.c        |  4 ++--
 arch/arm64/kvm/pmu.c                      |  2 +-
 10 files changed, 30 insertions(+), 19 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 21c57b812569..3ca2a9444f21 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -492,6 +492,17 @@ struct kvm_cpu_context {
 	u64 *vncr_array;
 };
 
+/*
+ * This structure is instanciated on a per-CPU basis, and contains
+ * data that is:
+ *
+ * - tied to a single physical CPU, and
+ * - either have a lifetime that does not extend past vcpu_put()
+ * - or is an invariant for the lifetime of the system
+ *
+ * Use host_data_ptr(field) as a way to access a pointer to such a
+ * field.
+ */
 struct kvm_host_data {
 	struct kvm_cpu_context host_ctxt;
 };
@@ -1114,6 +1125,8 @@ struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr);
 
 DECLARE_KVM_HYP_PER_CPU(struct kvm_host_data, kvm_host_data);
 
+#define host_data_ptr(f)	(&this_cpu_ptr(&kvm_host_data)->f)
+
 static inline void kvm_init_host_cpu_context(struct kvm_cpu_context *cpu_ctxt)
 {
 	/* The host's MPIDR is immutable, so let's set it up at boot time */
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index a25265aca432..d3d14cc41cb5 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -1960,7 +1960,7 @@ static void cpu_set_hyp_vector(void)
 
 static void cpu_hyp_init_context(void)
 {
-	kvm_init_host_cpu_context(&this_cpu_ptr_hyp_sym(kvm_host_data)->host_ctxt);
+	kvm_init_host_cpu_context(host_data_ptr(host_ctxt));
 
 	if (!is_kernel_in_hyp_mode())
 		cpu_init_hyp_mode();
diff --git a/arch/arm64/kvm/hyp/include/hyp/debug-sr.h b/arch/arm64/kvm/hyp/include/hyp/debug-sr.h
index 961bbef104a6..eec0f8ccda56 100644
--- a/arch/arm64/kvm/hyp/include/hyp/debug-sr.h
+++ b/arch/arm64/kvm/hyp/include/hyp/debug-sr.h
@@ -135,7 +135,7 @@ static inline void __debug_switch_to_guest_common(struct kvm_vcpu *vcpu)
 	if (!vcpu_get_flag(vcpu, DEBUG_DIRTY))
 		return;
 
-	host_ctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt;
+	host_ctxt = host_data_ptr(host_ctxt);
 	guest_ctxt = &vcpu->arch.ctxt;
 	host_dbg = &vcpu->arch.host_debug_state.regs;
 	guest_dbg = kern_hyp_va(vcpu->arch.debug_ptr);
@@ -154,7 +154,7 @@ static inline void __debug_switch_to_host_common(struct kvm_vcpu *vcpu)
 	if (!vcpu_get_flag(vcpu, DEBUG_DIRTY))
 		return;
 
-	host_ctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt;
+	host_ctxt = host_data_ptr(host_ctxt);
 	guest_ctxt = &vcpu->arch.ctxt;
 	host_dbg = &vcpu->arch.host_debug_state.regs;
 	guest_dbg = kern_hyp_va(vcpu->arch.debug_ptr);
diff --git a/arch/arm64/kvm/hyp/include/hyp/switch.h b/arch/arm64/kvm/hyp/include/hyp/switch.h
index a038320cdb08..9405a0c9b4c3 100644
--- a/arch/arm64/kvm/hyp/include/hyp/switch.h
+++ b/arch/arm64/kvm/hyp/include/hyp/switch.h
@@ -81,8 +81,7 @@ static inline void __activate_traps_fpsimd32(struct kvm_vcpu *vcpu)
 
 #define update_fgt_traps_cs(vcpu, reg, clr, set)			\
 	do {								\
-		struct kvm_cpu_context *hctxt =				\
-			&this_cpu_ptr(&kvm_host_data)->host_ctxt;	\
+		struct kvm_cpu_context *hctxt =	host_data_ptr(host_ctxt);\
 		u64 c = 0, s = 0;					\
 									\
 		ctxt_sys_reg(hctxt, reg) = read_sysreg_s(SYS_ ## reg);	\
@@ -121,7 +120,7 @@ static inline bool cpu_has_amu(void)
 
 static inline void __activate_traps_hfgxtr(struct kvm_vcpu *vcpu)
 {
-	struct kvm_cpu_context *hctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt;
+	struct kvm_cpu_context *hctxt = host_data_ptr(host_ctxt);
 	u64 r_clr = 0, w_clr = 0, r_set = 0, w_set = 0, tmp;
 	u64 r_val, w_val;
 
@@ -185,7 +184,7 @@ static inline void __activate_traps_hfgxtr(struct kvm_vcpu *vcpu)
 
 static inline void __deactivate_traps_hfgxtr(struct kvm_vcpu *vcpu)
 {
-	struct kvm_cpu_context *hctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt;
+	struct kvm_cpu_context *hctxt = host_data_ptr(host_ctxt);
 
 	if (!cpus_have_final_cap(ARM64_HAS_FGT))
 		return;
@@ -220,7 +219,7 @@ static inline void __activate_traps_common(struct kvm_vcpu *vcpu)
 
 		write_sysreg(0, pmselr_el0);
 
-		hctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt;
+		hctxt = host_data_ptr(host_ctxt);
 		ctxt_sys_reg(hctxt, PMUSERENR_EL0) = read_sysreg(pmuserenr_el0);
 		write_sysreg(ARMV8_PMU_USERENR_MASK, pmuserenr_el0);
 		vcpu_set_flag(vcpu, PMUSERENR_ON_CPU);
@@ -254,7 +253,7 @@ static inline void __deactivate_traps_common(struct kvm_vcpu *vcpu)
 	if (kvm_arm_support_pmu_v3()) {
 		struct kvm_cpu_context *hctxt;
 
-		hctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt;
+		hctxt = host_data_ptr(host_ctxt);
 		write_sysreg(ctxt_sys_reg(hctxt, PMUSERENR_EL0), pmuserenr_el0);
 		vcpu_clear_flag(vcpu, PMUSERENR_ON_CPU);
 	}
diff --git a/arch/arm64/kvm/hyp/nvhe/psci-relay.c b/arch/arm64/kvm/hyp/nvhe/psci-relay.c
index d57bcb6ab94d..dfe8fe0f7eaf 100644
--- a/arch/arm64/kvm/hyp/nvhe/psci-relay.c
+++ b/arch/arm64/kvm/hyp/nvhe/psci-relay.c
@@ -205,7 +205,7 @@ asmlinkage void __noreturn __kvm_host_psci_cpu_entry(bool is_cpu_on)
 	struct psci_boot_args *boot_args;
 	struct kvm_cpu_context *host_ctxt;
 
-	host_ctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt;
+	host_ctxt = host_data_ptr(host_ctxt);
 
 	if (is_cpu_on)
 		boot_args = this_cpu_ptr(&cpu_on_args);
diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
index bc58d1b515af..ae00dfa80801 100644
--- a/arch/arm64/kvm/hyp/nvhe/setup.c
+++ b/arch/arm64/kvm/hyp/nvhe/setup.c
@@ -257,8 +257,7 @@ static int fix_hyp_pgtable_refcnt(void)
 
 void __noreturn __pkvm_init_finalise(void)
 {
-	struct kvm_host_data *host_data = this_cpu_ptr(&kvm_host_data);
-	struct kvm_cpu_context *host_ctxt = &host_data->host_ctxt;
+	struct kvm_cpu_context *host_ctxt = host_data_ptr(host_ctxt);
 	unsigned long nr_pages, reserved_pages, pfn;
 	int ret;
 
diff --git a/arch/arm64/kvm/hyp/nvhe/switch.c b/arch/arm64/kvm/hyp/nvhe/switch.c
index c50f8459e4fc..544a419b9a39 100644
--- a/arch/arm64/kvm/hyp/nvhe/switch.c
+++ b/arch/arm64/kvm/hyp/nvhe/switch.c
@@ -264,7 +264,7 @@ int __kvm_vcpu_run(struct kvm_vcpu *vcpu)
 		pmr_sync();
 	}
 
-	host_ctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt;
+	host_ctxt = host_data_ptr(host_ctxt);
 	host_ctxt->__hyp_running_vcpu = vcpu;
 	guest_ctxt = &vcpu->arch.ctxt;
 
@@ -367,7 +367,7 @@ asmlinkage void __noreturn hyp_panic(void)
 	struct kvm_cpu_context *host_ctxt;
 	struct kvm_vcpu *vcpu;
 
-	host_ctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt;
+	host_ctxt = host_data_ptr(host_ctxt);
 	vcpu = host_ctxt->__hyp_running_vcpu;
 
 	if (vcpu) {
diff --git a/arch/arm64/kvm/hyp/vhe/switch.c b/arch/arm64/kvm/hyp/vhe/switch.c
index 1581df6aec87..14b7a6bc5909 100644
--- a/arch/arm64/kvm/hyp/vhe/switch.c
+++ b/arch/arm64/kvm/hyp/vhe/switch.c
@@ -221,7 +221,7 @@ static int __kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
 	struct kvm_cpu_context *guest_ctxt;
 	u64 exit_code;
 
-	host_ctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt;
+	host_ctxt = host_data_ptr(host_ctxt);
 	host_ctxt->__hyp_running_vcpu = vcpu;
 	guest_ctxt = &vcpu->arch.ctxt;
 
@@ -306,7 +306,7 @@ static void __hyp_call_panic(u64 spsr, u64 elr, u64 par)
 	struct kvm_cpu_context *host_ctxt;
 	struct kvm_vcpu *vcpu;
 
-	host_ctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt;
+	host_ctxt = host_data_ptr(host_ctxt);
 	vcpu = host_ctxt->__hyp_running_vcpu;
 
 	__deactivate_traps(vcpu);
diff --git a/arch/arm64/kvm/hyp/vhe/sysreg-sr.c b/arch/arm64/kvm/hyp/vhe/sysreg-sr.c
index 8e1e0d5033b6..ae763061909a 100644
--- a/arch/arm64/kvm/hyp/vhe/sysreg-sr.c
+++ b/arch/arm64/kvm/hyp/vhe/sysreg-sr.c
@@ -67,7 +67,7 @@ void __vcpu_load_switch_sysregs(struct kvm_vcpu *vcpu)
 	struct kvm_cpu_context *guest_ctxt = &vcpu->arch.ctxt;
 	struct kvm_cpu_context *host_ctxt;
 
-	host_ctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt;
+	host_ctxt = host_data_ptr(host_ctxt);
 	__sysreg_save_user_state(host_ctxt);
 
 	/*
@@ -110,7 +110,7 @@ void __vcpu_put_switch_sysregs(struct kvm_vcpu *vcpu)
 	struct kvm_cpu_context *guest_ctxt = &vcpu->arch.ctxt;
 	struct kvm_cpu_context *host_ctxt;
 
-	host_ctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt;
+	host_ctxt = host_data_ptr(host_ctxt);
 
 	__sysreg_save_el1_state(guest_ctxt);
 	__sysreg_save_user_state(guest_ctxt);
diff --git a/arch/arm64/kvm/pmu.c b/arch/arm64/kvm/pmu.c
index a243934c5568..329819806096 100644
--- a/arch/arm64/kvm/pmu.c
+++ b/arch/arm64/kvm/pmu.c
@@ -232,7 +232,7 @@ bool kvm_set_pmuserenr(u64 val)
 	if (!vcpu || !vcpu_get_flag(vcpu, PMUSERENR_ON_CPU))
 		return false;
 
-	hctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt;
+	hctxt = host_data_ptr(host_ctxt);
 	ctxt_sys_reg(hctxt, PMUSERENR_EL0) = val;
 	return true;
 }
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 2/5] KVM: arm64: Exclude host_debug_data from vcpu_arch
  2024-03-02 11:19 [PATCH 0/5] KVM: arm64: Move host-specific data out of kvm_vcpu_arch Marc Zyngier
  2024-03-02 11:19 ` [PATCH 1/5] KVM: arm64: Add accessor for per-CPU state Marc Zyngier
@ 2024-03-02 11:19 ` Marc Zyngier
  2024-03-02 11:19 ` [PATCH 3/5] KVM: arm64: Exclude mdcr_el2_host from kvm_vcpu_arch Marc Zyngier
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 18+ messages in thread
From: Marc Zyngier @ 2024-03-02 11:19 UTC (permalink / raw
  To: kvmarm, linux-arm-kernel, kvm
  Cc: James Morse, Suzuki K Poulose, Oliver Upton, Zenghui Yu,
	James Clark, Anshuman Khandual, Mark Brown

Keeping host_debug_state on a per-vcpu basis is completely
pointless. The lifetime of this data is only that of the inner
run-loop, which means it is never accessed outside of the core
EL2 code.

Move the structure into kvm_host_data, and save over 500 bytes
per vcpu.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_host.h         | 31 +++++++++++++----------
 arch/arm64/kvm/hyp/include/hyp/debug-sr.h |  4 +--
 arch/arm64/kvm/hyp/nvhe/debug-sr.c        |  8 +++---
 3 files changed, 23 insertions(+), 20 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 3ca2a9444f21..90ea5524c545 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -505,6 +505,19 @@ struct kvm_cpu_context {
  */
 struct kvm_host_data {
 	struct kvm_cpu_context host_ctxt;
+
+	/*
+	 * host_debug_state contains the host registers which are
+	 * saved and restored during world switches.
+	 */
+	 struct {
+		/* {Break,watch}point registers */
+		struct kvm_guest_debug_arch regs;
+		/* Statistical profiling extension */
+		u64 pmscr_el1;
+		/* Self-hosted trace */
+		u64 trfcr_el1;
+	} host_debug_state;
 };
 
 struct kvm_host_psci_config {
@@ -598,11 +611,10 @@ struct kvm_vcpu_arch {
 	 * We maintain more than a single set of debug registers to support
 	 * debugging the guest from the host and to maintain separate host and
 	 * guest state during world switches. vcpu_debug_state are the debug
-	 * registers of the vcpu as the guest sees them.  host_debug_state are
-	 * the host registers which are saved and restored during
-	 * world switches. external_debug_state contains the debug
-	 * values we want to debug the guest. This is set via the
-	 * KVM_SET_GUEST_DEBUG ioctl.
+	 * registers of the vcpu as the guest sees them.
+	 *
+	 * external_debug_state contains the debug values we want to debug the
+	 * guest. This is set via the KVM_SET_GUEST_DEBUG ioctl.
 	 *
 	 * debug_ptr points to the set of debug registers that should be loaded
 	 * onto the hardware when running the guest.
@@ -614,15 +626,6 @@ struct kvm_vcpu_arch {
 	struct user_fpsimd_state *host_fpsimd_state;	/* hyp VA */
 	struct task_struct *parent_task;
 
-	struct {
-		/* {Break,watch}point registers */
-		struct kvm_guest_debug_arch regs;
-		/* Statistical profiling extension */
-		u64 pmscr_el1;
-		/* Self-hosted trace */
-		u64 trfcr_el1;
-	} host_debug_state;
-
 	/* VGIC state */
 	struct vgic_cpu vgic_cpu;
 	struct arch_timer_cpu timer_cpu;
diff --git a/arch/arm64/kvm/hyp/include/hyp/debug-sr.h b/arch/arm64/kvm/hyp/include/hyp/debug-sr.h
index eec0f8ccda56..d00093699aaf 100644
--- a/arch/arm64/kvm/hyp/include/hyp/debug-sr.h
+++ b/arch/arm64/kvm/hyp/include/hyp/debug-sr.h
@@ -137,7 +137,7 @@ static inline void __debug_switch_to_guest_common(struct kvm_vcpu *vcpu)
 
 	host_ctxt = host_data_ptr(host_ctxt);
 	guest_ctxt = &vcpu->arch.ctxt;
-	host_dbg = &vcpu->arch.host_debug_state.regs;
+	host_dbg = host_data_ptr(host_debug_state.regs);
 	guest_dbg = kern_hyp_va(vcpu->arch.debug_ptr);
 
 	__debug_save_state(host_dbg, host_ctxt);
@@ -156,7 +156,7 @@ static inline void __debug_switch_to_host_common(struct kvm_vcpu *vcpu)
 
 	host_ctxt = host_data_ptr(host_ctxt);
 	guest_ctxt = &vcpu->arch.ctxt;
-	host_dbg = &vcpu->arch.host_debug_state.regs;
+	host_dbg = host_data_ptr(host_debug_state.regs);
 	guest_dbg = kern_hyp_va(vcpu->arch.debug_ptr);
 
 	__debug_save_state(guest_dbg, guest_ctxt);
diff --git a/arch/arm64/kvm/hyp/nvhe/debug-sr.c b/arch/arm64/kvm/hyp/nvhe/debug-sr.c
index 4558c02eb352..d89eaeae1c2a 100644
--- a/arch/arm64/kvm/hyp/nvhe/debug-sr.c
+++ b/arch/arm64/kvm/hyp/nvhe/debug-sr.c
@@ -83,10 +83,10 @@ void __debug_save_host_buffers_nvhe(struct kvm_vcpu *vcpu)
 {
 	/* Disable and flush SPE data generation */
 	if (vcpu_get_flag(vcpu, DEBUG_STATE_SAVE_SPE))
-		__debug_save_spe(&vcpu->arch.host_debug_state.pmscr_el1);
+		__debug_save_spe(host_data_ptr(host_debug_state.pmscr_el1));
 	/* Disable and flush Self-Hosted Trace generation */
 	if (vcpu_get_flag(vcpu, DEBUG_STATE_SAVE_TRBE))
-		__debug_save_trace(&vcpu->arch.host_debug_state.trfcr_el1);
+		__debug_save_trace(host_data_ptr(host_debug_state.trfcr_el1));
 }
 
 void __debug_switch_to_guest(struct kvm_vcpu *vcpu)
@@ -97,9 +97,9 @@ void __debug_switch_to_guest(struct kvm_vcpu *vcpu)
 void __debug_restore_host_buffers_nvhe(struct kvm_vcpu *vcpu)
 {
 	if (vcpu_get_flag(vcpu, DEBUG_STATE_SAVE_SPE))
-		__debug_restore_spe(vcpu->arch.host_debug_state.pmscr_el1);
+		__debug_restore_spe(*host_data_ptr(host_debug_state.pmscr_el1));
 	if (vcpu_get_flag(vcpu, DEBUG_STATE_SAVE_TRBE))
-		__debug_restore_trace(vcpu->arch.host_debug_state.trfcr_el1);
+		__debug_restore_trace(*host_data_ptr(host_debug_state.trfcr_el1));
 }
 
 void __debug_switch_to_host(struct kvm_vcpu *vcpu)
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 3/5] KVM: arm64: Exclude mdcr_el2_host from kvm_vcpu_arch
  2024-03-02 11:19 [PATCH 0/5] KVM: arm64: Move host-specific data out of kvm_vcpu_arch Marc Zyngier
  2024-03-02 11:19 ` [PATCH 1/5] KVM: arm64: Add accessor for per-CPU state Marc Zyngier
  2024-03-02 11:19 ` [PATCH 2/5] KVM: arm64: Exclude host_debug_data from vcpu_arch Marc Zyngier
@ 2024-03-02 11:19 ` Marc Zyngier
  2024-03-02 11:19 ` [PATCH 4/5] KVM: arm64: Exclude host_fpsimd_state pointer " Marc Zyngier
  2024-03-02 11:19 ` [PATCH 5/5] KVM: arm64: Exclude FP ownership " Marc Zyngier
  4 siblings, 0 replies; 18+ messages in thread
From: Marc Zyngier @ 2024-03-02 11:19 UTC (permalink / raw
  To: kvmarm, linux-arm-kernel, kvm
  Cc: James Morse, Suzuki K Poulose, Oliver Upton, Zenghui Yu,
	James Clark, Anshuman Khandual, Mark Brown

As for the rest of the host debug state, the host copy of mdcr_el2
has little to do in the vcpu, and is better placed in the host_data
structure.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_host.h       | 5 ++---
 arch/arm64/kvm/hyp/include/hyp/switch.h | 4 ++--
 2 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 90ea5524c545..a3718f441e12 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -517,6 +517,8 @@ struct kvm_host_data {
 		u64 pmscr_el1;
 		/* Self-hosted trace */
 		u64 trfcr_el1;
+		/* Values of trap registers for the host before guest entry. */
+		u64 mdcr_el2;
 	} host_debug_state;
 };
 
@@ -576,9 +578,6 @@ struct kvm_vcpu_arch {
 	u64 mdcr_el2;
 	u64 cptr_el2;
 
-	/* Values of trap registers for the host before guest entry. */
-	u64 mdcr_el2_host;
-
 	/* Exception Information */
 	struct kvm_vcpu_fault_info fault;
 
diff --git a/arch/arm64/kvm/hyp/include/hyp/switch.h b/arch/arm64/kvm/hyp/include/hyp/switch.h
index 9405a0c9b4c3..8ae81301083f 100644
--- a/arch/arm64/kvm/hyp/include/hyp/switch.h
+++ b/arch/arm64/kvm/hyp/include/hyp/switch.h
@@ -225,7 +225,7 @@ static inline void __activate_traps_common(struct kvm_vcpu *vcpu)
 		vcpu_set_flag(vcpu, PMUSERENR_ON_CPU);
 	}
 
-	vcpu->arch.mdcr_el2_host = read_sysreg(mdcr_el2);
+	*host_data_ptr(host_debug_state.mdcr_el2) = read_sysreg(mdcr_el2);
 	write_sysreg(vcpu->arch.mdcr_el2, mdcr_el2);
 
 	if (cpus_have_final_cap(ARM64_HAS_HCX)) {
@@ -247,7 +247,7 @@ static inline void __activate_traps_common(struct kvm_vcpu *vcpu)
 
 static inline void __deactivate_traps_common(struct kvm_vcpu *vcpu)
 {
-	write_sysreg(vcpu->arch.mdcr_el2_host, mdcr_el2);
+	write_sysreg(*host_data_ptr(host_debug_state.mdcr_el2), mdcr_el2);
 
 	write_sysreg(0, hstr_el2);
 	if (kvm_arm_support_pmu_v3()) {
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 4/5] KVM: arm64: Exclude host_fpsimd_state pointer from kvm_vcpu_arch
  2024-03-02 11:19 [PATCH 0/5] KVM: arm64: Move host-specific data out of kvm_vcpu_arch Marc Zyngier
                   ` (2 preceding siblings ...)
  2024-03-02 11:19 ` [PATCH 3/5] KVM: arm64: Exclude mdcr_el2_host from kvm_vcpu_arch Marc Zyngier
@ 2024-03-02 11:19 ` Marc Zyngier
  2024-03-04 20:45   ` Mark Brown
  2024-03-02 11:19 ` [PATCH 5/5] KVM: arm64: Exclude FP ownership " Marc Zyngier
  4 siblings, 1 reply; 18+ messages in thread
From: Marc Zyngier @ 2024-03-02 11:19 UTC (permalink / raw
  To: kvmarm, linux-arm-kernel, kvm
  Cc: James Morse, Suzuki K Poulose, Oliver Upton, Zenghui Yu,
	James Clark, Anshuman Khandual, Mark Brown

As the name of the field indicates, host_fpsimd_state is strictly
a host piece of data, and we reset this pointer on each PID change.

So let's move it where it belongs, and set it at load-time. Although
this is slightly more often, it is a well defined life-cycle which
matches other pieces of data.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_host.h       | 2 +-
 arch/arm64/kvm/fpsimd.c                 | 3 +--
 arch/arm64/kvm/hyp/include/hyp/switch.h | 2 +-
 arch/arm64/kvm/hyp/nvhe/hyp-main.c      | 1 -
 4 files changed, 3 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index a3718f441e12..39b39da3e61b 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -505,6 +505,7 @@ struct kvm_cpu_context {
  */
 struct kvm_host_data {
 	struct kvm_cpu_context host_ctxt;
+	struct user_fpsimd_state *fpsimd_state;	/* hyp VA */
 
 	/*
 	 * host_debug_state contains the host registers which are
@@ -622,7 +623,6 @@ struct kvm_vcpu_arch {
 	struct kvm_guest_debug_arch vcpu_debug_state;
 	struct kvm_guest_debug_arch external_debug_state;
 
-	struct user_fpsimd_state *host_fpsimd_state;	/* hyp VA */
 	struct task_struct *parent_task;
 
 	/* VGIC state */
diff --git a/arch/arm64/kvm/fpsimd.c b/arch/arm64/kvm/fpsimd.c
index 8c1d0d4853df..f650e46d4bea 100644
--- a/arch/arm64/kvm/fpsimd.c
+++ b/arch/arm64/kvm/fpsimd.c
@@ -49,8 +49,6 @@ int kvm_arch_vcpu_run_map_fp(struct kvm_vcpu *vcpu)
 	if (ret)
 		return ret;
 
-	vcpu->arch.host_fpsimd_state = kern_hyp_va(fpsimd);
-
 	/*
 	 * We need to keep current's task_struct pinned until its data has been
 	 * unshared with the hypervisor to make sure it is not re-used by the
@@ -87,6 +85,7 @@ void kvm_arch_vcpu_load_fp(struct kvm_vcpu *vcpu)
 	 * FP_STATE_FREE if the flag set.
 	 */
 	vcpu->arch.fp_state = FP_STATE_HOST_OWNED;
+	*host_data_ptr(fpsimd_state) = kern_hyp_va(&current->thread.uw.fpsimd_state);
 
 	vcpu_clear_flag(vcpu, HOST_SVE_ENABLED);
 	if (read_sysreg(cpacr_el1) & CPACR_EL1_ZEN_EL0EN)
diff --git a/arch/arm64/kvm/hyp/include/hyp/switch.h b/arch/arm64/kvm/hyp/include/hyp/switch.h
index 8ae81301083f..f67d8eafc245 100644
--- a/arch/arm64/kvm/hyp/include/hyp/switch.h
+++ b/arch/arm64/kvm/hyp/include/hyp/switch.h
@@ -370,7 +370,7 @@ static bool kvm_hyp_handle_fpsimd(struct kvm_vcpu *vcpu, u64 *exit_code)
 
 	/* Write out the host state if it's in the registers */
 	if (vcpu->arch.fp_state == FP_STATE_HOST_OWNED)
-		__fpsimd_save_state(vcpu->arch.host_fpsimd_state);
+		__fpsimd_save_state(*host_data_ptr(fpsimd_state));
 
 	/* Restore the guest state */
 	if (sve_guest)
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index 2385fd03ed87..c5f625dc1f07 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -42,7 +42,6 @@ static void flush_hyp_vcpu(struct pkvm_hyp_vcpu *hyp_vcpu)
 	hyp_vcpu->vcpu.arch.fp_state	= host_vcpu->arch.fp_state;
 
 	hyp_vcpu->vcpu.arch.debug_ptr	= kern_hyp_va(host_vcpu->arch.debug_ptr);
-	hyp_vcpu->vcpu.arch.host_fpsimd_state = host_vcpu->arch.host_fpsimd_state;
 
 	hyp_vcpu->vcpu.arch.vsesr_el2	= host_vcpu->arch.vsesr_el2;
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 5/5] KVM: arm64: Exclude FP ownership from kvm_vcpu_arch
  2024-03-02 11:19 [PATCH 0/5] KVM: arm64: Move host-specific data out of kvm_vcpu_arch Marc Zyngier
                   ` (3 preceding siblings ...)
  2024-03-02 11:19 ` [PATCH 4/5] KVM: arm64: Exclude host_fpsimd_state pointer " Marc Zyngier
@ 2024-03-02 11:19 ` Marc Zyngier
  2024-03-04 19:10   ` Mark Brown
  4 siblings, 1 reply; 18+ messages in thread
From: Marc Zyngier @ 2024-03-02 11:19 UTC (permalink / raw
  To: kvmarm, linux-arm-kernel, kvm
  Cc: James Morse, Suzuki K Poulose, Oliver Upton, Zenghui Yu,
	James Clark, Anshuman Khandual, Mark Brown

In retrospect, it is fairly obvious that the FP state ownership
is only meaningful for a given CPU, and that locating this
information in the vcpu was just a mistake.

Move the ownership tracking into the host data structure, and
rename it from fp_state to fp_owner, which is a better description
(name suggested by Mark Brown).

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_emulate.h    |  4 ++--
 arch/arm64/include/asm/kvm_host.h       | 14 +++++++-------
 arch/arm64/kvm/arm.c                    |  6 ------
 arch/arm64/kvm/fpsimd.c                 | 10 +++++-----
 arch/arm64/kvm/hyp/include/hyp/switch.h |  6 +++---
 arch/arm64/kvm/hyp/nvhe/hyp-main.c      |  2 --
 arch/arm64/kvm/hyp/nvhe/switch.c        |  2 +-
 arch/arm64/kvm/hyp/vhe/switch.c         |  2 +-
 8 files changed, 19 insertions(+), 27 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index b804fe832184..9cd418d2bee0 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -593,7 +593,7 @@ static __always_inline u64 kvm_get_reset_cptr_el2(struct kvm_vcpu *vcpu)
 		val = (CPACR_EL1_FPEN_EL0EN | CPACR_EL1_FPEN_EL1EN);
 
 		if (!vcpu_has_sve(vcpu) ||
-		    (vcpu->arch.fp_state != FP_STATE_GUEST_OWNED))
+		    (*host_data_ptr(fp_owner) != FP_STATE_GUEST_OWNED))
 			val |= CPACR_EL1_ZEN_EL1EN | CPACR_EL1_ZEN_EL0EN;
 		if (cpus_have_final_cap(ARM64_SME))
 			val |= CPACR_EL1_SMEN_EL1EN | CPACR_EL1_SMEN_EL0EN;
@@ -601,7 +601,7 @@ static __always_inline u64 kvm_get_reset_cptr_el2(struct kvm_vcpu *vcpu)
 		val = CPTR_NVHE_EL2_RES1;
 
 		if (vcpu_has_sve(vcpu) &&
-		    (vcpu->arch.fp_state == FP_STATE_GUEST_OWNED))
+		    (*host_data_ptr(fp_owner) == FP_STATE_GUEST_OWNED))
 			val |= CPTR_EL2_TZ;
 		if (cpus_have_final_cap(ARM64_SME))
 			val &= ~CPTR_EL2_TSM;
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 39b39da3e61b..e72e25702b9a 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -507,6 +507,13 @@ struct kvm_host_data {
 	struct kvm_cpu_context host_ctxt;
 	struct user_fpsimd_state *fpsimd_state;	/* hyp VA */
 
+	/* Ownership of the FP regs */
+	enum {
+		FP_STATE_FREE,
+		FP_STATE_HOST_OWNED,
+		FP_STATE_GUEST_OWNED,
+	} fp_owner;
+
 	/*
 	 * host_debug_state contains the host registers which are
 	 * saved and restored during world switches.
@@ -582,13 +589,6 @@ struct kvm_vcpu_arch {
 	/* Exception Information */
 	struct kvm_vcpu_fault_info fault;
 
-	/* Ownership of the FP regs */
-	enum {
-		FP_STATE_FREE,
-		FP_STATE_HOST_OWNED,
-		FP_STATE_GUEST_OWNED,
-	} fp_state;
-
 	/* Configuration flags, set once and for all before the vcpu can run */
 	u8 cflags;
 
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index d3d14cc41cb5..0c9720c7462e 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -373,12 +373,6 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
 
 	vcpu->arch.mmu_page_cache.gfp_zero = __GFP_ZERO;
 
-	/*
-	 * Default value for the FP state, will be overloaded at load
-	 * time if we support FP (pretty likely)
-	 */
-	vcpu->arch.fp_state = FP_STATE_FREE;
-
 	/* Set up the timer */
 	kvm_timer_vcpu_init(vcpu);
 
diff --git a/arch/arm64/kvm/fpsimd.c b/arch/arm64/kvm/fpsimd.c
index f650e46d4bea..4f51293db173 100644
--- a/arch/arm64/kvm/fpsimd.c
+++ b/arch/arm64/kvm/fpsimd.c
@@ -84,7 +84,7 @@ void kvm_arch_vcpu_load_fp(struct kvm_vcpu *vcpu)
 	 * guest in kvm_arch_vcpu_ctxflush_fp() and override this to
 	 * FP_STATE_FREE if the flag set.
 	 */
-	vcpu->arch.fp_state = FP_STATE_HOST_OWNED;
+	*host_data_ptr(fp_owner) = FP_STATE_HOST_OWNED;
 	*host_data_ptr(fpsimd_state) = kern_hyp_va(&current->thread.uw.fpsimd_state);
 
 	vcpu_clear_flag(vcpu, HOST_SVE_ENABLED);
@@ -109,7 +109,7 @@ void kvm_arch_vcpu_load_fp(struct kvm_vcpu *vcpu)
 		 * been saved, this is very unlikely to happen.
 		 */
 		if (read_sysreg_s(SYS_SVCR) & (SVCR_SM_MASK | SVCR_ZA_MASK)) {
-			vcpu->arch.fp_state = FP_STATE_FREE;
+			*host_data_ptr(fp_owner) = FP_STATE_FREE;
 			fpsimd_save_and_flush_cpu_state();
 		}
 	}
@@ -125,7 +125,7 @@ void kvm_arch_vcpu_load_fp(struct kvm_vcpu *vcpu)
 void kvm_arch_vcpu_ctxflush_fp(struct kvm_vcpu *vcpu)
 {
 	if (test_thread_flag(TIF_FOREIGN_FPSTATE))
-		vcpu->arch.fp_state = FP_STATE_FREE;
+		*host_data_ptr(fp_owner) = FP_STATE_FREE;
 }
 
 /*
@@ -141,7 +141,7 @@ void kvm_arch_vcpu_ctxsync_fp(struct kvm_vcpu *vcpu)
 
 	WARN_ON_ONCE(!irqs_disabled());
 
-	if (vcpu->arch.fp_state == FP_STATE_GUEST_OWNED) {
+	if (*host_data_ptr(fp_owner) == FP_STATE_GUEST_OWNED) {
 
 		/*
 		 * Currently we do not support SME guests so SVCR is
@@ -194,7 +194,7 @@ void kvm_arch_vcpu_put_fp(struct kvm_vcpu *vcpu)
 		isb();
 	}
 
-	if (vcpu->arch.fp_state == FP_STATE_GUEST_OWNED) {
+	if (*host_data_ptr(fp_owner) == FP_STATE_GUEST_OWNED) {
 		if (vcpu_has_sve(vcpu)) {
 			__vcpu_sys_reg(vcpu, ZCR_EL1) = read_sysreg_el1(SYS_ZCR);
 
diff --git a/arch/arm64/kvm/hyp/include/hyp/switch.h b/arch/arm64/kvm/hyp/include/hyp/switch.h
index f67d8eafc245..e124c4686376 100644
--- a/arch/arm64/kvm/hyp/include/hyp/switch.h
+++ b/arch/arm64/kvm/hyp/include/hyp/switch.h
@@ -42,7 +42,7 @@ extern struct kvm_exception_table_entry __stop___kvm_ex_table;
 /* Check whether the FP regs are owned by the guest */
 static inline bool guest_owns_fp_regs(struct kvm_vcpu *vcpu)
 {
-	return vcpu->arch.fp_state == FP_STATE_GUEST_OWNED;
+	return *host_data_ptr(fp_owner) == FP_STATE_GUEST_OWNED;
 }
 
 /* Save the 32-bit only FPSIMD system register state */
@@ -369,7 +369,7 @@ static bool kvm_hyp_handle_fpsimd(struct kvm_vcpu *vcpu, u64 *exit_code)
 	isb();
 
 	/* Write out the host state if it's in the registers */
-	if (vcpu->arch.fp_state == FP_STATE_HOST_OWNED)
+	if (*host_data_ptr(fp_owner) == FP_STATE_HOST_OWNED)
 		__fpsimd_save_state(*host_data_ptr(fpsimd_state));
 
 	/* Restore the guest state */
@@ -382,7 +382,7 @@ static bool kvm_hyp_handle_fpsimd(struct kvm_vcpu *vcpu, u64 *exit_code)
 	if (!(read_sysreg(hcr_el2) & HCR_RW))
 		write_sysreg(__vcpu_sys_reg(vcpu, FPEXC32_EL2), fpexc32_el2);
 
-	vcpu->arch.fp_state = FP_STATE_GUEST_OWNED;
+	*host_data_ptr(fp_owner) = FP_STATE_GUEST_OWNED;
 
 	return true;
 }
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index c5f625dc1f07..26561c562f7a 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -39,7 +39,6 @@ static void flush_hyp_vcpu(struct pkvm_hyp_vcpu *hyp_vcpu)
 	hyp_vcpu->vcpu.arch.cptr_el2	= host_vcpu->arch.cptr_el2;
 
 	hyp_vcpu->vcpu.arch.iflags	= host_vcpu->arch.iflags;
-	hyp_vcpu->vcpu.arch.fp_state	= host_vcpu->arch.fp_state;
 
 	hyp_vcpu->vcpu.arch.debug_ptr	= kern_hyp_va(host_vcpu->arch.debug_ptr);
 
@@ -63,7 +62,6 @@ static void sync_hyp_vcpu(struct pkvm_hyp_vcpu *hyp_vcpu)
 	host_vcpu->arch.fault		= hyp_vcpu->vcpu.arch.fault;
 
 	host_vcpu->arch.iflags		= hyp_vcpu->vcpu.arch.iflags;
-	host_vcpu->arch.fp_state	= hyp_vcpu->vcpu.arch.fp_state;
 
 	host_cpu_if->vgic_hcr		= hyp_cpu_if->vgic_hcr;
 	for (i = 0; i < hyp_cpu_if->used_lrs; ++i)
diff --git a/arch/arm64/kvm/hyp/nvhe/switch.c b/arch/arm64/kvm/hyp/nvhe/switch.c
index 544a419b9a39..1f82d531a494 100644
--- a/arch/arm64/kvm/hyp/nvhe/switch.c
+++ b/arch/arm64/kvm/hyp/nvhe/switch.c
@@ -337,7 +337,7 @@ int __kvm_vcpu_run(struct kvm_vcpu *vcpu)
 
 	__sysreg_restore_state_nvhe(host_ctxt);
 
-	if (vcpu->arch.fp_state == FP_STATE_GUEST_OWNED)
+	if (*host_data_ptr(fp_owner) == FP_STATE_GUEST_OWNED)
 		__fpsimd_save_fpexc32(vcpu);
 
 	__debug_switch_to_host(vcpu);
diff --git a/arch/arm64/kvm/hyp/vhe/switch.c b/arch/arm64/kvm/hyp/vhe/switch.c
index 14b7a6bc5909..b92f9fe2d50e 100644
--- a/arch/arm64/kvm/hyp/vhe/switch.c
+++ b/arch/arm64/kvm/hyp/vhe/switch.c
@@ -258,7 +258,7 @@ static int __kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
 
 	sysreg_restore_host_state_vhe(host_ctxt);
 
-	if (vcpu->arch.fp_state == FP_STATE_GUEST_OWNED)
+	if (*host_data_ptr(fp_owner) == FP_STATE_GUEST_OWNED)
 		__fpsimd_save_fpexc32(vcpu);
 
 	__debug_switch_to_host(vcpu);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/5] KVM: arm64: Add accessor for per-CPU state
  2024-03-02 11:19 ` [PATCH 1/5] KVM: arm64: Add accessor for per-CPU state Marc Zyngier
@ 2024-03-04 12:05   ` Suzuki K Poulose
  2024-03-09 13:00     ` Marc Zyngier
  2024-03-11  4:50   ` Dongli Zhang
  1 sibling, 1 reply; 18+ messages in thread
From: Suzuki K Poulose @ 2024-03-04 12:05 UTC (permalink / raw
  To: Marc Zyngier, kvmarm, linux-arm-kernel, kvm
  Cc: James Morse, Oliver Upton, Zenghui Yu, James Clark,
	Anshuman Khandual, Mark Brown

Hi Marc

On 02/03/2024 11:19, Marc Zyngier wrote:
> In order to facilitate the introduction of new per-CPU state,
> add a new host_data_ptr() helped that hides some of the per-CPU
> verbosity, and make it easier to move that state around in the
> future.
> 

This series looks like a good cleanup to make the whole host
data handling cleaner. One comment below.

> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>   arch/arm64/include/asm/kvm_host.h         | 13 +++++++++++++
>   arch/arm64/kvm/arm.c                      |  2 +-
>   arch/arm64/kvm/hyp/include/hyp/debug-sr.h |  4 ++--
>   arch/arm64/kvm/hyp/include/hyp/switch.h   | 11 +++++------
>   arch/arm64/kvm/hyp/nvhe/psci-relay.c      |  2 +-
>   arch/arm64/kvm/hyp/nvhe/setup.c           |  3 +--
>   arch/arm64/kvm/hyp/nvhe/switch.c          |  4 ++--
>   arch/arm64/kvm/hyp/vhe/switch.c           |  4 ++--
>   arch/arm64/kvm/hyp/vhe/sysreg-sr.c        |  4 ++--
>   arch/arm64/kvm/pmu.c                      |  2 +-
>   10 files changed, 30 insertions(+), 19 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 21c57b812569..3ca2a9444f21 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -492,6 +492,17 @@ struct kvm_cpu_context {
>   	u64 *vncr_array;
>   };
>   
> +/*
> + * This structure is instanciated on a per-CPU basis, and contains
> + * data that is:
> + *
> + * - tied to a single physical CPU, and
> + * - either have a lifetime that does not extend past vcpu_put()
> + * - or is an invariant for the lifetime of the system
> + *
> + * Use host_data_ptr(field) as a way to access a pointer to such a
> + * field.
> + */
>   struct kvm_host_data {
>   	struct kvm_cpu_context host_ctxt;
>   };
> @@ -1114,6 +1125,8 @@ struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr);
>   
>   DECLARE_KVM_HYP_PER_CPU(struct kvm_host_data, kvm_host_data);
>   
> +#define host_data_ptr(f)	(&this_cpu_ptr(&kvm_host_data)->f)
> +
>   static inline void kvm_init_host_cpu_context(struct kvm_cpu_context *cpu_ctxt)
>   {
>   	/* The host's MPIDR is immutable, so let's set it up at boot time */
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index a25265aca432..d3d14cc41cb5 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -1960,7 +1960,7 @@ static void cpu_set_hyp_vector(void)
>   
>   static void cpu_hyp_init_context(void)
>   {
> -	kvm_init_host_cpu_context(&this_cpu_ptr_hyp_sym(kvm_host_data)->host_ctxt);
> +	kvm_init_host_cpu_context(host_data_ptr(host_ctxt));

This silently changes the "this_cpu_ptr_hyp_sym" to "this_cpu_ptr()" and 
thus we could be using the VHE host_data even in nVHE ?
Rest looks fine to me.

Suzuki


>   
>   	if (!is_kernel_in_hyp_mode())
>   		cpu_init_hyp_mode();
> diff --git a/arch/arm64/kvm/hyp/include/hyp/debug-sr.h b/arch/arm64/kvm/hyp/include/hyp/debug-sr.h
> index 961bbef104a6..eec0f8ccda56 100644
> --- a/arch/arm64/kvm/hyp/include/hyp/debug-sr.h
> +++ b/arch/arm64/kvm/hyp/include/hyp/debug-sr.h
> @@ -135,7 +135,7 @@ static inline void __debug_switch_to_guest_common(struct kvm_vcpu *vcpu)
>   	if (!vcpu_get_flag(vcpu, DEBUG_DIRTY))
>   		return;
>   
> -	host_ctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt;
> +	host_ctxt = host_data_ptr(host_ctxt);
>   	guest_ctxt = &vcpu->arch.ctxt;
>   	host_dbg = &vcpu->arch.host_debug_state.regs;
>   	guest_dbg = kern_hyp_va(vcpu->arch.debug_ptr);
> @@ -154,7 +154,7 @@ static inline void __debug_switch_to_host_common(struct kvm_vcpu *vcpu)
>   	if (!vcpu_get_flag(vcpu, DEBUG_DIRTY))
>   		return;
>   
> -	host_ctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt;
> +	host_ctxt = host_data_ptr(host_ctxt);
>   	guest_ctxt = &vcpu->arch.ctxt;
>   	host_dbg = &vcpu->arch.host_debug_state.regs;
>   	guest_dbg = kern_hyp_va(vcpu->arch.debug_ptr);
> diff --git a/arch/arm64/kvm/hyp/include/hyp/switch.h b/arch/arm64/kvm/hyp/include/hyp/switch.h
> index a038320cdb08..9405a0c9b4c3 100644
> --- a/arch/arm64/kvm/hyp/include/hyp/switch.h
> +++ b/arch/arm64/kvm/hyp/include/hyp/switch.h
> @@ -81,8 +81,7 @@ static inline void __activate_traps_fpsimd32(struct kvm_vcpu *vcpu)
>   
>   #define update_fgt_traps_cs(vcpu, reg, clr, set)			\
>   	do {								\
> -		struct kvm_cpu_context *hctxt =				\
> -			&this_cpu_ptr(&kvm_host_data)->host_ctxt;	\
> +		struct kvm_cpu_context *hctxt =	host_data_ptr(host_ctxt);\
>   		u64 c = 0, s = 0;					\
>   									\
>   		ctxt_sys_reg(hctxt, reg) = read_sysreg_s(SYS_ ## reg);	\
> @@ -121,7 +120,7 @@ static inline bool cpu_has_amu(void)
>   
>   static inline void __activate_traps_hfgxtr(struct kvm_vcpu *vcpu)
>   {
> -	struct kvm_cpu_context *hctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt;
> +	struct kvm_cpu_context *hctxt = host_data_ptr(host_ctxt);
>   	u64 r_clr = 0, w_clr = 0, r_set = 0, w_set = 0, tmp;
>   	u64 r_val, w_val;
>   
> @@ -185,7 +184,7 @@ static inline void __activate_traps_hfgxtr(struct kvm_vcpu *vcpu)
>   
>   static inline void __deactivate_traps_hfgxtr(struct kvm_vcpu *vcpu)
>   {
> -	struct kvm_cpu_context *hctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt;
> +	struct kvm_cpu_context *hctxt = host_data_ptr(host_ctxt);
>   
>   	if (!cpus_have_final_cap(ARM64_HAS_FGT))
>   		return;
> @@ -220,7 +219,7 @@ static inline void __activate_traps_common(struct kvm_vcpu *vcpu)
>   
>   		write_sysreg(0, pmselr_el0);
>   
> -		hctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt;
> +		hctxt = host_data_ptr(host_ctxt);
>   		ctxt_sys_reg(hctxt, PMUSERENR_EL0) = read_sysreg(pmuserenr_el0);
>   		write_sysreg(ARMV8_PMU_USERENR_MASK, pmuserenr_el0);
>   		vcpu_set_flag(vcpu, PMUSERENR_ON_CPU);
> @@ -254,7 +253,7 @@ static inline void __deactivate_traps_common(struct kvm_vcpu *vcpu)
>   	if (kvm_arm_support_pmu_v3()) {
>   		struct kvm_cpu_context *hctxt;
>   
> -		hctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt;
> +		hctxt = host_data_ptr(host_ctxt);
>   		write_sysreg(ctxt_sys_reg(hctxt, PMUSERENR_EL0), pmuserenr_el0);
>   		vcpu_clear_flag(vcpu, PMUSERENR_ON_CPU);
>   	}
> diff --git a/arch/arm64/kvm/hyp/nvhe/psci-relay.c b/arch/arm64/kvm/hyp/nvhe/psci-relay.c
> index d57bcb6ab94d..dfe8fe0f7eaf 100644
> --- a/arch/arm64/kvm/hyp/nvhe/psci-relay.c
> +++ b/arch/arm64/kvm/hyp/nvhe/psci-relay.c
> @@ -205,7 +205,7 @@ asmlinkage void __noreturn __kvm_host_psci_cpu_entry(bool is_cpu_on)
>   	struct psci_boot_args *boot_args;
>   	struct kvm_cpu_context *host_ctxt;
>   
> -	host_ctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt;
> +	host_ctxt = host_data_ptr(host_ctxt);
>   
>   	if (is_cpu_on)
>   		boot_args = this_cpu_ptr(&cpu_on_args);
> diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
> index bc58d1b515af..ae00dfa80801 100644
> --- a/arch/arm64/kvm/hyp/nvhe/setup.c
> +++ b/arch/arm64/kvm/hyp/nvhe/setup.c
> @@ -257,8 +257,7 @@ static int fix_hyp_pgtable_refcnt(void)
>   
>   void __noreturn __pkvm_init_finalise(void)
>   {
> -	struct kvm_host_data *host_data = this_cpu_ptr(&kvm_host_data);
> -	struct kvm_cpu_context *host_ctxt = &host_data->host_ctxt;
> +	struct kvm_cpu_context *host_ctxt = host_data_ptr(host_ctxt);
>   	unsigned long nr_pages, reserved_pages, pfn;
>   	int ret;
>   
> diff --git a/arch/arm64/kvm/hyp/nvhe/switch.c b/arch/arm64/kvm/hyp/nvhe/switch.c
> index c50f8459e4fc..544a419b9a39 100644
> --- a/arch/arm64/kvm/hyp/nvhe/switch.c
> +++ b/arch/arm64/kvm/hyp/nvhe/switch.c
> @@ -264,7 +264,7 @@ int __kvm_vcpu_run(struct kvm_vcpu *vcpu)
>   		pmr_sync();
>   	}
>   
> -	host_ctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt;
> +	host_ctxt = host_data_ptr(host_ctxt);
>   	host_ctxt->__hyp_running_vcpu = vcpu;
>   	guest_ctxt = &vcpu->arch.ctxt;
>   
> @@ -367,7 +367,7 @@ asmlinkage void __noreturn hyp_panic(void)
>   	struct kvm_cpu_context *host_ctxt;
>   	struct kvm_vcpu *vcpu;
>   
> -	host_ctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt;
> +	host_ctxt = host_data_ptr(host_ctxt);
>   	vcpu = host_ctxt->__hyp_running_vcpu;
>   
>   	if (vcpu) {
> diff --git a/arch/arm64/kvm/hyp/vhe/switch.c b/arch/arm64/kvm/hyp/vhe/switch.c
> index 1581df6aec87..14b7a6bc5909 100644
> --- a/arch/arm64/kvm/hyp/vhe/switch.c
> +++ b/arch/arm64/kvm/hyp/vhe/switch.c
> @@ -221,7 +221,7 @@ static int __kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
>   	struct kvm_cpu_context *guest_ctxt;
>   	u64 exit_code;
>   
> -	host_ctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt;
> +	host_ctxt = host_data_ptr(host_ctxt);
>   	host_ctxt->__hyp_running_vcpu = vcpu;
>   	guest_ctxt = &vcpu->arch.ctxt;
>   
> @@ -306,7 +306,7 @@ static void __hyp_call_panic(u64 spsr, u64 elr, u64 par)
>   	struct kvm_cpu_context *host_ctxt;
>   	struct kvm_vcpu *vcpu;
>   
> -	host_ctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt;
> +	host_ctxt = host_data_ptr(host_ctxt);
>   	vcpu = host_ctxt->__hyp_running_vcpu;
>   
>   	__deactivate_traps(vcpu);
> diff --git a/arch/arm64/kvm/hyp/vhe/sysreg-sr.c b/arch/arm64/kvm/hyp/vhe/sysreg-sr.c
> index 8e1e0d5033b6..ae763061909a 100644
> --- a/arch/arm64/kvm/hyp/vhe/sysreg-sr.c
> +++ b/arch/arm64/kvm/hyp/vhe/sysreg-sr.c
> @@ -67,7 +67,7 @@ void __vcpu_load_switch_sysregs(struct kvm_vcpu *vcpu)
>   	struct kvm_cpu_context *guest_ctxt = &vcpu->arch.ctxt;
>   	struct kvm_cpu_context *host_ctxt;
>   
> -	host_ctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt;
> +	host_ctxt = host_data_ptr(host_ctxt);
>   	__sysreg_save_user_state(host_ctxt);
>   
>   	/*
> @@ -110,7 +110,7 @@ void __vcpu_put_switch_sysregs(struct kvm_vcpu *vcpu)
>   	struct kvm_cpu_context *guest_ctxt = &vcpu->arch.ctxt;
>   	struct kvm_cpu_context *host_ctxt;
>   
> -	host_ctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt;
> +	host_ctxt = host_data_ptr(host_ctxt);
>   
>   	__sysreg_save_el1_state(guest_ctxt);
>   	__sysreg_save_user_state(guest_ctxt);
> diff --git a/arch/arm64/kvm/pmu.c b/arch/arm64/kvm/pmu.c
> index a243934c5568..329819806096 100644
> --- a/arch/arm64/kvm/pmu.c
> +++ b/arch/arm64/kvm/pmu.c
> @@ -232,7 +232,7 @@ bool kvm_set_pmuserenr(u64 val)
>   	if (!vcpu || !vcpu_get_flag(vcpu, PMUSERENR_ON_CPU))
>   		return false;
>   
> -	hctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt;
> +	hctxt = host_data_ptr(host_ctxt);
>   	ctxt_sys_reg(hctxt, PMUSERENR_EL0) = val;
>   	return true;
>   }


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 5/5] KVM: arm64: Exclude FP ownership from kvm_vcpu_arch
  2024-03-02 11:19 ` [PATCH 5/5] KVM: arm64: Exclude FP ownership " Marc Zyngier
@ 2024-03-04 19:10   ` Mark Brown
  2024-03-06  9:43     ` Marc Zyngier
  0 siblings, 1 reply; 18+ messages in thread
From: Mark Brown @ 2024-03-04 19:10 UTC (permalink / raw
  To: Marc Zyngier
  Cc: kvmarm, linux-arm-kernel, kvm, James Morse, Suzuki K Poulose,
	Oliver Upton, Zenghui Yu, James Clark, Anshuman Khandual

[-- Attachment #1: Type: text/plain, Size: 1495 bytes --]

On Sat, Mar 02, 2024 at 11:19:35AM +0000, Marc Zyngier wrote:
> In retrospect, it is fairly obvious that the FP state ownership
> is only meaningful for a given CPU, and that locating this
> information in the vcpu was just a mistake.
> 
> Move the ownership tracking into the host data structure, and
> rename it from fp_state to fp_owner, which is a better description
> (name suggested by Mark Brown).

The SME patch series proposes adding an additional state to this
enumeration which would say if the registers are stored in a format
suitable for exchange with userspace, that would make this state part of
the vCPU state.  With the addition of SME we can have two vector lengths
in play so the series proposes picking the larger to be the format for
userspace registers.  

We could store this separately to fp_state/owner but it'd still be a
value stored in the vCPU.  Storing in a format suitable for userspace
usage all the time when we've got SME would most likely result in
performance overhead if nothing else and feels more complicated than
rewriting the data in the relatively unusual case where userspace looks
at it.  Trying to convert userspace writes into the current layout would
have issues if the current layout uses the smaller vector length and
create fragility with ordering issues when loading the guest state.

The proposal is not the most lovely idea ever but given the architecture
I think some degree of clunkiness would be unavoidable.  

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 4/5] KVM: arm64: Exclude host_fpsimd_state pointer from kvm_vcpu_arch
  2024-03-02 11:19 ` [PATCH 4/5] KVM: arm64: Exclude host_fpsimd_state pointer " Marc Zyngier
@ 2024-03-04 20:45   ` Mark Brown
  0 siblings, 0 replies; 18+ messages in thread
From: Mark Brown @ 2024-03-04 20:45 UTC (permalink / raw
  To: Marc Zyngier
  Cc: kvmarm, linux-arm-kernel, kvm, James Morse, Suzuki K Poulose,
	Oliver Upton, Zenghui Yu, James Clark, Anshuman Khandual

[-- Attachment #1: Type: text/plain, Size: 430 bytes --]

On Sat, Mar 02, 2024 at 11:19:34AM +0000, Marc Zyngier wrote:
> As the name of the field indicates, host_fpsimd_state is strictly
> a host piece of data, and we reset this pointer on each PID change.
> 
> So let's move it where it belongs, and set it at load-time. Although
> this is slightly more often, it is a well defined life-cycle which
> matches other pieces of data.

Reviewed-by: Mark Brown <broonie@kernel.org>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 5/5] KVM: arm64: Exclude FP ownership from kvm_vcpu_arch
  2024-03-04 19:10   ` Mark Brown
@ 2024-03-06  9:43     ` Marc Zyngier
  2024-03-06 22:19       ` Mark Brown
  0 siblings, 1 reply; 18+ messages in thread
From: Marc Zyngier @ 2024-03-06  9:43 UTC (permalink / raw
  To: Mark Brown
  Cc: kvmarm, linux-arm-kernel, kvm, James Morse, Suzuki K Poulose,
	Oliver Upton, Zenghui Yu, James Clark, Anshuman Khandual

On Mon, 04 Mar 2024 19:10:08 +0000,
Mark Brown <broonie@kernel.org> wrote:
> 
> [1  <text/plain; us-ascii (quoted-printable)>]
> On Sat, Mar 02, 2024 at 11:19:35AM +0000, Marc Zyngier wrote:
> > In retrospect, it is fairly obvious that the FP state ownership
> > is only meaningful for a given CPU, and that locating this
> > information in the vcpu was just a mistake.
> > 
> > Move the ownership tracking into the host data structure, and
> > rename it from fp_state to fp_owner, which is a better description
> > (name suggested by Mark Brown).
> 
> The SME patch series proposes adding an additional state to this
> enumeration which would say if the registers are stored in a format
> suitable for exchange with userspace, that would make this state part of
> the vCPU state.  With the addition of SME we can have two vector lengths
> in play so the series proposes picking the larger to be the format for
> userspace registers.

What does this addition have anything to do with the ownership of the
physical register file? Not a lot, it seems.

Specially as there better be no state resident on the CPU when
userspace messes up with it.

> 
> We could store this separately to fp_state/owner but it'd still be a
> value stored in the vCPU.

I totally disagree.

> Storing in a format suitable for userspace
> usage all the time when we've got SME would most likely result in
> performance overhead

What performance overhead? Why should we care?

> if nothing else and feels more complicated than
> rewriting the data in the relatively unusual case where userspace looks
> at it.  Trying to convert userspace writes into the current layout would
> have issues if the current layout uses the smaller vector length and
> create fragility with ordering issues when loading the guest state.

What ordering issues? If userspace manipulates the guest state, the
guest isn't running. If it is, all bets are off.

> 
> The proposal is not the most lovely idea ever but given the architecture
> I think some degree of clunkiness would be unavoidable.

It is only unavoidable if we decide to make a bad job of it.

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 5/5] KVM: arm64: Exclude FP ownership from kvm_vcpu_arch
  2024-03-06  9:43     ` Marc Zyngier
@ 2024-03-06 22:19       ` Mark Brown
  2024-03-07 11:10         ` Marc Zyngier
  0 siblings, 1 reply; 18+ messages in thread
From: Mark Brown @ 2024-03-06 22:19 UTC (permalink / raw
  To: Marc Zyngier
  Cc: kvmarm, linux-arm-kernel, kvm, James Morse, Suzuki K Poulose,
	Oliver Upton, Zenghui Yu, James Clark, Anshuman Khandual

[-- Attachment #1: Type: text/plain, Size: 4851 bytes --]

On Wed, Mar 06, 2024 at 09:43:13AM +0000, Marc Zyngier wrote:
> Mark Brown <broonie@kernel.org> wrote:
> > On Sat, Mar 02, 2024 at 11:19:35AM +0000, Marc Zyngier wrote:

> > > Move the ownership tracking into the host data structure, and
> > > rename it from fp_state to fp_owner, which is a better description
> > > (name suggested by Mark Brown).

> > The SME patch series proposes adding an additional state to this
> > enumeration which would say if the registers are stored in a format
> > suitable for exchange with userspace, that would make this state part of
> > the vCPU state.  With the addition of SME we can have two vector lengths
> > in play so the series proposes picking the larger to be the format for
> > userspace registers.

> What does this addition have anything to do with the ownership of the
> physical register file? Not a lot, it seems.

> Specially as there better be no state resident on the CPU when
> userspace messes up with it.

If we have a situation where the state might be stored in memory in
multiple formats it seems reasonable to consider the metadata which
indicates which format is currently in use as part of the state.

> > We could store this separately to fp_state/owner but it'd still be a
> > value stored in the vCPU.

> I totally disagree.

Where would you expect to see the state stored?

> > Storing in a format suitable for userspace
> > usage all the time when we've got SME would most likely result in
> > performance overhead

> What performance overhead? Why should we care?

Since in situations where we're not using the larger VL we would need to
load and store the registers using a vector length other than the
currently configured vector length we would not be able to use the
ability to load and store to a location based on a multiple of the
vector length that the architecture has:

   LDR <Zt>, [<Xn|SP>{, #<imm>, MUL VL}]
   LDR <Pt>, [<Xn|SP>{, #<imm>, MUL VL}]
   
   STR <Zt>, [<Xn|SP>{, #<imm>, MUL VL}]
   STR <Pt>, [<Xn|SP>{, #<imm>, MUL VL}]

and would instead need to manually compute the memory locations where
values are stored.  As well as the extra instructions when using the
smaller vector length we'd also be working with sparser data likely over
more cache lines.

We would also need to consider if we need to zero the holes in the data
when saving, we'd only potentially be leaking information from the guest
but it might cause nasty surprises given that transitioning to/from
streaming mode is expected to zero values.  If we do need to zero then
that would be additional work that would need doing.

Exactly what the performance hit would be will be system and use case
dependent.  *Hopefully* we aren't needing to save and load the guest
state too often but I would be very surprised if we didn't have people
considering any cost in the guest context switch path worth paying
attention to.

As well as the performance overhead there would be some code complexity
cost, if nothing else we'd not be using the same format as fpsimd_save()
and would need to rearrange how we handle saving the register state.

Spending more effort to implement something which also has more runtime
performance overhead for the case of saving and restoring guest state
which I expect to be vastly more common than the VMM accessing the guest
registers just doesn't seem like an appealing choice.

> > if nothing else and feels more complicated than
> > rewriting the data in the relatively unusual case where userspace looks
> > at it.  Trying to convert userspace writes into the current layout would
> > have issues if the current layout uses the smaller vector length and
> > create fragility with ordering issues when loading the guest state.

> What ordering issues? If userspace manipulates the guest state, the
> guest isn't running. If it is, all bets are off.

If we were storing the data in the native format for the guest then that
format will change if streaming mode is changed via a write to SVCR.
This would mean that the host would need to understand that when writing
values SVCR needs to be written before the Z and P registers.  To be
clear I don't think this is a good idea.

> > The proposal is not the most lovely idea ever but given the architecture
> > I think some degree of clunkiness would be unavoidable.

> It is only unavoidable if we decide to make a bad job of it.

I don't think the handling of the vector registers for KVM with SME is
something where there is a clear good and bad job we can do - I don't
see how we can reasonably avoid at some point needing to translate
vector lengths or to/from FPSIMD format (in the case of a system with
SME but not SVE) which is just inherently a sharp edge.  It's just a
question of when and how we do that.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 5/5] KVM: arm64: Exclude FP ownership from kvm_vcpu_arch
  2024-03-06 22:19       ` Mark Brown
@ 2024-03-07 11:10         ` Marc Zyngier
  2024-03-07 14:26           ` Mark Brown
  0 siblings, 1 reply; 18+ messages in thread
From: Marc Zyngier @ 2024-03-07 11:10 UTC (permalink / raw
  To: Mark Brown
  Cc: kvmarm, linux-arm-kernel, kvm, James Morse, Suzuki K Poulose,
	Oliver Upton, Zenghui Yu, James Clark, Anshuman Khandual

On Wed, 06 Mar 2024 22:19:03 +0000,
Mark Brown <broonie@kernel.org> wrote:
> 
> [1  <text/plain; us-ascii (quoted-printable)>]
> On Wed, Mar 06, 2024 at 09:43:13AM +0000, Marc Zyngier wrote:
> > Mark Brown <broonie@kernel.org> wrote:
> > > On Sat, Mar 02, 2024 at 11:19:35AM +0000, Marc Zyngier wrote:
> 
> > > > Move the ownership tracking into the host data structure, and
> > > > rename it from fp_state to fp_owner, which is a better description
> > > > (name suggested by Mark Brown).
> 
> > > The SME patch series proposes adding an additional state to this
> > > enumeration which would say if the registers are stored in a format
> > > suitable for exchange with userspace, that would make this state part of
> > > the vCPU state.  With the addition of SME we can have two vector lengths
> > > in play so the series proposes picking the larger to be the format for
> > > userspace registers.
> 
> > What does this addition have anything to do with the ownership of the
> > physical register file? Not a lot, it seems.
> 
> > Specially as there better be no state resident on the CPU when
> > userspace messes up with it.
> 
> If we have a situation where the state might be stored in memory in
> multiple formats it seems reasonable to consider the metadata which
> indicates which format is currently in use as part of the state.

There is no reason why the state should be in multiple formats
*simultaneously*. All the FP/SIMD/SVE/SME state is largely
overlapping, and we only need to correctly invalidate the state that
isn't relevant to writes from userspace.

> 
> > > We could store this separately to fp_state/owner but it'd still be a
> > > value stored in the vCPU.
> 
> > I totally disagree.
> 
> Where would you expect to see the state stored?

Sorry, that came out wrong. I expect *some* vcpu state to describe the
current use of the FP/vector registers, and that's about it. Not the
ownership information.

> 
> > > Storing in a format suitable for userspace
> > > usage all the time when we've got SME would most likely result in
> > > performance overhead
> 
> > What performance overhead? Why should we care?
> 
> Since in situations where we're not using the larger VL we would need to
> load and store the registers using a vector length other than the
> currently configured vector length we would not be able to use the
> ability to load and store to a location based on a multiple of the
> vector length that the architecture has:
> 
>    LDR <Zt>, [<Xn|SP>{, #<imm>, MUL VL}]
>    LDR <Pt>, [<Xn|SP>{, #<imm>, MUL VL}]
>    
>    STR <Zt>, [<Xn|SP>{, #<imm>, MUL VL}]
>    STR <Pt>, [<Xn|SP>{, #<imm>, MUL VL}]
> 
> and would instead need to manually compute the memory locations where
> values are stored.  As well as the extra instructions when using the
> smaller vector length we'd also be working with sparser data likely over
> more cache lines.

Are you talking about a context switch? or userspace accesses? I don't
give a damn about the latter, as it statistically never happens. The
former is of course of interest, but you still don't explain why the
above is a problem.

Nothing prevent you from storing the registers using the *current* VL,
since there is no data sharing between the SVE registers and the
streaming-SVE ones. All you need to do is to make sure you don't mix
the two.

> We would also need to consider if we need to zero the holes in the data
> when saving, we'd only potentially be leaking information from the guest
> but it might cause nasty surprises given that transitioning to/from
> streaming mode is expected to zero values.  If we do need to zero then
> that would be additional work that would need doing.

The zeroing is mandated by the architecture, AFAIU. That's not optional.

> 
> Exactly what the performance hit would be will be system and use case
> dependent.  *Hopefully* we aren't needing to save and load the guest
> state too often but I would be very surprised if we didn't have people
> considering any cost in the guest context switch path worth paying
> attention to.

These people can come out of the wood with numbers and reproducible
workloads. Until they do, their concerns do not really exist.

> As well as the performance overhead there would be some code complexity
> cost, if nothing else we'd not be using the same format as fpsimd_save()
> and would need to rearrange how we handle saving the register state.

And I'm fine with that. The way we store things is nobody's business
but ours, and I'm not sentimentally attached to 15 year old code.

> Spending more effort to implement something which also has more runtime
> performance overhead for the case of saving and restoring guest state
> which I expect to be vastly more common than the VMM accessing the guest
> registers just doesn't seem like an appealing choice.

I don't buy the runtime performance aspect at all. As long as you have
the space to dump the largest possible VL, you can always dump it in
the native format.

> > > if nothing else and feels more complicated than
> > > rewriting the data in the relatively unusual case where userspace looks
> > > at it.  Trying to convert userspace writes into the current layout would
> > > have issues if the current layout uses the smaller vector length and
> > > create fragility with ordering issues when loading the guest state.
> 
> > What ordering issues? If userspace manipulates the guest state, the
> > guest isn't running. If it is, all bets are off.
> 
> If we were storing the data in the native format for the guest then that
> format will change if streaming mode is changed via a write to SVCR.
> This would mean that the host would need to understand that when writing
> values SVCR needs to be written before the Z and P registers.  To be
> clear I don't think this is a good idea.

The architecture is crystal clear: you flip SVCR.SM, you loose all
data in both Z and P regs. If userspace doesn't understand the
architecture, that's their problem. The only thing we need to provide
is a faithful emulation of the architecture.

> 
> > > The proposal is not the most lovely idea ever but given the architecture
> > > I think some degree of clunkiness would be unavoidable.
> 
> > It is only unavoidable if we decide to make a bad job of it.
> 
> I don't think the handling of the vector registers for KVM with SME is
> something where there is a clear good and bad job we can do - I don't
> see how we can reasonably avoid at some point needing to translate
> vector lengths or to/from FPSIMD format (in the case of a system with
> SME but not SVE) which is just inherently a sharp edge.  It's just a
> question of when and how we do that.

My point is that there is no reason to translate the vector registers.
As long as your vcpu is in a given mode, all storage is done in that
mode. You switch mode, you lose data, as per the architecture. And
yes, there is some zeroing and invalidation to perform if the vcpu has
switched mode behind your back.

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 5/5] KVM: arm64: Exclude FP ownership from kvm_vcpu_arch
  2024-03-07 11:10         ` Marc Zyngier
@ 2024-03-07 14:26           ` Mark Brown
  2024-03-09 11:01             ` Marc Zyngier
  0 siblings, 1 reply; 18+ messages in thread
From: Mark Brown @ 2024-03-07 14:26 UTC (permalink / raw
  To: Marc Zyngier
  Cc: kvmarm, linux-arm-kernel, kvm, James Morse, Suzuki K Poulose,
	Oliver Upton, Zenghui Yu, James Clark, Anshuman Khandual

[-- Attachment #1: Type: text/plain, Size: 6951 bytes --]

On Thu, Mar 07, 2024 at 11:10:40AM +0000, Marc Zyngier wrote:
> Mark Brown <broonie@kernel.org> wrote:
> > On Wed, Mar 06, 2024 at 09:43:13AM +0000, Marc Zyngier wrote:
> > > Mark Brown <broonie@kernel.org> wrote:
> > > > On Sat, Mar 02, 2024 at 11:19:35AM +0000, Marc Zyngier wrote:

> > > > The SME patch series proposes adding an additional state to this
> > > > enumeration which would say if the registers are stored in a format
> > > > suitable for exchange with userspace, that would make this state part of
> > > > the vCPU state.  With the addition of SME we can have two vector lengths
> > > > in play so the series proposes picking the larger to be the format for
> > > > userspace registers.

> > > What does this addition have anything to do with the ownership of the
> > > physical register file? Not a lot, it seems.

> > > Specially as there better be no state resident on the CPU when
> > > userspace messes up with it.

> > If we have a situation where the state might be stored in memory in
> > multiple formats it seems reasonable to consider the metadata which
> > indicates which format is currently in use as part of the state.

> There is no reason why the state should be in multiple formats
> *simultaneously*. All the FP/SIMD/SVE/SME state is largely
> overlapping, and we only need to correctly invalidate the state that
> isn't relevant to writes from userspace.

I agree that we don't want to store multiple copies, the proposed
implementation for SME does that - when converting between guest native
and userspace formats we discard the original format after conversion
and updates the metadata which says which format is stored.  That
metadata is modeled as adding a new owner.

> > > > We could store this separately to fp_state/owner but it'd still be a
> > > > value stored in the vCPU.

> > > I totally disagree.

> > Where would you expect to see the state stored?

> Sorry, that came out wrong. I expect *some* vcpu state to describe the
> current use of the FP/vector registers, and that's about it. Not the
> ownership information.

Ah, I think the distinction here is that you're modeling the state
tracking updated by this patch as purely tracking the registers in which
case yes, we should just add a separate variable in the vCPU for
tracking the format of the data.  I was modeling the state tracking as
covering both the state in the registers and the state of the storage
backing them (since the guest state being in the registers means that
the state in memory is invalid).

> > > > Storing in a format suitable for userspace
> > > > usage all the time when we've got SME would most likely result in
> > > > performance overhead

> > > What performance overhead? Why should we care?
> > 
> > Since in situations where we're not using the larger VL we would need to
> > load and store the registers using a vector length other than the

...

> > and would instead need to manually compute the memory locations where
> > values are stored.  As well as the extra instructions when using the
> > smaller vector length we'd also be working with sparser data likely over
> > more cache lines.

> Are you talking about a context switch? or userspace accesses? I don't
> give a damn about the latter, as it statistically never happens. The
> former is of course of interest, but you still don't explain why the
> above is a problem.

Well, there will be a cost in any rewriting so I'm trying to shift it
away from context switch to userspace access since as you say they are
negligably frequent.

> Nothing prevent you from storing the registers using the *current* VL,
> since there is no data sharing between the SVE registers and the
> streaming-SVE ones. All you need to do is to make sure you don't mix
> the two.

You seem to be suggesting modeling the streaming mode registers as a
completely different set of registers in the KVM ABI.  That would
certainly be another option, though it's a bit unclear from an
architecture point of view and it didn't seem to entirely fit with the
handling of the FPSIMD registers when SVE is in use.  From an
architecture point of view the streaming mode registers aren't really a
separate set of registers.  When in streaming mode it is not possible to
observe the non-streaming register state and vice versa, and entering or
exiting streaming mode zeros the register state so functionally you just
have floating point registers.

You'd need to handle what happens with access to the inactive register
set from userspace, the simplest thing to implement would be to read the
logical value of zero and either discard or return an error on writes.

> > We would also need to consider if we need to zero the holes in the data
> > when saving, we'd only potentially be leaking information from the guest
> > but it might cause nasty surprises given that transitioning to/from
> > streaming mode is expected to zero values.  If we do need to zero then
> > that would be additional work that would need doing.

> The zeroing is mandated by the architecture, AFAIU. That's not optional.

Yes, we need to zero at some point - we could just do it on userspace
read though I think rather than having to do it when saving.

> > Spending more effort to implement something which also has more runtime
> > performance overhead for the case of saving and restoring guest state
> > which I expect to be vastly more common than the VMM accessing the guest
> > registers just doesn't seem like an appealing choice.

> I don't buy the runtime performance aspect at all. As long as you have
> the space to dump the largest possible VL, you can always dump it in
> the native format.

Sure, my point was that you appeared to be asking to dump in a
non-native format.  

> > If we were storing the data in the native format for the guest then that
> > format will change if streaming mode is changed via a write to SVCR.
> > This would mean that the host would need to understand that when writing
> > values SVCR needs to be written before the Z and P registers.  To be
> > clear I don't think this is a good idea.

> The architecture is crystal clear: you flip SVCR.SM, you loose all
> data in both Z and P regs. If userspace doesn't understand the
> architecture, that's their problem. The only thing we need to provide
> is a faithful emulation of the architecture.

It is not clear to me that the intention behind the KVM register ABI is
that it should have all the ordering requirements that the architecture
has, and decisions like hiding the V registers when SVE is active do
take us away from just a natural architectural point of view.  My
understanding was that it was intended that userspace should be able to
do something like just enumerate all the registers, save them and then
later on restore them without really understanding them.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 5/5] KVM: arm64: Exclude FP ownership from kvm_vcpu_arch
  2024-03-07 14:26           ` Mark Brown
@ 2024-03-09 11:01             ` Marc Zyngier
  2024-03-11 18:42               ` Mark Brown
  0 siblings, 1 reply; 18+ messages in thread
From: Marc Zyngier @ 2024-03-09 11:01 UTC (permalink / raw
  To: Mark Brown
  Cc: kvmarm, linux-arm-kernel, kvm, James Morse, Suzuki K Poulose,
	Oliver Upton, Zenghui Yu, James Clark, Anshuman Khandual

On Thu, 07 Mar 2024 14:26:30 +0000,
Mark Brown <broonie@kernel.org> wrote:
> 
> On Thu, Mar 07, 2024 at 11:10:40AM +0000, Marc Zyngier wrote:
> > Mark Brown <broonie@kernel.org> wrote:
> > > On Wed, Mar 06, 2024 at 09:43:13AM +0000, Marc Zyngier wrote:
> > > > Mark Brown <broonie@kernel.org> wrote:
> > > > > On Sat, Mar 02, 2024 at 11:19:35AM +0000, Marc Zyngier wrote:
> 
> > > > > The SME patch series proposes adding an additional state to this
> > > > > enumeration which would say if the registers are stored in a format
> > > > > suitable for exchange with userspace, that would make this state part of
> > > > > the vCPU state.  With the addition of SME we can have two vector lengths
> > > > > in play so the series proposes picking the larger to be the format for
> > > > > userspace registers.
> 
> > > > What does this addition have anything to do with the ownership of the
> > > > physical register file? Not a lot, it seems.
> 
> > > > Specially as there better be no state resident on the CPU when
> > > > userspace messes up with it.
> 
> > > If we have a situation where the state might be stored in memory in
> > > multiple formats it seems reasonable to consider the metadata which
> > > indicates which format is currently in use as part of the state.
> 
> > There is no reason why the state should be in multiple formats
> > *simultaneously*. All the FP/SIMD/SVE/SME state is largely
> > overlapping, and we only need to correctly invalidate the state that
> > isn't relevant to writes from userspace.
> 
> I agree that we don't want to store multiple copies, the proposed
> implementation for SME does that - when converting between guest native
> and userspace formats we discard the original format after conversion
> and updates the metadata which says which format is stored.  That
> metadata is modeled as adding a new owner.

What conversion? If userspace writes to FP, the upper bits of the SVE
registers should get zeroed. If it writes to SVE, it writes using the
current VL for the current streaming/non-streaming mode.

If the current userspace API doesn't give us the tools to do so, we
rev it.

>
> > > > > We could store this separately to fp_state/owner but it'd still be a
> > > > > value stored in the vCPU.
> 
> > > > I totally disagree.
> 
> > > Where would you expect to see the state stored?
> 
> > Sorry, that came out wrong. I expect *some* vcpu state to describe the
> > current use of the FP/vector registers, and that's about it. Not the
> > ownership information.
> 
> Ah, I think the distinction here is that you're modeling the state
> tracking updated by this patch as purely tracking the registers in which
> case yes, we should just add a separate variable in the vCPU for
> tracking the format of the data.  I was modeling the state tracking as
> covering both the state in the registers and the state of the storage
> backing them (since the guest state being in the registers means that
> the state in memory is invalid).
> 
> > > > > Storing in a format suitable for userspace
> > > > > usage all the time when we've got SME would most likely result in
> > > > > performance overhead
> 
> > > > What performance overhead? Why should we care?
> > > 
> > > Since in situations where we're not using the larger VL we would need to
> > > load and store the registers using a vector length other than the
> 
> ...
> 
> > > and would instead need to manually compute the memory locations where
> > > values are stored.  As well as the extra instructions when using the
> > > smaller vector length we'd also be working with sparser data likely over
> > > more cache lines.
> 
> > Are you talking about a context switch? or userspace accesses? I don't
> > give a damn about the latter, as it statistically never happens. The
> > former is of course of interest, but you still don't explain why the
> > above is a problem.
> 
> Well, there will be a cost in any rewriting so I'm trying to shift it
> away from context switch to userspace access since as you say they are
> negligably frequent.

So we're in violent agreement.

> > Nothing prevent you from storing the registers using the *current* VL,
> > since there is no data sharing between the SVE registers and the
> > streaming-SVE ones. All you need to do is to make sure you don't mix
> > the two.
> 
> You seem to be suggesting modeling the streaming mode registers as a
> completely different set of registers in the KVM ABI.

Where are you reading this? I *never* said anything of the sort. There
is only one set of SVE registers. The fact that they change size and
get randomly zeroed when userspace flips bits is a property of the
architecture. The HW *can* implements them as two sets of registers if
SME is a separate accelerator, but that's not architectural.

All I'm saying is that you can have a *single* register file, spanning
both FPSIMD and SVE, using the maximum VL achievable in the guest, and
be done with it. Yes, you'll have to refactor the code so that FPSIMD
lands at the right spot in the SVE register file. Big deal.

> That would
> certainly be another option, though it's a bit unclear from an
> architecture point of view and it didn't seem to entirely fit with the
> handling of the FPSIMD registers when SVE is in use.  From an
> architecture point of view the streaming mode registers aren't really a
> separate set of registers.  When in streaming mode it is not possible to
> observe the non-streaming register state and vice versa, and entering or
> exiting streaming mode zeros the register state so functionally you just
> have floating point registers.
> 
> You'd need to handle what happens with access to the inactive register
> set from userspace, the simplest thing to implement would be to read the
> logical value of zero and either discard or return an error on writes.
> 
> > > We would also need to consider if we need to zero the holes in the data
> > > when saving, we'd only potentially be leaking information from the guest
> > > but it might cause nasty surprises given that transitioning to/from
> > > streaming mode is expected to zero values.  If we do need to zero then
> > > that would be additional work that would need doing.
> 
> > The zeroing is mandated by the architecture, AFAIU. That's not optional.
> 
> Yes, we need to zero at some point - we could just do it on userspace
> read though I think rather than having to do it when saving.

As long as the guest only observes an architecturally compliant state,
I'm find with that.

> 
> > > Spending more effort to implement something which also has more runtime
> > > performance overhead for the case of saving and restoring guest state
> > > which I expect to be vastly more common than the VMM accessing the guest
> > > registers just doesn't seem like an appealing choice.
> 
> > I don't buy the runtime performance aspect at all. As long as you have
> > the space to dump the largest possible VL, you can always dump it in
> > the native format.
> 
> Sure, my point was that you appeared to be asking to dump in a
> non-native format.  

Again, what makes you think that? You keep putting words in my mouth.

>
> > > If we were storing the data in the native format for the guest then that
> > > format will change if streaming mode is changed via a write to SVCR.
> > > This would mean that the host would need to understand that when writing
> > > values SVCR needs to be written before the Z and P registers.  To be
> > > clear I don't think this is a good idea.
> 
> > The architecture is crystal clear: you flip SVCR.SM, you loose all
> > data in both Z and P regs. If userspace doesn't understand the
> > architecture, that's their problem. The only thing we need to provide
> > is a faithful emulation of the architecture.
> 
> It is not clear to me that the intention behind the KVM register ABI is
> that it should have all the ordering requirements that the architecture
> has, and decisions like hiding the V registers when SVE is active do
> take us away from just a natural architectural point of view.

My intention is that userspace accesses should be as close as possible
as that of a guest. If the current ABI doesn't allow this, I'm all for
changing it. Won't be the first time, won't be the last.

> My
> understanding was that it was intended that userspace should be able to
> do something like just enumerate all the registers, save them and then
> later on restore them without really understanding them.

And this statement doesn't get in the way of anything above. We own
the ABI, and can change it as we see fit when SME is enabled.

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/5] KVM: arm64: Add accessor for per-CPU state
  2024-03-04 12:05   ` Suzuki K Poulose
@ 2024-03-09 13:00     ` Marc Zyngier
  0 siblings, 0 replies; 18+ messages in thread
From: Marc Zyngier @ 2024-03-09 13:00 UTC (permalink / raw
  To: Suzuki K Poulose
  Cc: kvmarm, linux-arm-kernel, kvm, James Morse, Oliver Upton,
	Zenghui Yu, James Clark, Anshuman Khandual, Mark Brown

On Mon, 04 Mar 2024 12:05:20 +0000,
Suzuki K Poulose <suzuki.poulose@arm.com> wrote:
> 
> Hi Marc
> 
> On 02/03/2024 11:19, Marc Zyngier wrote:
> > In order to facilitate the introduction of new per-CPU state,
> > add a new host_data_ptr() helped that hides some of the per-CPU
> > verbosity, and make it easier to move that state around in the
> > future.
> > 
> 
> This series looks like a good cleanup to make the whole host
> data handling cleaner. One comment below.
> 
> > Signed-off-by: Marc Zyngier <maz@kernel.org>
> > ---
> >   arch/arm64/include/asm/kvm_host.h         | 13 +++++++++++++
> >   arch/arm64/kvm/arm.c                      |  2 +-
> >   arch/arm64/kvm/hyp/include/hyp/debug-sr.h |  4 ++--
> >   arch/arm64/kvm/hyp/include/hyp/switch.h   | 11 +++++------
> >   arch/arm64/kvm/hyp/nvhe/psci-relay.c      |  2 +-
> >   arch/arm64/kvm/hyp/nvhe/setup.c           |  3 +--
> >   arch/arm64/kvm/hyp/nvhe/switch.c          |  4 ++--
> >   arch/arm64/kvm/hyp/vhe/switch.c           |  4 ++--
> >   arch/arm64/kvm/hyp/vhe/sysreg-sr.c        |  4 ++--
> >   arch/arm64/kvm/pmu.c                      |  2 +-
> >   10 files changed, 30 insertions(+), 19 deletions(-)
> > 
> > diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> > index 21c57b812569..3ca2a9444f21 100644
> > --- a/arch/arm64/include/asm/kvm_host.h
> > +++ b/arch/arm64/include/asm/kvm_host.h
> > @@ -492,6 +492,17 @@ struct kvm_cpu_context {
> >   	u64 *vncr_array;
> >   };
> >   +/*
> > + * This structure is instanciated on a per-CPU basis, and contains
> > + * data that is:
> > + *
> > + * - tied to a single physical CPU, and
> > + * - either have a lifetime that does not extend past vcpu_put()
> > + * - or is an invariant for the lifetime of the system
> > + *
> > + * Use host_data_ptr(field) as a way to access a pointer to such a
> > + * field.
> > + */
> >   struct kvm_host_data {
> >   	struct kvm_cpu_context host_ctxt;
> >   };
> > @@ -1114,6 +1125,8 @@ struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr);
> >     DECLARE_KVM_HYP_PER_CPU(struct kvm_host_data, kvm_host_data);
> >   +#define host_data_ptr(f)	(&this_cpu_ptr(&kvm_host_data)->f)
> > +
> >   static inline void kvm_init_host_cpu_context(struct kvm_cpu_context *cpu_ctxt)
> >   {
> >   	/* The host's MPIDR is immutable, so let's set it up at boot time */
> > diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> > index a25265aca432..d3d14cc41cb5 100644
> > --- a/arch/arm64/kvm/arm.c
> > +++ b/arch/arm64/kvm/arm.c
> > @@ -1960,7 +1960,7 @@ static void cpu_set_hyp_vector(void)
> >     static void cpu_hyp_init_context(void)
> >   {
> > -	kvm_init_host_cpu_context(&this_cpu_ptr_hyp_sym(kvm_host_data)->host_ctxt);
> > +	kvm_init_host_cpu_context(host_data_ptr(host_ctxt));
> 
> This silently changes the "this_cpu_ptr_hyp_sym" to "this_cpu_ptr()"
> and thus we could be using the VHE host_data even in nVHE ?
> Rest looks fine to me.

Huh, well spotted. I'll switch the definition of host_data_ptr()
outside of the hypervisor code to use this_cpu_ptr_hyp_sym() instead,
like we have for some of the other helpers.

Thanks again,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/5] KVM: arm64: Add accessor for per-CPU state
  2024-03-02 11:19 ` [PATCH 1/5] KVM: arm64: Add accessor for per-CPU state Marc Zyngier
  2024-03-04 12:05   ` Suzuki K Poulose
@ 2024-03-11  4:50   ` Dongli Zhang
  2024-03-11 17:13     ` Marc Zyngier
  1 sibling, 1 reply; 18+ messages in thread
From: Dongli Zhang @ 2024-03-11  4:50 UTC (permalink / raw
  To: Marc Zyngier, kvmarm, linux-arm-kernel, kvm
  Cc: James Morse, Suzuki K Poulose, Oliver Upton, Zenghui Yu,
	James Clark, Anshuman Khandual, Mark Brown



On 3/2/24 03:19, Marc Zyngier wrote:
> In order to facilitate the introduction of new per-CPU state,
> add a new host_data_ptr() helped that hides some of the per-CPU
> verbosity, and make it easier to move that state around in the
> future.
> 
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/include/asm/kvm_host.h         | 13 +++++++++++++
>  arch/arm64/kvm/arm.c                      |  2 +-
>  arch/arm64/kvm/hyp/include/hyp/debug-sr.h |  4 ++--
>  arch/arm64/kvm/hyp/include/hyp/switch.h   | 11 +++++------
>  arch/arm64/kvm/hyp/nvhe/psci-relay.c      |  2 +-
>  arch/arm64/kvm/hyp/nvhe/setup.c           |  3 +--
>  arch/arm64/kvm/hyp/nvhe/switch.c          |  4 ++--
>  arch/arm64/kvm/hyp/vhe/switch.c           |  4 ++--
>  arch/arm64/kvm/hyp/vhe/sysreg-sr.c        |  4 ++--
>  arch/arm64/kvm/pmu.c                      |  2 +-
>  10 files changed, 30 insertions(+), 19 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 21c57b812569..3ca2a9444f21 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -492,6 +492,17 @@ struct kvm_cpu_context {
>  	u64 *vncr_array;
>  };
>  
> +/*
> + * This structure is instanciated on a per-CPU basis, and contains

instantiated?


May this patchset (at least the first, second, third and fifth) be qualified as
"non functional change" in the commit message?

That provides some hints when backporting this patchset to some old kernel in
the future. Thank you very much!

Dongli Zhang

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/5] KVM: arm64: Add accessor for per-CPU state
  2024-03-11  4:50   ` Dongli Zhang
@ 2024-03-11 17:13     ` Marc Zyngier
  0 siblings, 0 replies; 18+ messages in thread
From: Marc Zyngier @ 2024-03-11 17:13 UTC (permalink / raw
  To: Dongli Zhang
  Cc: kvmarm, linux-arm-kernel, kvm, James Morse, Suzuki K Poulose,
	Oliver Upton, Zenghui Yu, James Clark, Anshuman Khandual,
	Mark Brown

On Mon, 11 Mar 2024 04:50:23 +0000,
Dongli Zhang <dongli.zhang@oracle.com> wrote:
> 
> 
> 
> On 3/2/24 03:19, Marc Zyngier wrote:
> > In order to facilitate the introduction of new per-CPU state,
> > add a new host_data_ptr() helped that hides some of the per-CPU
> > verbosity, and make it easier to move that state around in the
> > future.
> > 
> > Signed-off-by: Marc Zyngier <maz@kernel.org>
> > ---
> >  arch/arm64/include/asm/kvm_host.h         | 13 +++++++++++++
> >  arch/arm64/kvm/arm.c                      |  2 +-
> >  arch/arm64/kvm/hyp/include/hyp/debug-sr.h |  4 ++--
> >  arch/arm64/kvm/hyp/include/hyp/switch.h   | 11 +++++------
> >  arch/arm64/kvm/hyp/nvhe/psci-relay.c      |  2 +-
> >  arch/arm64/kvm/hyp/nvhe/setup.c           |  3 +--
> >  arch/arm64/kvm/hyp/nvhe/switch.c          |  4 ++--
> >  arch/arm64/kvm/hyp/vhe/switch.c           |  4 ++--
> >  arch/arm64/kvm/hyp/vhe/sysreg-sr.c        |  4 ++--
> >  arch/arm64/kvm/pmu.c                      |  2 +-
> >  10 files changed, 30 insertions(+), 19 deletions(-)
> > 
> > diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> > index 21c57b812569..3ca2a9444f21 100644
> > --- a/arch/arm64/include/asm/kvm_host.h
> > +++ b/arch/arm64/include/asm/kvm_host.h
> > @@ -492,6 +492,17 @@ struct kvm_cpu_context {
> >  	u64 *vncr_array;
> >  };
> >  
> > +/*
> > + * This structure is instanciated on a per-CPU basis, and contains
> 
> instantiated?

Yup, thanks.

> 
> 
> May this patchset (at least the first, second, third and fifth) be qualified as
> "non functional change" in the commit message?

I disagree. No change is without any functional change unless it is
only changing comments, and this one definitely introduces a bug. So I
will not make any such statement.

> That provides some hints when backporting this patchset to some old kernel in
> the future. Thank you very much!

two things:

- you should not be backporting these patches. Full stop. They don't
  fix anything, they do not enable anything. They are just preliminary
  work for future things.

- you really should perform a full review of what you are backporting
  in the light of the *target* kernel. I am never going to do this job
  (I only care about the current state), and that makes your earlier
  request even less realistic.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 5/5] KVM: arm64: Exclude FP ownership from kvm_vcpu_arch
  2024-03-09 11:01             ` Marc Zyngier
@ 2024-03-11 18:42               ` Mark Brown
  0 siblings, 0 replies; 18+ messages in thread
From: Mark Brown @ 2024-03-11 18:42 UTC (permalink / raw
  To: Marc Zyngier
  Cc: kvmarm, linux-arm-kernel, kvm, James Morse, Suzuki K Poulose,
	Oliver Upton, Zenghui Yu, James Clark, Anshuman Khandual

[-- Attachment #1: Type: text/plain, Size: 5872 bytes --]

On Sat, Mar 09, 2024 at 11:01:56AM +0000, Marc Zyngier wrote:
> Mark Brown <broonie@kernel.org> wrote:

> > I agree that we don't want to store multiple copies, the proposed
> > implementation for SME does that - when converting between guest native
> > and userspace formats we discard the original format after conversion
> > and updates the metadata which says which format is stored.  That
> > metadata is modeled as adding a new owner.

> What conversion? If userspace writes to FP, the upper bits of the SVE
> registers should get zeroed. If it writes to SVE, it writes using the
> current VL for the current streaming/non-streaming mode.

> If the current userspace API doesn't give us the tools to do so, we
> rev it.

This is the conversion of vector lengths rather than something resulting
from acccessing from accessing the V registers via the sysreg interface
(which is not supported by either upstream or the currently posted SME
patches once either vector extension is enabled).  As previously
mentioned the current implementation of SME support for KVM always
presents the vector length sized registers to userspace with the maximum
vector length, not the vector length currently selected by SVCR.

> > > Nothing prevent you from storing the registers using the *current* VL,
> > > since there is no data sharing between the SVE registers and the
> > > streaming-SVE ones. All you need to do is to make sure you don't mix
> > > the two.

> > You seem to be suggesting modeling the streaming mode registers as a
> > completely different set of registers in the KVM ABI.

> Where are you reading this? I *never* said anything of the sort. There
> is only one set of SVE registers. The fact that they change size and
> get randomly zeroed when userspace flips bits is a property of the
> architecture. The HW *can* implements them as two sets of registers if
> SME is a separate accelerator, but that's not architectural.

When you talk about there being "no data sharing" and "mixing the two"
that sounds a lot like there might be two separate sets of data.  I've
been inferring a lot of what you are looking for from statements that
you make about other ideas and from the existing code and documentation
so things are less clear than they might be and it's likely there's been
some assumptions missed in places.  In order to try to clarify things
I've written down a summary of what I currently understand you're
looking for below.

> All I'm saying is that you can have a *single* register file, spanning
> both FPSIMD and SVE, using the maximum VL achievable in the guest, and

When you say "using" there I take it that you mean "sized to store"
rather than "written/accessed in the format for" since elsewhere you've
been taking about storing the current native format and changing the VL
in the ABI based on SVCR.SM?  I would have read "using the maximum VL
achievable in the guest" as meaning we store in a format reflecting the
larger VL but I'm pretty sure it's just the allocation you're
referencing there.

To be clear what I'm currently understanding for the userspace ABI is:

 - Create a requirement for userspace to set SVCR prior to setting any
   vector impacted register to ensure the correct format and that data
   isn't zeroed when SVCR is set.
 - Use the value of SVCR.SM and the guest maximum SVE and SME VLs to
   select the currently visible vector length for the Z, P and FFR
   registers.
 - This also implies discarding or failing all writes to ZA and ZT0
   unless SVCR.ZA is set for consistency, though that's much less
   impactful in terms of the implementation.
 - Add support for the V registers in the sysreg interface when SVE is
   enabled.

then the implementation can do what it likes to achieve that, the most
obvious thing being to store in native format for the current hardware
mode based on SVCR.{SM,ZA}.  Does that sound about right?

If that's all good the main thing I'm unclear on your expectations for
is what should happen with registers that are inaccessible in the
current mode (Z, P and FFR when in non-streaming mode without SVE, FFR
when in streaming mode without FA64, and ZA and ZT0 when SVCR.ZA is 0).
Having these registers cause errors on access as they would in the
architecture but I worry about complicating and potentially breaking the
ABI by having enumerable but inaccessible registers, though I think
that's more in line with your general thinking.  The main alternative
would be RAZ/WI style behavior which isn't quite what the architecture
does but does give what the guest would observe when it changes modes
and looks at the contents of those registers.

> be done with it. Yes, you'll have to refactor the code so that FPSIMD
> lands at the right spot in the SVE register file. Big deal.

At present the ABI just disables the V registers when SVE is enabled, I
was slightly surprised at that implementation and was concerned there
might be something that was considered desirable about the removal of
ambiguity (the changelog for the relevant commit did mention that it was
convenient to implement).

> > My
> > understanding was that it was intended that userspace should be able to
> > do something like just enumerate all the registers, save them and then
> > later on restore them without really understanding them.

> And this statement doesn't get in the way of anything above. We own
> the ABI, and can change it as we see fit when SME is enabled.

If you're willing to require specific ordering knowledge to do guest
state load then that makes life a bit easier for the implementation.  It
had appeared that VMMs could have existing ABI expectations about how
they can enumerate, store and load the state the guest implements which
we would cause problems by breaking.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2024-03-11 18:42 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-03-02 11:19 [PATCH 0/5] KVM: arm64: Move host-specific data out of kvm_vcpu_arch Marc Zyngier
2024-03-02 11:19 ` [PATCH 1/5] KVM: arm64: Add accessor for per-CPU state Marc Zyngier
2024-03-04 12:05   ` Suzuki K Poulose
2024-03-09 13:00     ` Marc Zyngier
2024-03-11  4:50   ` Dongli Zhang
2024-03-11 17:13     ` Marc Zyngier
2024-03-02 11:19 ` [PATCH 2/5] KVM: arm64: Exclude host_debug_data from vcpu_arch Marc Zyngier
2024-03-02 11:19 ` [PATCH 3/5] KVM: arm64: Exclude mdcr_el2_host from kvm_vcpu_arch Marc Zyngier
2024-03-02 11:19 ` [PATCH 4/5] KVM: arm64: Exclude host_fpsimd_state pointer " Marc Zyngier
2024-03-04 20:45   ` Mark Brown
2024-03-02 11:19 ` [PATCH 5/5] KVM: arm64: Exclude FP ownership " Marc Zyngier
2024-03-04 19:10   ` Mark Brown
2024-03-06  9:43     ` Marc Zyngier
2024-03-06 22:19       ` Mark Brown
2024-03-07 11:10         ` Marc Zyngier
2024-03-07 14:26           ` Mark Brown
2024-03-09 11:01             ` Marc Zyngier
2024-03-11 18:42               ` Mark Brown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).