LinuxPPC-Dev Archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/7] arch/mm/fault: accelerate pagefault when badaccess
@ 2024-04-02  7:51 Kefeng Wang
  2024-04-02  7:51 ` [PATCH 1/7] arm64: mm: cleanup __do_page_fault() Kefeng Wang
                   ` (6 more replies)
  0 siblings, 7 replies; 20+ messages in thread
From: Kefeng Wang @ 2024-04-02  7:51 UTC (permalink / raw
  To: akpm
  Cc: x86, linux-s390, Albert Ou, linuxppc-dev, Peter Zijlstra,
	Catalin Marinas, Kefeng Wang, Paul Walmsley, Russell King, surenb,
	Dave Hansen, linux-riscv, Palmer Dabbelt, Nicholas Piggin,
	Andy Lutomirski, Alexander Gordeev, Will Deacon, Gerald Schaefer,
	linux-arm-kernel

After VMA lock-based page fault handling enabled, if bad access met
under per-vma lock, it will fallback to mmap_lock-based handling,
so it leads to unnessary mmap lock and vma find again. A test from
lmbench shows 34% improve after this changes on arm64,

  lat_sig -P 1 prot lat_sig 0.29194 -> 0.19198

Only build test on other archs except arm64.

Kefeng Wang (7):
  arm64: mm: cleanup __do_page_fault()
  arm64: mm: accelerate pagefault when VM_FAULT_BADACCESS
  arm: mm: accelerate pagefault when VM_FAULT_BADACCESS
  powerpc: mm: accelerate pagefault when badaccess
  riscv: mm: accelerate pagefault when badaccess
  s390: mm: accelerate pagefault when badaccess
  x86: mm: accelerate pagefault when badaccess

 arch/arm/mm/fault.c     |  4 +++-
 arch/arm64/mm/fault.c   | 31 ++++++++++---------------------
 arch/powerpc/mm/fault.c | 33 ++++++++++++++++++++-------------
 arch/riscv/mm/fault.c   |  5 ++++-
 arch/s390/mm/fault.c    |  3 ++-
 arch/x86/mm/fault.c     | 23 ++++++++++++++---------
 6 files changed, 53 insertions(+), 46 deletions(-)

-- 
2.27.0


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 1/7] arm64: mm: cleanup __do_page_fault()
  2024-04-02  7:51 [PATCH 0/7] arch/mm/fault: accelerate pagefault when badaccess Kefeng Wang
@ 2024-04-02  7:51 ` Kefeng Wang
  2024-04-03  5:11   ` Suren Baghdasaryan
  2024-04-03 18:24   ` Catalin Marinas
  2024-04-02  7:51 ` [PATCH 2/7] arm64: mm: accelerate pagefault when VM_FAULT_BADACCESS Kefeng Wang
                   ` (5 subsequent siblings)
  6 siblings, 2 replies; 20+ messages in thread
From: Kefeng Wang @ 2024-04-02  7:51 UTC (permalink / raw
  To: akpm
  Cc: x86, linux-s390, Albert Ou, linuxppc-dev, Peter Zijlstra,
	Catalin Marinas, Kefeng Wang, Paul Walmsley, Russell King, surenb,
	Dave Hansen, linux-riscv, Palmer Dabbelt, Nicholas Piggin,
	Andy Lutomirski, Alexander Gordeev, Will Deacon, Gerald Schaefer,
	linux-arm-kernel

The __do_page_fault() only check vma->flags and call handle_mm_fault(),
and only called by do_page_fault(), let's squash it into do_page_fault()
to cleanup code.

Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
---
 arch/arm64/mm/fault.c | 27 +++++++--------------------
 1 file changed, 7 insertions(+), 20 deletions(-)

diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index 8251e2fea9c7..9bb9f395351a 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -486,25 +486,6 @@ static void do_bad_area(unsigned long far, unsigned long esr,
 	}
 }
 
-#define VM_FAULT_BADMAP		((__force vm_fault_t)0x010000)
-#define VM_FAULT_BADACCESS	((__force vm_fault_t)0x020000)
-
-static vm_fault_t __do_page_fault(struct mm_struct *mm,
-				  struct vm_area_struct *vma, unsigned long addr,
-				  unsigned int mm_flags, unsigned long vm_flags,
-				  struct pt_regs *regs)
-{
-	/*
-	 * Ok, we have a good vm_area for this memory access, so we can handle
-	 * it.
-	 * Check that the permissions on the VMA allow for the fault which
-	 * occurred.
-	 */
-	if (!(vma->vm_flags & vm_flags))
-		return VM_FAULT_BADACCESS;
-	return handle_mm_fault(vma, addr, mm_flags, regs);
-}
-
 static bool is_el0_instruction_abort(unsigned long esr)
 {
 	return ESR_ELx_EC(esr) == ESR_ELx_EC_IABT_LOW;
@@ -519,6 +500,9 @@ static bool is_write_abort(unsigned long esr)
 	return (esr & ESR_ELx_WNR) && !(esr & ESR_ELx_CM);
 }
 
+#define VM_FAULT_BADMAP		((__force vm_fault_t)0x010000)
+#define VM_FAULT_BADACCESS	((__force vm_fault_t)0x020000)
+
 static int __kprobes do_page_fault(unsigned long far, unsigned long esr,
 				   struct pt_regs *regs)
 {
@@ -617,7 +601,10 @@ static int __kprobes do_page_fault(unsigned long far, unsigned long esr,
 		goto done;
 	}
 
-	fault = __do_page_fault(mm, vma, addr, mm_flags, vm_flags, regs);
+	if (!(vma->vm_flags & vm_flags))
+		fault = VM_FAULT_BADACCESS;
+	else
+		fault = handle_mm_fault(vma, addr, mm_flags, regs);
 
 	/* Quick path to respond to signals */
 	if (fault_signal_pending(fault, regs)) {
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 2/7] arm64: mm: accelerate pagefault when VM_FAULT_BADACCESS
  2024-04-02  7:51 [PATCH 0/7] arch/mm/fault: accelerate pagefault when badaccess Kefeng Wang
  2024-04-02  7:51 ` [PATCH 1/7] arm64: mm: cleanup __do_page_fault() Kefeng Wang
@ 2024-04-02  7:51 ` Kefeng Wang
  2024-04-03  5:19   ` Suren Baghdasaryan
  2024-04-03 18:32   ` Catalin Marinas
  2024-04-02  7:51 ` [PATCH 3/7] arm: " Kefeng Wang
                   ` (4 subsequent siblings)
  6 siblings, 2 replies; 20+ messages in thread
From: Kefeng Wang @ 2024-04-02  7:51 UTC (permalink / raw
  To: akpm
  Cc: x86, linux-s390, Albert Ou, linuxppc-dev, Peter Zijlstra,
	Catalin Marinas, Kefeng Wang, Paul Walmsley, Russell King, surenb,
	Dave Hansen, linux-riscv, Palmer Dabbelt, Nicholas Piggin,
	Andy Lutomirski, Alexander Gordeev, Will Deacon, Gerald Schaefer,
	linux-arm-kernel

The vm_flags of vma already checked under per-VMA lock, if it is a
bad access, directly set fault to VM_FAULT_BADACCESS and handle error,
no need to lock_mm_and_find_vma() and check vm_flags again, the latency
time reduce 34% in lmbench 'lat_sig -P 1 prot lat_sig'.

Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
---
 arch/arm64/mm/fault.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index 9bb9f395351a..405f9aa831bd 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -572,7 +572,9 @@ static int __kprobes do_page_fault(unsigned long far, unsigned long esr,
 
 	if (!(vma->vm_flags & vm_flags)) {
 		vma_end_read(vma);
-		goto lock_mmap;
+		fault = VM_FAULT_BADACCESS;
+		count_vm_vma_lock_event(VMA_LOCK_SUCCESS);
+		goto done;
 	}
 	fault = handle_mm_fault(vma, addr, mm_flags | FAULT_FLAG_VMA_LOCK, regs);
 	if (!(fault & (VM_FAULT_RETRY | VM_FAULT_COMPLETED)))
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 3/7] arm: mm: accelerate pagefault when VM_FAULT_BADACCESS
  2024-04-02  7:51 [PATCH 0/7] arch/mm/fault: accelerate pagefault when badaccess Kefeng Wang
  2024-04-02  7:51 ` [PATCH 1/7] arm64: mm: cleanup __do_page_fault() Kefeng Wang
  2024-04-02  7:51 ` [PATCH 2/7] arm64: mm: accelerate pagefault when VM_FAULT_BADACCESS Kefeng Wang
@ 2024-04-02  7:51 ` Kefeng Wang
  2024-04-03  5:30   ` Suren Baghdasaryan
  2024-04-02  7:51 ` [PATCH 4/7] powerpc: mm: accelerate pagefault when badaccess Kefeng Wang
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 20+ messages in thread
From: Kefeng Wang @ 2024-04-02  7:51 UTC (permalink / raw
  To: akpm
  Cc: x86, linux-s390, Albert Ou, linuxppc-dev, Peter Zijlstra,
	Catalin Marinas, Kefeng Wang, Paul Walmsley, Russell King, surenb,
	Dave Hansen, linux-riscv, Palmer Dabbelt, Nicholas Piggin,
	Andy Lutomirski, Alexander Gordeev, Will Deacon, Gerald Schaefer,
	linux-arm-kernel

The vm_flags of vma already checked under per-VMA lock, if it is a
bad access, directly set fault to VM_FAULT_BADACCESS and handle error,
so no need to lock_mm_and_find_vma() and check vm_flags again.

Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
---
 arch/arm/mm/fault.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/arm/mm/fault.c b/arch/arm/mm/fault.c
index 439dc6a26bb9..5c4b417e24f9 100644
--- a/arch/arm/mm/fault.c
+++ b/arch/arm/mm/fault.c
@@ -294,7 +294,9 @@ do_page_fault(unsigned long addr, unsigned int fsr, struct pt_regs *regs)
 
 	if (!(vma->vm_flags & vm_flags)) {
 		vma_end_read(vma);
-		goto lock_mmap;
+		count_vm_vma_lock_event(VMA_LOCK_SUCCESS);
+		fault = VM_FAULT_BADACCESS;
+		goto bad_area;
 	}
 	fault = handle_mm_fault(vma, addr, flags | FAULT_FLAG_VMA_LOCK, regs);
 	if (!(fault & (VM_FAULT_RETRY | VM_FAULT_COMPLETED)))
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 4/7] powerpc: mm: accelerate pagefault when badaccess
  2024-04-02  7:51 [PATCH 0/7] arch/mm/fault: accelerate pagefault when badaccess Kefeng Wang
                   ` (2 preceding siblings ...)
  2024-04-02  7:51 ` [PATCH 3/7] arm: " Kefeng Wang
@ 2024-04-02  7:51 ` Kefeng Wang
  2024-04-03  5:34   ` Suren Baghdasaryan
  2024-04-02  7:51 ` [PATCH 5/7] riscv: " Kefeng Wang
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 20+ messages in thread
From: Kefeng Wang @ 2024-04-02  7:51 UTC (permalink / raw
  To: akpm
  Cc: x86, linux-s390, Albert Ou, linuxppc-dev, Peter Zijlstra,
	Catalin Marinas, Kefeng Wang, Paul Walmsley, Russell King, surenb,
	Dave Hansen, linux-riscv, Palmer Dabbelt, Nicholas Piggin,
	Andy Lutomirski, Alexander Gordeev, Will Deacon, Gerald Schaefer,
	linux-arm-kernel

The vm_flags of vma already checked under per-VMA lock, if it is a
bad access, directly handle error and return, there is no need to
lock_mm_and_find_vma() and check vm_flags again.

Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
---
 arch/powerpc/mm/fault.c | 33 ++++++++++++++++++++-------------
 1 file changed, 20 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
index 53335ae21a40..215690452495 100644
--- a/arch/powerpc/mm/fault.c
+++ b/arch/powerpc/mm/fault.c
@@ -71,23 +71,26 @@ static noinline int bad_area_nosemaphore(struct pt_regs *regs, unsigned long add
 	return __bad_area_nosemaphore(regs, address, SEGV_MAPERR);
 }
 
-static int __bad_area(struct pt_regs *regs, unsigned long address, int si_code)
+static int __bad_area(struct pt_regs *regs, unsigned long address, int si_code,
+		      struct mm_struct *mm, struct vm_area_struct *vma)
 {
-	struct mm_struct *mm = current->mm;
 
 	/*
 	 * Something tried to access memory that isn't in our memory map..
 	 * Fix it, but check if it's kernel or user first..
 	 */
-	mmap_read_unlock(mm);
+	if (mm)
+		mmap_read_unlock(mm);
+	else
+		vma_end_read(vma);
 
 	return __bad_area_nosemaphore(regs, address, si_code);
 }
 
 static noinline int bad_access_pkey(struct pt_regs *regs, unsigned long address,
+				    struct mm_struct *mm,
 				    struct vm_area_struct *vma)
 {
-	struct mm_struct *mm = current->mm;
 	int pkey;
 
 	/*
@@ -109,7 +112,10 @@ static noinline int bad_access_pkey(struct pt_regs *regs, unsigned long address,
 	 */
 	pkey = vma_pkey(vma);
 
-	mmap_read_unlock(mm);
+	if (mm)
+		mmap_read_unlock(mm);
+	else
+		vma_end_read(vma);
 
 	/*
 	 * If we are in kernel mode, bail out with a SEGV, this will
@@ -124,9 +130,10 @@ static noinline int bad_access_pkey(struct pt_regs *regs, unsigned long address,
 	return 0;
 }
 
-static noinline int bad_access(struct pt_regs *regs, unsigned long address)
+static noinline int bad_access(struct pt_regs *regs, unsigned long address,
+			       struct mm_struct *mm, struct vm_area_struct *vma)
 {
-	return __bad_area(regs, address, SEGV_ACCERR);
+	return __bad_area(regs, address, SEGV_ACCERR, mm, vma);
 }
 
 static int do_sigbus(struct pt_regs *regs, unsigned long address,
@@ -479,13 +486,13 @@ static int ___do_page_fault(struct pt_regs *regs, unsigned long address,
 
 	if (unlikely(access_pkey_error(is_write, is_exec,
 				       (error_code & DSISR_KEYFAULT), vma))) {
-		vma_end_read(vma);
-		goto lock_mmap;
+		count_vm_vma_lock_event(VMA_LOCK_SUCCESS);
+		return bad_access_pkey(regs, address, NULL, vma);
 	}
 
 	if (unlikely(access_error(is_write, is_exec, vma))) {
-		vma_end_read(vma);
-		goto lock_mmap;
+		count_vm_vma_lock_event(VMA_LOCK_SUCCESS);
+		return bad_access(regs, address, NULL, vma);
 	}
 
 	fault = handle_mm_fault(vma, address, flags | FAULT_FLAG_VMA_LOCK, regs);
@@ -521,10 +528,10 @@ static int ___do_page_fault(struct pt_regs *regs, unsigned long address,
 
 	if (unlikely(access_pkey_error(is_write, is_exec,
 				       (error_code & DSISR_KEYFAULT), vma)))
-		return bad_access_pkey(regs, address, vma);
+		return bad_access_pkey(regs, address, mm, vma);
 
 	if (unlikely(access_error(is_write, is_exec, vma)))
-		return bad_access(regs, address);
+		return bad_access(regs, address, mm, vma);
 
 	/*
 	 * If for any reason at all we couldn't handle the fault,
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 5/7] riscv: mm: accelerate pagefault when badaccess
  2024-04-02  7:51 [PATCH 0/7] arch/mm/fault: accelerate pagefault when badaccess Kefeng Wang
                   ` (3 preceding siblings ...)
  2024-04-02  7:51 ` [PATCH 4/7] powerpc: mm: accelerate pagefault when badaccess Kefeng Wang
@ 2024-04-02  7:51 ` Kefeng Wang
  2024-04-03  5:37   ` Suren Baghdasaryan
  2024-04-02  7:51 ` [PATCH 6/7] s390: " Kefeng Wang
  2024-04-02  7:51 ` [PATCH 7/7] x86: " Kefeng Wang
  6 siblings, 1 reply; 20+ messages in thread
From: Kefeng Wang @ 2024-04-02  7:51 UTC (permalink / raw
  To: akpm
  Cc: x86, linux-s390, Albert Ou, linuxppc-dev, Peter Zijlstra,
	Catalin Marinas, Kefeng Wang, Paul Walmsley, Russell King, surenb,
	Dave Hansen, linux-riscv, Palmer Dabbelt, Nicholas Piggin,
	Andy Lutomirski, Alexander Gordeev, Will Deacon, Gerald Schaefer,
	linux-arm-kernel

The vm_flags of vma already checked under per-VMA lock, if it is a
bad access, directly handle error and return, there is no need to
lock_mm_and_find_vma() and check vm_flags again.

Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
---
 arch/riscv/mm/fault.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/riscv/mm/fault.c b/arch/riscv/mm/fault.c
index 3ba1d4dde5dd..b3fcf7d67efb 100644
--- a/arch/riscv/mm/fault.c
+++ b/arch/riscv/mm/fault.c
@@ -292,7 +292,10 @@ void handle_page_fault(struct pt_regs *regs)
 
 	if (unlikely(access_error(cause, vma))) {
 		vma_end_read(vma);
-		goto lock_mmap;
+		count_vm_vma_lock_event(VMA_LOCK_SUCCESS);
+		tsk->thread.bad_cause = SEGV_ACCERR;
+		bad_area_nosemaphore(regs, code, addr);
+		return;
 	}
 
 	fault = handle_mm_fault(vma, addr, flags | FAULT_FLAG_VMA_LOCK, regs);
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 6/7] s390: mm: accelerate pagefault when badaccess
  2024-04-02  7:51 [PATCH 0/7] arch/mm/fault: accelerate pagefault when badaccess Kefeng Wang
                   ` (4 preceding siblings ...)
  2024-04-02  7:51 ` [PATCH 5/7] riscv: " Kefeng Wang
@ 2024-04-02  7:51 ` Kefeng Wang
  2024-04-02  7:51 ` [PATCH 7/7] x86: " Kefeng Wang
  6 siblings, 0 replies; 20+ messages in thread
From: Kefeng Wang @ 2024-04-02  7:51 UTC (permalink / raw
  To: akpm
  Cc: x86, linux-s390, Albert Ou, linuxppc-dev, Peter Zijlstra,
	Catalin Marinas, Kefeng Wang, Paul Walmsley, Russell King, surenb,
	Dave Hansen, linux-riscv, Palmer Dabbelt, Nicholas Piggin,
	Andy Lutomirski, Alexander Gordeev, Will Deacon, Gerald Schaefer,
	linux-arm-kernel

The vm_flags of vma already checked under per-VMA lock, if it is a
bad access, directly handle error and return, there is no need to
lock_mm_and_find_vma() and check vm_flags again.

Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
---
 arch/s390/mm/fault.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/s390/mm/fault.c b/arch/s390/mm/fault.c
index c421dd44ffbe..162ca2576fd4 100644
--- a/arch/s390/mm/fault.c
+++ b/arch/s390/mm/fault.c
@@ -325,7 +325,8 @@ static void do_exception(struct pt_regs *regs, int access)
 		goto lock_mmap;
 	if (!(vma->vm_flags & access)) {
 		vma_end_read(vma);
-		goto lock_mmap;
+		count_vm_vma_lock_event(VMA_LOCK_SUCCESS);
+		return handle_fault_error_nolock(regs, SEGV_ACCERR);
 	}
 	fault = handle_mm_fault(vma, address, flags | FAULT_FLAG_VMA_LOCK, regs);
 	if (!(fault & (VM_FAULT_RETRY | VM_FAULT_COMPLETED)))
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 7/7] x86: mm: accelerate pagefault when badaccess
  2024-04-02  7:51 [PATCH 0/7] arch/mm/fault: accelerate pagefault when badaccess Kefeng Wang
                   ` (5 preceding siblings ...)
  2024-04-02  7:51 ` [PATCH 6/7] s390: " Kefeng Wang
@ 2024-04-02  7:51 ` Kefeng Wang
  2024-04-03  5:59   ` Suren Baghdasaryan
  6 siblings, 1 reply; 20+ messages in thread
From: Kefeng Wang @ 2024-04-02  7:51 UTC (permalink / raw
  To: akpm
  Cc: x86, linux-s390, Albert Ou, linuxppc-dev, Peter Zijlstra,
	Catalin Marinas, Kefeng Wang, Paul Walmsley, Russell King, surenb,
	Dave Hansen, linux-riscv, Palmer Dabbelt, Nicholas Piggin,
	Andy Lutomirski, Alexander Gordeev, Will Deacon, Gerald Schaefer,
	linux-arm-kernel

The vm_flags of vma already checked under per-VMA lock, if it is a
bad access, directly handle error and return, there is no need to
lock_mm_and_find_vma() and check vm_flags again.

Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
---
 arch/x86/mm/fault.c | 23 ++++++++++++++---------
 1 file changed, 14 insertions(+), 9 deletions(-)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index a4cc20d0036d..67b18adc75dd 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -866,14 +866,17 @@ bad_area_nosemaphore(struct pt_regs *regs, unsigned long error_code,
 
 static void
 __bad_area(struct pt_regs *regs, unsigned long error_code,
-	   unsigned long address, u32 pkey, int si_code)
+	   unsigned long address, struct mm_struct *mm,
+	   struct vm_area_struct *vma, u32 pkey, int si_code)
 {
-	struct mm_struct *mm = current->mm;
 	/*
 	 * Something tried to access memory that isn't in our memory map..
 	 * Fix it, but check if it's kernel or user first..
 	 */
-	mmap_read_unlock(mm);
+	if (mm)
+		mmap_read_unlock(mm);
+	else
+		vma_end_read(vma);
 
 	__bad_area_nosemaphore(regs, error_code, address, pkey, si_code);
 }
@@ -897,7 +900,8 @@ static inline bool bad_area_access_from_pkeys(unsigned long error_code,
 
 static noinline void
 bad_area_access_error(struct pt_regs *regs, unsigned long error_code,
-		      unsigned long address, struct vm_area_struct *vma)
+		      unsigned long address, struct mm_struct *mm,
+		      struct vm_area_struct *vma)
 {
 	/*
 	 * This OSPKE check is not strictly necessary at runtime.
@@ -927,9 +931,9 @@ bad_area_access_error(struct pt_regs *regs, unsigned long error_code,
 		 */
 		u32 pkey = vma_pkey(vma);
 
-		__bad_area(regs, error_code, address, pkey, SEGV_PKUERR);
+		__bad_area(regs, error_code, address, mm, vma, pkey, SEGV_PKUERR);
 	} else {
-		__bad_area(regs, error_code, address, 0, SEGV_ACCERR);
+		__bad_area(regs, error_code, address, mm, vma, 0, SEGV_ACCERR);
 	}
 }
 
@@ -1357,8 +1361,9 @@ void do_user_addr_fault(struct pt_regs *regs,
 		goto lock_mmap;
 
 	if (unlikely(access_error(error_code, vma))) {
-		vma_end_read(vma);
-		goto lock_mmap;
+		bad_area_access_error(regs, error_code, address, NULL, vma);
+		count_vm_vma_lock_event(VMA_LOCK_SUCCESS);
+		return;
 	}
 	fault = handle_mm_fault(vma, address, flags | FAULT_FLAG_VMA_LOCK, regs);
 	if (!(fault & (VM_FAULT_RETRY | VM_FAULT_COMPLETED)))
@@ -1394,7 +1399,7 @@ void do_user_addr_fault(struct pt_regs *regs,
 	 * we can handle it..
 	 */
 	if (unlikely(access_error(error_code, vma))) {
-		bad_area_access_error(regs, error_code, address, vma);
+		bad_area_access_error(regs, error_code, address, mm, vma);
 		return;
 	}
 
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/7] arm64: mm: cleanup __do_page_fault()
  2024-04-02  7:51 ` [PATCH 1/7] arm64: mm: cleanup __do_page_fault() Kefeng Wang
@ 2024-04-03  5:11   ` Suren Baghdasaryan
  2024-04-03 18:24   ` Catalin Marinas
  1 sibling, 0 replies; 20+ messages in thread
From: Suren Baghdasaryan @ 2024-04-03  5:11 UTC (permalink / raw
  To: Kefeng Wang
  Cc: x86, linux-s390, Albert Ou, linuxppc-dev, Peter Zijlstra,
	Catalin Marinas, Paul Walmsley, Russell King, Nicholas Piggin,
	Dave Hansen, linux-riscv, Palmer Dabbelt, Andy Lutomirski, akpm,
	Gerald Schaefer, Will Deacon, Alexander Gordeev, linux-arm-kernel

On Tue, Apr 2, 2024 at 12:53 AM Kefeng Wang <wangkefeng.wang@huawei.com> wrote:
>
> The __do_page_fault() only check vma->flags and call handle_mm_fault(),
> and only called by do_page_fault(), let's squash it into do_page_fault()
> to cleanup code.
>
> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>

Reviewed-by: Suren Baghdasaryan <surenb@google.com>

> ---
>  arch/arm64/mm/fault.c | 27 +++++++--------------------
>  1 file changed, 7 insertions(+), 20 deletions(-)
>
> diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
> index 8251e2fea9c7..9bb9f395351a 100644
> --- a/arch/arm64/mm/fault.c
> +++ b/arch/arm64/mm/fault.c
> @@ -486,25 +486,6 @@ static void do_bad_area(unsigned long far, unsigned long esr,
>         }
>  }
>
> -#define VM_FAULT_BADMAP                ((__force vm_fault_t)0x010000)
> -#define VM_FAULT_BADACCESS     ((__force vm_fault_t)0x020000)
> -
> -static vm_fault_t __do_page_fault(struct mm_struct *mm,
> -                                 struct vm_area_struct *vma, unsigned long addr,
> -                                 unsigned int mm_flags, unsigned long vm_flags,
> -                                 struct pt_regs *regs)
> -{
> -       /*
> -        * Ok, we have a good vm_area for this memory access, so we can handle
> -        * it.
> -        * Check that the permissions on the VMA allow for the fault which
> -        * occurred.
> -        */
> -       if (!(vma->vm_flags & vm_flags))
> -               return VM_FAULT_BADACCESS;
> -       return handle_mm_fault(vma, addr, mm_flags, regs);
> -}
> -
>  static bool is_el0_instruction_abort(unsigned long esr)
>  {
>         return ESR_ELx_EC(esr) == ESR_ELx_EC_IABT_LOW;
> @@ -519,6 +500,9 @@ static bool is_write_abort(unsigned long esr)
>         return (esr & ESR_ELx_WNR) && !(esr & ESR_ELx_CM);
>  }
>
> +#define VM_FAULT_BADMAP                ((__force vm_fault_t)0x010000)
> +#define VM_FAULT_BADACCESS     ((__force vm_fault_t)0x020000)
> +
>  static int __kprobes do_page_fault(unsigned long far, unsigned long esr,
>                                    struct pt_regs *regs)
>  {
> @@ -617,7 +601,10 @@ static int __kprobes do_page_fault(unsigned long far, unsigned long esr,
>                 goto done;
>         }
>
> -       fault = __do_page_fault(mm, vma, addr, mm_flags, vm_flags, regs);
> +       if (!(vma->vm_flags & vm_flags))
> +               fault = VM_FAULT_BADACCESS;
> +       else
> +               fault = handle_mm_fault(vma, addr, mm_flags, regs);
>
>         /* Quick path to respond to signals */
>         if (fault_signal_pending(fault, regs)) {
> --
> 2.27.0
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/7] arm64: mm: accelerate pagefault when VM_FAULT_BADACCESS
  2024-04-02  7:51 ` [PATCH 2/7] arm64: mm: accelerate pagefault when VM_FAULT_BADACCESS Kefeng Wang
@ 2024-04-03  5:19   ` Suren Baghdasaryan
  2024-04-03  5:30     ` Suren Baghdasaryan
  2024-04-03 18:32   ` Catalin Marinas
  1 sibling, 1 reply; 20+ messages in thread
From: Suren Baghdasaryan @ 2024-04-03  5:19 UTC (permalink / raw
  To: Kefeng Wang
  Cc: x86, linux-s390, Albert Ou, linuxppc-dev, Peter Zijlstra,
	Catalin Marinas, Paul Walmsley, Russell King, Nicholas Piggin,
	Dave Hansen, linux-riscv, Palmer Dabbelt, Andy Lutomirski, akpm,
	Gerald Schaefer, Will Deacon, Alexander Gordeev, linux-arm-kernel

On Tue, Apr 2, 2024 at 12:53 AM Kefeng Wang <wangkefeng.wang@huawei.com> wrote:
>
> The vm_flags of vma already checked under per-VMA lock, if it is a
> bad access, directly set fault to VM_FAULT_BADACCESS and handle error,
> no need to lock_mm_and_find_vma() and check vm_flags again, the latency
> time reduce 34% in lmbench 'lat_sig -P 1 prot lat_sig'.

The change makes sense to me. Per-VMA lock is enough to keep
vma->vm_flags stable, so no need to retry with mmap_lock.

>
> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>

Reviewed-by: Suren Baghdasaryan <surenb@google.com>

> ---
>  arch/arm64/mm/fault.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
> index 9bb9f395351a..405f9aa831bd 100644
> --- a/arch/arm64/mm/fault.c
> +++ b/arch/arm64/mm/fault.c
> @@ -572,7 +572,9 @@ static int __kprobes do_page_fault(unsigned long far, unsigned long esr,
>
>         if (!(vma->vm_flags & vm_flags)) {
>                 vma_end_read(vma);
> -               goto lock_mmap;
> +               fault = VM_FAULT_BADACCESS;
> +               count_vm_vma_lock_event(VMA_LOCK_SUCCESS);

nit: VMA_LOCK_SUCCESS accounting here seems correct to me but
unrelated to the main change. Either splitting into a separate patch
or mentioning this additional fixup in the changelog would be helpful.

> +               goto done;
>         }
>         fault = handle_mm_fault(vma, addr, mm_flags | FAULT_FLAG_VMA_LOCK, regs);
>         if (!(fault & (VM_FAULT_RETRY | VM_FAULT_COMPLETED)))
> --
> 2.27.0
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/7] arm64: mm: accelerate pagefault when VM_FAULT_BADACCESS
  2024-04-03  5:19   ` Suren Baghdasaryan
@ 2024-04-03  5:30     ` Suren Baghdasaryan
  2024-04-03  6:13       ` Kefeng Wang
  0 siblings, 1 reply; 20+ messages in thread
From: Suren Baghdasaryan @ 2024-04-03  5:30 UTC (permalink / raw
  To: Kefeng Wang
  Cc: x86, linux-s390, Albert Ou, linuxppc-dev, Peter Zijlstra,
	Catalin Marinas, Paul Walmsley, Russell King, Nicholas Piggin,
	Dave Hansen, linux-riscv, Palmer Dabbelt, Andy Lutomirski, akpm,
	Gerald Schaefer, Will Deacon, Alexander Gordeev, linux-arm-kernel

On Tue, Apr 2, 2024 at 10:19 PM Suren Baghdasaryan <surenb@google.com> wrote:
>
> On Tue, Apr 2, 2024 at 12:53 AM Kefeng Wang <wangkefeng.wang@huawei.com> wrote:
> >
> > The vm_flags of vma already checked under per-VMA lock, if it is a
> > bad access, directly set fault to VM_FAULT_BADACCESS and handle error,
> > no need to lock_mm_and_find_vma() and check vm_flags again, the latency
> > time reduce 34% in lmbench 'lat_sig -P 1 prot lat_sig'.
>
> The change makes sense to me. Per-VMA lock is enough to keep
> vma->vm_flags stable, so no need to retry with mmap_lock.
>
> >
> > Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
>
> Reviewed-by: Suren Baghdasaryan <surenb@google.com>
>
> > ---
> >  arch/arm64/mm/fault.c | 4 +++-
> >  1 file changed, 3 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
> > index 9bb9f395351a..405f9aa831bd 100644
> > --- a/arch/arm64/mm/fault.c
> > +++ b/arch/arm64/mm/fault.c
> > @@ -572,7 +572,9 @@ static int __kprobes do_page_fault(unsigned long far, unsigned long esr,
> >
> >         if (!(vma->vm_flags & vm_flags)) {
> >                 vma_end_read(vma);
> > -               goto lock_mmap;
> > +               fault = VM_FAULT_BADACCESS;
> > +               count_vm_vma_lock_event(VMA_LOCK_SUCCESS);
>
> nit: VMA_LOCK_SUCCESS accounting here seems correct to me but
> unrelated to the main change. Either splitting into a separate patch
> or mentioning this additional fixup in the changelog would be helpful.

The above nit applies to all the patches after this one, so I won't
comment on each one separately. If you decide to split or adjust the
changelog please do that for each patch.

>
> > +               goto done;
> >         }
> >         fault = handle_mm_fault(vma, addr, mm_flags | FAULT_FLAG_VMA_LOCK, regs);
> >         if (!(fault & (VM_FAULT_RETRY | VM_FAULT_COMPLETED)))
> > --
> > 2.27.0
> >

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/7] arm: mm: accelerate pagefault when VM_FAULT_BADACCESS
  2024-04-02  7:51 ` [PATCH 3/7] arm: " Kefeng Wang
@ 2024-04-03  5:30   ` Suren Baghdasaryan
  0 siblings, 0 replies; 20+ messages in thread
From: Suren Baghdasaryan @ 2024-04-03  5:30 UTC (permalink / raw
  To: Kefeng Wang
  Cc: x86, linux-s390, Albert Ou, linuxppc-dev, Peter Zijlstra,
	Catalin Marinas, Paul Walmsley, Russell King, Nicholas Piggin,
	Dave Hansen, linux-riscv, Palmer Dabbelt, Andy Lutomirski, akpm,
	Gerald Schaefer, Will Deacon, Alexander Gordeev, linux-arm-kernel

On Tue, Apr 2, 2024 at 12:53 AM Kefeng Wang <wangkefeng.wang@huawei.com> wrote:
>
> The vm_flags of vma already checked under per-VMA lock, if it is a
> bad access, directly set fault to VM_FAULT_BADACCESS and handle error,
> so no need to lock_mm_and_find_vma() and check vm_flags again.
>
> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>

Reviewed-by: Suren Baghdasaryan <surenb@google.com>



> ---
>  arch/arm/mm/fault.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/arch/arm/mm/fault.c b/arch/arm/mm/fault.c
> index 439dc6a26bb9..5c4b417e24f9 100644
> --- a/arch/arm/mm/fault.c
> +++ b/arch/arm/mm/fault.c
> @@ -294,7 +294,9 @@ do_page_fault(unsigned long addr, unsigned int fsr, struct pt_regs *regs)
>
>         if (!(vma->vm_flags & vm_flags)) {
>                 vma_end_read(vma);
> -               goto lock_mmap;
> +               count_vm_vma_lock_event(VMA_LOCK_SUCCESS);
> +               fault = VM_FAULT_BADACCESS;
> +               goto bad_area;
>         }
>         fault = handle_mm_fault(vma, addr, flags | FAULT_FLAG_VMA_LOCK, regs);
>         if (!(fault & (VM_FAULT_RETRY | VM_FAULT_COMPLETED)))
> --
> 2.27.0
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 4/7] powerpc: mm: accelerate pagefault when badaccess
  2024-04-02  7:51 ` [PATCH 4/7] powerpc: mm: accelerate pagefault when badaccess Kefeng Wang
@ 2024-04-03  5:34   ` Suren Baghdasaryan
  0 siblings, 0 replies; 20+ messages in thread
From: Suren Baghdasaryan @ 2024-04-03  5:34 UTC (permalink / raw
  To: Kefeng Wang
  Cc: x86, linux-s390, Albert Ou, linuxppc-dev, Peter Zijlstra,
	Catalin Marinas, Paul Walmsley, Russell King, Nicholas Piggin,
	Dave Hansen, linux-riscv, Palmer Dabbelt, Andy Lutomirski, akpm,
	Gerald Schaefer, Will Deacon, Alexander Gordeev, linux-arm-kernel

On Tue, Apr 2, 2024 at 12:53 AM Kefeng Wang <wangkefeng.wang@huawei.com> wrote:
>
> The vm_flags of vma already checked under per-VMA lock, if it is a
> bad access, directly handle error and return, there is no need to
> lock_mm_and_find_vma() and check vm_flags again.
>
> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>

Code-wise looks correct to me and almost identical to x86 change but
someone with more experience with this architecture should review.

> ---
>  arch/powerpc/mm/fault.c | 33 ++++++++++++++++++++-------------
>  1 file changed, 20 insertions(+), 13 deletions(-)
>
> diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
> index 53335ae21a40..215690452495 100644
> --- a/arch/powerpc/mm/fault.c
> +++ b/arch/powerpc/mm/fault.c
> @@ -71,23 +71,26 @@ static noinline int bad_area_nosemaphore(struct pt_regs *regs, unsigned long add
>         return __bad_area_nosemaphore(regs, address, SEGV_MAPERR);
>  }
>
> -static int __bad_area(struct pt_regs *regs, unsigned long address, int si_code)
> +static int __bad_area(struct pt_regs *regs, unsigned long address, int si_code,
> +                     struct mm_struct *mm, struct vm_area_struct *vma)
>  {
> -       struct mm_struct *mm = current->mm;
>
>         /*
>          * Something tried to access memory that isn't in our memory map..
>          * Fix it, but check if it's kernel or user first..
>          */
> -       mmap_read_unlock(mm);
> +       if (mm)
> +               mmap_read_unlock(mm);
> +       else
> +               vma_end_read(vma);
>
>         return __bad_area_nosemaphore(regs, address, si_code);
>  }
>
>  static noinline int bad_access_pkey(struct pt_regs *regs, unsigned long address,
> +                                   struct mm_struct *mm,
>                                     struct vm_area_struct *vma)
>  {
> -       struct mm_struct *mm = current->mm;
>         int pkey;
>
>         /*
> @@ -109,7 +112,10 @@ static noinline int bad_access_pkey(struct pt_regs *regs, unsigned long address,
>          */
>         pkey = vma_pkey(vma);
>
> -       mmap_read_unlock(mm);
> +       if (mm)
> +               mmap_read_unlock(mm);
> +       else
> +               vma_end_read(vma);
>
>         /*
>          * If we are in kernel mode, bail out with a SEGV, this will
> @@ -124,9 +130,10 @@ static noinline int bad_access_pkey(struct pt_regs *regs, unsigned long address,
>         return 0;
>  }
>
> -static noinline int bad_access(struct pt_regs *regs, unsigned long address)
> +static noinline int bad_access(struct pt_regs *regs, unsigned long address,
> +                              struct mm_struct *mm, struct vm_area_struct *vma)
>  {
> -       return __bad_area(regs, address, SEGV_ACCERR);
> +       return __bad_area(regs, address, SEGV_ACCERR, mm, vma);
>  }
>
>  static int do_sigbus(struct pt_regs *regs, unsigned long address,
> @@ -479,13 +486,13 @@ static int ___do_page_fault(struct pt_regs *regs, unsigned long address,
>
>         if (unlikely(access_pkey_error(is_write, is_exec,
>                                        (error_code & DSISR_KEYFAULT), vma))) {
> -               vma_end_read(vma);
> -               goto lock_mmap;
> +               count_vm_vma_lock_event(VMA_LOCK_SUCCESS);
> +               return bad_access_pkey(regs, address, NULL, vma);
>         }
>
>         if (unlikely(access_error(is_write, is_exec, vma))) {
> -               vma_end_read(vma);
> -               goto lock_mmap;
> +               count_vm_vma_lock_event(VMA_LOCK_SUCCESS);
> +               return bad_access(regs, address, NULL, vma);
>         }
>
>         fault = handle_mm_fault(vma, address, flags | FAULT_FLAG_VMA_LOCK, regs);
> @@ -521,10 +528,10 @@ static int ___do_page_fault(struct pt_regs *regs, unsigned long address,
>
>         if (unlikely(access_pkey_error(is_write, is_exec,
>                                        (error_code & DSISR_KEYFAULT), vma)))
> -               return bad_access_pkey(regs, address, vma);
> +               return bad_access_pkey(regs, address, mm, vma);
>
>         if (unlikely(access_error(is_write, is_exec, vma)))
> -               return bad_access(regs, address);
> +               return bad_access(regs, address, mm, vma);
>
>         /*
>          * If for any reason at all we couldn't handle the fault,
> --
> 2.27.0
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 5/7] riscv: mm: accelerate pagefault when badaccess
  2024-04-02  7:51 ` [PATCH 5/7] riscv: " Kefeng Wang
@ 2024-04-03  5:37   ` Suren Baghdasaryan
  0 siblings, 0 replies; 20+ messages in thread
From: Suren Baghdasaryan @ 2024-04-03  5:37 UTC (permalink / raw
  To: Kefeng Wang
  Cc: x86, linux-s390, Albert Ou, linuxppc-dev, Peter Zijlstra,
	Catalin Marinas, Paul Walmsley, Russell King, Nicholas Piggin,
	Dave Hansen, linux-riscv, Palmer Dabbelt, Andy Lutomirski, akpm,
	Gerald Schaefer, Will Deacon, Alexander Gordeev, linux-arm-kernel

On Tue, Apr 2, 2024 at 12:53 AM Kefeng Wang <wangkefeng.wang@huawei.com> wrote:
>
> The vm_flags of vma already checked under per-VMA lock, if it is a
> bad access, directly handle error and return, there is no need to
> lock_mm_and_find_vma() and check vm_flags again.
>
> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>

Reviewed-by: Suren Baghdasaryan <surenb@google.com>

> ---
>  arch/riscv/mm/fault.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/arch/riscv/mm/fault.c b/arch/riscv/mm/fault.c
> index 3ba1d4dde5dd..b3fcf7d67efb 100644
> --- a/arch/riscv/mm/fault.c
> +++ b/arch/riscv/mm/fault.c
> @@ -292,7 +292,10 @@ void handle_page_fault(struct pt_regs *regs)
>
>         if (unlikely(access_error(cause, vma))) {
>                 vma_end_read(vma);
> -               goto lock_mmap;
> +               count_vm_vma_lock_event(VMA_LOCK_SUCCESS);
> +               tsk->thread.bad_cause = SEGV_ACCERR;
> +               bad_area_nosemaphore(regs, code, addr);
> +               return;
>         }
>
>         fault = handle_mm_fault(vma, addr, flags | FAULT_FLAG_VMA_LOCK, regs);
> --
> 2.27.0
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 7/7] x86: mm: accelerate pagefault when badaccess
  2024-04-02  7:51 ` [PATCH 7/7] x86: " Kefeng Wang
@ 2024-04-03  5:59   ` Suren Baghdasaryan
  2024-04-03  7:58     ` Kefeng Wang
  0 siblings, 1 reply; 20+ messages in thread
From: Suren Baghdasaryan @ 2024-04-03  5:59 UTC (permalink / raw
  To: Kefeng Wang
  Cc: x86, linux-s390, Albert Ou, linuxppc-dev, Peter Zijlstra,
	Catalin Marinas, Paul Walmsley, Russell King, Nicholas Piggin,
	Dave Hansen, linux-riscv, Palmer Dabbelt, Andy Lutomirski, akpm,
	Gerald Schaefer, Will Deacon, Alexander Gordeev, linux-arm-kernel

On Tue, Apr 2, 2024 at 12:53 AM Kefeng Wang <wangkefeng.wang@huawei.com> wrote:
>
> The vm_flags of vma already checked under per-VMA lock, if it is a
> bad access, directly handle error and return, there is no need to
> lock_mm_and_find_vma() and check vm_flags again.
>
> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>

Looks safe to me.
Using (mm != NULL) to indicate that we are holding mmap_lock is not
ideal but I guess that works.

Reviewed-by: Suren Baghdasaryan <surenb@google.com>

> ---
>  arch/x86/mm/fault.c | 23 ++++++++++++++---------
>  1 file changed, 14 insertions(+), 9 deletions(-)
>
> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
> index a4cc20d0036d..67b18adc75dd 100644
> --- a/arch/x86/mm/fault.c
> +++ b/arch/x86/mm/fault.c
> @@ -866,14 +866,17 @@ bad_area_nosemaphore(struct pt_regs *regs, unsigned long error_code,
>
>  static void
>  __bad_area(struct pt_regs *regs, unsigned long error_code,
> -          unsigned long address, u32 pkey, int si_code)
> +          unsigned long address, struct mm_struct *mm,
> +          struct vm_area_struct *vma, u32 pkey, int si_code)
>  {
> -       struct mm_struct *mm = current->mm;
>         /*
>          * Something tried to access memory that isn't in our memory map..
>          * Fix it, but check if it's kernel or user first..
>          */
> -       mmap_read_unlock(mm);
> +       if (mm)
> +               mmap_read_unlock(mm);
> +       else
> +               vma_end_read(vma);
>
>         __bad_area_nosemaphore(regs, error_code, address, pkey, si_code);
>  }
> @@ -897,7 +900,8 @@ static inline bool bad_area_access_from_pkeys(unsigned long error_code,
>
>  static noinline void
>  bad_area_access_error(struct pt_regs *regs, unsigned long error_code,
> -                     unsigned long address, struct vm_area_struct *vma)
> +                     unsigned long address, struct mm_struct *mm,
> +                     struct vm_area_struct *vma)
>  {
>         /*
>          * This OSPKE check is not strictly necessary at runtime.
> @@ -927,9 +931,9 @@ bad_area_access_error(struct pt_regs *regs, unsigned long error_code,
>                  */
>                 u32 pkey = vma_pkey(vma);
>
> -               __bad_area(regs, error_code, address, pkey, SEGV_PKUERR);
> +               __bad_area(regs, error_code, address, mm, vma, pkey, SEGV_PKUERR);
>         } else {
> -               __bad_area(regs, error_code, address, 0, SEGV_ACCERR);
> +               __bad_area(regs, error_code, address, mm, vma, 0, SEGV_ACCERR);
>         }
>  }
>
> @@ -1357,8 +1361,9 @@ void do_user_addr_fault(struct pt_regs *regs,
>                 goto lock_mmap;
>
>         if (unlikely(access_error(error_code, vma))) {
> -               vma_end_read(vma);
> -               goto lock_mmap;
> +               bad_area_access_error(regs, error_code, address, NULL, vma);
> +               count_vm_vma_lock_event(VMA_LOCK_SUCCESS);
> +               return;
>         }
>         fault = handle_mm_fault(vma, address, flags | FAULT_FLAG_VMA_LOCK, regs);
>         if (!(fault & (VM_FAULT_RETRY | VM_FAULT_COMPLETED)))
> @@ -1394,7 +1399,7 @@ void do_user_addr_fault(struct pt_regs *regs,
>          * we can handle it..
>          */
>         if (unlikely(access_error(error_code, vma))) {
> -               bad_area_access_error(regs, error_code, address, vma);
> +               bad_area_access_error(regs, error_code, address, mm, vma);
>                 return;
>         }
>
> --
> 2.27.0
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/7] arm64: mm: accelerate pagefault when VM_FAULT_BADACCESS
  2024-04-03  5:30     ` Suren Baghdasaryan
@ 2024-04-03  6:13       ` Kefeng Wang
  0 siblings, 0 replies; 20+ messages in thread
From: Kefeng Wang @ 2024-04-03  6:13 UTC (permalink / raw
  To: Suren Baghdasaryan
  Cc: x86, linux-s390, Albert Ou, linuxppc-dev, Peter Zijlstra,
	Catalin Marinas, Paul Walmsley, Russell King, Nicholas Piggin,
	Dave Hansen, linux-riscv, Palmer Dabbelt, Andy Lutomirski, akpm,
	Gerald Schaefer, Will Deacon, Alexander Gordeev, linux-arm-kernel



On 2024/4/3 13:30, Suren Baghdasaryan wrote:
> On Tue, Apr 2, 2024 at 10:19 PM Suren Baghdasaryan <surenb@google.com> wrote:
>>
>> On Tue, Apr 2, 2024 at 12:53 AM Kefeng Wang <wangkefeng.wang@huawei.com> wrote:
>>>
>>> The vm_flags of vma already checked under per-VMA lock, if it is a
>>> bad access, directly set fault to VM_FAULT_BADACCESS and handle error,
>>> no need to lock_mm_and_find_vma() and check vm_flags again, the latency
>>> time reduce 34% in lmbench 'lat_sig -P 1 prot lat_sig'.
>>
>> The change makes sense to me. Per-VMA lock is enough to keep
>> vma->vm_flags stable, so no need to retry with mmap_lock.
>>
>>>
>>> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
>>
>> Reviewed-by: Suren Baghdasaryan <surenb@google.com>
>>
>>> ---
>>>   arch/arm64/mm/fault.c | 4 +++-
>>>   1 file changed, 3 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
>>> index 9bb9f395351a..405f9aa831bd 100644
>>> --- a/arch/arm64/mm/fault.c
>>> +++ b/arch/arm64/mm/fault.c
>>> @@ -572,7 +572,9 @@ static int __kprobes do_page_fault(unsigned long far, unsigned long esr,
>>>
>>>          if (!(vma->vm_flags & vm_flags)) {
>>>                  vma_end_read(vma);
>>> -               goto lock_mmap;
>>> +               fault = VM_FAULT_BADACCESS;
>>> +               count_vm_vma_lock_event(VMA_LOCK_SUCCESS);
>>
>> nit: VMA_LOCK_SUCCESS accounting here seems correct to me but
>> unrelated to the main change. Either splitting into a separate patch
>> or mentioning this additional fixup in the changelog would be helpful.
> 
> The above nit applies to all the patches after this one, so I won't
> comment on each one separately. If you decide to split or adjust the
> changelog please do that for each patch.

I will update the change log for each patch, thank for your review and 
suggestion.

> 
>>
>>> +               goto done;
>>>          }
>>>          fault = handle_mm_fault(vma, addr, mm_flags | FAULT_FLAG_VMA_LOCK, regs);
>>>          if (!(fault & (VM_FAULT_RETRY | VM_FAULT_COMPLETED)))
>>> --
>>> 2.27.0
>>>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 7/7] x86: mm: accelerate pagefault when badaccess
  2024-04-03  5:59   ` Suren Baghdasaryan
@ 2024-04-03  7:58     ` Kefeng Wang
  2024-04-03 14:23       ` Suren Baghdasaryan
  0 siblings, 1 reply; 20+ messages in thread
From: Kefeng Wang @ 2024-04-03  7:58 UTC (permalink / raw
  To: Suren Baghdasaryan
  Cc: x86, linux-s390, Albert Ou, linuxppc-dev, Peter Zijlstra,
	Catalin Marinas, Paul Walmsley, Russell King, Nicholas Piggin,
	Dave Hansen, linux-riscv, Palmer Dabbelt, Andy Lutomirski, akpm,
	Gerald Schaefer, Will Deacon, Alexander Gordeev, linux-arm-kernel



On 2024/4/3 13:59, Suren Baghdasaryan wrote:
> On Tue, Apr 2, 2024 at 12:53 AM Kefeng Wang <wangkefeng.wang@huawei.com> wrote:
>>
>> The vm_flags of vma already checked under per-VMA lock, if it is a
>> bad access, directly handle error and return, there is no need to
>> lock_mm_and_find_vma() and check vm_flags again.
>>
>> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
> 
> Looks safe to me.
> Using (mm != NULL) to indicate that we are holding mmap_lock is not
> ideal but I guess that works.
> 

Yes, I will add this part it into change too,

The access_error() of vma already checked under per-VMA lock, if it
is a bad access, directly handle error, no need to retry with mmap_lock
again. In order to release the correct lock, pass the mm_struct into
bad_area_access_error(), if mm is NULL, release vma lock, or release
mmap_lock. Since the page faut is handled under per-VMA lock, count it
as a vma lock event with VMA_LOCK_SUCCESS.

Thanks.


> Reviewed-by: Suren Baghdasaryan <surenb@google.com>
> 
>> ---
>>   arch/x86/mm/fault.c | 23 ++++++++++++++---------
>>   1 file changed, 14 insertions(+), 9 deletions(-)
>>
>> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
>> index a4cc20d0036d..67b18adc75dd 100644
>> --- a/arch/x86/mm/fault.c
>> +++ b/arch/x86/mm/fault.c
>> @@ -866,14 +866,17 @@ bad_area_nosemaphore(struct pt_regs *regs, unsigned long error_code,
>>
>>   static void
>>   __bad_area(struct pt_regs *regs, unsigned long error_code,
>> -          unsigned long address, u32 pkey, int si_code)
>> +          unsigned long address, struct mm_struct *mm,
>> +          struct vm_area_struct *vma, u32 pkey, int si_code)
>>   {
>> -       struct mm_struct *mm = current->mm;
>>          /*
>>           * Something tried to access memory that isn't in our memory map..
>>           * Fix it, but check if it's kernel or user first..
>>           */
>> -       mmap_read_unlock(mm);
>> +       if (mm)
>> +               mmap_read_unlock(mm);
>> +       else
>> +               vma_end_read(vma);
>>
>>          __bad_area_nosemaphore(regs, error_code, address, pkey, si_code);
>>   }
>> @@ -897,7 +900,8 @@ static inline bool bad_area_access_from_pkeys(unsigned long error_code,
>>
>>   static noinline void
>>   bad_area_access_error(struct pt_regs *regs, unsigned long error_code,
>> -                     unsigned long address, struct vm_area_struct *vma)
>> +                     unsigned long address, struct mm_struct *mm,
>> +                     struct vm_area_struct *vma)
>>   {
>>          /*
>>           * This OSPKE check is not strictly necessary at runtime.
>> @@ -927,9 +931,9 @@ bad_area_access_error(struct pt_regs *regs, unsigned long error_code,
>>                   */
>>                  u32 pkey = vma_pkey(vma);
>>
>> -               __bad_area(regs, error_code, address, pkey, SEGV_PKUERR);
>> +               __bad_area(regs, error_code, address, mm, vma, pkey, SEGV_PKUERR);
>>          } else {
>> -               __bad_area(regs, error_code, address, 0, SEGV_ACCERR);
>> +               __bad_area(regs, error_code, address, mm, vma, 0, SEGV_ACCERR);
>>          }
>>   }
>>
>> @@ -1357,8 +1361,9 @@ void do_user_addr_fault(struct pt_regs *regs,
>>                  goto lock_mmap;
>>
>>          if (unlikely(access_error(error_code, vma))) {
>> -               vma_end_read(vma);
>> -               goto lock_mmap;
>> +               bad_area_access_error(regs, error_code, address, NULL, vma);
>> +               count_vm_vma_lock_event(VMA_LOCK_SUCCESS);
>> +               return;
>>          }
>>          fault = handle_mm_fault(vma, address, flags | FAULT_FLAG_VMA_LOCK, regs);
>>          if (!(fault & (VM_FAULT_RETRY | VM_FAULT_COMPLETED)))
>> @@ -1394,7 +1399,7 @@ void do_user_addr_fault(struct pt_regs *regs,
>>           * we can handle it..
>>           */
>>          if (unlikely(access_error(error_code, vma))) {
>> -               bad_area_access_error(regs, error_code, address, vma);
>> +               bad_area_access_error(regs, error_code, address, mm, vma);
>>                  return;
>>          }
>>
>> --
>> 2.27.0
>>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 7/7] x86: mm: accelerate pagefault when badaccess
  2024-04-03  7:58     ` Kefeng Wang
@ 2024-04-03 14:23       ` Suren Baghdasaryan
  0 siblings, 0 replies; 20+ messages in thread
From: Suren Baghdasaryan @ 2024-04-03 14:23 UTC (permalink / raw
  To: Kefeng Wang
  Cc: x86, linux-s390, Albert Ou, linuxppc-dev, Peter Zijlstra,
	Catalin Marinas, Paul Walmsley, Russell King, Nicholas Piggin,
	Dave Hansen, linux-riscv, Palmer Dabbelt, Andy Lutomirski, akpm,
	Gerald Schaefer, Will Deacon, Alexander Gordeev, linux-arm-kernel

On Wed, Apr 3, 2024 at 12:58 AM Kefeng Wang <wangkefeng.wang@huawei.com> wrote:
>
>
>
> On 2024/4/3 13:59, Suren Baghdasaryan wrote:
> > On Tue, Apr 2, 2024 at 12:53 AM Kefeng Wang <wangkefeng.wang@huawei.com> wrote:
> >>
> >> The vm_flags of vma already checked under per-VMA lock, if it is a
> >> bad access, directly handle error and return, there is no need to
> >> lock_mm_and_find_vma() and check vm_flags again.
> >>
> >> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
> >
> > Looks safe to me.
> > Using (mm != NULL) to indicate that we are holding mmap_lock is not
> > ideal but I guess that works.
> >
>
> Yes, I will add this part it into change too,
>
> The access_error() of vma already checked under per-VMA lock, if it
> is a bad access, directly handle error, no need to retry with mmap_lock
> again. In order to release the correct lock, pass the mm_struct into
> bad_area_access_error(), if mm is NULL, release vma lock, or release
> mmap_lock. Since the page faut is handled under per-VMA lock, count it
> as a vma lock event with VMA_LOCK_SUCCESS.

The part about passing mm_struct is unnecessary IMHO. It explains "how
you do things" but changelog should describe only "what you do" and
"why you do that". The rest we can see from the code.

>
> Thanks.
>
>
> > Reviewed-by: Suren Baghdasaryan <surenb@google.com>
> >
> >> ---
> >>   arch/x86/mm/fault.c | 23 ++++++++++++++---------
> >>   1 file changed, 14 insertions(+), 9 deletions(-)
> >>
> >> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
> >> index a4cc20d0036d..67b18adc75dd 100644
> >> --- a/arch/x86/mm/fault.c
> >> +++ b/arch/x86/mm/fault.c
> >> @@ -866,14 +866,17 @@ bad_area_nosemaphore(struct pt_regs *regs, unsigned long error_code,
> >>
> >>   static void
> >>   __bad_area(struct pt_regs *regs, unsigned long error_code,
> >> -          unsigned long address, u32 pkey, int si_code)
> >> +          unsigned long address, struct mm_struct *mm,
> >> +          struct vm_area_struct *vma, u32 pkey, int si_code)
> >>   {
> >> -       struct mm_struct *mm = current->mm;
> >>          /*
> >>           * Something tried to access memory that isn't in our memory map..
> >>           * Fix it, but check if it's kernel or user first..
> >>           */
> >> -       mmap_read_unlock(mm);
> >> +       if (mm)
> >> +               mmap_read_unlock(mm);
> >> +       else
> >> +               vma_end_read(vma);
> >>
> >>          __bad_area_nosemaphore(regs, error_code, address, pkey, si_code);
> >>   }
> >> @@ -897,7 +900,8 @@ static inline bool bad_area_access_from_pkeys(unsigned long error_code,
> >>
> >>   static noinline void
> >>   bad_area_access_error(struct pt_regs *regs, unsigned long error_code,
> >> -                     unsigned long address, struct vm_area_struct *vma)
> >> +                     unsigned long address, struct mm_struct *mm,
> >> +                     struct vm_area_struct *vma)
> >>   {
> >>          /*
> >>           * This OSPKE check is not strictly necessary at runtime.
> >> @@ -927,9 +931,9 @@ bad_area_access_error(struct pt_regs *regs, unsigned long error_code,
> >>                   */
> >>                  u32 pkey = vma_pkey(vma);
> >>
> >> -               __bad_area(regs, error_code, address, pkey, SEGV_PKUERR);
> >> +               __bad_area(regs, error_code, address, mm, vma, pkey, SEGV_PKUERR);
> >>          } else {
> >> -               __bad_area(regs, error_code, address, 0, SEGV_ACCERR);
> >> +               __bad_area(regs, error_code, address, mm, vma, 0, SEGV_ACCERR);
> >>          }
> >>   }
> >>
> >> @@ -1357,8 +1361,9 @@ void do_user_addr_fault(struct pt_regs *regs,
> >>                  goto lock_mmap;
> >>
> >>          if (unlikely(access_error(error_code, vma))) {
> >> -               vma_end_read(vma);
> >> -               goto lock_mmap;
> >> +               bad_area_access_error(regs, error_code, address, NULL, vma);
> >> +               count_vm_vma_lock_event(VMA_LOCK_SUCCESS);
> >> +               return;
> >>          }
> >>          fault = handle_mm_fault(vma, address, flags | FAULT_FLAG_VMA_LOCK, regs);
> >>          if (!(fault & (VM_FAULT_RETRY | VM_FAULT_COMPLETED)))
> >> @@ -1394,7 +1399,7 @@ void do_user_addr_fault(struct pt_regs *regs,
> >>           * we can handle it..
> >>           */
> >>          if (unlikely(access_error(error_code, vma))) {
> >> -               bad_area_access_error(regs, error_code, address, vma);
> >> +               bad_area_access_error(regs, error_code, address, mm, vma);
> >>                  return;
> >>          }
> >>
> >> --
> >> 2.27.0
> >>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/7] arm64: mm: cleanup __do_page_fault()
  2024-04-02  7:51 ` [PATCH 1/7] arm64: mm: cleanup __do_page_fault() Kefeng Wang
  2024-04-03  5:11   ` Suren Baghdasaryan
@ 2024-04-03 18:24   ` Catalin Marinas
  1 sibling, 0 replies; 20+ messages in thread
From: Catalin Marinas @ 2024-04-03 18:24 UTC (permalink / raw
  To: Kefeng Wang
  Cc: x86, Albert Ou, linux-s390, Peter Zijlstra, Dave Hansen,
	Russell King, surenb, linuxppc-dev, linux-riscv, Palmer Dabbelt,
	Nicholas Piggin, Andy Lutomirski, Paul Walmsley, akpm,
	Gerald Schaefer, Will Deacon, Alexander Gordeev, linux-arm-kernel

On Tue, Apr 02, 2024 at 03:51:36PM +0800, Kefeng Wang wrote:
> The __do_page_fault() only check vma->flags and call handle_mm_fault(),
> and only called by do_page_fault(), let's squash it into do_page_fault()
> to cleanup code.
> 
> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/7] arm64: mm: accelerate pagefault when VM_FAULT_BADACCESS
  2024-04-02  7:51 ` [PATCH 2/7] arm64: mm: accelerate pagefault when VM_FAULT_BADACCESS Kefeng Wang
  2024-04-03  5:19   ` Suren Baghdasaryan
@ 2024-04-03 18:32   ` Catalin Marinas
  1 sibling, 0 replies; 20+ messages in thread
From: Catalin Marinas @ 2024-04-03 18:32 UTC (permalink / raw
  To: Kefeng Wang
  Cc: x86, Albert Ou, linux-s390, Peter Zijlstra, Dave Hansen,
	Russell King, surenb, linuxppc-dev, linux-riscv, Palmer Dabbelt,
	Nicholas Piggin, Andy Lutomirski, Paul Walmsley, akpm,
	Gerald Schaefer, Will Deacon, Alexander Gordeev, linux-arm-kernel

On Tue, Apr 02, 2024 at 03:51:37PM +0800, Kefeng Wang wrote:
> The vm_flags of vma already checked under per-VMA lock, if it is a
> bad access, directly set fault to VM_FAULT_BADACCESS and handle error,
> no need to lock_mm_and_find_vma() and check vm_flags again, the latency
> time reduce 34% in lmbench 'lat_sig -P 1 prot lat_sig'.
> 
> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
> ---
>  arch/arm64/mm/fault.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
> index 9bb9f395351a..405f9aa831bd 100644
> --- a/arch/arm64/mm/fault.c
> +++ b/arch/arm64/mm/fault.c
> @@ -572,7 +572,9 @@ static int __kprobes do_page_fault(unsigned long far, unsigned long esr,
>  
>  	if (!(vma->vm_flags & vm_flags)) {
>  		vma_end_read(vma);
> -		goto lock_mmap;
> +		fault = VM_FAULT_BADACCESS;
> +		count_vm_vma_lock_event(VMA_LOCK_SUCCESS);
> +		goto done;
>  	}
>  	fault = handle_mm_fault(vma, addr, mm_flags | FAULT_FLAG_VMA_LOCK, regs);
>  	if (!(fault & (VM_FAULT_RETRY | VM_FAULT_COMPLETED)))

I think this makes sense. A concurrent modification of vma->vm_flags
(e.g. mprotect()) would do a vma_start_write(), so no need to recheck
again with the mmap lock held.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2024-04-03 18:32 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-04-02  7:51 [PATCH 0/7] arch/mm/fault: accelerate pagefault when badaccess Kefeng Wang
2024-04-02  7:51 ` [PATCH 1/7] arm64: mm: cleanup __do_page_fault() Kefeng Wang
2024-04-03  5:11   ` Suren Baghdasaryan
2024-04-03 18:24   ` Catalin Marinas
2024-04-02  7:51 ` [PATCH 2/7] arm64: mm: accelerate pagefault when VM_FAULT_BADACCESS Kefeng Wang
2024-04-03  5:19   ` Suren Baghdasaryan
2024-04-03  5:30     ` Suren Baghdasaryan
2024-04-03  6:13       ` Kefeng Wang
2024-04-03 18:32   ` Catalin Marinas
2024-04-02  7:51 ` [PATCH 3/7] arm: " Kefeng Wang
2024-04-03  5:30   ` Suren Baghdasaryan
2024-04-02  7:51 ` [PATCH 4/7] powerpc: mm: accelerate pagefault when badaccess Kefeng Wang
2024-04-03  5:34   ` Suren Baghdasaryan
2024-04-02  7:51 ` [PATCH 5/7] riscv: " Kefeng Wang
2024-04-03  5:37   ` Suren Baghdasaryan
2024-04-02  7:51 ` [PATCH 6/7] s390: " Kefeng Wang
2024-04-02  7:51 ` [PATCH 7/7] x86: " Kefeng Wang
2024-04-03  5:59   ` Suren Baghdasaryan
2024-04-03  7:58     ` Kefeng Wang
2024-04-03 14:23       ` Suren Baghdasaryan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).