Linux-ARM-Kernel Archive mirror
 help / color / mirror / Atom feed
From: Yang Shi <yang@os.amperecomputing.com>
To: catalin.marinas@arm.com, will@kernel.org,
	scott@os.amperecomputing.com, cl@gentwo.org
Cc: yang@os.amperecomputing.com,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org
Subject: [PATCH] arm64: mm: force write fault for atomic RMW instructions
Date: Tue,  7 May 2024 15:35:58 -0700	[thread overview]
Message-ID: <20240507223558.3039562-1-yang@os.amperecomputing.com> (raw)

The atomic RMW instructions, for example, ldadd, actually does load +
add + store in one instruction, it may trigger two page faults, the
first fault is a read fault, the second fault is a write fault.

Some applications use atomic RMW instructions to populate memory, for
example, openjdk uses atomic-add-0 to do pretouch (populate heap memory
at launch time) between v18 and v22.

But the double page fault has some problems:

1. Noticeable TLB overhead.  The kernel actually installs zero page with
   readonly PTE for the read fault.  The write fault will trigger a
   write-protection fault (CoW).  The CoW will allocate a new page and
   make the PTE point to the new page, this needs TLB invalidations.  The
   tlb invalidation and the mandatory memory barriers may incur
   significant overhead, particularly on the machines with many cores.

2. Break up huge pages.  If THP is on the read fault will install huge
   zero pages.  The later CoW will break up the huge page and allocate
   base pages instead of huge page.  The applications have to rely on
   khugepaged (kernel thread) to collapse huge pages asynchronously.
   This also incurs noticeable performance penalty.

3. 512x page faults with huge page.  Due to #2, the applications have to
   have page faults for every 4K area for the write, this makes the speed
   up by using huge page actually gone.

So it sounds pointless to have two page faults since we know the memory
will be definitely written very soon.  Forcing write fault for atomic RMW
instruction makes some sense and it can solve the aforementioned problems:

Firstly, it just allocates zero'ed page, no tlb invalidation and memory
barriers anymore.
Secondly, it can populate writable huge pages in the first place and
don't break them up.  Just one page fault is needed for 2M area instrad
of 512 faults and also save cpu time by not using khugepaged.

A simple micro benchmark which populates 1G memory shows the number of
page faults is reduced by half and the time spent by system is reduced
by 60% on a VM running on Ampere Altra platform.

And the benchmark for anonymous read fault on 1G memory, file read fault
on 1G file (cold page cache and warm page cache) don't show noticeable
regression.

Some other architectures also have code inspection in page fault path,
for example, SPARC and x86.

Signed-off-by: Yang Shi <yang@os.amperecomputing.com>
---
 arch/arm64/include/asm/insn.h |  1 +
 arch/arm64/mm/fault.c         | 19 +++++++++++++++++++
 2 files changed, 20 insertions(+)

diff --git a/arch/arm64/include/asm/insn.h b/arch/arm64/include/asm/insn.h
index db1aeacd4cd9..5d5a3fbeecc0 100644
--- a/arch/arm64/include/asm/insn.h
+++ b/arch/arm64/include/asm/insn.h
@@ -319,6 +319,7 @@ static __always_inline u32 aarch64_insn_get_##abbr##_value(void)	\
  * "-" means "don't care"
  */
 __AARCH64_INSN_FUNCS(class_branch_sys,	0x1c000000, 0x14000000)
+__AARCH64_INSN_FUNCS(class_atomic,	0x3b200c00, 0x38200000)
 
 __AARCH64_INSN_FUNCS(adr,	0x9F000000, 0x10000000)
 __AARCH64_INSN_FUNCS(adrp,	0x9F000000, 0x90000000)
diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index 8251e2fea9c7..f7bceedf5ef3 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -529,6 +529,7 @@ static int __kprobes do_page_fault(unsigned long far, unsigned long esr,
 	unsigned int mm_flags = FAULT_FLAG_DEFAULT;
 	unsigned long addr = untagged_addr(far);
 	struct vm_area_struct *vma;
+	unsigned int insn;
 
 	if (kprobe_page_fault(regs, esr))
 		return 0;
@@ -586,6 +587,24 @@ static int __kprobes do_page_fault(unsigned long far, unsigned long esr,
 	if (!vma)
 		goto lock_mmap;
 
+	if (mm_flags & (FAULT_FLAG_WRITE | FAULT_FLAG_INSTRUCTION))
+		goto continue_fault;
+
+	pagefault_disable();
+
+	if (get_user(insn, (unsigned int __user *) instruction_pointer(regs))) {
+		pagefault_enable();
+		goto continue_fault;
+	}
+
+	if (aarch64_insn_is_class_atomic(insn)) {
+		vm_flags = VM_WRITE;
+		mm_flags |= FAULT_FLAG_WRITE;
+	}
+
+	pagefault_enable();
+
+continue_fault:
 	if (!(vma->vm_flags & vm_flags)) {
 		vma_end_read(vma);
 		goto lock_mmap;
-- 
2.41.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

             reply	other threads:[~2024-05-07 22:36 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-07 22:35 Yang Shi [this message]
2024-05-07 22:42 ` [PATCH] arm64: mm: force write fault for atomic RMW instructions Christoph Lameter (Ampere)
2024-05-08  6:45 ` Anshuman Khandual
2024-05-08 17:15   ` Christoph Lameter (Ampere)
2024-05-09  4:23     ` Anshuman Khandual
2024-05-13 22:39       ` Christoph Lameter (Ampere)
2024-05-08 18:37   ` Yang Shi
2024-05-09  4:31     ` Anshuman Khandual
2024-05-09 21:46       ` Yang Shi
2024-05-10  4:28         ` Anshuman Khandual
2024-05-10 16:37           ` Yang Shi
2024-05-10 12:11 ` Catalin Marinas
2024-05-10 17:13   ` Yang Shi
2024-05-13 22:41     ` Christoph Lameter (Ampere)
2024-05-14 10:39     ` Catalin Marinas
2024-05-14 15:57       ` David Hildenbrand
2024-05-17 16:30       ` Yang Shi
2024-05-17 17:25         ` Catalin Marinas
2024-05-17 17:35           ` Yang Shi
2024-05-14  3:19   ` Yang Shi
2024-05-14 10:53     ` Catalin Marinas
2024-05-17 16:10       ` Yang Shi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240507223558.3039562-1-yang@os.amperecomputing.com \
    --to=yang@os.amperecomputing.com \
    --cc=catalin.marinas@arm.com \
    --cc=cl@gentwo.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=scott@os.amperecomputing.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).