Sparclinux Archive mirror
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: linux-kernel@vger.kernel.org
Cc: linux-mm@kvack.org, David Hildenbrand <david@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Matthew Wilcox <willy@infradead.org>,
	Ryan Roberts <ryan.roberts@arm.com>,
	Russell King <linux@armlinux.org.uk>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Will Deacon <will@kernel.org>, Dinh Nguyen <dinguyen@kernel.org>,
	Michael Ellerman <mpe@ellerman.id.au>,
	Nicholas Piggin <npiggin@gmail.com>,
	Christophe Leroy <christophe.leroy@csgroup.eu>,
	"Aneesh Kumar K.V" <aneesh.kumar@kernel.org>,
	"Naveen N. Rao" <naveen.n.rao@linux.ibm.com>,
	Paul Walmsley <paul.walmsley@sifive.com>,
	Palmer Dabbelt <palmer@dabbelt.com>,
	Albert Ou <aou@eecs.berkeley.edu>,
	Alexander Gordeev <agordeev@linux.ibm.com>,
	Gerald Schaefer <gerald.schaefer@linux.ibm.com>,
	Heiko Carstens <hca@linux.ibm.com>,
	Vasily Gorbik <gor@linux.ibm.com>,
	Christian Borntraeger <borntraeger@linux.ibm.com>,
	Sven Schnelle <svens@linux.ibm.com>,
	"David S. Miller" <davem@davemloft.net>,
	linux-arm-kernel@lists.infradead.org,
	linuxppc-dev@lists.ozlabs.org, linux-riscv@lists.infradead.org,
	linux-s390@vger.kernel.org, sparclinux@vger.kernel.org
Subject: [PATCH v1 00/11] mm/memory: optimize fork() with PTE-mapped THP
Date: Mon, 22 Jan 2024 20:41:49 +0100	[thread overview]
Message-ID: <20240122194200.381241-1-david@redhat.com> (raw)

Now that the rmap overhaul[1] is upstream that provides a clean interface
for rmap batching, let's implement PTE batching during fork when processing
PTE-mapped THPs.

This series is partially based on Ryan's previous work[2] to implement
cont-pte support on arm64, but its a complete rewrite based on [1] to
optimize all architectures independent of any such PTE bits, and to
use the new rmap batching functions that simplify the code and prepare
for further rmap accounting changes.

We collect consecutive PTEs that map consecutive pages of the same large
folio, making sure that the other PTE bits are compatible, and (a) adjust
the refcount only once per batch, (b) call rmap handling functions only
once per batch and (c) perform batch PTE setting/updates.

While this series should be beneficial for adding cont-pte support on
ARM64[2], it's one of the requirements for maintaining a total mapcount[3]
for large folios with minimal added overhead and further changes[4] that
build up on top of the total mapcount.

Independent of all that, this series results in a speedup during fork with
PTE-mapped THP, which is the default with THPs that are smaller than a PMD
(for example, 16KiB to 1024KiB mTHPs for anonymous memory[5]).

On an Intel Xeon Silver 4210R CPU, fork'ing with 1GiB of PTE-mapped folios
of the same size (stddev < 1%) results in the following runtimes
for fork() (shorter is better):

Folio Size | v6.8-rc1 |      New | Change
------------------------------------------
      4KiB | 0.014328 | 0.014265 |     0%
     16KiB | 0.014263 | 0.013293 |   - 7%
     32KiB | 0.014334 | 0.012355 |   -14%
     64KiB | 0.014046 | 0.011837 |   -16%
    128KiB | 0.014011 | 0.011536 |   -18%
    256KiB | 0.013993 | 0.01134  |   -19%
    512KiB | 0.013983 | 0.011311 |   -19%
   1024KiB | 0.013986 | 0.011282 |   -19%
   2048KiB | 0.014305 | 0.011496 |   -20%

Next up is PTE batching when unmapping, that I'll probably send out
based on this series this/next week.

Only tested on x86-64. Compile-tested on most other architectures. Will
do more testing and double-check the arch changes while this is getting
some review.

[1] https://lkml.kernel.org/r/20231220224504.646757-1-david@redhat.com
[2] https://lkml.kernel.org/r/20231218105100.172635-1-ryan.roberts@arm.com
[3] https://lkml.kernel.org/r/20230809083256.699513-1-david@redhat.com
[4] https://lkml.kernel.org/r/20231124132626.235350-1-david@redhat.com
[5] https://lkml.kernel.org/r/20231207161211.2374093-1-ryan.roberts@arm.com

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Dinh Nguyen <dinguyen@kernel.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@kernel.org>
Cc: "Naveen N. Rao" <naveen.n.rao@linux.ibm.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-riscv@lists.infradead.org
Cc: linux-s390@vger.kernel.org
Cc: sparclinux@vger.kernel.org

David Hildenbrand (11):
  arm/pgtable: define PFN_PTE_SHIFT on arm and arm64
  nios2/pgtable: define PFN_PTE_SHIFT
  powerpc/pgtable: define PFN_PTE_SHIFT
  risc: pgtable: define PFN_PTE_SHIFT
  s390/pgtable: define PFN_PTE_SHIFT
  sparc/pgtable: define PFN_PTE_SHIFT
  mm/memory: factor out copying the actual PTE in copy_present_pte()
  mm/memory: pass PTE to copy_present_pte()
  mm/memory: optimize fork() with PTE-mapped THP
  mm/memory: ignore dirty/accessed/soft-dirty bits in folio_pte_batch()
  mm/memory: ignore writable bit in folio_pte_batch()

 arch/arm/include/asm/pgtable.h      |   2 +
 arch/arm64/include/asm/pgtable.h    |   2 +
 arch/nios2/include/asm/pgtable.h    |   2 +
 arch/powerpc/include/asm/pgtable.h  |   2 +
 arch/riscv/include/asm/pgtable.h    |   2 +
 arch/s390/include/asm/pgtable.h     |   2 +
 arch/sparc/include/asm/pgtable_64.h |   2 +
 include/linux/pgtable.h             |  17 ++-
 mm/memory.c                         | 188 +++++++++++++++++++++-------
 9 files changed, 173 insertions(+), 46 deletions(-)


base-commit: 6613476e225e090cc9aad49be7fa504e290dd33d
-- 
2.43.0


             reply	other threads:[~2024-01-22 19:42 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-22 19:41 David Hildenbrand [this message]
2024-01-22 19:41 ` [PATCH v1 01/11] arm/pgtable: define PFN_PTE_SHIFT on arm and arm64 David Hildenbrand
2024-01-23 10:34   ` Ryan Roberts
2024-01-23 10:48     ` David Hildenbrand
2024-01-23 11:02       ` David Hildenbrand
2024-01-23 11:17         ` Ryan Roberts
2024-01-23 11:33           ` David Hildenbrand
2024-01-23 11:44             ` Ryan Roberts
2024-01-23 11:08       ` Ryan Roberts
2024-01-23 11:16         ` Christophe Leroy
2024-01-23 11:31           ` David Hildenbrand
2024-01-23 11:38             ` Ryan Roberts
2024-01-23 11:40               ` David Hildenbrand
2024-01-24  5:45                 ` Aneesh Kumar K.V
2024-01-23 11:48               ` Christophe Leroy
2024-01-23 11:53                 ` David Hildenbrand
2024-01-24  5:46             ` Aneesh Kumar K.V
2024-01-23 11:10       ` Christophe Leroy
2024-01-23 15:01     ` Matthew Wilcox
2024-01-23 15:22       ` Ryan Roberts
2024-01-22 19:41 ` [PATCH v1 02/11] nios2/pgtable: define PFN_PTE_SHIFT David Hildenbrand
2024-01-22 19:41 ` [PATCH v1 03/11] powerpc/pgtable: " David Hildenbrand
2024-01-22 19:41 ` [PATCH v1 04/11] risc: pgtable: " David Hildenbrand
2024-01-22 20:03   ` Alexandre Ghiti
2024-01-22 20:08     ` David Hildenbrand
2024-01-22 19:41 ` [PATCH v1 05/11] s390/pgtable: " David Hildenbrand
2024-01-22 19:41 ` [PATCH v1 06/11] sparc/pgtable: " David Hildenbrand
2024-01-22 19:41 ` [PATCH v1 07/11] mm/memory: factor out copying the actual PTE in copy_present_pte() David Hildenbrand
2024-01-23 10:45   ` Ryan Roberts
2024-01-22 19:41 ` [PATCH v1 08/11] mm/memory: pass PTE to copy_present_pte() David Hildenbrand
2024-01-23 10:47   ` Ryan Roberts
2024-01-22 19:41 ` [PATCH v1 09/11] mm/memory: optimize fork() with PTE-mapped THP David Hildenbrand
2024-01-23 12:01   ` Ryan Roberts
2024-01-23 12:19     ` David Hildenbrand
2024-01-23 12:28       ` Ryan Roberts
2024-01-22 19:41 ` [PATCH v1 10/11] mm/memory: ignore dirty/accessed/soft-dirty bits in folio_pte_batch() David Hildenbrand
2024-01-23 12:25   ` Ryan Roberts
2024-01-23 13:06     ` David Hildenbrand
2024-01-23 13:42       ` Ryan Roberts
2024-01-23 13:55         ` David Hildenbrand
2024-01-23 14:13           ` David Hildenbrand
2024-01-23 14:27             ` Ryan Roberts
2024-01-22 19:42 ` [PATCH v1 11/11] mm/memory: ignore writable bit " David Hildenbrand
2024-01-23 12:35   ` Ryan Roberts
2024-01-23 19:15 ` [PATCH v1 00/11] mm/memory: optimize fork() with PTE-mapped THP Ryan Roberts
2024-01-23 19:33   ` David Hildenbrand
2024-01-23 19:43     ` Ryan Roberts
2024-01-23 20:14       ` David Hildenbrand
2024-01-23 20:43         ` Ryan Roberts

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240122194200.381241-1-david@redhat.com \
    --to=david@redhat.com \
    --cc=agordeev@linux.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=aneesh.kumar@kernel.org \
    --cc=aou@eecs.berkeley.edu \
    --cc=borntraeger@linux.ibm.com \
    --cc=catalin.marinas@arm.com \
    --cc=christophe.leroy@csgroup.eu \
    --cc=davem@davemloft.net \
    --cc=dinguyen@kernel.org \
    --cc=gerald.schaefer@linux.ibm.com \
    --cc=gor@linux.ibm.com \
    --cc=hca@linux.ibm.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-riscv@lists.infradead.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=linux@armlinux.org.uk \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mpe@ellerman.id.au \
    --cc=naveen.n.rao@linux.ibm.com \
    --cc=npiggin@gmail.com \
    --cc=palmer@dabbelt.com \
    --cc=paul.walmsley@sifive.com \
    --cc=ryan.roberts@arm.com \
    --cc=sparclinux@vger.kernel.org \
    --cc=svens@linux.ibm.com \
    --cc=will@kernel.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).