All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v6 0/4] mm: add per-order mTHP alloc and swpout counters
@ 2024-04-12 11:48 Barry Song
  2024-04-12 11:48 ` [PATCH v6 1/4] mm: add per-order mTHP anon_fault_alloc and anon_fault_fallback counters Barry Song
                   ` (4 more replies)
  0 siblings, 5 replies; 15+ messages in thread
From: Barry Song @ 2024-04-12 11:48 UTC (permalink / raw
  To: akpm, linux-mm
  Cc: cerasuolodomenico, chrisl, david, kasong, linux-kernel, peterx,
	ryan.roberts, surenb, v-songbaohua, willy, yosryahmed, yuzhao,
	corbet

From: Barry Song <v-songbaohua@oppo.com>

The patchset introduces a framework to facilitate mTHP counters, starting
with the allocation and swap-out counters. Currently, only four new nodes
are appended to the stats directory for each mTHP size.

/sys/kernel/mm/transparent_hugepage/hugepages-<size>/stats
	anon_fault_alloc
	anon_fault_fallback
	anon_fault_fallback_charge
	anon_swpout
	anon_swpout_fallback

These nodes are crucial for us to monitor the fragmentation levels of
both the buddy system and the swap partitions. In the future, we may
consider adding additional nodes for further insights.

-v6:
  * collect reviewed-by tags for patch2/4, 3/4, 4/4, Ryan;
  * move back to static array by using MAX_PTRS_PER_PTE, Ryan;
  * move to for_each_possible_cpu to handle cpu hotplug, Ryan;
  * other minor cleanups according to Ryan;
 
-v5:
  * rename anon_alloc to anon_fault_alloc, Barry/Ryan;
  * add anon_fault_fallback_charge, Ryan;
  * move to dynamic alloc_percpu as powerpc's PMD_ORDER is not const,
    kernel test robot;
  * make anon_fault_alloc and anon_fault_fallback more consistent
    with thp_fault_alloc and thp_fault_fallback, Ryan;
  * handle cpu hotplug properly, Ryan;
  * add docs for new sysfs nodes and ABI, Andrew.
  link:
  https://lore.kernel.org/linux-mm/20240412073740.294272-1-21cnbao@gmail.com/

-v4:
  * Many thanks to David and Ryan for your patience and valuable insights
    throughout the numerous renaming efforts!
  * Guard the case order > PMD_ORDER in count func rather than in callers,
    Ryan;
  * Add swpout counters;
  * Add a helper DEFINE_MTHP_STAT_ATTR to avoid code duplication for various
    counters;
  link:
  https://lore.kernel.org/linux-mm/20240405102704.77559-1-21cnbao@gmail.com/

-v3:
  https://lore.kernel.org/linux-mm/20240403035502.71356-1-21cnbao@gmail.com/

Barry Song (4):
  mm: add per-order mTHP anon_fault_alloc and anon_fault_fallback
    counters
  mm: add per-order mTHP anon_swpout and anon_swpout_fallback counters
  mm: add docs for per-order mTHP counters and transhuge_page ABI
  mm: correct the docs for thp_fault_alloc and thp_fault_fallback

 .../sys-kernel-mm-transparent-hugepage        | 17 ++++++
 Documentation/admin-guide/mm/transhuge.rst    | 32 ++++++++++-
 include/linux/huge_mm.h                       | 23 ++++++++
 mm/huge_memory.c                              | 56 +++++++++++++++++++
 mm/memory.c                                   |  5 ++
 mm/page_io.c                                  |  1 +
 mm/vmscan.c                                   |  3 +
 7 files changed, 135 insertions(+), 2 deletions(-)
 create mode 100644 Documentation/ABI/testing/sys-kernel-mm-transparent-hugepage

-- 
2.34.1



^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v6 1/4] mm: add per-order mTHP anon_fault_alloc and anon_fault_fallback counters
  2024-04-12 11:48 [PATCH v6 0/4] mm: add per-order mTHP alloc and swpout counters Barry Song
@ 2024-04-12 11:48 ` Barry Song
  2024-04-12 11:59   ` Ryan Roberts
  2024-04-16  8:12   ` David Hildenbrand
  2024-04-12 11:48 ` [PATCH v6 2/4] mm: add per-order mTHP anon_swpout and anon_swpout_fallback counters Barry Song
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 15+ messages in thread
From: Barry Song @ 2024-04-12 11:48 UTC (permalink / raw
  To: akpm, linux-mm
  Cc: cerasuolodomenico, chrisl, david, kasong, linux-kernel, peterx,
	ryan.roberts, surenb, v-songbaohua, willy, yosryahmed, yuzhao,
	corbet

From: Barry Song <v-songbaohua@oppo.com>

Profiling a system blindly with mTHP has become challenging due to the
lack of visibility into its operations.  Presenting the success rate of
mTHP allocations appears to be pressing need.

Recently, I've been experiencing significant difficulty debugging
performance improvements and regressions without these figures.  It's
crucial for us to understand the true effectiveness of mTHP in real-world
scenarios, especially in systems with fragmented memory.

This patch establishes the framework for per-order mTHP
counters. It begins by introducing the anon_fault_alloc and
anon_fault_fallback counters. Additionally, to maintain consistency
with thp_fault_fallback_charge in /proc/vmstat, this patch also tracks
anon_fault_fallback_charge when mem_cgroup_charge fails for mTHP.
Incorporating additional counters should now be straightforward as well.

Signed-off-by: Barry Song <v-songbaohua@oppo.com>
Cc: Chris Li <chrisl@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Domenico Cerasuolo <cerasuolodomenico@gmail.com>
Cc: Kairui Song <kasong@tencent.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Yosry Ahmed <yosryahmed@google.com>
Cc: Yu Zhao <yuzhao@google.com>
---
 include/linux/huge_mm.h | 21 +++++++++++++++++
 mm/huge_memory.c        | 52 +++++++++++++++++++++++++++++++++++++++++
 mm/memory.c             |  5 ++++
 3 files changed, 78 insertions(+)

diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index e896ca4760f6..d4fdb2641070 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -264,6 +264,27 @@ unsigned long thp_vma_allowable_orders(struct vm_area_struct *vma,
 					  enforce_sysfs, orders);
 }
 
+enum mthp_stat_item {
+	MTHP_STAT_ANON_FAULT_ALLOC,
+	MTHP_STAT_ANON_FAULT_FALLBACK,
+	MTHP_STAT_ANON_FAULT_FALLBACK_CHARGE,
+	__MTHP_STAT_COUNT
+};
+
+struct mthp_stat {
+	unsigned long stats[ilog2(MAX_PTRS_PER_PTE) + 1][__MTHP_STAT_COUNT];
+};
+
+DECLARE_PER_CPU(struct mthp_stat, mthp_stats);
+
+static inline void count_mthp_stat(int order, enum mthp_stat_item item)
+{
+	if (order <= 0 || order > PMD_ORDER)
+		return;
+
+	this_cpu_inc(mthp_stats.stats[order][item]);
+}
+
 #define transparent_hugepage_use_zero_page()				\
 	(transparent_hugepage_flags &					\
 	 (1<<TRANSPARENT_HUGEPAGE_USE_ZERO_PAGE_FLAG))
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index dc30139590e6..dfc38cc83a04 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -526,6 +526,48 @@ static const struct kobj_type thpsize_ktype = {
 	.sysfs_ops = &kobj_sysfs_ops,
 };
 
+DEFINE_PER_CPU(struct mthp_stat, mthp_stats) = {{{0}}};
+
+static unsigned long sum_mthp_stat(int order, enum mthp_stat_item item)
+{
+	unsigned long sum = 0;
+	int cpu;
+
+	for_each_possible_cpu(cpu) {
+		struct mthp_stat *this = &per_cpu(mthp_stats, cpu);
+
+		sum += this->stats[order][item];
+	}
+
+	return sum;
+}
+
+#define DEFINE_MTHP_STAT_ATTR(_name, _index)				\
+static ssize_t _name##_show(struct kobject *kobj,			\
+			struct kobj_attribute *attr, char *buf)		\
+{									\
+	int order = to_thpsize(kobj)->order;				\
+									\
+	return sysfs_emit(buf, "%lu\n", sum_mthp_stat(order, _index));	\
+}									\
+static struct kobj_attribute _name##_attr = __ATTR_RO(_name)
+
+DEFINE_MTHP_STAT_ATTR(anon_fault_alloc, MTHP_STAT_ANON_FAULT_ALLOC);
+DEFINE_MTHP_STAT_ATTR(anon_fault_fallback, MTHP_STAT_ANON_FAULT_FALLBACK);
+DEFINE_MTHP_STAT_ATTR(anon_fault_fallback_charge, MTHP_STAT_ANON_FAULT_FALLBACK_CHARGE);
+
+static struct attribute *stats_attrs[] = {
+	&anon_fault_alloc_attr.attr,
+	&anon_fault_fallback_attr.attr,
+	&anon_fault_fallback_charge_attr.attr,
+	NULL,
+};
+
+static struct attribute_group stats_attr_group = {
+	.name = "stats",
+	.attrs = stats_attrs,
+};
+
 static struct thpsize *thpsize_create(int order, struct kobject *parent)
 {
 	unsigned long size = (PAGE_SIZE << order) / SZ_1K;
@@ -549,6 +591,12 @@ static struct thpsize *thpsize_create(int order, struct kobject *parent)
 		return ERR_PTR(ret);
 	}
 
+	ret = sysfs_create_group(&thpsize->kobj, &stats_attr_group);
+	if (ret) {
+		kobject_put(&thpsize->kobj);
+		return ERR_PTR(ret);
+	}
+
 	thpsize->order = order;
 	return thpsize;
 }
@@ -880,6 +928,8 @@ static vm_fault_t __do_huge_pmd_anonymous_page(struct vm_fault *vmf,
 		folio_put(folio);
 		count_vm_event(THP_FAULT_FALLBACK);
 		count_vm_event(THP_FAULT_FALLBACK_CHARGE);
+		count_mthp_stat(HPAGE_PMD_ORDER, MTHP_STAT_ANON_FAULT_FALLBACK);
+		count_mthp_stat(HPAGE_PMD_ORDER, MTHP_STAT_ANON_FAULT_FALLBACK_CHARGE);
 		return VM_FAULT_FALLBACK;
 	}
 	folio_throttle_swaprate(folio, gfp);
@@ -929,6 +979,7 @@ static vm_fault_t __do_huge_pmd_anonymous_page(struct vm_fault *vmf,
 		mm_inc_nr_ptes(vma->vm_mm);
 		spin_unlock(vmf->ptl);
 		count_vm_event(THP_FAULT_ALLOC);
+		count_mthp_stat(HPAGE_PMD_ORDER, MTHP_STAT_ANON_FAULT_ALLOC);
 		count_memcg_event_mm(vma->vm_mm, THP_FAULT_ALLOC);
 	}
 
@@ -1050,6 +1101,7 @@ vm_fault_t do_huge_pmd_anonymous_page(struct vm_fault *vmf)
 	folio = vma_alloc_folio(gfp, HPAGE_PMD_ORDER, vma, haddr, true);
 	if (unlikely(!folio)) {
 		count_vm_event(THP_FAULT_FALLBACK);
+		count_mthp_stat(HPAGE_PMD_ORDER, MTHP_STAT_ANON_FAULT_FALLBACK);
 		return VM_FAULT_FALLBACK;
 	}
 	return __do_huge_pmd_anonymous_page(vmf, &folio->page, gfp);
diff --git a/mm/memory.c b/mm/memory.c
index 649a547fe8e3..f31da2de19c6 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4368,6 +4368,7 @@ static struct folio *alloc_anon_folio(struct vm_fault *vmf)
 		folio = vma_alloc_folio(gfp, order, vma, addr, true);
 		if (folio) {
 			if (mem_cgroup_charge(folio, vma->vm_mm, gfp)) {
+				count_mthp_stat(order, MTHP_STAT_ANON_FAULT_FALLBACK_CHARGE);
 				folio_put(folio);
 				goto next;
 			}
@@ -4376,6 +4377,7 @@ static struct folio *alloc_anon_folio(struct vm_fault *vmf)
 			return folio;
 		}
 next:
+		count_mthp_stat(order, MTHP_STAT_ANON_FAULT_FALLBACK);
 		order = next_order(&orders, order);
 	}
 
@@ -4485,6 +4487,9 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf)
 
 	folio_ref_add(folio, nr_pages - 1);
 	add_mm_counter(vma->vm_mm, MM_ANONPAGES, nr_pages);
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+	count_mthp_stat(folio_order(folio), MTHP_STAT_ANON_FAULT_ALLOC);
+#endif
 	folio_add_new_anon_rmap(folio, vma, addr);
 	folio_add_lru_vma(folio, vma);
 setpte:
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v6 2/4] mm: add per-order mTHP anon_swpout and anon_swpout_fallback counters
  2024-04-12 11:48 [PATCH v6 0/4] mm: add per-order mTHP alloc and swpout counters Barry Song
  2024-04-12 11:48 ` [PATCH v6 1/4] mm: add per-order mTHP anon_fault_alloc and anon_fault_fallback counters Barry Song
@ 2024-04-12 11:48 ` Barry Song
  2024-04-16  8:14   ` David Hildenbrand
  2024-04-12 11:48 ` [PATCH v6 3/4] mm: add docs for per-order mTHP counters and transhuge_page ABI Barry Song
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 15+ messages in thread
From: Barry Song @ 2024-04-12 11:48 UTC (permalink / raw
  To: akpm, linux-mm
  Cc: cerasuolodomenico, chrisl, david, kasong, linux-kernel, peterx,
	ryan.roberts, surenb, v-songbaohua, willy, yosryahmed, yuzhao,
	corbet

From: Barry Song <v-songbaohua@oppo.com>

This helps to display the fragmentation situation of the swapfile, knowing
the proportion of how much we haven't split large folios.  So far, we only
support non-split swapout for anon memory, with the possibility of
expanding to shmem in the future.  So, we add the "anon" prefix to the
counter names.

Signed-off-by: Barry Song <v-songbaohua@oppo.com>
Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
Cc: Chris Li <chrisl@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Domenico Cerasuolo <cerasuolodomenico@gmail.com>
Cc: Kairui Song <kasong@tencent.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Yosry Ahmed <yosryahmed@google.com>
Cc: Yu Zhao <yuzhao@google.com>
---
 include/linux/huge_mm.h | 2 ++
 mm/huge_memory.c        | 4 ++++
 mm/page_io.c            | 1 +
 mm/vmscan.c             | 3 +++
 4 files changed, 10 insertions(+)

diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index d4fdb2641070..7cd07b83a3d0 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -268,6 +268,8 @@ enum mthp_stat_item {
 	MTHP_STAT_ANON_FAULT_ALLOC,
 	MTHP_STAT_ANON_FAULT_FALLBACK,
 	MTHP_STAT_ANON_FAULT_FALLBACK_CHARGE,
+	MTHP_STAT_ANON_SWPOUT,
+	MTHP_STAT_ANON_SWPOUT_FALLBACK,
 	__MTHP_STAT_COUNT
 };
 
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index dfc38cc83a04..58f2c4745d80 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -555,11 +555,15 @@ static struct kobj_attribute _name##_attr = __ATTR_RO(_name)
 DEFINE_MTHP_STAT_ATTR(anon_fault_alloc, MTHP_STAT_ANON_FAULT_ALLOC);
 DEFINE_MTHP_STAT_ATTR(anon_fault_fallback, MTHP_STAT_ANON_FAULT_FALLBACK);
 DEFINE_MTHP_STAT_ATTR(anon_fault_fallback_charge, MTHP_STAT_ANON_FAULT_FALLBACK_CHARGE);
+DEFINE_MTHP_STAT_ATTR(anon_swpout, MTHP_STAT_ANON_SWPOUT);
+DEFINE_MTHP_STAT_ATTR(anon_swpout_fallback, MTHP_STAT_ANON_SWPOUT_FALLBACK);
 
 static struct attribute *stats_attrs[] = {
 	&anon_fault_alloc_attr.attr,
 	&anon_fault_fallback_attr.attr,
 	&anon_fault_fallback_charge_attr.attr,
+	&anon_swpout_attr.attr,
+	&anon_swpout_fallback_attr.attr,
 	NULL,
 };
 
diff --git a/mm/page_io.c b/mm/page_io.c
index a9a7c236aecc..46c603dddf04 100644
--- a/mm/page_io.c
+++ b/mm/page_io.c
@@ -217,6 +217,7 @@ static inline void count_swpout_vm_event(struct folio *folio)
 		count_memcg_folio_events(folio, THP_SWPOUT, 1);
 		count_vm_event(THP_SWPOUT);
 	}
+	count_mthp_stat(folio_order(folio), MTHP_STAT_ANON_SWPOUT);
 #endif
 	count_vm_events(PSWPOUT, folio_nr_pages(folio));
 }
diff --git a/mm/vmscan.c b/mm/vmscan.c
index bca2d9981c95..49bd94423961 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1231,6 +1231,8 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
 						goto activate_locked;
 				}
 				if (!add_to_swap(folio)) {
+					int __maybe_unused order = folio_order(folio);
+
 					if (!folio_test_large(folio))
 						goto activate_locked_split;
 					/* Fallback to swap normal pages */
@@ -1242,6 +1244,7 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
 							THP_SWPOUT_FALLBACK, 1);
 						count_vm_event(THP_SWPOUT_FALLBACK);
 					}
+					count_mthp_stat(order, MTHP_STAT_ANON_SWPOUT_FALLBACK);
 #endif
 					if (!add_to_swap(folio))
 						goto activate_locked_split;
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v6 3/4] mm: add docs for per-order mTHP counters and transhuge_page ABI
  2024-04-12 11:48 [PATCH v6 0/4] mm: add per-order mTHP alloc and swpout counters Barry Song
  2024-04-12 11:48 ` [PATCH v6 1/4] mm: add per-order mTHP anon_fault_alloc and anon_fault_fallback counters Barry Song
  2024-04-12 11:48 ` [PATCH v6 2/4] mm: add per-order mTHP anon_swpout and anon_swpout_fallback counters Barry Song
@ 2024-04-12 11:48 ` Barry Song
  2024-04-16  8:15   ` David Hildenbrand
  2024-04-12 11:48 ` [PATCH v6 4/4] mm: correct the docs for thp_fault_alloc and thp_fault_fallback Barry Song
  2024-04-12 12:54 ` [PATCH v6 0/4] mm: add per-order mTHP alloc and swpout counters David Hildenbrand
  4 siblings, 1 reply; 15+ messages in thread
From: Barry Song @ 2024-04-12 11:48 UTC (permalink / raw
  To: akpm, linux-mm
  Cc: cerasuolodomenico, chrisl, david, kasong, linux-kernel, peterx,
	ryan.roberts, surenb, v-songbaohua, willy, yosryahmed, yuzhao,
	corbet

From: Barry Song <v-songbaohua@oppo.com>

This patch includes documentation for mTHP counters and an ABI file
for sys-kernel-mm-transparent-hugepage, which appears to have been
missing for some time.

Signed-off-by: Barry Song <v-songbaohua@oppo.com>
Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
Cc: Chris Li <chrisl@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Domenico Cerasuolo <cerasuolodomenico@gmail.com>
Cc: Kairui Song <kasong@tencent.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Yosry Ahmed <yosryahmed@google.com>
Cc: Yu Zhao <yuzhao@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
---
 .../sys-kernel-mm-transparent-hugepage        | 17 +++++++++++
 Documentation/admin-guide/mm/transhuge.rst    | 28 +++++++++++++++++++
 2 files changed, 45 insertions(+)
 create mode 100644 Documentation/ABI/testing/sys-kernel-mm-transparent-hugepage

diff --git a/Documentation/ABI/testing/sys-kernel-mm-transparent-hugepage b/Documentation/ABI/testing/sys-kernel-mm-transparent-hugepage
new file mode 100644
index 000000000000..33163eba5342
--- /dev/null
+++ b/Documentation/ABI/testing/sys-kernel-mm-transparent-hugepage
@@ -0,0 +1,17 @@
+What:		/sys/kernel/mm/transparent_hugepage/
+Date:		April 2024
+Contact:	Linux memory management mailing list <linux-mm@kvack.org>
+Description:
+		/sys/kernel/mm/transparent_hugepage/ contains a number of files and
+		subdirectories,
+			- defrag
+			- enabled
+			- hpage_pmd_size
+			- khugepaged
+			- shmem_enabled
+			- use_zero_page
+			- subdirectories of the form hugepages-<size>kB, where <size>
+			  is the page size of the hugepages supported by the kernel/CPU
+			  combination.
+
+		See Documentation/admin-guide/mm/transhuge.rst for details.
diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst
index 04eb45a2f940..e0fe17affeb3 100644
--- a/Documentation/admin-guide/mm/transhuge.rst
+++ b/Documentation/admin-guide/mm/transhuge.rst
@@ -447,6 +447,34 @@ thp_swpout_fallback
 	Usually because failed to allocate some continuous swap space
 	for the huge page.
 
+In /sys/kernel/mm/transparent_hugepage/hugepages-<size>kB/stats, There are
+also individual counters for each huge page size, which can be utilized to
+monitor the system's effectiveness in providing huge pages for usage. Each
+counter has its own corresponding file.
+
+anon_fault_alloc
+	is incremented every time a huge page is successfully
+	allocated and charged to handle a page fault.
+
+anon_fault_fallback
+	is incremented if a page fault fails to allocate or charge
+	a huge page and instead falls back to using huge pages with
+	lower orders or small pages.
+
+anon_fault_fallback_charge
+	is incremented if a page fault fails to charge a huge page and
+	instead falls back to using huge pages with lower orders or
+	small pages even though the allocation was successful.
+
+anon_swpout
+	is incremented every time a huge page is swapped out in one
+	piece without splitting.
+
+anon_swpout_fallback
+	is incremented if a huge page has to be split before swapout.
+	Usually because failed to allocate some continuous swap space
+	for the huge page.
+
 As the system ages, allocating huge pages may be expensive as the
 system uses memory compaction to copy data around memory to free a
 huge page for use. There are some counters in ``/proc/vmstat`` to help
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v6 4/4] mm: correct the docs for thp_fault_alloc and thp_fault_fallback
  2024-04-12 11:48 [PATCH v6 0/4] mm: add per-order mTHP alloc and swpout counters Barry Song
                   ` (2 preceding siblings ...)
  2024-04-12 11:48 ` [PATCH v6 3/4] mm: add docs for per-order mTHP counters and transhuge_page ABI Barry Song
@ 2024-04-12 11:48 ` Barry Song
  2024-04-16  8:16   ` David Hildenbrand
  2024-04-12 12:54 ` [PATCH v6 0/4] mm: add per-order mTHP alloc and swpout counters David Hildenbrand
  4 siblings, 1 reply; 15+ messages in thread
From: Barry Song @ 2024-04-12 11:48 UTC (permalink / raw
  To: akpm, linux-mm
  Cc: cerasuolodomenico, chrisl, david, kasong, linux-kernel, peterx,
	ryan.roberts, surenb, v-songbaohua, willy, yosryahmed, yuzhao,
	corbet

From: Barry Song <v-songbaohua@oppo.com>

The documentation does not align with the code. In
__do_huge_pmd_anonymous_page(), THP_FAULT_FALLBACK is incremented when
mem_cgroup_charge() fails, despite the allocation succeeding, whereas
THP_FAULT_ALLOC is only incremented after a successful charge.

Signed-off-by: Barry Song <v-songbaohua@oppo.com>
Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
Cc: Chris Li <chrisl@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Domenico Cerasuolo <cerasuolodomenico@gmail.com>
Cc: Kairui Song <kasong@tencent.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Yosry Ahmed <yosryahmed@google.com>
Cc: Yu Zhao <yuzhao@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
---
 Documentation/admin-guide/mm/transhuge.rst | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst
index e0fe17affeb3..f82300b9193f 100644
--- a/Documentation/admin-guide/mm/transhuge.rst
+++ b/Documentation/admin-guide/mm/transhuge.rst
@@ -369,7 +369,7 @@ monitor how successfully the system is providing huge pages for use.
 
 thp_fault_alloc
 	is incremented every time a huge page is successfully
-	allocated to handle a page fault.
+	allocated and charged to handle a page fault.
 
 thp_collapse_alloc
 	is incremented by khugepaged when it has found
@@ -377,7 +377,7 @@ thp_collapse_alloc
 	successfully allocated a new huge page to store the data.
 
 thp_fault_fallback
-	is incremented if a page fault fails to allocate
+	is incremented if a page fault fails to allocate or charge
 	a huge page and instead falls back to using small pages.
 
 thp_fault_fallback_charge
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v6 1/4] mm: add per-order mTHP anon_fault_alloc and anon_fault_fallback counters
  2024-04-12 11:48 ` [PATCH v6 1/4] mm: add per-order mTHP anon_fault_alloc and anon_fault_fallback counters Barry Song
@ 2024-04-12 11:59   ` Ryan Roberts
  2024-04-16  8:12   ` David Hildenbrand
  1 sibling, 0 replies; 15+ messages in thread
From: Ryan Roberts @ 2024-04-12 11:59 UTC (permalink / raw
  To: Barry Song, akpm, linux-mm
  Cc: cerasuolodomenico, chrisl, david, kasong, linux-kernel, peterx,
	surenb, v-songbaohua, willy, yosryahmed, yuzhao, corbet

On 12/04/2024 12:48, Barry Song wrote:
> From: Barry Song <v-songbaohua@oppo.com>
> 
> Profiling a system blindly with mTHP has become challenging due to the
> lack of visibility into its operations.  Presenting the success rate of
> mTHP allocations appears to be pressing need.
> 
> Recently, I've been experiencing significant difficulty debugging
> performance improvements and regressions without these figures.  It's
> crucial for us to understand the true effectiveness of mTHP in real-world
> scenarios, especially in systems with fragmented memory.
> 
> This patch establishes the framework for per-order mTHP
> counters. It begins by introducing the anon_fault_alloc and
> anon_fault_fallback counters. Additionally, to maintain consistency
> with thp_fault_fallback_charge in /proc/vmstat, this patch also tracks
> anon_fault_fallback_charge when mem_cgroup_charge fails for mTHP.
> Incorporating additional counters should now be straightforward as well.
> 
> Signed-off-by: Barry Song <v-songbaohua@oppo.com>
> Cc: Chris Li <chrisl@kernel.org>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Domenico Cerasuolo <cerasuolodomenico@gmail.com>
> Cc: Kairui Song <kasong@tencent.com>
> Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
> Cc: Peter Xu <peterx@redhat.com>
> Cc: Ryan Roberts <ryan.roberts@arm.com>
> Cc: Suren Baghdasaryan <surenb@google.com>
> Cc: Yosry Ahmed <yosryahmed@google.com>
> Cc: Yu Zhao <yuzhao@google.com>

LGTM!

Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>

> ---
>  include/linux/huge_mm.h | 21 +++++++++++++++++
>  mm/huge_memory.c        | 52 +++++++++++++++++++++++++++++++++++++++++
>  mm/memory.c             |  5 ++++
>  3 files changed, 78 insertions(+)
> 
> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> index e896ca4760f6..d4fdb2641070 100644
> --- a/include/linux/huge_mm.h
> +++ b/include/linux/huge_mm.h
> @@ -264,6 +264,27 @@ unsigned long thp_vma_allowable_orders(struct vm_area_struct *vma,
>  					  enforce_sysfs, orders);
>  }
>  
> +enum mthp_stat_item {
> +	MTHP_STAT_ANON_FAULT_ALLOC,
> +	MTHP_STAT_ANON_FAULT_FALLBACK,
> +	MTHP_STAT_ANON_FAULT_FALLBACK_CHARGE,
> +	__MTHP_STAT_COUNT
> +};
> +
> +struct mthp_stat {
> +	unsigned long stats[ilog2(MAX_PTRS_PER_PTE) + 1][__MTHP_STAT_COUNT];
> +};
> +
> +DECLARE_PER_CPU(struct mthp_stat, mthp_stats);
> +
> +static inline void count_mthp_stat(int order, enum mthp_stat_item item)
> +{
> +	if (order <= 0 || order > PMD_ORDER)
> +		return;
> +
> +	this_cpu_inc(mthp_stats.stats[order][item]);
> +}
> +
>  #define transparent_hugepage_use_zero_page()				\
>  	(transparent_hugepage_flags &					\
>  	 (1<<TRANSPARENT_HUGEPAGE_USE_ZERO_PAGE_FLAG))
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index dc30139590e6..dfc38cc83a04 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -526,6 +526,48 @@ static const struct kobj_type thpsize_ktype = {
>  	.sysfs_ops = &kobj_sysfs_ops,
>  };
>  
> +DEFINE_PER_CPU(struct mthp_stat, mthp_stats) = {{{0}}};
> +
> +static unsigned long sum_mthp_stat(int order, enum mthp_stat_item item)
> +{
> +	unsigned long sum = 0;
> +	int cpu;
> +
> +	for_each_possible_cpu(cpu) {
> +		struct mthp_stat *this = &per_cpu(mthp_stats, cpu);
> +
> +		sum += this->stats[order][item];
> +	}
> +
> +	return sum;
> +}
> +
> +#define DEFINE_MTHP_STAT_ATTR(_name, _index)				\
> +static ssize_t _name##_show(struct kobject *kobj,			\
> +			struct kobj_attribute *attr, char *buf)		\
> +{									\
> +	int order = to_thpsize(kobj)->order;				\
> +									\
> +	return sysfs_emit(buf, "%lu\n", sum_mthp_stat(order, _index));	\
> +}									\
> +static struct kobj_attribute _name##_attr = __ATTR_RO(_name)
> +
> +DEFINE_MTHP_STAT_ATTR(anon_fault_alloc, MTHP_STAT_ANON_FAULT_ALLOC);
> +DEFINE_MTHP_STAT_ATTR(anon_fault_fallback, MTHP_STAT_ANON_FAULT_FALLBACK);
> +DEFINE_MTHP_STAT_ATTR(anon_fault_fallback_charge, MTHP_STAT_ANON_FAULT_FALLBACK_CHARGE);
> +
> +static struct attribute *stats_attrs[] = {
> +	&anon_fault_alloc_attr.attr,
> +	&anon_fault_fallback_attr.attr,
> +	&anon_fault_fallback_charge_attr.attr,
> +	NULL,
> +};
> +
> +static struct attribute_group stats_attr_group = {
> +	.name = "stats",
> +	.attrs = stats_attrs,
> +};
> +
>  static struct thpsize *thpsize_create(int order, struct kobject *parent)
>  {
>  	unsigned long size = (PAGE_SIZE << order) / SZ_1K;
> @@ -549,6 +591,12 @@ static struct thpsize *thpsize_create(int order, struct kobject *parent)
>  		return ERR_PTR(ret);
>  	}
>  
> +	ret = sysfs_create_group(&thpsize->kobj, &stats_attr_group);
> +	if (ret) {
> +		kobject_put(&thpsize->kobj);
> +		return ERR_PTR(ret);
> +	}
> +
>  	thpsize->order = order;
>  	return thpsize;
>  }
> @@ -880,6 +928,8 @@ static vm_fault_t __do_huge_pmd_anonymous_page(struct vm_fault *vmf,
>  		folio_put(folio);
>  		count_vm_event(THP_FAULT_FALLBACK);
>  		count_vm_event(THP_FAULT_FALLBACK_CHARGE);
> +		count_mthp_stat(HPAGE_PMD_ORDER, MTHP_STAT_ANON_FAULT_FALLBACK);
> +		count_mthp_stat(HPAGE_PMD_ORDER, MTHP_STAT_ANON_FAULT_FALLBACK_CHARGE);
>  		return VM_FAULT_FALLBACK;
>  	}
>  	folio_throttle_swaprate(folio, gfp);
> @@ -929,6 +979,7 @@ static vm_fault_t __do_huge_pmd_anonymous_page(struct vm_fault *vmf,
>  		mm_inc_nr_ptes(vma->vm_mm);
>  		spin_unlock(vmf->ptl);
>  		count_vm_event(THP_FAULT_ALLOC);
> +		count_mthp_stat(HPAGE_PMD_ORDER, MTHP_STAT_ANON_FAULT_ALLOC);
>  		count_memcg_event_mm(vma->vm_mm, THP_FAULT_ALLOC);
>  	}
>  
> @@ -1050,6 +1101,7 @@ vm_fault_t do_huge_pmd_anonymous_page(struct vm_fault *vmf)
>  	folio = vma_alloc_folio(gfp, HPAGE_PMD_ORDER, vma, haddr, true);
>  	if (unlikely(!folio)) {
>  		count_vm_event(THP_FAULT_FALLBACK);
> +		count_mthp_stat(HPAGE_PMD_ORDER, MTHP_STAT_ANON_FAULT_FALLBACK);
>  		return VM_FAULT_FALLBACK;
>  	}
>  	return __do_huge_pmd_anonymous_page(vmf, &folio->page, gfp);
> diff --git a/mm/memory.c b/mm/memory.c
> index 649a547fe8e3..f31da2de19c6 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -4368,6 +4368,7 @@ static struct folio *alloc_anon_folio(struct vm_fault *vmf)
>  		folio = vma_alloc_folio(gfp, order, vma, addr, true);
>  		if (folio) {
>  			if (mem_cgroup_charge(folio, vma->vm_mm, gfp)) {
> +				count_mthp_stat(order, MTHP_STAT_ANON_FAULT_FALLBACK_CHARGE);
>  				folio_put(folio);
>  				goto next;
>  			}
> @@ -4376,6 +4377,7 @@ static struct folio *alloc_anon_folio(struct vm_fault *vmf)
>  			return folio;
>  		}
>  next:
> +		count_mthp_stat(order, MTHP_STAT_ANON_FAULT_FALLBACK);
>  		order = next_order(&orders, order);
>  	}
>  
> @@ -4485,6 +4487,9 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf)
>  
>  	folio_ref_add(folio, nr_pages - 1);
>  	add_mm_counter(vma->vm_mm, MM_ANONPAGES, nr_pages);
> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> +	count_mthp_stat(folio_order(folio), MTHP_STAT_ANON_FAULT_ALLOC);
> +#endif
>  	folio_add_new_anon_rmap(folio, vma, addr);
>  	folio_add_lru_vma(folio, vma);
>  setpte:


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v6 0/4] mm: add per-order mTHP alloc and swpout counters
  2024-04-12 11:48 [PATCH v6 0/4] mm: add per-order mTHP alloc and swpout counters Barry Song
                   ` (3 preceding siblings ...)
  2024-04-12 11:48 ` [PATCH v6 4/4] mm: correct the docs for thp_fault_alloc and thp_fault_fallback Barry Song
@ 2024-04-12 12:54 ` David Hildenbrand
  2024-04-12 13:16   ` Barry Song
  4 siblings, 1 reply; 15+ messages in thread
From: David Hildenbrand @ 2024-04-12 12:54 UTC (permalink / raw
  To: Barry Song, akpm, linux-mm
  Cc: cerasuolodomenico, chrisl, kasong, linux-kernel, peterx,
	ryan.roberts, surenb, v-songbaohua, willy, yosryahmed, yuzhao,
	corbet

On 12.04.24 13:48, Barry Song wrote:
> From: Barry Song <v-songbaohua@oppo.com>
> 
> The patchset introduces a framework to facilitate mTHP counters, starting
> with the allocation and swap-out counters. Currently, only four new nodes
> are appended to the stats directory for each mTHP size.
> 
> /sys/kernel/mm/transparent_hugepage/hugepages-<size>/stats
> 	anon_fault_alloc
> 	anon_fault_fallback
> 	anon_fault_fallback_charge
> 	anon_swpout
> 	anon_swpout_fallback
> 
> These nodes are crucial for us to monitor the fragmentation levels of
> both the buddy system and the swap partitions. In the future, we may
> consider adding additional nodes for further insights.
> 
> -v6:
>    * collect reviewed-by tags for patch2/4, 3/4, 4/4, Ryan;
>    * move back to static array by using MAX_PTRS_PER_PTE, Ryan;
>    * move to for_each_possible_cpu to handle cpu hotplug, Ryan;
>    * other minor cleanups according to Ryan;

Please *really* not multiple versions of the same patch set on one a 
single day.

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v6 0/4] mm: add per-order mTHP alloc and swpout counters
  2024-04-12 12:54 ` [PATCH v6 0/4] mm: add per-order mTHP alloc and swpout counters David Hildenbrand
@ 2024-04-12 13:16   ` Barry Song
  2024-04-16  8:08     ` David Hildenbrand
  0 siblings, 1 reply; 15+ messages in thread
From: Barry Song @ 2024-04-12 13:16 UTC (permalink / raw
  To: David Hildenbrand
  Cc: akpm, linux-mm, cerasuolodomenico, chrisl, kasong, linux-kernel,
	peterx, ryan.roberts, surenb, v-songbaohua, willy, yosryahmed,
	yuzhao, corbet

On Sat, Apr 13, 2024 at 12:54 AM David Hildenbrand <david@redhat.com> wrote:
>
> On 12.04.24 13:48, Barry Song wrote:
> > From: Barry Song <v-songbaohua@oppo.com>
> >
> > The patchset introduces a framework to facilitate mTHP counters, starting
> > with the allocation and swap-out counters. Currently, only four new nodes
> > are appended to the stats directory for each mTHP size.
> >
> > /sys/kernel/mm/transparent_hugepage/hugepages-<size>/stats
> >       anon_fault_alloc
> >       anon_fault_fallback
> >       anon_fault_fallback_charge
> >       anon_swpout
> >       anon_swpout_fallback
> >
> > These nodes are crucial for us to monitor the fragmentation levels of
> > both the buddy system and the swap partitions. In the future, we may
> > consider adding additional nodes for further insights.
> >
> > -v6:
> >    * collect reviewed-by tags for patch2/4, 3/4, 4/4, Ryan;
> >    * move back to static array by using MAX_PTRS_PER_PTE, Ryan;
> >    * move to for_each_possible_cpu to handle cpu hotplug, Ryan;
> >    * other minor cleanups according to Ryan;
>
> Please *really* not multiple versions of the same patch set on one a
> single day.

Ok. I will leave more time for you to review the older versions before moving
to a new version.

For v5->v6,  it is quite a straightforward re-spin though I can understand
it might be a bit annoying if you got v6 while you were reading v5.

>
> --
> Cheers,
>
> David / dhildenb

Thanks
Barry

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v6 0/4] mm: add per-order mTHP alloc and swpout counters
  2024-04-12 13:16   ` Barry Song
@ 2024-04-16  8:08     ` David Hildenbrand
  0 siblings, 0 replies; 15+ messages in thread
From: David Hildenbrand @ 2024-04-16  8:08 UTC (permalink / raw
  To: Barry Song
  Cc: akpm, linux-mm, cerasuolodomenico, chrisl, kasong, linux-kernel,
	peterx, ryan.roberts, surenb, v-songbaohua, willy, yosryahmed,
	yuzhao, corbet

On 12.04.24 15:16, Barry Song wrote:
> On Sat, Apr 13, 2024 at 12:54 AM David Hildenbrand <david@redhat.com> wrote:
>>
>> On 12.04.24 13:48, Barry Song wrote:
>>> From: Barry Song <v-songbaohua@oppo.com>
>>>
>>> The patchset introduces a framework to facilitate mTHP counters, starting
>>> with the allocation and swap-out counters. Currently, only four new nodes
>>> are appended to the stats directory for each mTHP size.
>>>
>>> /sys/kernel/mm/transparent_hugepage/hugepages-<size>/stats
>>>        anon_fault_alloc
>>>        anon_fault_fallback
>>>        anon_fault_fallback_charge
>>>        anon_swpout
>>>        anon_swpout_fallback
>>>
>>> These nodes are crucial for us to monitor the fragmentation levels of
>>> both the buddy system and the swap partitions. In the future, we may
>>> consider adding additional nodes for further insights.
>>>
>>> -v6:
>>>     * collect reviewed-by tags for patch2/4, 3/4, 4/4, Ryan;
>>>     * move back to static array by using MAX_PTRS_PER_PTE, Ryan;
>>>     * move to for_each_possible_cpu to handle cpu hotplug, Ryan;
>>>     * other minor cleanups according to Ryan;
>>
>> Please *really* not multiple versions of the same patch set on one a
>> single day.
> 
> Ok. I will leave more time for you to review the older versions before moving
> to a new version.

Yes please. There is not anything gained from sending out stuff to fast, 
besides mixing discussions, same questions/comments ... and effectively 
more work for reviewers.

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v6 1/4] mm: add per-order mTHP anon_fault_alloc and anon_fault_fallback counters
  2024-04-12 11:48 ` [PATCH v6 1/4] mm: add per-order mTHP anon_fault_alloc and anon_fault_fallback counters Barry Song
  2024-04-12 11:59   ` Ryan Roberts
@ 2024-04-16  8:12   ` David Hildenbrand
  1 sibling, 0 replies; 15+ messages in thread
From: David Hildenbrand @ 2024-04-16  8:12 UTC (permalink / raw
  To: Barry Song, akpm, linux-mm
  Cc: cerasuolodomenico, chrisl, kasong, linux-kernel, peterx,
	ryan.roberts, surenb, v-songbaohua, willy, yosryahmed, yuzhao,
	corbet

On 12.04.24 13:48, Barry Song wrote:
> From: Barry Song <v-songbaohua@oppo.com>
> 
> Profiling a system blindly with mTHP has become challenging due to the
> lack of visibility into its operations.  Presenting the success rate of
> mTHP allocations appears to be pressing need.
> 
> Recently, I've been experiencing significant difficulty debugging
> performance improvements and regressions without these figures.  It's
> crucial for us to understand the true effectiveness of mTHP in real-world
> scenarios, especially in systems with fragmented memory.
> 
> This patch establishes the framework for per-order mTHP
> counters. It begins by introducing the anon_fault_alloc and
> anon_fault_fallback counters. Additionally, to maintain consistency
> with thp_fault_fallback_charge in /proc/vmstat, this patch also tracks
> anon_fault_fallback_charge when mem_cgroup_charge fails for mTHP.
> Incorporating additional counters should now be straightforward as well.
> 
> Signed-off-by: Barry Song <v-songbaohua@oppo.com>
> Cc: Chris Li <chrisl@kernel.org>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Domenico Cerasuolo <cerasuolodomenico@gmail.com>
> Cc: Kairui Song <kasong@tencent.com>
> Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
> Cc: Peter Xu <peterx@redhat.com>
> Cc: Ryan Roberts <ryan.roberts@arm.com>
> Cc: Suren Baghdasaryan <surenb@google.com>
> Cc: Yosry Ahmed <yosryahmed@google.com>
> Cc: Yu Zhao <yuzhao@google.com>
> ---

Acked-by: David Hildenbrand <david@redhat.com>

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v6 2/4] mm: add per-order mTHP anon_swpout and anon_swpout_fallback counters
  2024-04-12 11:48 ` [PATCH v6 2/4] mm: add per-order mTHP anon_swpout and anon_swpout_fallback counters Barry Song
@ 2024-04-16  8:14   ` David Hildenbrand
  2024-04-16  8:16     ` Barry Song
  2024-04-16  8:17     ` David Hildenbrand
  0 siblings, 2 replies; 15+ messages in thread
From: David Hildenbrand @ 2024-04-16  8:14 UTC (permalink / raw
  To: Barry Song, akpm, linux-mm
  Cc: cerasuolodomenico, chrisl, kasong, linux-kernel, peterx,
	ryan.roberts, surenb, v-songbaohua, willy, yosryahmed, yuzhao,
	corbet

On 12.04.24 13:48, Barry Song wrote:
> From: Barry Song <v-songbaohua@oppo.com>
> 
> This helps to display the fragmentation situation of the swapfile, knowing
> the proportion of how much we haven't split large folios.  So far, we only
> support non-split swapout for anon memory, with the possibility of
> expanding to shmem in the future.  So, we add the "anon" prefix to the
> counter names.
> 
> Signed-off-by: Barry Song <v-songbaohua@oppo.com>
> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
> Cc: Chris Li <chrisl@kernel.org>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Domenico Cerasuolo <cerasuolodomenico@gmail.com>
> Cc: Kairui Song <kasong@tencent.com>
> Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
> Cc: Peter Xu <peterx@redhat.com>
> Cc: Ryan Roberts <ryan.roberts@arm.com>
> Cc: Suren Baghdasaryan <surenb@google.com>
> Cc: Yosry Ahmed <yosryahmed@google.com>
> Cc: Yu Zhao <yuzhao@google.com>
> ---
>   include/linux/huge_mm.h | 2 ++
>   mm/huge_memory.c        | 4 ++++
>   mm/page_io.c            | 1 +
>   mm/vmscan.c             | 3 +++
>   4 files changed, 10 insertions(+)
> 
> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> index d4fdb2641070..7cd07b83a3d0 100644
> --- a/include/linux/huge_mm.h
> +++ b/include/linux/huge_mm.h
> @@ -268,6 +268,8 @@ enum mthp_stat_item {
>   	MTHP_STAT_ANON_FAULT_ALLOC,
>   	MTHP_STAT_ANON_FAULT_FALLBACK,
>   	MTHP_STAT_ANON_FAULT_FALLBACK_CHARGE,
> +	MTHP_STAT_ANON_SWPOUT,
> +	MTHP_STAT_ANON_SWPOUT_FALLBACK,
>   	__MTHP_STAT_COUNT
>   };
>   
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index dfc38cc83a04..58f2c4745d80 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -555,11 +555,15 @@ static struct kobj_attribute _name##_attr = __ATTR_RO(_name)
>   DEFINE_MTHP_STAT_ATTR(anon_fault_alloc, MTHP_STAT_ANON_FAULT_ALLOC);
>   DEFINE_MTHP_STAT_ATTR(anon_fault_fallback, MTHP_STAT_ANON_FAULT_FALLBACK);
>   DEFINE_MTHP_STAT_ATTR(anon_fault_fallback_charge, MTHP_STAT_ANON_FAULT_FALLBACK_CHARGE);
> +DEFINE_MTHP_STAT_ATTR(anon_swpout, MTHP_STAT_ANON_SWPOUT);
> +DEFINE_MTHP_STAT_ATTR(anon_swpout_fallback, MTHP_STAT_ANON_SWPOUT_FALLBACK);
>   
>   static struct attribute *stats_attrs[] = {
>   	&anon_fault_alloc_attr.attr,
>   	&anon_fault_fallback_attr.attr,
>   	&anon_fault_fallback_charge_attr.attr,
> +	&anon_swpout_attr.attr,
> +	&anon_swpout_fallback_attr.attr,
>   	NULL,
>   };
>   
> diff --git a/mm/page_io.c b/mm/page_io.c
> index a9a7c236aecc..46c603dddf04 100644
> --- a/mm/page_io.c
> +++ b/mm/page_io.c
> @@ -217,6 +217,7 @@ static inline void count_swpout_vm_event(struct folio *folio)
>   		count_memcg_folio_events(folio, THP_SWPOUT, 1);
>   		count_vm_event(THP_SWPOUT);
>   	}
> +	count_mthp_stat(folio_order(folio), MTHP_STAT_ANON_SWPOUT);
>   #endif
>   	count_vm_events(PSWPOUT, folio_nr_pages(folio));
>   }
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index bca2d9981c95..49bd94423961 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1231,6 +1231,8 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
>   						goto activate_locked;
>   				}
>   				if (!add_to_swap(folio)) {
> +					int __maybe_unused order = folio_order(folio);
> +
>   					if (!folio_test_large(folio))
>   						goto activate_locked_split;
>   					/* Fallback to swap normal pages */
> @@ -1242,6 +1244,7 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
>   							THP_SWPOUT_FALLBACK, 1);
>   						count_vm_event(THP_SWPOUT_FALLBACK);
>   					}
> +					count_mthp_stat(order, MTHP_STAT_ANON_SWPOUT_FALLBACK);

Why the temporary variable for order?

count_mthp_stat(folio_order(order),
                 MTHP_STAT_ANON_SWPOUT_FALLBACK);

... but now I do wonder if we want to pass the folio to count_mthp_stat() ?

Anyhow

Acked-by: David Hildenbrand <david@redhat.com>

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v6 3/4] mm: add docs for per-order mTHP counters and transhuge_page ABI
  2024-04-12 11:48 ` [PATCH v6 3/4] mm: add docs for per-order mTHP counters and transhuge_page ABI Barry Song
@ 2024-04-16  8:15   ` David Hildenbrand
  0 siblings, 0 replies; 15+ messages in thread
From: David Hildenbrand @ 2024-04-16  8:15 UTC (permalink / raw
  To: Barry Song, akpm, linux-mm
  Cc: cerasuolodomenico, chrisl, kasong, linux-kernel, peterx,
	ryan.roberts, surenb, v-songbaohua, willy, yosryahmed, yuzhao,
	corbet

On 12.04.24 13:48, Barry Song wrote:
> From: Barry Song <v-songbaohua@oppo.com>
> 
> This patch includes documentation for mTHP counters and an ABI file
> for sys-kernel-mm-transparent-hugepage, which appears to have been
> missing for some time.
> 
> Signed-off-by: Barry Song <v-songbaohua@oppo.com>
> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
> Cc: Chris Li <chrisl@kernel.org>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Domenico Cerasuolo <cerasuolodomenico@gmail.com>
> Cc: Kairui Song <kasong@tencent.com>
> Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
> Cc: Peter Xu <peterx@redhat.com>
> Cc: Ryan Roberts <ryan.roberts@arm.com>
> Cc: Suren Baghdasaryan <surenb@google.com>
> Cc: Yosry Ahmed <yosryahmed@google.com>
> Cc: Yu Zhao <yuzhao@google.com>
> Cc: Jonathan Corbet <corbet@lwn.net>
> ---

Reviewed-by: David Hildenbrand <david@redhat.com>

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v6 4/4] mm: correct the docs for thp_fault_alloc and thp_fault_fallback
  2024-04-12 11:48 ` [PATCH v6 4/4] mm: correct the docs for thp_fault_alloc and thp_fault_fallback Barry Song
@ 2024-04-16  8:16   ` David Hildenbrand
  0 siblings, 0 replies; 15+ messages in thread
From: David Hildenbrand @ 2024-04-16  8:16 UTC (permalink / raw
  To: Barry Song, akpm, linux-mm
  Cc: cerasuolodomenico, chrisl, kasong, linux-kernel, peterx,
	ryan.roberts, surenb, v-songbaohua, willy, yosryahmed, yuzhao,
	corbet

On 12.04.24 13:48, Barry Song wrote:
> From: Barry Song <v-songbaohua@oppo.com>
> 
> The documentation does not align with the code. In
> __do_huge_pmd_anonymous_page(), THP_FAULT_FALLBACK is incremented when
> mem_cgroup_charge() fails, despite the allocation succeeding, whereas
> THP_FAULT_ALLOC is only incremented after a successful charge.
> 
> Signed-off-by: Barry Song <v-songbaohua@oppo.com>
> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
> Cc: Chris Li <chrisl@kernel.org>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Domenico Cerasuolo <cerasuolodomenico@gmail.com>
> Cc: Kairui Song <kasong@tencent.com>
> Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
> Cc: Peter Xu <peterx@redhat.com>
> Cc: Ryan Roberts <ryan.roberts@arm.com>
> Cc: Suren Baghdasaryan <surenb@google.com>
> Cc: Yosry Ahmed <yosryahmed@google.com>
> Cc: Yu Zhao <yuzhao@google.com>
> Cc: Jonathan Corbet <corbet@lwn.net>
> ---

Reviewed-by: David Hildenbrand <david@redhat.com>

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v6 2/4] mm: add per-order mTHP anon_swpout and anon_swpout_fallback counters
  2024-04-16  8:14   ` David Hildenbrand
@ 2024-04-16  8:16     ` Barry Song
  2024-04-16  8:17     ` David Hildenbrand
  1 sibling, 0 replies; 15+ messages in thread
From: Barry Song @ 2024-04-16  8:16 UTC (permalink / raw
  To: David Hildenbrand
  Cc: akpm, linux-mm, cerasuolodomenico, chrisl, kasong, linux-kernel,
	peterx, ryan.roberts, surenb, v-songbaohua, willy, yosryahmed,
	yuzhao, corbet

On Tue, Apr 16, 2024 at 8:14 PM David Hildenbrand <david@redhat.com> wrote:
>
> On 12.04.24 13:48, Barry Song wrote:
> > From: Barry Song <v-songbaohua@oppo.com>
> >
> > This helps to display the fragmentation situation of the swapfile, knowing
> > the proportion of how much we haven't split large folios.  So far, we only
> > support non-split swapout for anon memory, with the possibility of
> > expanding to shmem in the future.  So, we add the "anon" prefix to the
> > counter names.
> >
> > Signed-off-by: Barry Song <v-songbaohua@oppo.com>
> > Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
> > Cc: Chris Li <chrisl@kernel.org>
> > Cc: David Hildenbrand <david@redhat.com>
> > Cc: Domenico Cerasuolo <cerasuolodomenico@gmail.com>
> > Cc: Kairui Song <kasong@tencent.com>
> > Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
> > Cc: Peter Xu <peterx@redhat.com>
> > Cc: Ryan Roberts <ryan.roberts@arm.com>
> > Cc: Suren Baghdasaryan <surenb@google.com>
> > Cc: Yosry Ahmed <yosryahmed@google.com>
> > Cc: Yu Zhao <yuzhao@google.com>
> > ---
> >   include/linux/huge_mm.h | 2 ++
> >   mm/huge_memory.c        | 4 ++++
> >   mm/page_io.c            | 1 +
> >   mm/vmscan.c             | 3 +++
> >   4 files changed, 10 insertions(+)
> >
> > diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> > index d4fdb2641070..7cd07b83a3d0 100644
> > --- a/include/linux/huge_mm.h
> > +++ b/include/linux/huge_mm.h
> > @@ -268,6 +268,8 @@ enum mthp_stat_item {
> >       MTHP_STAT_ANON_FAULT_ALLOC,
> >       MTHP_STAT_ANON_FAULT_FALLBACK,
> >       MTHP_STAT_ANON_FAULT_FALLBACK_CHARGE,
> > +     MTHP_STAT_ANON_SWPOUT,
> > +     MTHP_STAT_ANON_SWPOUT_FALLBACK,
> >       __MTHP_STAT_COUNT
> >   };
> >
> > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > index dfc38cc83a04..58f2c4745d80 100644
> > --- a/mm/huge_memory.c
> > +++ b/mm/huge_memory.c
> > @@ -555,11 +555,15 @@ static struct kobj_attribute _name##_attr = __ATTR_RO(_name)
> >   DEFINE_MTHP_STAT_ATTR(anon_fault_alloc, MTHP_STAT_ANON_FAULT_ALLOC);
> >   DEFINE_MTHP_STAT_ATTR(anon_fault_fallback, MTHP_STAT_ANON_FAULT_FALLBACK);
> >   DEFINE_MTHP_STAT_ATTR(anon_fault_fallback_charge, MTHP_STAT_ANON_FAULT_FALLBACK_CHARGE);
> > +DEFINE_MTHP_STAT_ATTR(anon_swpout, MTHP_STAT_ANON_SWPOUT);
> > +DEFINE_MTHP_STAT_ATTR(anon_swpout_fallback, MTHP_STAT_ANON_SWPOUT_FALLBACK);
> >
> >   static struct attribute *stats_attrs[] = {
> >       &anon_fault_alloc_attr.attr,
> >       &anon_fault_fallback_attr.attr,
> >       &anon_fault_fallback_charge_attr.attr,
> > +     &anon_swpout_attr.attr,
> > +     &anon_swpout_fallback_attr.attr,
> >       NULL,
> >   };
> >
> > diff --git a/mm/page_io.c b/mm/page_io.c
> > index a9a7c236aecc..46c603dddf04 100644
> > --- a/mm/page_io.c
> > +++ b/mm/page_io.c
> > @@ -217,6 +217,7 @@ static inline void count_swpout_vm_event(struct folio *folio)
> >               count_memcg_folio_events(folio, THP_SWPOUT, 1);
> >               count_vm_event(THP_SWPOUT);
> >       }
> > +     count_mthp_stat(folio_order(folio), MTHP_STAT_ANON_SWPOUT);
> >   #endif
> >       count_vm_events(PSWPOUT, folio_nr_pages(folio));
> >   }
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index bca2d9981c95..49bd94423961 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -1231,6 +1231,8 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
> >                                               goto activate_locked;
> >                               }
> >                               if (!add_to_swap(folio)) {
> > +                                     int __maybe_unused order = folio_order(folio);
> > +
> >                                       if (!folio_test_large(folio))
> >                                               goto activate_locked_split;
> >                                       /* Fallback to swap normal pages */
> > @@ -1242,6 +1244,7 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
> >                                                       THP_SWPOUT_FALLBACK, 1);
> >                                               count_vm_event(THP_SWPOUT_FALLBACK);
> >                                       }
> > +                                     count_mthp_stat(order, MTHP_STAT_ANON_SWPOUT_FALLBACK);
>
> Why the temporary variable for order?
>
> count_mthp_stat(folio_order(order),
>                  MTHP_STAT_ANON_SWPOUT_FALLBACK);
>
> ... but now I do wonder if we want to pass the folio to count_mthp_stat() ?

because we have called split_folio_to_list() before counting. that is also
why Ryan is using if (nr_pages >= HPAGE_PMD_NR) but not pmd_mappable.


>
> Anyhow
>
> Acked-by: David Hildenbrand <david@redhat.com>

thanks!

>
> --
> Cheers,
>
> David / dhildenb
>
Barry

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v6 2/4] mm: add per-order mTHP anon_swpout and anon_swpout_fallback counters
  2024-04-16  8:14   ` David Hildenbrand
  2024-04-16  8:16     ` Barry Song
@ 2024-04-16  8:17     ` David Hildenbrand
  1 sibling, 0 replies; 15+ messages in thread
From: David Hildenbrand @ 2024-04-16  8:17 UTC (permalink / raw
  To: Barry Song, akpm, linux-mm
  Cc: cerasuolodomenico, chrisl, kasong, linux-kernel, peterx,
	ryan.roberts, surenb, v-songbaohua, willy, yosryahmed, yuzhao,
	corbet

On 16.04.24 10:14, David Hildenbrand wrote:
> On 12.04.24 13:48, Barry Song wrote:
>> From: Barry Song <v-songbaohua@oppo.com>
>>
>> This helps to display the fragmentation situation of the swapfile, knowing
>> the proportion of how much we haven't split large folios.  So far, we only
>> support non-split swapout for anon memory, with the possibility of
>> expanding to shmem in the future.  So, we add the "anon" prefix to the
>> counter names.
>>
>> Signed-off-by: Barry Song <v-songbaohua@oppo.com>
>> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
>> Cc: Chris Li <chrisl@kernel.org>
>> Cc: David Hildenbrand <david@redhat.com>
>> Cc: Domenico Cerasuolo <cerasuolodomenico@gmail.com>
>> Cc: Kairui Song <kasong@tencent.com>
>> Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
>> Cc: Peter Xu <peterx@redhat.com>
>> Cc: Ryan Roberts <ryan.roberts@arm.com>
>> Cc: Suren Baghdasaryan <surenb@google.com>
>> Cc: Yosry Ahmed <yosryahmed@google.com>
>> Cc: Yu Zhao <yuzhao@google.com>
>> ---
>>    include/linux/huge_mm.h | 2 ++
>>    mm/huge_memory.c        | 4 ++++
>>    mm/page_io.c            | 1 +
>>    mm/vmscan.c             | 3 +++
>>    4 files changed, 10 insertions(+)
>>
>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
>> index d4fdb2641070..7cd07b83a3d0 100644
>> --- a/include/linux/huge_mm.h
>> +++ b/include/linux/huge_mm.h
>> @@ -268,6 +268,8 @@ enum mthp_stat_item {
>>    	MTHP_STAT_ANON_FAULT_ALLOC,
>>    	MTHP_STAT_ANON_FAULT_FALLBACK,
>>    	MTHP_STAT_ANON_FAULT_FALLBACK_CHARGE,
>> +	MTHP_STAT_ANON_SWPOUT,
>> +	MTHP_STAT_ANON_SWPOUT_FALLBACK,
>>    	__MTHP_STAT_COUNT
>>    };
>>    
>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>> index dfc38cc83a04..58f2c4745d80 100644
>> --- a/mm/huge_memory.c
>> +++ b/mm/huge_memory.c
>> @@ -555,11 +555,15 @@ static struct kobj_attribute _name##_attr = __ATTR_RO(_name)
>>    DEFINE_MTHP_STAT_ATTR(anon_fault_alloc, MTHP_STAT_ANON_FAULT_ALLOC);
>>    DEFINE_MTHP_STAT_ATTR(anon_fault_fallback, MTHP_STAT_ANON_FAULT_FALLBACK);
>>    DEFINE_MTHP_STAT_ATTR(anon_fault_fallback_charge, MTHP_STAT_ANON_FAULT_FALLBACK_CHARGE);
>> +DEFINE_MTHP_STAT_ATTR(anon_swpout, MTHP_STAT_ANON_SWPOUT);
>> +DEFINE_MTHP_STAT_ATTR(anon_swpout_fallback, MTHP_STAT_ANON_SWPOUT_FALLBACK);
>>    
>>    static struct attribute *stats_attrs[] = {
>>    	&anon_fault_alloc_attr.attr,
>>    	&anon_fault_fallback_attr.attr,
>>    	&anon_fault_fallback_charge_attr.attr,
>> +	&anon_swpout_attr.attr,
>> +	&anon_swpout_fallback_attr.attr,
>>    	NULL,
>>    };
>>    
>> diff --git a/mm/page_io.c b/mm/page_io.c
>> index a9a7c236aecc..46c603dddf04 100644
>> --- a/mm/page_io.c
>> +++ b/mm/page_io.c
>> @@ -217,6 +217,7 @@ static inline void count_swpout_vm_event(struct folio *folio)
>>    		count_memcg_folio_events(folio, THP_SWPOUT, 1);
>>    		count_vm_event(THP_SWPOUT);
>>    	}
>> +	count_mthp_stat(folio_order(folio), MTHP_STAT_ANON_SWPOUT);
>>    #endif
>>    	count_vm_events(PSWPOUT, folio_nr_pages(folio));
>>    }
>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>> index bca2d9981c95..49bd94423961 100644
>> --- a/mm/vmscan.c
>> +++ b/mm/vmscan.c
>> @@ -1231,6 +1231,8 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
>>    						goto activate_locked;
>>    				}
>>    				if (!add_to_swap(folio)) {
>> +					int __maybe_unused order = folio_order(folio);
>> +
>>    					if (!folio_test_large(folio))
>>    						goto activate_locked_split;
>>    					/* Fallback to swap normal pages */
>> @@ -1242,6 +1244,7 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
>>    							THP_SWPOUT_FALLBACK, 1);
>>    						count_vm_event(THP_SWPOUT_FALLBACK);
>>    					}
>> +					count_mthp_stat(order, MTHP_STAT_ANON_SWPOUT_FALLBACK);
> 
> Why the temporary variable for order?
> 
> count_mthp_stat(folio_order(order),
>                   MTHP_STAT_ANON_SWPOUT_FALLBACK);
> 
> ... but now I do wonder if we want to pass the folio to count_mthp_stat() ?

... and now realizing, that that doesn't make sense if we fail to 
allocate the folio in the first place. So all good.

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2024-04-16  8:17 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-04-12 11:48 [PATCH v6 0/4] mm: add per-order mTHP alloc and swpout counters Barry Song
2024-04-12 11:48 ` [PATCH v6 1/4] mm: add per-order mTHP anon_fault_alloc and anon_fault_fallback counters Barry Song
2024-04-12 11:59   ` Ryan Roberts
2024-04-16  8:12   ` David Hildenbrand
2024-04-12 11:48 ` [PATCH v6 2/4] mm: add per-order mTHP anon_swpout and anon_swpout_fallback counters Barry Song
2024-04-16  8:14   ` David Hildenbrand
2024-04-16  8:16     ` Barry Song
2024-04-16  8:17     ` David Hildenbrand
2024-04-12 11:48 ` [PATCH v6 3/4] mm: add docs for per-order mTHP counters and transhuge_page ABI Barry Song
2024-04-16  8:15   ` David Hildenbrand
2024-04-12 11:48 ` [PATCH v6 4/4] mm: correct the docs for thp_fault_alloc and thp_fault_fallback Barry Song
2024-04-16  8:16   ` David Hildenbrand
2024-04-12 12:54 ` [PATCH v6 0/4] mm: add per-order mTHP alloc and swpout counters David Hildenbrand
2024-04-12 13:16   ` Barry Song
2024-04-16  8:08     ` David Hildenbrand

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.