All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/5] Minor improvements and cleanups to ext4 mballoc
@ 2024-03-26 21:38 Kemeng Shi
  2024-03-26 21:38 ` [PATCH 1/5] ext4: keep "prefetch_grp" and "nr" consistent Kemeng Shi
                   ` (4 more replies)
  0 siblings, 5 replies; 16+ messages in thread
From: Kemeng Shi @ 2024-03-26 21:38 UTC (permalink / raw
  To: tytso, adilger.kernel, linux-ext4, linux-kernel
  Cc: jack, ojaswin, ritesh.list

This series contains some minor improvements and cleanups to ext4
mballoc. No failure is found in "kvm-xfstests smoke" test and unit
test. More details can be found in respective patches. Thanks!

Kemeng Shi (5):
  ext4: keep "prefetch_grp" and "nr" consistent
  ext4: add test_mb_mark_used_cost to estimate cost of mb_mark_used
  ext4: call ext4_mb_mark_free_simple in mb_mark_used to clear bits
  ext4: use correct criteria name instead stale integer number in
    comment
  ext4: expand next_linear_group to remove repeat check for linear scan.

 fs/ext4/ext4.h         | 15 +++++--
 fs/ext4/mballoc-test.c | 56 ++++++++++++++++++++++++++
 fs/ext4/mballoc.c      | 89 +++++++++++++++++-------------------------
 fs/ext4/mballoc.h      |  4 +-
 4 files changed, 105 insertions(+), 59 deletions(-)

-- 
2.30.0


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/5] ext4: keep "prefetch_grp" and "nr" consistent
  2024-03-26 21:38 [PATCH 0/5] Minor improvements and cleanups to ext4 mballoc Kemeng Shi
@ 2024-03-26 21:38 ` Kemeng Shi
  2024-03-29  7:52   ` Ojaswin Mujoo
  2024-04-04 13:22   ` Jan Kara
  2024-03-26 21:38 ` [PATCH 2/5] ext4: add test_mb_mark_used_cost to estimate cost of mb_mark_used Kemeng Shi
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 16+ messages in thread
From: Kemeng Shi @ 2024-03-26 21:38 UTC (permalink / raw
  To: tytso, adilger.kernel, linux-ext4, linux-kernel
  Cc: jack, ojaswin, ritesh.list

Keep "prefetch_grp" and "nr" consistent to avoid to call
ext4_mb_prefetch_fini with non-prefetched groups.
When we step into next criteria, "prefetch_grp" is set to prefetch start
of new criteria while "nr" is number of the prefetched group in previous
criteria. If previous criteria and next criteria are both inexpensive
(< CR_GOAL_LEN_SLOW) and prefetch_ios reachs sbi->s_mb_prefetch_limit
in previous criteria, "prefetch_grp" and "nr" will be inconsistent and
may introduce unexpected cost to do ext4_mb_init_group for non-prefetched
groups.
Reset "nr" to 0 when we reset "prefetch_grp" to start of prefech in next
criteria to ensure "prefetch_grp" and "nr" are consistent.

Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
---
 fs/ext4/mballoc.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 12b3f196010b..a61fc52956b2 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -2856,6 +2856,7 @@ ext4_mb_regular_allocator(struct ext4_allocation_context *ac)
 		group = ac->ac_g_ex.fe_group;
 		ac->ac_groups_linear_remaining = sbi->s_mb_max_linear_groups;
 		prefetch_grp = group;
+		nr = 0;
 
 		for (i = 0, new_cr = cr; i < ngroups; i++,
 		     ext4_mb_choose_next_group(ac, &new_cr, &group, ngroups)) {
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 2/5] ext4: add test_mb_mark_used_cost to estimate cost of mb_mark_used
  2024-03-26 21:38 [PATCH 0/5] Minor improvements and cleanups to ext4 mballoc Kemeng Shi
  2024-03-26 21:38 ` [PATCH 1/5] ext4: keep "prefetch_grp" and "nr" consistent Kemeng Shi
@ 2024-03-26 21:38 ` Kemeng Shi
  2024-03-29  7:26   ` kernel test robot
  2024-03-26 21:38 ` [PATCH 3/5] ext4: call ext4_mb_mark_free_simple in mb_mark_used to clear bits Kemeng Shi
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 16+ messages in thread
From: Kemeng Shi @ 2024-03-26 21:38 UTC (permalink / raw
  To: tytso, adilger.kernel, linux-ext4, linux-kernel
  Cc: jack, ojaswin, ritesh.list

Add test_mb_mark_used_cost to estimate runtime of mb_mark_used.

Result of unit test is as following:
$ ./tools/testing/kunit/kunit.py run --kunitconfig=fs/ext4/.kunitconfig --raw_output
...
        # Subtest: test_mb_mark_used_cost
    # test_mb_mark_used_cost: costed jiffies 311
        ok 1 block_bits=10 cluster_bits=3 blocks_per_group=8192 group_count=4 desc_size=64
    # test_mb_mark_used_cost: costed jiffies 304
        ok 2 block_bits=12 cluster_bits=3 blocks_per_group=8192 group_count=4 desc_size=64
        ok 3 block_bits=16 cluster_bits=3 blocks_per_group=8192 group_count=4 desc_size=64
 # SKIP blocksize exceeds pagesize
    # test_mb_mark_used_cost.speed: slow
    # test_mb_mark_used_cost: pass:2 fail:0 skip:1 total:3
    ok 7 test_mb_mark_used_cost
...

Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
---
 fs/ext4/mballoc-test.c | 56 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 56 insertions(+)

diff --git a/fs/ext4/mballoc-test.c b/fs/ext4/mballoc-test.c
index 044ca5238f41..cb1a551f9596 100644
--- a/fs/ext4/mballoc-test.c
+++ b/fs/ext4/mballoc-test.c
@@ -859,6 +859,56 @@ static void test_mb_free_blocks(struct kunit *test)
 	ext4_mb_unload_buddy(&e4b);
 }
 
+#define COUNT_FOR_ESTIMATE 1000000
+static void test_mb_mark_used_cost(struct kunit *test)
+{
+	struct ext4_buddy e4b;
+	struct super_block *sb = (struct super_block *)test->priv;
+	struct ext4_free_extent ex;
+	int ret;
+	struct test_range ranges[TEST_RANGE_COUNT];
+	int i, j;
+	unsigned long start, end, all = 0;
+
+	/* buddy cache assumes that each page contains at least one block */
+	if (sb->s_blocksize > PAGE_SIZE)
+		kunit_skip(test, "blocksize exceeds pagesize");
+
+	ret = ext4_mb_load_buddy(sb, TEST_GOAL_GROUP, &e4b);
+	KUNIT_ASSERT_EQ(test, ret, 0);
+
+	ex.fe_group = TEST_GOAL_GROUP;
+	for (j = 0; j < COUNT_FOR_ESTIMATE; j++) {
+		mbt_generate_test_ranges(sb, ranges, TEST_RANGE_COUNT);
+		start = jiffies;
+		for (i = 0; i < TEST_RANGE_COUNT; i++) {
+			if (ranges[i].len == 0)
+				continue;
+
+			ex.fe_start = ranges[i].start;
+			ex.fe_len = ranges[i].len;
+			ext4_lock_group(sb, TEST_GOAL_GROUP);
+			mb_mark_used(&e4b, &ex);
+			ext4_unlock_group(sb, TEST_GOAL_GROUP);
+		}
+		end = jiffies;
+		all += (end - start);
+
+		for (i = 0; i < TEST_RANGE_COUNT; i++) {
+			if (ranges[i].len == 0)
+				continue;
+
+			ext4_lock_group(sb, TEST_GOAL_GROUP);
+			mb_free_blocks(NULL, &e4b, ranges[i].start,
+				       ranges[i].len);
+			ext4_unlock_group(sb, TEST_GOAL_GROUP);
+		}
+	}
+
+	kunit_info(test, "costed jiffies %lu\n", all);
+	ext4_mb_unload_buddy(&e4b);
+}
+
 static const struct mbt_ext4_block_layout mbt_test_layouts[] = {
 	{
 		.blocksize_bits = 10,
@@ -894,6 +944,10 @@ static void mbt_show_layout(const struct mbt_ext4_block_layout *layout,
 }
 KUNIT_ARRAY_PARAM(mbt_layouts, mbt_test_layouts, mbt_show_layout);
 
+static const struct kunit_attributes slow_attr = {
+	.speed = KUNIT_SPEED_SLOW,
+};
+
 static struct kunit_case mbt_test_cases[] = {
 	KUNIT_CASE_PARAM(test_new_blocks_simple, mbt_layouts_gen_params),
 	KUNIT_CASE_PARAM(test_free_blocks_simple, mbt_layouts_gen_params),
@@ -901,6 +955,8 @@ static struct kunit_case mbt_test_cases[] = {
 	KUNIT_CASE_PARAM(test_mb_mark_used, mbt_layouts_gen_params),
 	KUNIT_CASE_PARAM(test_mb_free_blocks, mbt_layouts_gen_params),
 	KUNIT_CASE_PARAM(test_mark_diskspace_used, mbt_layouts_gen_params),
+	KUNIT_CASE_PARAM_ATTR(test_mb_mark_used_cost, mbt_layouts_gen_params,
+			      slow_attr),
 	{}
 };
 
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 3/5] ext4: call ext4_mb_mark_free_simple in mb_mark_used to clear bits
  2024-03-26 21:38 [PATCH 0/5] Minor improvements and cleanups to ext4 mballoc Kemeng Shi
  2024-03-26 21:38 ` [PATCH 1/5] ext4: keep "prefetch_grp" and "nr" consistent Kemeng Shi
  2024-03-26 21:38 ` [PATCH 2/5] ext4: add test_mb_mark_used_cost to estimate cost of mb_mark_used Kemeng Shi
@ 2024-03-26 21:38 ` Kemeng Shi
  2024-04-04 14:16   ` Jan Kara
  2024-03-26 21:38 ` [PATCH 4/5] ext4: use correct criteria name instead stale integer number in comment Kemeng Shi
  2024-03-26 21:38 ` [PATCH 5/5] ext4: expand next_linear_group to remove repeat check for linear scan Kemeng Shi
  4 siblings, 1 reply; 16+ messages in thread
From: Kemeng Shi @ 2024-03-26 21:38 UTC (permalink / raw
  To: tytso, adilger.kernel, linux-ext4, linux-kernel
  Cc: jack, ojaswin, ritesh.list

Function ext4_mb_mark_free_simple could search order for bit clearing in
O(1) cost while mb_mark_used will search order in O(distance from chunk
order to target order) and introduce unnecessary bit flips.

Consider we have 4 continuous free bits and going to mark bit 0-2 inuse.
initial state of buddy bitmap:
order 2 |           0           |
order 1 |     1     |     1     |
order 0 |  1  |  1  |  1  |  1  |

mark whole chunk inuse
order 2 |           1           |
order 1 |     1     |     1     |
order 0 |  1  |  1  |  1  |  1  |

split chunk to order 1
order 2 |           1           |
order 1 |     0     |     0     |
order 0 |  1  |  1  |  1  |  1  |

set the first bit in order 1 to mark bit 0-1 inuse
set the second bit in order 1 for split
order 2 |           1           |
order 1 |     1     |     1     |
order 0 |  1  |  1  |  1  |  1  |

step 3: split the second bit in order 1 to order 0
order 2 |           1           |
order 1 |     1     |     1     |
order 0 |  1  |  1  |  0  |  0  |

step 4: set the third bit in order 0 to mark bit 2 inuse.
order 2 |           1           |
order 1 |     1     |     1     |
order 0 |  1  |  1  |  1  |  0  |
There are two unnecessary splits and three unnecessary bit flips.

With ext4_mb_mark_free_simple, we will clear the 4th bit in order 0
with O(1) search and no extra bit flip.

The cost estimated by test_mb_mark_used_cost is as following:
Before (three runs of test):
    # test_mb_mark_used_cost: costed jiffies 311
    # test_mb_mark_used_cost: costed jiffies 304
    # test_mb_mark_used_cost: costed jiffies 305
    # test_mb_mark_used_cost: costed jiffies 323
    # test_mb_mark_used_cost: costed jiffies 317
    # test_mb_mark_used_cost: costed jiffies 317
After (three runs of test):
    # test_mb_mark_used_cost: costed jiffies 166
    # test_mb_mark_used_cost: costed jiffies 152
    # test_mb_mark_used_cost: costed jiffies 159
    # test_mb_mark_used_cost: costed jiffies 138
    # test_mb_mark_used_cost: costed jiffies 149

Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
---
 fs/ext4/mballoc.c | 37 ++++++++++++++++++++-----------------
 1 file changed, 20 insertions(+), 17 deletions(-)

diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index a61fc52956b2..62d468379722 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -2040,13 +2040,12 @@ static int mb_mark_used(struct ext4_buddy *e4b, struct ext4_free_extent *ex)
 	int ord;
 	int mlen = 0;
 	int max = 0;
-	int cur;
 	int start = ex->fe_start;
 	int len = ex->fe_len;
 	unsigned ret = 0;
 	int len0 = len;
 	void *buddy;
-	bool split = false;
+	int ord_start, ord_end;
 
 	BUG_ON(start + len > (e4b->bd_sb->s_blocksize << 3));
 	BUG_ON(e4b->bd_group != ex->fe_group);
@@ -2071,16 +2070,12 @@ static int mb_mark_used(struct ext4_buddy *e4b, struct ext4_free_extent *ex)
 
 	/* let's maintain buddy itself */
 	while (len) {
-		if (!split)
-			ord = mb_find_order_for_block(e4b, start);
+		ord = mb_find_order_for_block(e4b, start);
 
 		if (((start >> ord) << ord) == start && len >= (1 << ord)) {
 			/* the whole chunk may be allocated at once! */
 			mlen = 1 << ord;
-			if (!split)
-				buddy = mb_find_buddy(e4b, ord, &max);
-			else
-				split = false;
+			buddy = mb_find_buddy(e4b, ord, &max);
 			BUG_ON((start >> ord) >= max);
 			mb_set_bit(start >> ord, buddy);
 			e4b->bd_info->bb_counters[ord]--;
@@ -2094,20 +2089,28 @@ static int mb_mark_used(struct ext4_buddy *e4b, struct ext4_free_extent *ex)
 		if (ret == 0)
 			ret = len | (ord << 16);
 
-		/* we have to split large buddy */
 		BUG_ON(ord <= 0);
 		buddy = mb_find_buddy(e4b, ord, &max);
 		mb_set_bit(start >> ord, buddy);
 		e4b->bd_info->bb_counters[ord]--;
 
-		ord--;
-		cur = (start >> ord) & ~1U;
-		buddy = mb_find_buddy(e4b, ord, &max);
-		mb_clear_bit(cur, buddy);
-		mb_clear_bit(cur + 1, buddy);
-		e4b->bd_info->bb_counters[ord]++;
-		e4b->bd_info->bb_counters[ord]++;
-		split = true;
+		ord_start = (start >> ord) << ord;
+		ord_end = ord_start + (1 << ord);
+		if (start > ord_start)
+			ext4_mb_mark_free_simple(e4b->bd_sb, e4b->bd_buddy,
+						 ord_start, start - ord_start,
+						 e4b->bd_info);
+
+		if (start + len < ord_end) {
+			ext4_mb_mark_free_simple(e4b->bd_sb, e4b->bd_buddy,
+						 start + len,
+						 ord_end - (start + len),
+						 e4b->bd_info);
+			break;
+		}
+
+		len = start + len - ord_end;
+		start = ord_end;
 	}
 	mb_set_largest_free_order(e4b->bd_sb, e4b->bd_info);
 
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 4/5] ext4: use correct criteria name instead stale integer number in comment
  2024-03-26 21:38 [PATCH 0/5] Minor improvements and cleanups to ext4 mballoc Kemeng Shi
                   ` (2 preceding siblings ...)
  2024-03-26 21:38 ` [PATCH 3/5] ext4: call ext4_mb_mark_free_simple in mb_mark_used to clear bits Kemeng Shi
@ 2024-03-26 21:38 ` Kemeng Shi
  2024-03-29  7:15   ` Ojaswin Mujoo
  2024-04-04 14:19   ` Jan Kara
  2024-03-26 21:38 ` [PATCH 5/5] ext4: expand next_linear_group to remove repeat check for linear scan Kemeng Shi
  4 siblings, 2 replies; 16+ messages in thread
From: Kemeng Shi @ 2024-03-26 21:38 UTC (permalink / raw
  To: tytso, adilger.kernel, linux-ext4, linux-kernel
  Cc: jack, ojaswin, ritesh.list

Use correct criteria name instead stale integer number in comment

Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
---
 fs/ext4/ext4.h    | 15 ++++++++++++---
 fs/ext4/mballoc.c | 14 ++++++++------
 fs/ext4/mballoc.h |  4 ++--
 3 files changed, 22 insertions(+), 11 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 023571f8dd1b..9b90013c59a3 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -213,11 +213,20 @@ enum criteria {
 #define EXT4_MB_USE_RESERVED		0x2000
 /* Do strict check for free blocks while retrying block allocation */
 #define EXT4_MB_STRICT_CHECK		0x4000
-/* Large fragment size list lookup succeeded at least once for cr = 0 */
+/*
+ * Large fragment size list lookup succeeded at least once for cr =
+ * CR_POWER2_ALIGNED
+ */
 #define EXT4_MB_CR_POWER2_ALIGNED_OPTIMIZED		0x8000
-/* Avg fragment size rb tree lookup succeeded at least once for cr = 1 */
+/*
+ * Avg fragment size rb tree lookup succeeded at least once for cr =
+ * CR_GOAL_LEN_FAST
+ */
 #define EXT4_MB_CR_GOAL_LEN_FAST_OPTIMIZED		0x00010000
-/* Avg fragment size rb tree lookup succeeded at least once for cr = 1.5 */
+/*
+ * Avg fragment size rb tree lookup succeeded at least once for cr =
+ * CR_BEST_AVAIL_LEN
+ */
 #define EXT4_MB_CR_BEST_AVAIL_LEN_OPTIMIZED		0x00020000
 
 struct ext4_allocation_request {
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 62d468379722..0f8a34513bf6 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -1131,8 +1131,9 @@ static void ext4_mb_choose_next_group(struct ext4_allocation_context *ac,
 		ext4_mb_choose_next_group_best_avail(ac, new_cr, group);
 	} else {
 		/*
-		 * TODO: For CR=2, we can arrange groups in an rb tree sorted by
-		 * bb_free. But until that happens, we should never come here.
+		 * TODO: For CR=CR_GOAL_LEN_SLOW, we can arrange groups in an
+		 * rb tree sorted by bb_free. But until that happens, we should
+		 * never come here.
 		 */
 		WARN_ON(1);
 	}
@@ -3444,10 +3445,11 @@ static int ext4_mb_init_backend(struct super_block *sb)
 	}
 	if (sbi->s_mb_prefetch > ext4_get_groups_count(sb))
 		sbi->s_mb_prefetch = ext4_get_groups_count(sb);
-	/* now many real IOs to prefetch within a single allocation at cr=0
-	 * given cr=0 is an CPU-related optimization we shouldn't try to
-	 * load too many groups, at some point we should start to use what
-	 * we've got in memory.
+	/*
+	 * now many real IOs to prefetch within a single allocation at
+	 * cr=CR_POWER2_ALIGNED. Given cr=CR_POWER2_ALIGNED is an CPU-related
+	 * optimization we shouldn't try to load too many groups, at some point
+	 * we should start to use what we've got in memory.
 	 * with an average random access time 5ms, it'd take a second to get
 	 * 200 groups (* N with flex_bg), so let's make this limit 4
 	 */
diff --git a/fs/ext4/mballoc.h b/fs/ext4/mballoc.h
index 56938532b4ce..042437d8860f 100644
--- a/fs/ext4/mballoc.h
+++ b/fs/ext4/mballoc.h
@@ -187,8 +187,8 @@ struct ext4_allocation_context {
 	struct ext4_free_extent ac_f_ex;
 
 	/*
-	 * goal len can change in CR1.5, so save the original len. This is
-	 * used while adjusting the PA window and for accounting.
+	 * goal len can change in CR_BEST_AVAIL_LEN, so save the original len.
+	 * This is used while adjusting the PA window and for accounting.
 	 */
 	ext4_grpblk_t	ac_orig_goal_len;
 
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 5/5] ext4: expand next_linear_group to remove repeat check for linear scan.
  2024-03-26 21:38 [PATCH 0/5] Minor improvements and cleanups to ext4 mballoc Kemeng Shi
                   ` (3 preceding siblings ...)
  2024-03-26 21:38 ` [PATCH 4/5] ext4: use correct criteria name instead stale integer number in comment Kemeng Shi
@ 2024-03-26 21:38 ` Kemeng Shi
  2024-03-29  7:14   ` Ojaswin Mujoo
  4 siblings, 1 reply; 16+ messages in thread
From: Kemeng Shi @ 2024-03-26 21:38 UTC (permalink / raw
  To: tytso, adilger.kernel, linux-ext4, linux-kernel
  Cc: jack, ojaswin, ritesh.list

Expand next_linear_group to remove repat check for linear scan.

Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
---
 fs/ext4/mballoc.c | 37 ++++++-------------------------------
 1 file changed, 6 insertions(+), 31 deletions(-)

diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 0f8a34513bf6..561780a274cd 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -1075,31 +1075,6 @@ static inline int should_optimize_scan(struct ext4_allocation_context *ac)
 	return 1;
 }
 
-/*
- * Return next linear group for allocation. If linear traversal should not be
- * performed, this function just returns the same group
- */
-static ext4_group_t
-next_linear_group(struct ext4_allocation_context *ac, ext4_group_t group,
-		  ext4_group_t ngroups)
-{
-	if (!should_optimize_scan(ac))
-		goto inc_and_return;
-
-	if (ac->ac_groups_linear_remaining) {
-		ac->ac_groups_linear_remaining--;
-		goto inc_and_return;
-	}
-
-	return group;
-inc_and_return:
-	/*
-	 * Artificially restricted ngroups for non-extent
-	 * files makes group > ngroups possible on first loop.
-	 */
-	return group + 1 >= ngroups ? 0 : group + 1;
-}
-
 /*
  * ext4_mb_choose_next_group: choose next group for allocation.
  *
@@ -1118,12 +1093,12 @@ static void ext4_mb_choose_next_group(struct ext4_allocation_context *ac,
 {
 	*new_cr = ac->ac_criteria;
 
-	if (!should_optimize_scan(ac) || ac->ac_groups_linear_remaining) {
-		*group = next_linear_group(ac, *group, ngroups);
-		return;
-	}
-
-	if (*new_cr == CR_POWER2_ALIGNED) {
+	if (!should_optimize_scan(ac))
+		*group = *group + 1 >= ngroups ? 0 : *group + 1;
+	else if (ac->ac_groups_linear_remaining) {
+		ac->ac_groups_linear_remaining--;
+		*group = *group + 1 >= ngroups ? 0 : *group + 1;
+	} else if (*new_cr == CR_POWER2_ALIGNED) {
 		ext4_mb_choose_next_group_p2_aligned(ac, new_cr, group);
 	} else if (*new_cr == CR_GOAL_LEN_FAST) {
 		ext4_mb_choose_next_group_goal_fast(ac, new_cr, group);
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 5/5] ext4: expand next_linear_group to remove repeat check for linear scan.
  2024-03-26 21:38 ` [PATCH 5/5] ext4: expand next_linear_group to remove repeat check for linear scan Kemeng Shi
@ 2024-03-29  7:14   ` Ojaswin Mujoo
  2024-04-03  6:57     ` Kemeng Shi
  0 siblings, 1 reply; 16+ messages in thread
From: Ojaswin Mujoo @ 2024-03-29  7:14 UTC (permalink / raw
  To: Kemeng Shi
  Cc: tytso, adilger.kernel, linux-ext4, linux-kernel, jack,
	ritesh.list

On Wed, Mar 27, 2024 at 05:38:23AM +0800, Kemeng Shi wrote:
> Expand next_linear_group to remove repat check for linear scan.
> 
> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
> ---
>  fs/ext4/mballoc.c | 37 ++++++-------------------------------
>  1 file changed, 6 insertions(+), 31 deletions(-)
> 
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index 0f8a34513bf6..561780a274cd 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -1075,31 +1075,6 @@ static inline int should_optimize_scan(struct ext4_allocation_context *ac)
>   return 1;
>  }
>  
> -/*
> - * Return next linear group for allocation. If linear traversal should not be
> - * performed, this function just returns the same group
> - */
> -static ext4_group_t
> -next_linear_group(struct ext4_allocation_context *ac, ext4_group_t group,
> -     ext4_group_t ngroups)
> -{
> - if (!should_optimize_scan(ac))
> -   goto inc_and_return;
> -
> - if (ac->ac_groups_linear_remaining) {
> -   ac->ac_groups_linear_remaining--;
> -   goto inc_and_return;
> - }
> -
> - return group;
> -inc_and_return:
> - /*
> -  * Artificially restricted ngroups for non-extent
> -  * files makes group > ngroups possible on first loop.
> -  */
> - return group + 1 >= ngroups ? 0 : group + 1;
> -}
> -
>  /*
>   * ext4_mb_choose_next_group: choose next group for allocation.
>   *
> @@ -1118,12 +1093,12 @@ static void ext4_mb_choose_next_group(struct ext4_allocation_context *ac,
>  {
>   *new_cr = ac->ac_criteria;
>  
> - if (!should_optimize_scan(ac) || ac->ac_groups_linear_remaining) {
> -   *group = next_linear_group(ac, *group, ngroups);
> -   return;
> - }
> -
> - if (*new_cr == CR_POWER2_ALIGNED) {
> + if (!should_optimize_scan(ac))
> +   *group = *group + 1 >= ngroups ? 0 : *group + 1;
> + else if (ac->ac_groups_linear_remaining) {
> +   ac->ac_groups_linear_remaining--;
> +   *group = *group + 1 >= ngroups ? 0 : *group + 1;
> + } else if (*new_cr == CR_POWER2_ALIGNED) {


Hi Kemeng, thanks for the cleanups

I feel that open coding this logic and having a single if for linear scan and
non linear scan cases is making the code a bit more harder to follow and we are
losing some comments as well.

Since our main aim is to avoid the double checking, maybe we can keep
next_linear_group() strictly for getting the next linear group correctly and
rest of the checks outside. So something like:

static ext4_group_t
next_linear_group(ext4_group_t group, ext4_group_t ngroups)
{

  /*
   * Artificially restricted ngroups for non-extent
   * files makes group > ngroups possible on first loop.
   */
  return group + 1 >= ngroups ? 0 : group + 1;
}

static void ext4_mb_choose_next_group(...)
{
  ...

  /*
   * Fallback to linear scan when optimized scanning is disabled
   */
  if (!should_optimize_scan(ac)) {
    *group = next_linear_group(*group, ngroups);
    return;
  }

  /*
   * Optimized scanning can return non adjacent groups which can cause
   * seek overhead for rotational disks. So try few linear groups before 
   * trying optimized scan.
   */
  if (ac->ac_groups_linear_remaining) {
    *group = next_linear_group(*group, ngroups);
    ac->ac_groups_linear_remaining--;
    return;
  }
  
  ...
}

Let me know your thought. 

Regards,
ojaswin

>     ext4_mb_choose_next_group_p2_aligned(ac, new_cr, group);
>   } else if (*new_cr == CR_GOAL_LEN_FAST) {
>     ext4_mb_choose_next_group_goal_fast(ac, new_cr, group);
> -- 
> 2.30.0
> 




^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 4/5] ext4: use correct criteria name instead stale integer number in comment
  2024-03-26 21:38 ` [PATCH 4/5] ext4: use correct criteria name instead stale integer number in comment Kemeng Shi
@ 2024-03-29  7:15   ` Ojaswin Mujoo
  2024-04-04 14:19   ` Jan Kara
  1 sibling, 0 replies; 16+ messages in thread
From: Ojaswin Mujoo @ 2024-03-29  7:15 UTC (permalink / raw
  To: Kemeng Shi
  Cc: tytso, adilger.kernel, linux-ext4, linux-kernel, jack,
	ritesh.list

On Wed, Mar 27, 2024 at 05:38:22AM +0800, Kemeng Shi wrote:
> Use correct criteria name instead stale integer number in comment
> 
> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
> ---
>  fs/ext4/ext4.h    | 15 ++++++++++++---
>  fs/ext4/mballoc.c | 14 ++++++++------
>  fs/ext4/mballoc.h |  4 ++--
>  3 files changed, 22 insertions(+), 11 deletions(-)
> 

Thanks for the cleanup! Feel free to add:

Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>

> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> index 023571f8dd1b..9b90013c59a3 100644
> --- a/fs/ext4/ext4.h
> +++ b/fs/ext4/ext4.h
> @@ -213,11 +213,20 @@ enum criteria {
>  #define EXT4_MB_USE_RESERVED		0x2000
>  /* Do strict check for free blocks while retrying block allocation */
>  #define EXT4_MB_STRICT_CHECK		0x4000
> -/* Large fragment size list lookup succeeded at least once for cr = 0 */
> +/*
> + * Large fragment size list lookup succeeded at least once for cr =
> + * CR_POWER2_ALIGNED
> + */
>  #define EXT4_MB_CR_POWER2_ALIGNED_OPTIMIZED		0x8000
> -/* Avg fragment size rb tree lookup succeeded at least once for cr = 1 */
> +/*
> + * Avg fragment size rb tree lookup succeeded at least once for cr =
> + * CR_GOAL_LEN_FAST
> + */
>  #define EXT4_MB_CR_GOAL_LEN_FAST_OPTIMIZED		0x00010000
> -/* Avg fragment size rb tree lookup succeeded at least once for cr = 1.5 */
> +/*
> + * Avg fragment size rb tree lookup succeeded at least once for cr =
> + * CR_BEST_AVAIL_LEN
> + */
>  #define EXT4_MB_CR_BEST_AVAIL_LEN_OPTIMIZED		0x00020000
>  
>  struct ext4_allocation_request {
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index 62d468379722..0f8a34513bf6 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -1131,8 +1131,9 @@ static void ext4_mb_choose_next_group(struct ext4_allocation_context *ac,
>  		ext4_mb_choose_next_group_best_avail(ac, new_cr, group);
>  	} else {
>  		/*
> -		 * TODO: For CR=2, we can arrange groups in an rb tree sorted by
> -		 * bb_free. But until that happens, we should never come here.
> +		 * TODO: For CR=CR_GOAL_LEN_SLOW, we can arrange groups in an
> +		 * rb tree sorted by bb_free. But until that happens, we should
> +		 * never come here.
>  		 */
>  		WARN_ON(1);
>  	}
> @@ -3444,10 +3445,11 @@ static int ext4_mb_init_backend(struct super_block *sb)
>  	}
>  	if (sbi->s_mb_prefetch > ext4_get_groups_count(sb))
>  		sbi->s_mb_prefetch = ext4_get_groups_count(sb);
> -	/* now many real IOs to prefetch within a single allocation at cr=0
> -	 * given cr=0 is an CPU-related optimization we shouldn't try to
> -	 * load too many groups, at some point we should start to use what
> -	 * we've got in memory.
> +	/*
> +	 * now many real IOs to prefetch within a single allocation at
> +	 * cr=CR_POWER2_ALIGNED. Given cr=CR_POWER2_ALIGNED is an CPU-related
> +	 * optimization we shouldn't try to load too many groups, at some point
> +	 * we should start to use what we've got in memory.
>  	 * with an average random access time 5ms, it'd take a second to get
>  	 * 200 groups (* N with flex_bg), so let's make this limit 4
>  	 */
> diff --git a/fs/ext4/mballoc.h b/fs/ext4/mballoc.h
> index 56938532b4ce..042437d8860f 100644
> --- a/fs/ext4/mballoc.h
> +++ b/fs/ext4/mballoc.h
> @@ -187,8 +187,8 @@ struct ext4_allocation_context {
>  	struct ext4_free_extent ac_f_ex;
>  
>  	/*
> -	 * goal len can change in CR1.5, so save the original len. This is
> -	 * used while adjusting the PA window and for accounting.
> +	 * goal len can change in CR_BEST_AVAIL_LEN, so save the original len.
> +	 * This is used while adjusting the PA window and for accounting.
>  	 */
>  	ext4_grpblk_t	ac_orig_goal_len;
>  
> -- 
> 2.30.0
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/5] ext4: add test_mb_mark_used_cost to estimate cost of mb_mark_used
  2024-03-26 21:38 ` [PATCH 2/5] ext4: add test_mb_mark_used_cost to estimate cost of mb_mark_used Kemeng Shi
@ 2024-03-29  7:26   ` kernel test robot
  0 siblings, 0 replies; 16+ messages in thread
From: kernel test robot @ 2024-03-29  7:26 UTC (permalink / raw
  To: Kemeng Shi, tytso, adilger.kernel, linux-ext4, linux-kernel
  Cc: oe-kbuild-all, jack, ojaswin, ritesh.list

Hi Kemeng,

kernel test robot noticed the following build errors:

[auto build test ERROR on tytso-ext4/dev]
[also build test ERROR on linus/master v6.9-rc1 next-20240328]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Kemeng-Shi/ext4-keep-prefetch_grp-and-nr-consistent/20240326-214754
base:   https://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git dev
patch link:    https://lore.kernel.org/r/20240326213823.528302-3-shikemeng%40huaweicloud.com
patch subject: [PATCH 2/5] ext4: add test_mb_mark_used_cost to estimate cost of mb_mark_used
config: x86_64-randconfig-072-20240329 (https://download.01.org/0day-ci/archive/20240329/202403291544.lxme27eF-lkp@intel.com/config)
compiler: gcc-7 (Ubuntu 7.5.0-6ubuntu2) 7.5.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240329/202403291544.lxme27eF-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202403291544.lxme27eF-lkp@intel.com/

All errors (new ones prefixed by >>):

   In file included from include/kunit/static_stub.h:18:0,
                    from fs/ext4/mballoc.c:21:
>> fs/ext4/mballoc-test.c:959:10: error: initializer element is not constant
             slow_attr),
             ^
   include/kunit/test.h:218:13: note: in definition of macro 'KUNIT_CASE_PARAM_ATTR'
        .attr = attributes, .module_name = KBUILD_MODNAME}
                ^~~~~~~~~~
   fs/ext4/mballoc-test.c:959:10: note: (near initialization for 'mbt_test_cases[6].attr')
             slow_attr),
             ^
   include/kunit/test.h:218:13: note: in definition of macro 'KUNIT_CASE_PARAM_ATTR'
        .attr = attributes, .module_name = KBUILD_MODNAME}
                ^~~~~~~~~~


vim +959 fs/ext4/mballoc-test.c

   950	
   951	static struct kunit_case mbt_test_cases[] = {
   952		KUNIT_CASE_PARAM(test_new_blocks_simple, mbt_layouts_gen_params),
   953		KUNIT_CASE_PARAM(test_free_blocks_simple, mbt_layouts_gen_params),
   954		KUNIT_CASE_PARAM(test_mb_generate_buddy, mbt_layouts_gen_params),
   955		KUNIT_CASE_PARAM(test_mb_mark_used, mbt_layouts_gen_params),
   956		KUNIT_CASE_PARAM(test_mb_free_blocks, mbt_layouts_gen_params),
   957		KUNIT_CASE_PARAM(test_mark_diskspace_used, mbt_layouts_gen_params),
   958		KUNIT_CASE_PARAM_ATTR(test_mb_mark_used_cost, mbt_layouts_gen_params,
 > 959				      slow_attr),
   960		{}
   961	};
   962	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/5] ext4: keep "prefetch_grp" and "nr" consistent
  2024-03-26 21:38 ` [PATCH 1/5] ext4: keep "prefetch_grp" and "nr" consistent Kemeng Shi
@ 2024-03-29  7:52   ` Ojaswin Mujoo
  2024-04-04 13:22   ` Jan Kara
  1 sibling, 0 replies; 16+ messages in thread
From: Ojaswin Mujoo @ 2024-03-29  7:52 UTC (permalink / raw
  To: Kemeng Shi
  Cc: tytso, adilger.kernel, linux-ext4, linux-kernel, jack,
	ritesh.list

On Wed, Mar 27, 2024 at 05:38:19AM +0800, Kemeng Shi wrote:
> Keep "prefetch_grp" and "nr" consistent to avoid to call
> ext4_mb_prefetch_fini with non-prefetched groups.
> When we step into next criteria, "prefetch_grp" is set to prefetch start
> of new criteria while "nr" is number of the prefetched group in previous
> criteria. If previous criteria and next criteria are both inexpensive
> (< CR_GOAL_LEN_SLOW) and prefetch_ios reachs sbi->s_mb_prefetch_limit
> in previous criteria, "prefetch_grp" and "nr" will be inconsistent and
> may introduce unexpected cost to do ext4_mb_init_group for non-prefetched
> groups.
> Reset "nr" to 0 when we reset "prefetch_grp" to start of prefech in next
> criteria to ensure "prefetch_grp" and "nr" are consistent.
> 
> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
> ---
>  fs/ext4/mballoc.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index 12b3f196010b..a61fc52956b2 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -2856,6 +2856,7 @@ ext4_mb_regular_allocator(struct ext4_allocation_context *ac)
>  		group = ac->ac_g_ex.fe_group;
>  		ac->ac_groups_linear_remaining = sbi->s_mb_max_linear_groups;
>  		prefetch_grp = group;
> +		nr = 0;

Looks good, feel free to add:

Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>

To add some thoughts, I think the way we depend on group and nr being in
sync for prefetch and prefetch_fini to work is very fragile and I've
seen some issues there in the past as well. I believe a better approach
would be to use some kind of list which is populated with group nos by
prefetch() and then looked up by prefetch_fini() to initialize the buddy
and other data structures.

As the comment on ext4_mb_prefetch_fini() suggests, we can also look
into eliminating this function completely by using a bg worker queue
kick off when prefetch of the bitmap is complete although we'll also
need to take care of the other caller of these functions ie the 
lazy initialization thread that prefetches block bitmaps in BG
(ext4_lazyinit_thread())

Regards,
ojaswin

>  
>  		for (i = 0, new_cr = cr; i < ngroups; i++,
>  		     ext4_mb_choose_next_group(ac, &new_cr, &group, ngroups)) {
> -- 
> 2.30.0
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 5/5] ext4: expand next_linear_group to remove repeat check for linear scan.
  2024-03-29  7:14   ` Ojaswin Mujoo
@ 2024-04-03  6:57     ` Kemeng Shi
  0 siblings, 0 replies; 16+ messages in thread
From: Kemeng Shi @ 2024-04-03  6:57 UTC (permalink / raw
  To: Ojaswin Mujoo
  Cc: tytso, adilger.kernel, linux-ext4, linux-kernel, jack,
	ritesh.list



on 3/29/2024 3:14 PM, Ojaswin Mujoo wrote:
> On Wed, Mar 27, 2024 at 05:38:23AM +0800, Kemeng Shi wrote:
>> Expand next_linear_group to remove repat check for linear scan.
>>
>> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
>> ---
>>  fs/ext4/mballoc.c | 37 ++++++-------------------------------
>>  1 file changed, 6 insertions(+), 31 deletions(-)
>>
>> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
>> index 0f8a34513bf6..561780a274cd 100644
>> --- a/fs/ext4/mballoc.c
>> +++ b/fs/ext4/mballoc.c
>> @@ -1075,31 +1075,6 @@ static inline int should_optimize_scan(struct ext4_allocation_context *ac)
>>   return 1;
>>  }
>>  
>> -/*
>> - * Return next linear group for allocation. If linear traversal should not be
>> - * performed, this function just returns the same group
>> - */
>> -static ext4_group_t
>> -next_linear_group(struct ext4_allocation_context *ac, ext4_group_t group,
>> -     ext4_group_t ngroups)
>> -{
>> - if (!should_optimize_scan(ac))
>> -   goto inc_and_return;
>> -
>> - if (ac->ac_groups_linear_remaining) {
>> -   ac->ac_groups_linear_remaining--;
>> -   goto inc_and_return;
>> - }
>> -
>> - return group;
>> -inc_and_return:
>> - /*
>> -  * Artificially restricted ngroups for non-extent
>> -  * files makes group > ngroups possible on first loop.
>> -  */
>> - return group + 1 >= ngroups ? 0 : group + 1;
>> -}
>> -
>>  /*
>>   * ext4_mb_choose_next_group: choose next group for allocation.
>>   *
>> @@ -1118,12 +1093,12 @@ static void ext4_mb_choose_next_group(struct ext4_allocation_context *ac,
>>  {
>>   *new_cr = ac->ac_criteria;
>>  
>> - if (!should_optimize_scan(ac) || ac->ac_groups_linear_remaining) {
>> -   *group = next_linear_group(ac, *group, ngroups);
>> -   return;
>> - }
>> -
>> - if (*new_cr == CR_POWER2_ALIGNED) {
>> + if (!should_optimize_scan(ac))
>> +   *group = *group + 1 >= ngroups ? 0 : *group + 1;
>> + else if (ac->ac_groups_linear_remaining) {
>> +   ac->ac_groups_linear_remaining--;
>> +   *group = *group + 1 >= ngroups ? 0 : *group + 1;
>> + } else if (*new_cr == CR_POWER2_ALIGNED) {
> 
> 
> Hi Kemeng, thanks for the cleanups
> 
> I feel that open coding this logic and having a single if for linear scan and
> non linear scan cases is making the code a bit more harder to follow and we are
> losing some comments as well.
> 
> Since our main aim is to avoid the double checking, maybe we can keep
> next_linear_group() strictly for getting the next linear group correctly and
> rest of the checks outside. So something like:
> 
> static ext4_group_t
> next_linear_group(ext4_group_t group, ext4_group_t ngroups)
> {
> 
>   /*
>    * Artificially restricted ngroups for non-extent
>    * files makes group > ngroups possible on first loop.
>    */
>   return group + 1 >= ngroups ? 0 : group + 1;
> }
> 
> static void ext4_mb_choose_next_group(...)
> {
>   ...
> 
>   /*
>    * Fallback to linear scan when optimized scanning is disabled
>    */
>   if (!should_optimize_scan(ac)) {
>     *group = next_linear_group(*group, ngroups);
>     return;
>   }
> 
>   /*
>    * Optimized scanning can return non adjacent groups which can cause
>    * seek overhead for rotational disks. So try few linear groups before 
>    * trying optimized scan.
>    */
>   if (ac->ac_groups_linear_remaining) {
>     *group = next_linear_group(*group, ngroups);
>     ac->ac_groups_linear_remaining--;
>     return;
>   }
>   
>   ...
> }
> 
> Let me know your thought. 
This make senses to me. I will do in next version. Thanks for the advise.

Kemeng
> 
> Regards,
> ojaswin
> 
>>     ext4_mb_choose_next_group_p2_aligned(ac, new_cr, group);
>>   } else if (*new_cr == CR_GOAL_LEN_FAST) {
>>     ext4_mb_choose_next_group_goal_fast(ac, new_cr, group);
>> -- 
>> 2.30.0
>>
> 
> 
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/5] ext4: keep "prefetch_grp" and "nr" consistent
  2024-03-26 21:38 ` [PATCH 1/5] ext4: keep "prefetch_grp" and "nr" consistent Kemeng Shi
  2024-03-29  7:52   ` Ojaswin Mujoo
@ 2024-04-04 13:22   ` Jan Kara
  1 sibling, 0 replies; 16+ messages in thread
From: Jan Kara @ 2024-04-04 13:22 UTC (permalink / raw
  To: Kemeng Shi
  Cc: tytso, adilger.kernel, linux-ext4, linux-kernel, jack, ojaswin,
	ritesh.list

On Wed 27-03-24 05:38:19, Kemeng Shi wrote:
> Keep "prefetch_grp" and "nr" consistent to avoid to call
> ext4_mb_prefetch_fini with non-prefetched groups.
> When we step into next criteria, "prefetch_grp" is set to prefetch start
> of new criteria while "nr" is number of the prefetched group in previous
> criteria. If previous criteria and next criteria are both inexpensive
> (< CR_GOAL_LEN_SLOW) and prefetch_ios reachs sbi->s_mb_prefetch_limit
> in previous criteria, "prefetch_grp" and "nr" will be inconsistent and
> may introduce unexpected cost to do ext4_mb_init_group for non-prefetched
> groups.
> Reset "nr" to 0 when we reset "prefetch_grp" to start of prefech in next
> criteria to ensure "prefetch_grp" and "nr" are consistent.
> 
> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>

Looks good. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  fs/ext4/mballoc.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index 12b3f196010b..a61fc52956b2 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -2856,6 +2856,7 @@ ext4_mb_regular_allocator(struct ext4_allocation_context *ac)
>  		group = ac->ac_g_ex.fe_group;
>  		ac->ac_groups_linear_remaining = sbi->s_mb_max_linear_groups;
>  		prefetch_grp = group;
> +		nr = 0;
>  
>  		for (i = 0, new_cr = cr; i < ngroups; i++,
>  		     ext4_mb_choose_next_group(ac, &new_cr, &group, ngroups)) {
> -- 
> 2.30.0
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/5] ext4: call ext4_mb_mark_free_simple in mb_mark_used to clear bits
  2024-03-26 21:38 ` [PATCH 3/5] ext4: call ext4_mb_mark_free_simple in mb_mark_used to clear bits Kemeng Shi
@ 2024-04-04 14:16   ` Jan Kara
  2024-04-07  6:31     ` Kemeng Shi
  0 siblings, 1 reply; 16+ messages in thread
From: Jan Kara @ 2024-04-04 14:16 UTC (permalink / raw
  To: Kemeng Shi
  Cc: tytso, adilger.kernel, linux-ext4, linux-kernel, jack, ojaswin,
	ritesh.list

On Wed 27-03-24 05:38:21, Kemeng Shi wrote:
> Function ext4_mb_mark_free_simple could search order for bit clearing in
> O(1) cost while mb_mark_used will search order in O(distance from chunk
> order to target order) and introduce unnecessary bit flips.

Let me see if I understand you right. I agree that mb_mark_used() is
actually O(log(bitmap_size)^2) because each call to
mb_find_order_for_block() is O(log(bitmap_size)). Do I understand your
concern right?

> Consider we have 4 continuous free bits and going to mark bit 0-2 inuse.
> initial state of buddy bitmap:
> order 2 |           0           |
> order 1 |     1     |     1     |
> order 0 |  1  |  1  |  1  |  1  |
>
> mark whole chunk inuse
> order 2 |           1           |
> order 1 |     1     |     1     |
> order 0 |  1  |  1  |  1  |  1  |
> 
> split chunk to order 1
> order 2 |           1           |
> order 1 |     0     |     0     |
> order 0 |  1  |  1  |  1  |  1  |
> 
> set the first bit in order 1 to mark bit 0-1 inuse
> set the second bit in order 1 for split
> order 2 |           1           |
> order 1 |     1     |     1     |
> order 0 |  1  |  1  |  1  |  1  |
> 
> step 3: split the second bit in order 1 to order 0
> order 2 |           1           |
> order 1 |     1     |     1     |
> order 0 |  1  |  1  |  0  |  0  |
> 
> step 4: set the third bit in order 0 to mark bit 2 inuse.
> order 2 |           1           |
> order 1 |     1     |     1     |
> order 0 |  1  |  1  |  1  |  0  |
> There are two unnecessary splits and three unnecessary bit flips.
> 
> With ext4_mb_mark_free_simple, we will clear the 4th bit in order 0
> with O(1) search and no extra bit flip.

However this looks like a bit ugly way to speed it up, I'm not even sure
this would result in practical speedups and asymptotically, I think the
complexity is still O(log^2). Also the extra bit flips are not really a
concern I'd say as they are in the same cacheline anyway. The unnecessary
overhead (if at all measurable) comes from the O(log^2) behavior. And there
I agree we could do better by not starting the block order search from 1 in
all the cases - we know the found order will be first increasing for some
time and then decreasing again so with some effort we could amortize all
block order searches to O(log) time. But it makes the code more complex and
I'm not conviced this is all worth it. So if you want to go this direction,
then please provide (micro-)benchmarks from real hardware (not just
theoretical cost estimations) showing the benefit. Thanks.

								Honza

> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index a61fc52956b2..62d468379722 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -2040,13 +2040,12 @@ static int mb_mark_used(struct ext4_buddy *e4b, struct ext4_free_extent *ex)
>  	int ord;
>  	int mlen = 0;
>  	int max = 0;
> -	int cur;
>  	int start = ex->fe_start;
>  	int len = ex->fe_len;
>  	unsigned ret = 0;
>  	int len0 = len;
>  	void *buddy;
> -	bool split = false;
> +	int ord_start, ord_end;
>  
>  	BUG_ON(start + len > (e4b->bd_sb->s_blocksize << 3));
>  	BUG_ON(e4b->bd_group != ex->fe_group);
> @@ -2071,16 +2070,12 @@ static int mb_mark_used(struct ext4_buddy *e4b, struct ext4_free_extent *ex)
>  
>  	/* let's maintain buddy itself */
>  	while (len) {
> -		if (!split)
> -			ord = mb_find_order_for_block(e4b, start);
> +		ord = mb_find_order_for_block(e4b, start);
>  
>  		if (((start >> ord) << ord) == start && len >= (1 << ord)) {
>  			/* the whole chunk may be allocated at once! */
>  			mlen = 1 << ord;
> -			if (!split)
> -				buddy = mb_find_buddy(e4b, ord, &max);
> -			else
> -				split = false;
> +			buddy = mb_find_buddy(e4b, ord, &max);
>  			BUG_ON((start >> ord) >= max);
>  			mb_set_bit(start >> ord, buddy);
>  			e4b->bd_info->bb_counters[ord]--;
> @@ -2094,20 +2089,28 @@ static int mb_mark_used(struct ext4_buddy *e4b, struct ext4_free_extent *ex)
>  		if (ret == 0)
>  			ret = len | (ord << 16);
>  
> -		/* we have to split large buddy */
>  		BUG_ON(ord <= 0);
>  		buddy = mb_find_buddy(e4b, ord, &max);
>  		mb_set_bit(start >> ord, buddy);
>  		e4b->bd_info->bb_counters[ord]--;
>  
> -		ord--;
> -		cur = (start >> ord) & ~1U;
> -		buddy = mb_find_buddy(e4b, ord, &max);
> -		mb_clear_bit(cur, buddy);
> -		mb_clear_bit(cur + 1, buddy);
> -		e4b->bd_info->bb_counters[ord]++;
> -		e4b->bd_info->bb_counters[ord]++;
> -		split = true;
> +		ord_start = (start >> ord) << ord;
> +		ord_end = ord_start + (1 << ord);
> +		if (start > ord_start)
> +			ext4_mb_mark_free_simple(e4b->bd_sb, e4b->bd_buddy,
> +						 ord_start, start - ord_start,
> +						 e4b->bd_info);
> +
> +		if (start + len < ord_end) {
> +			ext4_mb_mark_free_simple(e4b->bd_sb, e4b->bd_buddy,
> +						 start + len,
> +						 ord_end - (start + len),
> +						 e4b->bd_info);
> +			break;
> +		}
> +
> +		len = start + len - ord_end;
> +		start = ord_end;
>  	}
>  	mb_set_largest_free_order(e4b->bd_sb, e4b->bd_info);
>  
> -- 
> 2.30.0
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 4/5] ext4: use correct criteria name instead stale integer number in comment
  2024-03-26 21:38 ` [PATCH 4/5] ext4: use correct criteria name instead stale integer number in comment Kemeng Shi
  2024-03-29  7:15   ` Ojaswin Mujoo
@ 2024-04-04 14:19   ` Jan Kara
  2024-04-07  3:21     ` Kemeng Shi
  1 sibling, 1 reply; 16+ messages in thread
From: Jan Kara @ 2024-04-04 14:19 UTC (permalink / raw
  To: Kemeng Shi
  Cc: tytso, adilger.kernel, linux-ext4, linux-kernel, jack, ojaswin,
	ritesh.list

On Wed 27-03-24 05:38:22, Kemeng Shi wrote:
> Use correct criteria name instead stale integer number in comment
> 
> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>

Looks good. But since the symbolic names already have CR prefix, we
probably don't have to write e.g.:

/* Large fragment size list lookup succeeded at least once for cr =
 * CR_POWER2_ALIGNED */

But we can write directly:

/* Large fragment size list lookup succeeded at least once for
 * CR_POWER2_ALIGNED */

								Honza

> ---
>  fs/ext4/ext4.h    | 15 ++++++++++++---
>  fs/ext4/mballoc.c | 14 ++++++++------
>  fs/ext4/mballoc.h |  4 ++--
>  3 files changed, 22 insertions(+), 11 deletions(-)
> 
> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> index 023571f8dd1b..9b90013c59a3 100644
> --- a/fs/ext4/ext4.h
> +++ b/fs/ext4/ext4.h
> @@ -213,11 +213,20 @@ enum criteria {
>  #define EXT4_MB_USE_RESERVED		0x2000
>  /* Do strict check for free blocks while retrying block allocation */
>  #define EXT4_MB_STRICT_CHECK		0x4000
> -/* Large fragment size list lookup succeeded at least once for cr = 0 */
> +/*
> + * Large fragment size list lookup succeeded at least once for cr =
> + * CR_POWER2_ALIGNED
> + */
>  #define EXT4_MB_CR_POWER2_ALIGNED_OPTIMIZED		0x8000
> -/* Avg fragment size rb tree lookup succeeded at least once for cr = 1 */
> +/*
> + * Avg fragment size rb tree lookup succeeded at least once for cr =
> + * CR_GOAL_LEN_FAST
> + */
>  #define EXT4_MB_CR_GOAL_LEN_FAST_OPTIMIZED		0x00010000
> -/* Avg fragment size rb tree lookup succeeded at least once for cr = 1.5 */
> +/*
> + * Avg fragment size rb tree lookup succeeded at least once for cr =
> + * CR_BEST_AVAIL_LEN
> + */
>  #define EXT4_MB_CR_BEST_AVAIL_LEN_OPTIMIZED		0x00020000
>  
>  struct ext4_allocation_request {
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index 62d468379722..0f8a34513bf6 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -1131,8 +1131,9 @@ static void ext4_mb_choose_next_group(struct ext4_allocation_context *ac,
>  		ext4_mb_choose_next_group_best_avail(ac, new_cr, group);
>  	} else {
>  		/*
> -		 * TODO: For CR=2, we can arrange groups in an rb tree sorted by
> -		 * bb_free. But until that happens, we should never come here.
> +		 * TODO: For CR=CR_GOAL_LEN_SLOW, we can arrange groups in an
> +		 * rb tree sorted by bb_free. But until that happens, we should
> +		 * never come here.
>  		 */
>  		WARN_ON(1);
>  	}
> @@ -3444,10 +3445,11 @@ static int ext4_mb_init_backend(struct super_block *sb)
>  	}
>  	if (sbi->s_mb_prefetch > ext4_get_groups_count(sb))
>  		sbi->s_mb_prefetch = ext4_get_groups_count(sb);
> -	/* now many real IOs to prefetch within a single allocation at cr=0
> -	 * given cr=0 is an CPU-related optimization we shouldn't try to
> -	 * load too many groups, at some point we should start to use what
> -	 * we've got in memory.
> +	/*
> +	 * now many real IOs to prefetch within a single allocation at
> +	 * cr=CR_POWER2_ALIGNED. Given cr=CR_POWER2_ALIGNED is an CPU-related
> +	 * optimization we shouldn't try to load too many groups, at some point
> +	 * we should start to use what we've got in memory.
>  	 * with an average random access time 5ms, it'd take a second to get
>  	 * 200 groups (* N with flex_bg), so let's make this limit 4
>  	 */
> diff --git a/fs/ext4/mballoc.h b/fs/ext4/mballoc.h
> index 56938532b4ce..042437d8860f 100644
> --- a/fs/ext4/mballoc.h
> +++ b/fs/ext4/mballoc.h
> @@ -187,8 +187,8 @@ struct ext4_allocation_context {
>  	struct ext4_free_extent ac_f_ex;
>  
>  	/*
> -	 * goal len can change in CR1.5, so save the original len. This is
> -	 * used while adjusting the PA window and for accounting.
> +	 * goal len can change in CR_BEST_AVAIL_LEN, so save the original len.
> +	 * This is used while adjusting the PA window and for accounting.
>  	 */
>  	ext4_grpblk_t	ac_orig_goal_len;
>  
> -- 
> 2.30.0
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 4/5] ext4: use correct criteria name instead stale integer number in comment
  2024-04-04 14:19   ` Jan Kara
@ 2024-04-07  3:21     ` Kemeng Shi
  0 siblings, 0 replies; 16+ messages in thread
From: Kemeng Shi @ 2024-04-07  3:21 UTC (permalink / raw
  To: Jan Kara
  Cc: tytso, adilger.kernel, linux-ext4, linux-kernel, ojaswin,
	ritesh.list



on 4/4/2024 10:19 PM, Jan Kara wrote:
> On Wed 27-03-24 05:38:22, Kemeng Shi wrote:
>> Use correct criteria name instead stale integer number in comment
>>
>> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
> 
> Looks good. But since the symbolic names already have CR prefix, we
> probably don't have to write e.g.:
> 
> /* Large fragment size list lookup succeeded at least once for cr =
>  * CR_POWER2_ALIGNED */
> 
> But we can write directly:
> 
> /* Large fragment size list lookup succeeded at least once for
>  * CR_POWER2_ALIGNED */
Sure, will do it in next version. Thanks.

Kemeng
> 
> 								Honza
> 
>> ---
>>  fs/ext4/ext4.h    | 15 ++++++++++++---
>>  fs/ext4/mballoc.c | 14 ++++++++------
>>  fs/ext4/mballoc.h |  4 ++--
>>  3 files changed, 22 insertions(+), 11 deletions(-)
>>
>> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
>> index 023571f8dd1b..9b90013c59a3 100644
>> --- a/fs/ext4/ext4.h
>> +++ b/fs/ext4/ext4.h
>> @@ -213,11 +213,20 @@ enum criteria {
>>  #define EXT4_MB_USE_RESERVED		0x2000
>>  /* Do strict check for free blocks while retrying block allocation */
>>  #define EXT4_MB_STRICT_CHECK		0x4000
>> -/* Large fragment size list lookup succeeded at least once for cr = 0 */
>> +/*
>> + * Large fragment size list lookup succeeded at least once for cr =
>> + * CR_POWER2_ALIGNED
>> + */
>>  #define EXT4_MB_CR_POWER2_ALIGNED_OPTIMIZED		0x8000
>> -/* Avg fragment size rb tree lookup succeeded at least once for cr = 1 */
>> +/*
>> + * Avg fragment size rb tree lookup succeeded at least once for cr =
>> + * CR_GOAL_LEN_FAST
>> + */
>>  #define EXT4_MB_CR_GOAL_LEN_FAST_OPTIMIZED		0x00010000
>> -/* Avg fragment size rb tree lookup succeeded at least once for cr = 1.5 */
>> +/*
>> + * Avg fragment size rb tree lookup succeeded at least once for cr =
>> + * CR_BEST_AVAIL_LEN
>> + */
>>  #define EXT4_MB_CR_BEST_AVAIL_LEN_OPTIMIZED		0x00020000
>>  
>>  struct ext4_allocation_request {
>> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
>> index 62d468379722..0f8a34513bf6 100644
>> --- a/fs/ext4/mballoc.c
>> +++ b/fs/ext4/mballoc.c
>> @@ -1131,8 +1131,9 @@ static void ext4_mb_choose_next_group(struct ext4_allocation_context *ac,
>>  		ext4_mb_choose_next_group_best_avail(ac, new_cr, group);
>>  	} else {
>>  		/*
>> -		 * TODO: For CR=2, we can arrange groups in an rb tree sorted by
>> -		 * bb_free. But until that happens, we should never come here.
>> +		 * TODO: For CR=CR_GOAL_LEN_SLOW, we can arrange groups in an
>> +		 * rb tree sorted by bb_free. But until that happens, we should
>> +		 * never come here.
>>  		 */
>>  		WARN_ON(1);
>>  	}
>> @@ -3444,10 +3445,11 @@ static int ext4_mb_init_backend(struct super_block *sb)
>>  	}
>>  	if (sbi->s_mb_prefetch > ext4_get_groups_count(sb))
>>  		sbi->s_mb_prefetch = ext4_get_groups_count(sb);
>> -	/* now many real IOs to prefetch within a single allocation at cr=0
>> -	 * given cr=0 is an CPU-related optimization we shouldn't try to
>> -	 * load too many groups, at some point we should start to use what
>> -	 * we've got in memory.
>> +	/*
>> +	 * now many real IOs to prefetch within a single allocation at
>> +	 * cr=CR_POWER2_ALIGNED. Given cr=CR_POWER2_ALIGNED is an CPU-related
>> +	 * optimization we shouldn't try to load too many groups, at some point
>> +	 * we should start to use what we've got in memory.
>>  	 * with an average random access time 5ms, it'd take a second to get
>>  	 * 200 groups (* N with flex_bg), so let's make this limit 4
>>  	 */
>> diff --git a/fs/ext4/mballoc.h b/fs/ext4/mballoc.h
>> index 56938532b4ce..042437d8860f 100644
>> --- a/fs/ext4/mballoc.h
>> +++ b/fs/ext4/mballoc.h
>> @@ -187,8 +187,8 @@ struct ext4_allocation_context {
>>  	struct ext4_free_extent ac_f_ex;
>>  
>>  	/*
>> -	 * goal len can change in CR1.5, so save the original len. This is
>> -	 * used while adjusting the PA window and for accounting.
>> +	 * goal len can change in CR_BEST_AVAIL_LEN, so save the original len.
>> +	 * This is used while adjusting the PA window and for accounting.
>>  	 */
>>  	ext4_grpblk_t	ac_orig_goal_len;
>>  
>> -- 
>> 2.30.0
>>


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/5] ext4: call ext4_mb_mark_free_simple in mb_mark_used to clear bits
  2024-04-04 14:16   ` Jan Kara
@ 2024-04-07  6:31     ` Kemeng Shi
  0 siblings, 0 replies; 16+ messages in thread
From: Kemeng Shi @ 2024-04-07  6:31 UTC (permalink / raw
  To: Jan Kara
  Cc: tytso, adilger.kernel, linux-ext4, linux-kernel, ojaswin,
	ritesh.list



on 4/4/2024 10:16 PM, Jan Kara wrote:
> On Wed 27-03-24 05:38:21, Kemeng Shi wrote:
>> Function ext4_mb_mark_free_simple could search order for bit clearing in
>> O(1) cost while mb_mark_used will search order in O(distance from chunk
>> order to target order) and introduce unnecessary bit flips.
> 
> Let me see if I understand you right. I agree that mb_mark_used() is
> actually O(log(bitmap_size)^2) because each call to
> mb_find_order_for_block() is O(log(bitmap_size)). Do I understand your
> concern right?
Sorry for the confusion. Actually it's times to do bit clear after
mb_find_order_for_block to mark partial part of block chunk free.

In mb_mark_used, we will find free chunk and mark it in use. For chunk
in mid of passed range, we could simply mark whole chunk in use. For chunk
in end of range, we need to mark partial part of chunk inuse. To only mark
partial part of chunk inuse, we firstly mark whole chunk inuse and then
do serveral times of bit clear work by "mb_find_buddy(...);
mb_clear_bit(...); ..." mark partial part of chunk free. The times to call
"mb_find_buddy(); ..." is [order of free chunk] - [last order to free partial
part of chunk], which is what I mean "distance from chunk order to target order"
in changelog.

The repeat "mb_find_buddy(...); ..." aims to mark continuous range blocks
free which is excat the work ext4_mb_mark_free_simple has done and
ext4_mb_mark_free_simple does in a more effective way than code to free
blocks in mb_mark_used. So we can simply find the range need to set free in
chunk at end of range and call ext4_mb_mark_free_simple to use the effective
and exsiting code to free a continuous range of blocks.

> 
>> Consider we have 4 continuous free bits and going to mark bit 0-2 inuse.
>> initial state of buddy bitmap:
>> order 2 |           0           |
>> order 1 |     1     |     1     |
>> order 0 |  1  |  1  |  1  |  1  |
>>
>> mark whole chunk inuse
>> order 2 |           1           |
>> order 1 |     1     |     1     |
>> order 0 |  1  |  1  |  1  |  1  |
>>
>> split chunk to order 1
>> order 2 |           1           |
>> order 1 |     0     |     0     |
>> order 0 |  1  |  1  |  1  |  1  |
>>
>> set the first bit in order 1 to mark bit 0-1 inuse
>> set the second bit in order 1 for split
>> order 2 |           1           |
>> order 1 |     1     |     1     |
>> order 0 |  1  |  1  |  1  |  1  |
>>
>> step 3: split the second bit in order 1 to order 0
>> order 2 |           1           |
>> order 1 |     1     |     1     |
>> order 0 |  1  |  1  |  0  |  0  |
>>
>> step 4: set the third bit in order 0 to mark bit 2 inuse.
>> order 2 |           1           |
>> order 1 |     1     |     1     |
>> order 0 |  1  |  1  |  1  |  0  |
>> There are two unnecessary splits and three unnecessary bit flips.
>>
>> With ext4_mb_mark_free_simple, we will clear the 4th bit in order 0
>> with O(1) search and no extra bit flip.
> 
> However this looks like a bit ugly way to speed it up, I'm not even sure
> this would result in practical speedups and asymptotically, I think the
> complexity is still O(log^2). Also the extra bit flips are not really a
> concern I'd say as they are in the same cacheline anyway. The unnecessary
> overhead (if at all measurable) comes from the O(log^2) behavior. And there
> I agree we could do better by not starting the block order search from 1 in
> all the cases - we know the found order will be first increasing for some
> time and then decreasing again so with some effort we could amortize all
> block order searches to O(log) time. But it makes the code more complex and
> I'm not conviced this is all worth it. So if you want to go this direction,
> then please provide (micro-)benchmarks from real hardware (not just
> theoretical cost estimations) showing the benefit. Thanks.
> 
> 								Honza
> 
>> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
>> index a61fc52956b2..62d468379722 100644
>> --- a/fs/ext4/mballoc.c
>> +++ b/fs/ext4/mballoc.c
>> @@ -2040,13 +2040,12 @@ static int mb_mark_used(struct ext4_buddy *e4b, struct ext4_free_extent *ex)
>>  	int ord;
>>  	int mlen = 0;
>>  	int max = 0;
>> -	int cur;
>>  	int start = ex->fe_start;
>>  	int len = ex->fe_len;
>>  	unsigned ret = 0;
>>  	int len0 = len;
>>  	void *buddy;
>> -	bool split = false;
>> +	int ord_start, ord_end;
>>  
>>  	BUG_ON(start + len > (e4b->bd_sb->s_blocksize << 3));
>>  	BUG_ON(e4b->bd_group != ex->fe_group);
>> @@ -2071,16 +2070,12 @@ static int mb_mark_used(struct ext4_buddy *e4b, struct ext4_free_extent *ex)
>>  
>>  	/* let's maintain buddy itself */
>>  	while (len) {
>> -		if (!split)
>> -			ord = mb_find_order_for_block(e4b, start);
>> +		ord = mb_find_order_for_block(e4b, start);
>>  
>>  		if (((start >> ord) << ord) == start && len >= (1 << ord)) {
>>  			/* the whole chunk may be allocated at once! */
>>  			mlen = 1 << ord;
>> -			if (!split)
>> -				buddy = mb_find_buddy(e4b, ord, &max);
>> -			else
>> -				split = false;
>> +			buddy = mb_find_buddy(e4b, ord, &max);
>>  			BUG_ON((start >> ord) >= max);
>>  			mb_set_bit(start >> ord, buddy);
>>  			e4b->bd_info->bb_counters[ord]--;
>> @@ -2094,20 +2089,28 @@ static int mb_mark_used(struct ext4_buddy *e4b, struct ext4_free_extent *ex)
>>  		if (ret == 0)
>>  			ret = len | (ord << 16);
>>  
>> -		/* we have to split large buddy */
>>  		BUG_ON(ord <= 0);
>>  		buddy = mb_find_buddy(e4b, ord, &max);
>>  		mb_set_bit(start >> ord, buddy);
>>  		e4b->bd_info->bb_counters[ord]--;
>>  
>> -		ord--;
>> -		cur = (start >> ord) & ~1U;
>> -		buddy = mb_find_buddy(e4b, ord, &max);
>> -		mb_clear_bit(cur, buddy);
>> -		mb_clear_bit(cur + 1, buddy);
>> -		e4b->bd_info->bb_counters[ord]++;
>> -		e4b->bd_info->bb_counters[ord]++;
>> -		split = true;
>> +		ord_start = (start >> ord) << ord;
>> +		ord_end = ord_start + (1 << ord);
>> +		if (start > ord_start)
>> +			ext4_mb_mark_free_simple(e4b->bd_sb, e4b->bd_buddy,
>> +						 ord_start, start - ord_start,
>> +						 e4b->bd_info);
>> +
>> +		if (start + len < ord_end) {
>> +			ext4_mb_mark_free_simple(e4b->bd_sb, e4b->bd_buddy,
>> +						 start + len,
>> +						 ord_end - (start + len),
>> +						 e4b->bd_info);
>> +			break;
>> +		}
>> +
>> +		len = start + len - ord_end;
>> +		start = ord_end;
>>  	}
>>  	mb_set_largest_free_order(e4b->bd_sb, e4b->bd_info);
>>  
>> -- 
>> 2.30.0
>>


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2024-04-07  6:31 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-03-26 21:38 [PATCH 0/5] Minor improvements and cleanups to ext4 mballoc Kemeng Shi
2024-03-26 21:38 ` [PATCH 1/5] ext4: keep "prefetch_grp" and "nr" consistent Kemeng Shi
2024-03-29  7:52   ` Ojaswin Mujoo
2024-04-04 13:22   ` Jan Kara
2024-03-26 21:38 ` [PATCH 2/5] ext4: add test_mb_mark_used_cost to estimate cost of mb_mark_used Kemeng Shi
2024-03-29  7:26   ` kernel test robot
2024-03-26 21:38 ` [PATCH 3/5] ext4: call ext4_mb_mark_free_simple in mb_mark_used to clear bits Kemeng Shi
2024-04-04 14:16   ` Jan Kara
2024-04-07  6:31     ` Kemeng Shi
2024-03-26 21:38 ` [PATCH 4/5] ext4: use correct criteria name instead stale integer number in comment Kemeng Shi
2024-03-29  7:15   ` Ojaswin Mujoo
2024-04-04 14:19   ` Jan Kara
2024-04-07  3:21     ` Kemeng Shi
2024-03-26 21:38 ` [PATCH 5/5] ext4: expand next_linear_group to remove repeat check for linear scan Kemeng Shi
2024-03-29  7:14   ` Ojaswin Mujoo
2024-04-03  6:57     ` Kemeng Shi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.