Re: [PATCH 0/9] Fixes and cleanups to compaction

Linux-mm Archive mirror
 help / color / mirror / Atom feed

* Re: [PATCH 0/9] Fixes and cleanups to compaction
  2023-08-05 11:07 [PATCH 0/9] Fixes and cleanups to compaction Kemeng Shi
@ 2023-08-05  3:14 ` Matthew Wilcox
  2023-08-05  4:07   ` Kemeng Shi
  2023-08-05 11:07 ` [PATCH 2/9] mm/compaction: call list_is_{first}/{last} more intuitively in move_freelist_{head}/{tail} Kemeng Shi
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 33+ messages in thread
From: Matthew Wilcox @ 2023-08-05  3:14 UTC (permalink / raw)
  To: Kemeng Shi; +Cc: linux-mm, linux-kernel, akpm, baolin.wang, mgorman, david

On Sat, Aug 05, 2023 at 07:07:02PM +0800, Kemeng Shi wrote:
> Hi all, this is another series to do fix and clean up to compaction.
> Patch 1-2 fix and clean up freepage list operation.
> Patch 3-4 fix and clean up isolation of freepages
> Patch 7-9 factor code to check if compaction is needed for allocation
> order.
> More details can be found in respective patches. Thanks!

As with your last patch series, half of the patches are missing.
Looks like they didn't make it to lore.kernel.org either:

https://lore.kernel.org/linux-mm/20230804110454.2935878-1-shikemeng@huaweicloud.com/
https://lore.kernel.org/linux-mm/20230805110711.2975149-1-shikemeng@huaweicloud.com/


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 0/9] Fixes and cleanups to compaction
  2023-08-05  3:14 ` Matthew Wilcox
@ 2023-08-05  4:07   ` Kemeng Shi
  0 siblings, 0 replies; 33+ messages in thread
From: Kemeng Shi @ 2023-08-05  4:07 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: linux-mm, linux-kernel, akpm, baolin.wang, mgorman, david

Hi Matthew, thanks to inform me of this. You can find full series on lore.kernel.org
all at [1] and [2].
I contacted owner-linux-mm@kvack.org and was told all patches are recived and forawrd
successfully. Then I contacted meta@public-inbox.org which is the only email I found
from help page of lore.kernel.org/linux-mm/, but no response yet.
Please let me know if there is any other ways I can get help. Thanks!

[1] https://lore.kernel.org/all/20230804110454.2935878-1-shikemeng@huaweicloud.com/
[2] https://lore.kernel.org/all/20230805110711.2975149-1-shikemeng@huaweicloud.com/

on 8/5/2023 11:14 AM, Matthew Wilcox wrote:
> On Sat, Aug 05, 2023 at 07:07:02PM +0800, Kemeng Shi wrote:
>> Hi all, this is another series to do fix and clean up to compaction.
>> Patch 1-2 fix and clean up freepage list operation.
>> Patch 3-4 fix and clean up isolation of freepages
>> Patch 7-9 factor code to check if compaction is needed for allocation
>> order.
>> More details can be found in respective patches. Thanks!
> 
> As with your last patch series, half of the patches are missing.
> Looks like they didn't make it to lore.kernel.org either:
> 
> https://lore.kernel.org/linux-mm/20230804110454.2935878-1-shikemeng@huaweicloud.com/
> https://lore.kernel.org/linux-mm/20230805110711.2975149-1-shikemeng@huaweicloud.com/
> 
> 



^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 0/9] Fixes and cleanups to compaction
@ 2023-08-05 11:07 Kemeng Shi
  2023-08-05  3:14 ` Matthew Wilcox
                   ` (8 more replies)
  0 siblings, 9 replies; 33+ messages in thread
From: Kemeng Shi @ 2023-08-05 11:07 UTC (permalink / raw)
  To: linux-mm, linux-kernel, akpm, baolin.wang, mgorman, david; +Cc: shikemeng

Hi all, this is another series to do fix and clean up to compaction.
Patch 1-2 fix and clean up freepage list operation.
Patch 3-4 fix and clean up isolation of freepages
Patch 7-9 factor code to check if compaction is needed for allocation
order.
More details can be found in respective patches. Thanks!

Kemeng Shi (9):
  mm/compaction: use correct list in move_freelist_{head}/{tail}
  mm/compaction: call list_is_{first}/{last} more intuitively in
    move_freelist_{head}/{tail}
  mm/compaction: correctly return failure with bogus compound_order in
    strict mode
  mm/compaction: simplify pfn iteration in isolate_freepages_range
  mm/compaction: remove repeat compact_blockskip_flush check in
    reset_isolation_suitable
  mm/compaction: rename is_via_compact_memory to
    compaction_with_allocation_order
  mm/compaction: factor out code to test if we should run compaction for
    target order
  mm/compaction: call compaction_suit_allocation_order in
    kcompactd_node_suitable
  mm/compaction: call compaction_suit_allocation_order in
    kcompactd_do_work

 mm/compaction.c | 119 +++++++++++++++++++++++-------------------------
 1 file changed, 57 insertions(+), 62 deletions(-)

-- 
2.30.0



^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 2/9] mm/compaction: call list_is_{first}/{last} more intuitively in move_freelist_{head}/{tail}
  2023-08-05 11:07 [PATCH 0/9] Fixes and cleanups to compaction Kemeng Shi
  2023-08-05  3:14 ` Matthew Wilcox
@ 2023-08-05 11:07 ` Kemeng Shi
  2023-08-15  7:49   ` Baolin Wang
  2023-08-05 11:07 ` [PATCH 5/9] mm/compaction: remove repeat compact_blockskip_flush check in reset_isolation_suitable Kemeng Shi
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 33+ messages in thread
From: Kemeng Shi @ 2023-08-05 11:07 UTC (permalink / raw)
  To: linux-mm, linux-kernel, akpm, baolin.wang, mgorman, david; +Cc: shikemeng

We use move_freelist_head after list_for_each_entry_reverse to skip
recent pages. And there is no need to do actual move if all freepages
are searched in list_for_each_entry_reverse, e.g. freepage point to
first page in freelist. It's more intuitively to call list_is_first
with list entry as the first argument and list head as the second
argument to check if list entry is the first list entry instead of
call list_is_last with list entry and list head passed in reverse.

Similarly, call list_is_last in move_freelist_tail is more intuitively.

Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
---
 mm/compaction.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index 513b1caeb4fa..fa1b100b0d10 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1395,7 +1395,7 @@ move_freelist_head(struct list_head *freelist, struct page *freepage)
 {
 	LIST_HEAD(sublist);
 
-	if (!list_is_last(freelist, &freepage->buddy_list)) {
+	if (!list_is_first(&freepage->buddy_list, freelist)) {
 		list_cut_before(&sublist, freelist, &freepage->buddy_list);
 		list_splice_tail(&sublist, freelist);
 	}
@@ -1412,7 +1412,7 @@ move_freelist_tail(struct list_head *freelist, struct page *freepage)
 {
 	LIST_HEAD(sublist);
 
-	if (!list_is_first(freelist, &freepage->buddy_list)) {
+	if (!list_is_last(&freepage->buddy_list, freelist)) {
 		list_cut_position(&sublist, freelist, &freepage->buddy_list);
 		list_splice_tail(&sublist, freelist);
 	}
-- 
2.30.0



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 5/9] mm/compaction: remove repeat compact_blockskip_flush check in reset_isolation_suitable
  2023-08-05 11:07 [PATCH 0/9] Fixes and cleanups to compaction Kemeng Shi
  2023-08-05  3:14 ` Matthew Wilcox
  2023-08-05 11:07 ` [PATCH 2/9] mm/compaction: call list_is_{first}/{last} more intuitively in move_freelist_{head}/{tail} Kemeng Shi
@ 2023-08-05 11:07 ` Kemeng Shi
  2023-08-15  8:42   ` Baolin Wang
  2023-08-05 11:07 ` [PATCH 6/9] mm/compaction: rename is_via_compact_memory to compaction_with_allocation_order Kemeng Shi
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 33+ messages in thread
From: Kemeng Shi @ 2023-08-05 11:07 UTC (permalink / raw)
  To: linux-mm, linux-kernel, akpm, baolin.wang, mgorman, david; +Cc: shikemeng

We have compact_blockskip_flush check in __reset_isolation_suitable, just
remove repeat check before __reset_isolation_suitable in
compact_blockskip_flush.

Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
---
 mm/compaction.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index 8d7d38073d30..d8416d3dd445 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -434,9 +434,7 @@ void reset_isolation_suitable(pg_data_t *pgdat)
 		if (!populated_zone(zone))
 			continue;
 
-		/* Only flush if a full compaction finished recently */
-		if (zone->compact_blockskip_flush)
-			__reset_isolation_suitable(zone);
+		__reset_isolation_suitable(zone);
 	}
 }
 
-- 
2.30.0



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 6/9] mm/compaction: rename is_via_compact_memory to compaction_with_allocation_order
  2023-08-05 11:07 [PATCH 0/9] Fixes and cleanups to compaction Kemeng Shi
                   ` (2 preceding siblings ...)
  2023-08-05 11:07 ` [PATCH 5/9] mm/compaction: remove repeat compact_blockskip_flush check in reset_isolation_suitable Kemeng Shi
@ 2023-08-05 11:07 ` Kemeng Shi
  2023-08-15  8:58   ` Baolin Wang
  2023-08-05 11:07 ` [PATCH 7/9] mm/compaction: factor out code to test if we should run compaction for target order Kemeng Shi
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 33+ messages in thread
From: Kemeng Shi @ 2023-08-05 11:07 UTC (permalink / raw)
  To: linux-mm, linux-kernel, akpm, baolin.wang, mgorman, david; +Cc: shikemeng

We have order = -1 via proactive compaction, the is_via_compact_memory is
not proper name anymore.
As cc->order informs the compaction to satisfy a allocation with that
order, so rename it to compaction_with_allocation_order.

Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
---
 mm/compaction.c | 11 +++++------
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index d8416d3dd445..b5a699ed526b 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -2055,12 +2055,11 @@ static isolate_migrate_t isolate_migratepages(struct compact_control *cc)
 }
 
 /*
- * order == -1 is expected when compacting via
- * /proc/sys/vm/compact_memory
+ * compact to satisfy allocation with target order
  */
-static inline bool is_via_compact_memory(int order)
+static inline bool compaction_with_allocation_order(int order)
 {
-	return order == -1;
+	return order != -1;
 }
 
 /*
@@ -2200,7 +2199,7 @@ static enum compact_result __compact_finished(struct compact_control *cc)
 		goto out;
 	}
 
-	if (is_via_compact_memory(cc->order))
+	if (!compaction_with_allocation_order(cc->order))
 		return COMPACT_CONTINUE;
 
 	/*
@@ -2390,7 +2389,7 @@ compact_zone(struct compact_control *cc, struct capture_control *capc)
 
 	cc->migratetype = gfp_migratetype(cc->gfp_mask);
 
-	if (!is_via_compact_memory(cc->order)) {
+	if (compaction_with_allocation_order(cc->order)) {
 		unsigned long watermark;
 
 		/* Allocation can already succeed, nothing to do */
-- 
2.30.0



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 7/9] mm/compaction: factor out code to test if we should run compaction for target order
  2023-08-05 11:07 [PATCH 0/9] Fixes and cleanups to compaction Kemeng Shi
                   ` (3 preceding siblings ...)
  2023-08-05 11:07 ` [PATCH 6/9] mm/compaction: rename is_via_compact_memory to compaction_with_allocation_order Kemeng Shi
@ 2023-08-05 11:07 ` Kemeng Shi
  2023-08-15  8:53   ` Baolin Wang
  2023-08-05 11:07 ` [PATCH 9/9] mm/compaction: call compaction_suit_allocation_order in kcompactd_do_work Kemeng Shi
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 33+ messages in thread
From: Kemeng Shi @ 2023-08-05 11:07 UTC (permalink / raw)
  To: linux-mm, linux-kernel, akpm, baolin.wang, mgorman, david; +Cc: shikemeng

We always do zone_watermark_ok check and compaction_suitable check
together to test if compaction for target order should be runned.
Factor these code out for preparation to remove repeat code.

Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
---
 mm/compaction.c | 42 +++++++++++++++++++++++++++++-------------
 1 file changed, 29 insertions(+), 13 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index b5a699ed526b..26787ebb0297 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -2365,6 +2365,30 @@ bool compaction_zonelist_suitable(struct alloc_context *ac, int order,
 	return false;
 }
 
+/*
+ * Should we do compaction for target allocation order.
+ * Return COMPACT_SUCCESS if allocation for target order can be already
+ * satisfied
+ * Return COMPACT_SKIPPED if compaction for target order is likely to fail
+ * Return COMPACT_CONTINUE if compaction for target order should be runned
+ */
+static inline enum compact_result
+compaction_suit_allocation_order(struct zone *zone, unsigned int order,
+				 int highest_zoneidx, unsigned int alloc_flags)
+{
+	unsigned long watermark;
+
+	watermark = wmark_pages(zone, alloc_flags & ALLOC_WMARK_MASK);
+	if (zone_watermark_ok(zone, order, watermark, highest_zoneidx,
+			      alloc_flags))
+		return COMPACT_SUCCESS;
+
+	if (!compaction_suitable(zone, order, highest_zoneidx))
+		return COMPACT_SKIPPED;
+
+	return COMPACT_CONTINUE;
+}
+
 static enum compact_result
 compact_zone(struct compact_control *cc, struct capture_control *capc)
 {
@@ -2390,19 +2414,11 @@ compact_zone(struct compact_control *cc, struct capture_control *capc)
 	cc->migratetype = gfp_migratetype(cc->gfp_mask);
 
 	if (compaction_with_allocation_order(cc->order)) {
-		unsigned long watermark;
-
-		/* Allocation can already succeed, nothing to do */
-		watermark = wmark_pages(cc->zone,
-					cc->alloc_flags & ALLOC_WMARK_MASK);
-		if (zone_watermark_ok(cc->zone, cc->order, watermark,
-				      cc->highest_zoneidx, cc->alloc_flags))
-			return COMPACT_SUCCESS;
-
-		/* Compaction is likely to fail */
-		if (!compaction_suitable(cc->zone, cc->order,
-					 cc->highest_zoneidx))
-			return COMPACT_SKIPPED;
+		ret = compaction_suit_allocation_order(cc->zone, cc->order,
+						       cc->highest_zoneidx,
+						       cc->alloc_flags);
+		if (ret != COMPACT_CONTINUE)
+			return ret;
 	}
 
 	/*
-- 
2.30.0



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 9/9] mm/compaction: call compaction_suit_allocation_order in kcompactd_do_work
  2023-08-05 11:07 [PATCH 0/9] Fixes and cleanups to compaction Kemeng Shi
                   ` (4 preceding siblings ...)
  2023-08-05 11:07 ` [PATCH 7/9] mm/compaction: factor out code to test if we should run compaction for target order Kemeng Shi
@ 2023-08-05 11:07 ` Kemeng Shi
       [not found] ` <20230805110711.2975149-2-shikemeng@huaweicloud.com>
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 33+ messages in thread
From: Kemeng Shi @ 2023-08-05 11:07 UTC (permalink / raw)
  To: linux-mm, linux-kernel, akpm, baolin.wang, mgorman, david; +Cc: shikemeng

Use compaction_suit_allocation_order helper in kcompactd_do_work to remove
repeat code.

Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
---
 mm/compaction.c | 8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index f4b6c520038a..05c27a1ef68a 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -2963,12 +2963,8 @@ static void kcompactd_do_work(pg_data_t *pgdat)
 		if (compaction_deferred(zone, cc.order))
 			continue;
 
-		/* Allocation can already succeed, nothing to do */
-		if (zone_watermark_ok(zone, cc.order,
-				      min_wmark_pages(zone), zoneid, 0))
-			continue;
-
-		if (!compaction_suitable(zone, cc.order, zoneid))
+		if (compaction_suit_allocation_order(zone,
+				cc.order, zoneid, 0) != COMPACT_CONTINUE)
 			continue;
 
 		if (kthread_should_stop())
-- 
2.30.0



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/9] mm/compaction: use correct list in move_freelist_{head}/{tail}
       [not found] ` <20230805110711.2975149-2-shikemeng@huaweicloud.com>
@ 2023-08-05 17:11   ` Andrew Morton
  2023-08-07  0:37     ` Kemeng Shi
  2023-08-15  7:16   ` Baolin Wang
  1 sibling, 1 reply; 33+ messages in thread
From: Andrew Morton @ 2023-08-05 17:11 UTC (permalink / raw)
  To: Kemeng Shi; +Cc: linux-mm, linux-kernel, baolin.wang, mgorman, david

On Sat,  5 Aug 2023 19:07:03 +0800 Kemeng Shi <shikemeng@huaweicloud.com> wrote:

> The freepage is chained with buddy_list in freelist head. Use buddy_list
> instead of lru to correct the list operation.
> 
> ...
>
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -1395,8 +1395,8 @@ move_freelist_head(struct list_head *freelist, struct page *freepage)
>  {
>  	LIST_HEAD(sublist);
>  
> -	if (!list_is_last(freelist, &freepage->lru)) {
> -		list_cut_before(&sublist, freelist, &freepage->lru);
> +	if (!list_is_last(freelist, &freepage->buddy_list)) {
> +		list_cut_before(&sublist, freelist, &freepage->buddy_list);
>  		list_splice_tail(&sublist, freelist);
>  	}
>  }
> @@ -1412,8 +1412,8 @@ move_freelist_tail(struct list_head *freelist, struct page *freepage)
>  {
>  	LIST_HEAD(sublist);
>  
> -	if (!list_is_first(freelist, &freepage->lru)) {
> -		list_cut_position(&sublist, freelist, &freepage->lru);
> +	if (!list_is_first(freelist, &freepage->buddy_list)) {
> +		list_cut_position(&sublist, freelist, &freepage->buddy_list);
>  		list_splice_tail(&sublist, freelist);
>  	}
>  }

This looks like a significant error.  Can we speculate about the
possible runtime effects?



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/9] mm/compaction: use correct list in move_freelist_{head}/{tail}
  2023-08-05 17:11   ` [PATCH 1/9] mm/compaction: use correct list in move_freelist_{head}/{tail} Andrew Morton
@ 2023-08-07  0:37     ` Kemeng Shi
  0 siblings, 0 replies; 33+ messages in thread
From: Kemeng Shi @ 2023-08-07  0:37 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-mm, linux-kernel, baolin.wang, mgorman, david



on 8/6/2023 1:11 AM, Andrew Morton wrote:
> On Sat,  5 Aug 2023 19:07:03 +0800 Kemeng Shi <shikemeng@huaweicloud.com> wrote:
> 
>> The freepage is chained with buddy_list in freelist head. Use buddy_list
>> instead of lru to correct the list operation.
>>
>> ...
>>
>> --- a/mm/compaction.c
>> +++ b/mm/compaction.c
>> @@ -1395,8 +1395,8 @@ move_freelist_head(struct list_head *freelist, struct page *freepage)
>>  {
>>  	LIST_HEAD(sublist);
>>  
>> -	if (!list_is_last(freelist, &freepage->lru)) {
>> -		list_cut_before(&sublist, freelist, &freepage->lru);
>> +	if (!list_is_last(freelist, &freepage->buddy_list)) {
>> +		list_cut_before(&sublist, freelist, &freepage->buddy_list);
>>  		list_splice_tail(&sublist, freelist);
>>  	}
>>  }
>> @@ -1412,8 +1412,8 @@ move_freelist_tail(struct list_head *freelist, struct page *freepage)
>>  {
>>  	LIST_HEAD(sublist);
>>  
>> -	if (!list_is_first(freelist, &freepage->lru)) {
>> -		list_cut_position(&sublist, freelist, &freepage->lru);
>> +	if (!list_is_first(freelist, &freepage->buddy_list)) {
>> +		list_cut_position(&sublist, freelist, &freepage->buddy_list);
>>  		list_splice_tail(&sublist, freelist);
>>  	}
>>  }
> 
> This looks like a significant error.  Can we speculate about the
> possible runtime effects?
> 
> 
> It seems no runtime effects for now as lru and buddy_list share
the same memory address in a union.



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/9] mm/compaction: use correct list in move_freelist_{head}/{tail}
       [not found] ` <20230805110711.2975149-2-shikemeng@huaweicloud.com>
  2023-08-05 17:11   ` [PATCH 1/9] mm/compaction: use correct list in move_freelist_{head}/{tail} Andrew Morton
@ 2023-08-15  7:16   ` Baolin Wang
  1 sibling, 0 replies; 33+ messages in thread
From: Baolin Wang @ 2023-08-15  7:16 UTC (permalink / raw)
  To: Kemeng Shi, linux-mm, linux-kernel, akpm, mgorman, david



On 8/5/2023 7:07 PM, Kemeng Shi wrote:
> The freepage is chained with buddy_list in freelist head. Use buddy_list
> instead of lru to correct the list operation.
> 
> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>

Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>

> ---
>   mm/compaction.c | 8 ++++----
>   1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/mm/compaction.c b/mm/compaction.c
> index ea61922a1619..513b1caeb4fa 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -1395,8 +1395,8 @@ move_freelist_head(struct list_head *freelist, struct page *freepage)
>   {
>   	LIST_HEAD(sublist);
>   
> -	if (!list_is_last(freelist, &freepage->lru)) {
> -		list_cut_before(&sublist, freelist, &freepage->lru);
> +	if (!list_is_last(freelist, &freepage->buddy_list)) {
> +		list_cut_before(&sublist, freelist, &freepage->buddy_list);
>   		list_splice_tail(&sublist, freelist);
>   	}
>   }
> @@ -1412,8 +1412,8 @@ move_freelist_tail(struct list_head *freelist, struct page *freepage)
>   {
>   	LIST_HEAD(sublist);
>   
> -	if (!list_is_first(freelist, &freepage->lru)) {
> -		list_cut_position(&sublist, freelist, &freepage->lru);
> +	if (!list_is_first(freelist, &freepage->buddy_list)) {
> +		list_cut_position(&sublist, freelist, &freepage->buddy_list);
>   		list_splice_tail(&sublist, freelist);
>   	}
>   }


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/9] mm/compaction: call list_is_{first}/{last} more intuitively in move_freelist_{head}/{tail}
  2023-08-05 11:07 ` [PATCH 2/9] mm/compaction: call list_is_{first}/{last} more intuitively in move_freelist_{head}/{tail} Kemeng Shi
@ 2023-08-15  7:49   ` Baolin Wang
  0 siblings, 0 replies; 33+ messages in thread
From: Baolin Wang @ 2023-08-15  7:49 UTC (permalink / raw)
  To: Kemeng Shi, linux-mm, linux-kernel, akpm, mgorman, david



On 8/5/2023 7:07 PM, Kemeng Shi wrote:
> We use move_freelist_head after list_for_each_entry_reverse to skip
> recent pages. And there is no need to do actual move if all freepages
> are searched in list_for_each_entry_reverse, e.g. freepage point to
> first page in freelist. It's more intuitively to call list_is_first
> with list entry as the first argument and list head as the second
> argument to check if list entry is the first list entry instead of
> call list_is_last with list entry and list head passed in reverse.
> 
> Similarly, call list_is_last in move_freelist_tail is more intuitively.
> 
> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>

Make sense to me.
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>

> ---
>   mm/compaction.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 513b1caeb4fa..fa1b100b0d10 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -1395,7 +1395,7 @@ move_freelist_head(struct list_head *freelist, struct page *freepage)
>   {
>   	LIST_HEAD(sublist);
>   
> -	if (!list_is_last(freelist, &freepage->buddy_list)) {
> +	if (!list_is_first(&freepage->buddy_list, freelist)) {
>   		list_cut_before(&sublist, freelist, &freepage->buddy_list);
>   		list_splice_tail(&sublist, freelist);
>   	}
> @@ -1412,7 +1412,7 @@ move_freelist_tail(struct list_head *freelist, struct page *freepage)
>   {
>   	LIST_HEAD(sublist);
>   
> -	if (!list_is_first(freelist, &freepage->buddy_list)) {
> +	if (!list_is_last(&freepage->buddy_list, freelist)) {
>   		list_cut_position(&sublist, freelist, &freepage->buddy_list);
>   		list_splice_tail(&sublist, freelist);
>   	}


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 3/9] mm/compaction: correctly return failure with bogus compound_order in strict mode
       [not found] ` <20230805110711.2975149-4-shikemeng@huaweicloud.com>
@ 2023-08-15  8:28   ` Baolin Wang
  2023-08-15  9:22     ` Kemeng Shi
  0 siblings, 1 reply; 33+ messages in thread
From: Baolin Wang @ 2023-08-15  8:28 UTC (permalink / raw)
  To: Kemeng Shi, linux-mm, linux-kernel, akpm, mgorman, david



On 8/5/2023 7:07 PM, Kemeng Shi wrote:
> In strict mode, we should return 0 if there is any hole in pageblock. If
> we successfully isolated pages at beginning at pageblock and then have a
> bogus compound_order outside pageblock in next page. We will abort search
> loop with blockpfn > end_pfn. Although we will limit blockpfn to end_pfn,
> we will treat it as a successful isolation in strict mode as blockpfn is
> not < end_pfn and return partial isolated pages. Then
> isolate_freepages_range may success unexpectly with hole in isolated
> range.

Yes, that can be happened.

> This patch also removes unnecessary limit for blockpfn to go outside
> by buddy page introduced in fixed commit or by stride introduced after
> fixed commit. Caller could use returned blockpfn to check if full
> pageblock is scanned by test if blockpfn >= end and to get next pfn to
> scan inside isolate_freepages_block on demand.

IMO, I don't think removing the pageblock restriction is worth it, since 
it did not fix anything and will make people more confused, at least to me.

That is to say, it will be surprised that the blockpfn can go outside of 
the pageblock after calling isolate_freepages_block() to just scan only 
one pageblock, and I did not see in detail if this can cause other 
potential problems.

> Fixes: 9fcd6d2e052ee ("mm, compaction: skip compound pages by order in free scanner")
> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
> ---
>   mm/compaction.c | 21 +++++++++++----------
>   1 file changed, 11 insertions(+), 10 deletions(-)
> 
> diff --git a/mm/compaction.c b/mm/compaction.c
> index fa1b100b0d10..684f6e6cd8bc 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -631,6 +631,14 @@ static unsigned long isolate_freepages_block(struct compact_control *cc,
>   				page += (1UL << order) - 1;
>   				nr_scanned += (1UL << order) - 1;
>   			}
> +			/*
> +			 * There is a tiny chance that we have read bogus
> +			 * compound_order(), so be careful to not go outside
> +			 * of the pageblock.
> +			 */
> +			if (unlikely(blockpfn >= end_pfn))
> +				blockpfn = end_pfn - 1;

So we can just add this validation to ensure that the 
isolate_freepages_block() can return 0 if failure is happened, which can 
fix your problem.

> +
>   			goto isolate_fail;
>   		}
>   
> @@ -677,17 +685,10 @@ static unsigned long isolate_freepages_block(struct compact_control *cc,
>   	if (locked)
>   		spin_unlock_irqrestore(&cc->zone->lock, flags);
>   
> -	/*
> -	 * There is a tiny chance that we have read bogus compound_order(),
> -	 * so be careful to not go outside of the pageblock.
> -	 */
> -	if (unlikely(blockpfn > end_pfn))
> -		blockpfn = end_pfn;
> -
>   	trace_mm_compaction_isolate_freepages(*start_pfn, blockpfn,
>   					nr_scanned, total_isolated);
>   
> -	/* Record how far we have got within the block */
> +	/* Record how far we have got */
>   	*start_pfn = blockpfn;
>   
>   	/*
> @@ -1443,7 +1444,7 @@ fast_isolate_around(struct compact_control *cc, unsigned long pfn)
>   	isolate_freepages_block(cc, &start_pfn, end_pfn, &cc->freepages, 1, false);
>   
>   	/* Skip this pageblock in the future as it's full or nearly full */
> -	if (start_pfn == end_pfn && !cc->no_set_skip_hint)
> +	if (start_pfn >= end_pfn && !cc->no_set_skip_hint)
>   		set_pageblock_skip(page);
>   }
>   
> @@ -1712,7 +1713,7 @@ static void isolate_freepages(struct compact_control *cc)
>   					block_end_pfn, freelist, stride, false);
>   
>   		/* Update the skip hint if the full pageblock was scanned */
> -		if (isolate_start_pfn == block_end_pfn)
> +		if (isolate_start_pfn >= block_end_pfn)
>   			update_pageblock_skip(cc, page, block_start_pfn -
>   					      pageblock_nr_pages);
>   


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 4/9] mm/compaction: simplify pfn iteration in isolate_freepages_range
       [not found] ` <20230805110711.2975149-5-shikemeng@huaweicloud.com>
@ 2023-08-15  8:38   ` Baolin Wang
  2023-08-15  9:32     ` Kemeng Shi
  0 siblings, 1 reply; 33+ messages in thread
From: Baolin Wang @ 2023-08-15  8:38 UTC (permalink / raw)
  To: Kemeng Shi, linux-mm, linux-kernel, akpm, mgorman, david



On 8/5/2023 7:07 PM, Kemeng Shi wrote:
> We call isolate_freepages_block in strict mode, continuous pages in
> pageblock will be isolated if isolate_freepages_block successed.
> Then pfn + isolated will point to start of next pageblock to scan
> no matter how many pageblocks are isolated in isolate_freepages_block.
> Use pfn + isolated as start of next pageblock to scan to simplify the
> iteration.

IIUC, the isolate_freepages_block() can isolate high-order free pages, 
which means the pfn + isolated can be larger than the block_end_pfn. So 
in your patch, the 'block_start_pfn' and 'block_end_pfn' can be in 
different pageblocks, that will break pageblock_pfn_to_page().

> 
> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
> ---
>   mm/compaction.c | 14 ++------------
>   1 file changed, 2 insertions(+), 12 deletions(-)
> 
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 684f6e6cd8bc..8d7d38073d30 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -733,21 +733,11 @@ isolate_freepages_range(struct compact_control *cc,
>   	block_end_pfn = pageblock_end_pfn(pfn);
>   
>   	for (; pfn < end_pfn; pfn += isolated,
> -				block_start_pfn = block_end_pfn,
> -				block_end_pfn += pageblock_nr_pages) {
> +				block_start_pfn = pfn,
> +				block_end_pfn = pfn + pageblock_nr_pages) {
>   		/* Protect pfn from changing by isolate_freepages_block */
>   		unsigned long isolate_start_pfn = pfn;
>   
> -		/*
> -		 * pfn could pass the block_end_pfn if isolated freepage
> -		 * is more than pageblock order. In this case, we adjust
> -		 * scanning range to right one.
> -		 */
> -		if (pfn >= block_end_pfn) {
> -			block_start_pfn = pageblock_start_pfn(pfn);
> -			block_end_pfn = pageblock_end_pfn(pfn);
> -		}
> -
>   		block_end_pfn = min(block_end_pfn, end_pfn);
>   
>   		if (!pageblock_pfn_to_page(block_start_pfn,


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 5/9] mm/compaction: remove repeat compact_blockskip_flush check in reset_isolation_suitable
  2023-08-05 11:07 ` [PATCH 5/9] mm/compaction: remove repeat compact_blockskip_flush check in reset_isolation_suitable Kemeng Shi
@ 2023-08-15  8:42   ` Baolin Wang
  0 siblings, 0 replies; 33+ messages in thread
From: Baolin Wang @ 2023-08-15  8:42 UTC (permalink / raw)
  To: Kemeng Shi, linux-mm, linux-kernel, akpm, mgorman, david



On 8/5/2023 7:07 PM, Kemeng Shi wrote:
> We have compact_blockskip_flush check in __reset_isolation_suitable, just
> remove repeat check before __reset_isolation_suitable in
> compact_blockskip_flush.
> 
> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>

Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>

> ---
>   mm/compaction.c | 4 +---
>   1 file changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 8d7d38073d30..d8416d3dd445 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -434,9 +434,7 @@ void reset_isolation_suitable(pg_data_t *pgdat)
>   		if (!populated_zone(zone))
>   			continue;
>   
> -		/* Only flush if a full compaction finished recently */
> -		if (zone->compact_blockskip_flush)
> -			__reset_isolation_suitable(zone);
> +		__reset_isolation_suitable(zone);
>   	}
>   }
>   


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 7/9] mm/compaction: factor out code to test if we should run compaction for target order
  2023-08-05 11:07 ` [PATCH 7/9] mm/compaction: factor out code to test if we should run compaction for target order Kemeng Shi
@ 2023-08-15  8:53   ` Baolin Wang
  2023-08-15 12:10     ` Kemeng Shi
  0 siblings, 1 reply; 33+ messages in thread
From: Baolin Wang @ 2023-08-15  8:53 UTC (permalink / raw)
  To: Kemeng Shi, linux-mm, linux-kernel, akpm, mgorman, david



On 8/5/2023 7:07 PM, Kemeng Shi wrote:
> We always do zone_watermark_ok check and compaction_suitable check
> together to test if compaction for target order should be runned.
> Factor these code out for preparation to remove repeat code.
> 
> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
> ---
>   mm/compaction.c | 42 +++++++++++++++++++++++++++++-------------
>   1 file changed, 29 insertions(+), 13 deletions(-)
> 
> diff --git a/mm/compaction.c b/mm/compaction.c
> index b5a699ed526b..26787ebb0297 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -2365,6 +2365,30 @@ bool compaction_zonelist_suitable(struct alloc_context *ac, int order,
>   	return false;
>   }
>   
> +/*
> + * Should we do compaction for target allocation order.
> + * Return COMPACT_SUCCESS if allocation for target order can be already
> + * satisfied
> + * Return COMPACT_SKIPPED if compaction for target order is likely to fail
> + * Return COMPACT_CONTINUE if compaction for target order should be runned
> + */
> +static inline enum compact_result
> +compaction_suit_allocation_order(struct zone *zone, unsigned int order,
> +				 int highest_zoneidx, unsigned int alloc_flags)
> +{
> +	unsigned long watermark;
> +
> +	watermark = wmark_pages(zone, alloc_flags & ALLOC_WMARK_MASK);

IIUC, the watermark used in patch 8 and patch 9 is different, right? 
Have you measured the impact of modifying this watermark?

> +	if (zone_watermark_ok(zone, order, watermark, highest_zoneidx,
> +			      alloc_flags))
> +		return COMPACT_SUCCESS;
> +
> +	if (!compaction_suitable(zone, order, highest_zoneidx))
> +		return COMPACT_SKIPPED;
> +
> +	return COMPACT_CONTINUE;
> +}
> +
>   static enum compact_result
>   compact_zone(struct compact_control *cc, struct capture_control *capc)
>   {
> @@ -2390,19 +2414,11 @@ compact_zone(struct compact_control *cc, struct capture_control *capc)
>   	cc->migratetype = gfp_migratetype(cc->gfp_mask);
>   
>   	if (compaction_with_allocation_order(cc->order)) {
> -		unsigned long watermark;
> -
> -		/* Allocation can already succeed, nothing to do */
> -		watermark = wmark_pages(cc->zone,
> -					cc->alloc_flags & ALLOC_WMARK_MASK);
> -		if (zone_watermark_ok(cc->zone, cc->order, watermark,
> -				      cc->highest_zoneidx, cc->alloc_flags))
> -			return COMPACT_SUCCESS;
> -
> -		/* Compaction is likely to fail */
> -		if (!compaction_suitable(cc->zone, cc->order,
> -					 cc->highest_zoneidx))
> -			return COMPACT_SKIPPED;
> +		ret = compaction_suit_allocation_order(cc->zone, cc->order,
> +						       cc->highest_zoneidx,
> +						       cc->alloc_flags);
> +		if (ret != COMPACT_CONTINUE)
> +			return ret;
>   	}
>   
>   	/*


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 6/9] mm/compaction: rename is_via_compact_memory to compaction_with_allocation_order
  2023-08-05 11:07 ` [PATCH 6/9] mm/compaction: rename is_via_compact_memory to compaction_with_allocation_order Kemeng Shi
@ 2023-08-15  8:58   ` Baolin Wang
  2023-08-15 12:04     ` Kemeng Shi
  0 siblings, 1 reply; 33+ messages in thread
From: Baolin Wang @ 2023-08-15  8:58 UTC (permalink / raw)
  To: Kemeng Shi, linux-mm, linux-kernel, akpm, mgorman, david



On 8/5/2023 7:07 PM, Kemeng Shi wrote:
> We have order = -1 via proactive compaction, the is_via_compact_memory is
> not proper name anymore.
> As cc->order informs the compaction to satisfy a allocation with that
> order, so rename it to compaction_with_allocation_order.
> 
> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
> ---
>   mm/compaction.c | 11 +++++------
>   1 file changed, 5 insertions(+), 6 deletions(-)
> 
> diff --git a/mm/compaction.c b/mm/compaction.c
> index d8416d3dd445..b5a699ed526b 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -2055,12 +2055,11 @@ static isolate_migrate_t isolate_migratepages(struct compact_control *cc)
>   }
>   
>   /*
> - * order == -1 is expected when compacting via
> - * /proc/sys/vm/compact_memory
> + * compact to satisfy allocation with target order
>    */
> -static inline bool is_via_compact_memory(int order)
> +static inline bool compaction_with_allocation_order(int order)

I know naming is hard, but this name is not good enough that can show 
the compaction mode. But the original one could.

>   {
> -	return order == -1;
> +	return order != -1;
>   }
>   
>   /*
> @@ -2200,7 +2199,7 @@ static enum compact_result __compact_finished(struct compact_control *cc)
>   		goto out;
>   	}
>   
> -	if (is_via_compact_memory(cc->order))
> +	if (!compaction_with_allocation_order(cc->order))
>   		return COMPACT_CONTINUE;
>   
>   	/*
> @@ -2390,7 +2389,7 @@ compact_zone(struct compact_control *cc, struct capture_control *capc)
>   
>   	cc->migratetype = gfp_migratetype(cc->gfp_mask);
>   
> -	if (!is_via_compact_memory(cc->order)) {
> +	if (compaction_with_allocation_order(cc->order)) {
>   		unsigned long watermark;
>   
>   		/* Allocation can already succeed, nothing to do */


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 3/9] mm/compaction: correctly return failure with bogus compound_order in strict mode
  2023-08-15  8:28   ` [PATCH 3/9] mm/compaction: correctly return failure with bogus compound_order in strict mode Baolin Wang
@ 2023-08-15  9:22     ` Kemeng Shi
  0 siblings, 0 replies; 33+ messages in thread
From: Kemeng Shi @ 2023-08-15  9:22 UTC (permalink / raw)
  To: Baolin Wang, linux-mm, linux-kernel, akpm, mgorman, david



on 8/15/2023 4:28 PM, Baolin Wang wrote:
> 
> 
> On 8/5/2023 7:07 PM, Kemeng Shi wrote:
>> In strict mode, we should return 0 if there is any hole in pageblock. If
>> we successfully isolated pages at beginning at pageblock and then have a
>> bogus compound_order outside pageblock in next page. We will abort search
>> loop with blockpfn > end_pfn. Although we will limit blockpfn to end_pfn,
>> we will treat it as a successful isolation in strict mode as blockpfn is
>> not < end_pfn and return partial isolated pages. Then
>> isolate_freepages_range may success unexpectly with hole in isolated
>> range.
> 
> Yes, that can be happened.
> 
>> This patch also removes unnecessary limit for blockpfn to go outside
>> by buddy page introduced in fixed commit or by stride introduced after
>> fixed commit. Caller could use returned blockpfn to check if full
>> pageblock is scanned by test if blockpfn >= end and to get next pfn to
>> scan inside isolate_freepages_block on demand.
> 
> IMO, I don't think removing the pageblock restriction is worth it, since it did not fix anything and will make people more confused, at least to me.
> 
> That is to say, it will be surprised that the blockpfn can go outside of the pageblock after calling isolate_freepages_block() to just scan only one pageblock, and I did not see in detail if this can cause other potential problems.
> 
>> Fixes: 9fcd6d2e052ee ("mm, compaction: skip compound pages by order in free scanner")
>> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
>> ---
>>   mm/compaction.c | 21 +++++++++++----------
>>   1 file changed, 11 insertions(+), 10 deletions(-)
>>
>> diff --git a/mm/compaction.c b/mm/compaction.c
>> index fa1b100b0d10..684f6e6cd8bc 100644
>> --- a/mm/compaction.c
>> +++ b/mm/compaction.c
>> @@ -631,6 +631,14 @@ static unsigned long isolate_freepages_block(struct compact_control *cc,
>>                   page += (1UL << order) - 1;
>>                   nr_scanned += (1UL << order) - 1;
>>               }
>> +            /*
>> +             * There is a tiny chance that we have read bogus
>> +             * compound_order(), so be careful to not go outside
>> +             * of the pageblock.
>> +             */
>> +            if (unlikely(blockpfn >= end_pfn))
>> +                blockpfn = end_pfn - 1;
> 
> So we can just add this validation to ensure that the isolate_freepages_block() can return 0 if failure is happened, which can fix your problem.
> 
Thanks for feedback! Sure, I will do this in next version.
>> +
>>               goto isolate_fail;
>>           }
>>   @@ -677,17 +685,10 @@ static unsigned long isolate_freepages_block(struct compact_control *cc,
>>       if (locked)
>>           spin_unlock_irqrestore(&cc->zone->lock, flags);
>>   -    /*
>> -     * There is a tiny chance that we have read bogus compound_order(),
>> -     * so be careful to not go outside of the pageblock.
>> -     */
>> -    if (unlikely(blockpfn > end_pfn))
>> -        blockpfn = end_pfn;
>> -
>>       trace_mm_compaction_isolate_freepages(*start_pfn, blockpfn,
>>                       nr_scanned, total_isolated);
>>   -    /* Record how far we have got within the block */
>> +    /* Record how far we have got */
>>       *start_pfn = blockpfn;
>>         /*
>> @@ -1443,7 +1444,7 @@ fast_isolate_around(struct compact_control *cc, unsigned long pfn)
>>       isolate_freepages_block(cc, &start_pfn, end_pfn, &cc->freepages, 1, false);
>>         /* Skip this pageblock in the future as it's full or nearly full */
>> -    if (start_pfn == end_pfn && !cc->no_set_skip_hint)
>> +    if (start_pfn >= end_pfn && !cc->no_set_skip_hint)
>>           set_pageblock_skip(page);
>>   }
>>   @@ -1712,7 +1713,7 @@ static void isolate_freepages(struct compact_control *cc)
>>                       block_end_pfn, freelist, stride, false);
>>             /* Update the skip hint if the full pageblock was scanned */
>> -        if (isolate_start_pfn == block_end_pfn)
>> +        if (isolate_start_pfn >= block_end_pfn)
>>               update_pageblock_skip(cc, page, block_start_pfn -
>>                             pageblock_nr_pages);
>>   
> 



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 4/9] mm/compaction: simplify pfn iteration in isolate_freepages_range
  2023-08-15  8:38   ` [PATCH 4/9] mm/compaction: simplify pfn iteration in isolate_freepages_range Baolin Wang
@ 2023-08-15  9:32     ` Kemeng Shi
  2023-08-15 10:07       ` Baolin Wang
  0 siblings, 1 reply; 33+ messages in thread
From: Kemeng Shi @ 2023-08-15  9:32 UTC (permalink / raw)
  To: Baolin Wang, linux-mm, linux-kernel, akpm, mgorman, david



on 8/15/2023 4:38 PM, Baolin Wang wrote:
> 
> 
> On 8/5/2023 7:07 PM, Kemeng Shi wrote:
>> We call isolate_freepages_block in strict mode, continuous pages in
>> pageblock will be isolated if isolate_freepages_block successed.
>> Then pfn + isolated will point to start of next pageblock to scan
>> no matter how many pageblocks are isolated in isolate_freepages_block.
>> Use pfn + isolated as start of next pageblock to scan to simplify the
>> iteration.
> 
> IIUC, the isolate_freepages_block() can isolate high-order free pages, which means the pfn + isolated can be larger than the block_end_pfn. So in your patch, the 'block_start_pfn' and 'block_end_pfn' can be in different pageblocks, that will break pageblock_pfn_to_page().
> 
In for update statement, we always update block_start_pfn to pfn and
update block_end_pfn to pfn + pageblock_nr_pages. So they should point
to the same pageblock. I guess you missed the change to update of
block_end_pfn. :)
>>
>> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
>> ---
>>   mm/compaction.c | 14 ++------------
>>   1 file changed, 2 insertions(+), 12 deletions(-)
>>
>> diff --git a/mm/compaction.c b/mm/compaction.c
>> index 684f6e6cd8bc..8d7d38073d30 100644
>> --- a/mm/compaction.c
>> +++ b/mm/compaction.c
>> @@ -733,21 +733,11 @@ isolate_freepages_range(struct compact_control *cc,
>>       block_end_pfn = pageblock_end_pfn(pfn);
>>         for (; pfn < end_pfn; pfn += isolated,
>> -                block_start_pfn = block_end_pfn,
>> -                block_end_pfn += pageblock_nr_pages) {
>> +                block_start_pfn = pfn,
>> +                block_end_pfn = pfn + pageblock_nr_pages) {
>>           /* Protect pfn from changing by isolate_freepages_block */
>>           unsigned long isolate_start_pfn = pfn;
>>   -        /*
>> -         * pfn could pass the block_end_pfn if isolated freepage
>> -         * is more than pageblock order. In this case, we adjust
>> -         * scanning range to right one.
>> -         */
>> -        if (pfn >= block_end_pfn) {
>> -            block_start_pfn = pageblock_start_pfn(pfn);
>> -            block_end_pfn = pageblock_end_pfn(pfn);
>> -        }
>> -
>>           block_end_pfn = min(block_end_pfn, end_pfn);
>>             if (!pageblock_pfn_to_page(block_start_pfn,
> 



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 4/9] mm/compaction: simplify pfn iteration in isolate_freepages_range
  2023-08-15  9:32     ` Kemeng Shi
@ 2023-08-15 10:07       ` Baolin Wang
  2023-08-15 10:37         ` Kemeng Shi
  0 siblings, 1 reply; 33+ messages in thread
From: Baolin Wang @ 2023-08-15 10:07 UTC (permalink / raw)
  To: Kemeng Shi, linux-mm, linux-kernel, akpm, mgorman, david



On 8/15/2023 5:32 PM, Kemeng Shi wrote:
> 
> 
> on 8/15/2023 4:38 PM, Baolin Wang wrote:
>>
>>
>> On 8/5/2023 7:07 PM, Kemeng Shi wrote:
>>> We call isolate_freepages_block in strict mode, continuous pages in
>>> pageblock will be isolated if isolate_freepages_block successed.
>>> Then pfn + isolated will point to start of next pageblock to scan
>>> no matter how many pageblocks are isolated in isolate_freepages_block.
>>> Use pfn + isolated as start of next pageblock to scan to simplify the
>>> iteration.
>>
>> IIUC, the isolate_freepages_block() can isolate high-order free pages, which means the pfn + isolated can be larger than the block_end_pfn. So in your patch, the 'block_start_pfn' and 'block_end_pfn' can be in different pageblocks, that will break pageblock_pfn_to_page().
>>
> In for update statement, we always update block_start_pfn to pfn and

I mean, you changed to:
1) pfn += isolated;
2) block_start_pfn = pfn;
3) block_end_pfn = pfn + pageblock_nr_pages;

But in 1) pfn + isolated can go outside of the currnet pageblock if 
isolating a high-order page, for example, located in the middle of the 
next pageblock. So that the block_start_pfn can point to the middle of 
the next pageblock, not the start position. Meanwhile after 3), the 
block_end_pfn can point another pageblock. Or I missed something else?

> update block_end_pfn to pfn + pageblock_nr_pages. So they should point
> to the same pageblock. I guess you missed the change to update of
> block_end_pfn. :)
>>>
>>> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
>>> ---
>>>    mm/compaction.c | 14 ++------------
>>>    1 file changed, 2 insertions(+), 12 deletions(-)
>>>
>>> diff --git a/mm/compaction.c b/mm/compaction.c
>>> index 684f6e6cd8bc..8d7d38073d30 100644
>>> --- a/mm/compaction.c
>>> +++ b/mm/compaction.c
>>> @@ -733,21 +733,11 @@ isolate_freepages_range(struct compact_control *cc,
>>>        block_end_pfn = pageblock_end_pfn(pfn);
>>>          for (; pfn < end_pfn; pfn += isolated,
>>> -                block_start_pfn = block_end_pfn,
>>> -                block_end_pfn += pageblock_nr_pages) {
>>> +                block_start_pfn = pfn,
>>> +                block_end_pfn = pfn + pageblock_nr_pages) {
>>>            /* Protect pfn from changing by isolate_freepages_block */
>>>            unsigned long isolate_start_pfn = pfn;
>>>    -        /*
>>> -         * pfn could pass the block_end_pfn if isolated freepage
>>> -         * is more than pageblock order. In this case, we adjust
>>> -         * scanning range to right one.
>>> -         */
>>> -        if (pfn >= block_end_pfn) {
>>> -            block_start_pfn = pageblock_start_pfn(pfn);
>>> -            block_end_pfn = pageblock_end_pfn(pfn);
>>> -        }
>>> -
>>>            block_end_pfn = min(block_end_pfn, end_pfn);
>>>              if (!pageblock_pfn_to_page(block_start_pfn,
>>


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 4/9] mm/compaction: simplify pfn iteration in isolate_freepages_range
  2023-08-15 10:07       ` Baolin Wang
@ 2023-08-15 10:37         ` Kemeng Shi
  2023-08-19 11:58           ` Baolin Wang
  0 siblings, 1 reply; 33+ messages in thread
From: Kemeng Shi @ 2023-08-15 10:37 UTC (permalink / raw)
  To: Baolin Wang, linux-mm, linux-kernel, akpm, mgorman, david



on 8/15/2023 6:07 PM, Baolin Wang wrote:
> 
> 
> On 8/15/2023 5:32 PM, Kemeng Shi wrote:
>>
>>
>> on 8/15/2023 4:38 PM, Baolin Wang wrote:
>>>
>>>
>>> On 8/5/2023 7:07 PM, Kemeng Shi wrote:
>>>> We call isolate_freepages_block in strict mode, continuous pages in
>>>> pageblock will be isolated if isolate_freepages_block successed.
>>>> Then pfn + isolated will point to start of next pageblock to scan
>>>> no matter how many pageblocks are isolated in isolate_freepages_block.
>>>> Use pfn + isolated as start of next pageblock to scan to simplify the
>>>> iteration.
>>>
>>> IIUC, the isolate_freepages_block() can isolate high-order free pages, which means the pfn + isolated can be larger than the block_end_pfn. So in your patch, the 'block_start_pfn' and 'block_end_pfn' can be in different pageblocks, that will break pageblock_pfn_to_page().
>>>
>> In for update statement, we always update block_start_pfn to pfn and
> 
> I mean, you changed to:
> 1) pfn += isolated;
> 2) block_start_pfn = pfn;
> 3) block_end_pfn = pfn + pageblock_nr_pages;
> 
> But in 1) pfn + isolated can go outside of the currnet pageblock if isolating a high-order page, for example, located in the middle of the next pageblock. So that the block_start_pfn can point to the middle of the next pageblock, not the start position. Meanwhile after 3), the block_end_pfn can point another pageblock. Or I missed something else?
> 
Ah, I miss to explain this in changelog.
In case we could we have buddy page with order higher than pageblock:
1. page in buddy page is aligned with it's order
2. order of page is higher than pageblock order
Then page is aligned with pageblock order. So pfn of page and isolated pages
count are both aligned pageblock order. So pfn + isolated is pageblock order
aligned.
>> update block_end_pfn to pfn + pageblock_nr_pages. So they should point
>> to the same pageblock. I guess you missed the change to update of
>> block_end_pfn. :)
>>>>
>>>> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
>>>> ---
>>>>    mm/compaction.c | 14 ++------------
>>>>    1 file changed, 2 insertions(+), 12 deletions(-)
>>>>
>>>> diff --git a/mm/compaction.c b/mm/compaction.c
>>>> index 684f6e6cd8bc..8d7d38073d30 100644
>>>> --- a/mm/compaction.c
>>>> +++ b/mm/compaction.c
>>>> @@ -733,21 +733,11 @@ isolate_freepages_range(struct compact_control *cc,
>>>>        block_end_pfn = pageblock_end_pfn(pfn);
>>>>          for (; pfn < end_pfn; pfn += isolated,
>>>> -                block_start_pfn = block_end_pfn,
>>>> -                block_end_pfn += pageblock_nr_pages) {
>>>> +                block_start_pfn = pfn,
>>>> +                block_end_pfn = pfn + pageblock_nr_pages) {
>>>>            /* Protect pfn from changing by isolate_freepages_block */
>>>>            unsigned long isolate_start_pfn = pfn;
>>>>    -        /*
>>>> -         * pfn could pass the block_end_pfn if isolated freepage
>>>> -         * is more than pageblock order. In this case, we adjust
>>>> -         * scanning range to right one.
>>>> -         */
>>>> -        if (pfn >= block_end_pfn) {
>>>> -            block_start_pfn = pageblock_start_pfn(pfn);
>>>> -            block_end_pfn = pageblock_end_pfn(pfn);
>>>> -        }
>>>> -
>>>>            block_end_pfn = min(block_end_pfn, end_pfn);
>>>>              if (!pageblock_pfn_to_page(block_start_pfn,
>>>
> 



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 6/9] mm/compaction: rename is_via_compact_memory to compaction_with_allocation_order
  2023-08-15  8:58   ` Baolin Wang
@ 2023-08-15 12:04     ` Kemeng Shi
  2023-08-19 12:14       ` Baolin Wang
  0 siblings, 1 reply; 33+ messages in thread
From: Kemeng Shi @ 2023-08-15 12:04 UTC (permalink / raw)
  To: Baolin Wang, linux-mm, linux-kernel, akpm, mgorman, david



on 8/15/2023 4:58 PM, Baolin Wang wrote:
> 
> 
> On 8/5/2023 7:07 PM, Kemeng Shi wrote:
>> We have order = -1 via proactive compaction, the is_via_compact_memory is
>> not proper name anymore.
>> As cc->order informs the compaction to satisfy a allocation with that
>> order, so rename it to compaction_with_allocation_order.
>>
>> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
>> ---
>>   mm/compaction.c | 11 +++++------
>>   1 file changed, 5 insertions(+), 6 deletions(-)
>>
>> diff --git a/mm/compaction.c b/mm/compaction.c
>> index d8416d3dd445..b5a699ed526b 100644
>> --- a/mm/compaction.c
>> +++ b/mm/compaction.c
>> @@ -2055,12 +2055,11 @@ static isolate_migrate_t isolate_migratepages(struct compact_control *cc)
>>   }
>>     /*
>> - * order == -1 is expected when compacting via
>> - * /proc/sys/vm/compact_memory
>> + * compact to satisfy allocation with target order
>>    */
>> -static inline bool is_via_compact_memory(int order)
>> +static inline bool compaction_with_allocation_order(int order)
> 
> I know naming is hard, but this name is not good enough that can show the compaction mode. But the original one could.
> 
Yes, I agree with this, but name and comment of is_via_compact_memory may
mislead reader that order == -1 is equivalent to compaction from
/proc/sys/vm/compact_memory.
Actually, we have several approaches to trigger compaction with order == -1:
1. via /proc/sys/vm/compact_memory
2. via /sys/devices/system/node/nodex/compact
3. via proactive compact

Instead of indicate compaction is tirggerred by compact_memocy or anything,
order == -1 implies if compaction is triggerrred to meet allocation with high
order and we will stop compaction if allocation with target order will success.

>>   {
>> -    return order == -1;
>> +    return order != -1;
>>   }
>>     /*
>> @@ -2200,7 +2199,7 @@ static enum compact_result __compact_finished(struct compact_control *cc)
>>           goto out;
>>       }
>>   -    if (is_via_compact_memory(cc->order))
>> +    if (!compaction_with_allocation_order(cc->order))
>>           return COMPACT_CONTINUE;
>>         /*
>> @@ -2390,7 +2389,7 @@ compact_zone(struct compact_control *cc, struct capture_control *capc)
>>         cc->migratetype = gfp_migratetype(cc->gfp_mask);
>>   -    if (!is_via_compact_memory(cc->order)) {
>> +    if (compaction_with_allocation_order(cc->order)) {
>>           unsigned long watermark;
>>             /* Allocation can already succeed, nothing to do */
> 



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 7/9] mm/compaction: factor out code to test if we should run compaction for target order
  2023-08-15  8:53   ` Baolin Wang
@ 2023-08-15 12:10     ` Kemeng Shi
  2023-08-19 12:27       ` Baolin Wang
  0 siblings, 1 reply; 33+ messages in thread
From: Kemeng Shi @ 2023-08-15 12:10 UTC (permalink / raw)
  To: Baolin Wang, linux-mm, linux-kernel, akpm, mgorman, david



on 8/15/2023 4:53 PM, Baolin Wang wrote:
> 
> 
> On 8/5/2023 7:07 PM, Kemeng Shi wrote:
>> We always do zone_watermark_ok check and compaction_suitable check
>> together to test if compaction for target order should be runned.
>> Factor these code out for preparation to remove repeat code.
>>
>> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
>> ---
>>   mm/compaction.c | 42 +++++++++++++++++++++++++++++-------------
>>   1 file changed, 29 insertions(+), 13 deletions(-)
>>
>> diff --git a/mm/compaction.c b/mm/compaction.c
>> index b5a699ed526b..26787ebb0297 100644
>> --- a/mm/compaction.c
>> +++ b/mm/compaction.c
>> @@ -2365,6 +2365,30 @@ bool compaction_zonelist_suitable(struct alloc_context *ac, int order,
>>       return false;
>>   }
>>   +/*
>> + * Should we do compaction for target allocation order.
>> + * Return COMPACT_SUCCESS if allocation for target order can be already
>> + * satisfied
>> + * Return COMPACT_SKIPPED if compaction for target order is likely to fail
>> + * Return COMPACT_CONTINUE if compaction for target order should be runned
>> + */
>> +static inline enum compact_result
>> +compaction_suit_allocation_order(struct zone *zone, unsigned int order,
>> +                 int highest_zoneidx, unsigned int alloc_flags)
>> +{
>> +    unsigned long watermark;
>> +
>> +    watermark = wmark_pages(zone, alloc_flags & ALLOC_WMARK_MASK);
> 
> IIUC, the watermark used in patch 8 and patch 9 is different, right? Have you measured the impact of modifying this watermark?
> 
Actually, there is no functional change intended. Consider wmark_pages with
alloc_flags = 0 is equivalent to min_wmark_pages, patch 8 and patch 9 still
use original watermark.

>> +    if (zone_watermark_ok(zone, order, watermark, highest_zoneidx,
>> +                  alloc_flags))
>> +        return COMPACT_SUCCESS;
>> +
>> +    if (!compaction_suitable(zone, order, highest_zoneidx))
>> +        return COMPACT_SKIPPED;
>> +
>> +    return COMPACT_CONTINUE;
>> +}
>> +
>>   static enum compact_result
>>   compact_zone(struct compact_control *cc, struct capture_control *capc)
>>   {
>> @@ -2390,19 +2414,11 @@ compact_zone(struct compact_control *cc, struct capture_control *capc)
>>       cc->migratetype = gfp_migratetype(cc->gfp_mask);
>>         if (compaction_with_allocation_order(cc->order)) {
>> -        unsigned long watermark;
>> -
>> -        /* Allocation can already succeed, nothing to do */
>> -        watermark = wmark_pages(cc->zone,
>> -                    cc->alloc_flags & ALLOC_WMARK_MASK);
>> -        if (zone_watermark_ok(cc->zone, cc->order, watermark,
>> -                      cc->highest_zoneidx, cc->alloc_flags))
>> -            return COMPACT_SUCCESS;
>> -
>> -        /* Compaction is likely to fail */
>> -        if (!compaction_suitable(cc->zone, cc->order,
>> -                     cc->highest_zoneidx))
>> -            return COMPACT_SKIPPED;
>> +        ret = compaction_suit_allocation_order(cc->zone, cc->order,
>> +                               cc->highest_zoneidx,
>> +                               cc->alloc_flags);
>> +        if (ret != COMPACT_CONTINUE)
>> +            return ret;
>>       }
>>         /*
> 
> 



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 4/9] mm/compaction: simplify pfn iteration in isolate_freepages_range
  2023-08-15 10:37         ` Kemeng Shi
@ 2023-08-19 11:58           ` Baolin Wang
  2023-08-22  1:37             ` Kemeng Shi
  0 siblings, 1 reply; 33+ messages in thread
From: Baolin Wang @ 2023-08-19 11:58 UTC (permalink / raw)
  To: Kemeng Shi, linux-mm, linux-kernel, akpm, mgorman, david



On 8/15/2023 6:37 PM, Kemeng Shi wrote:
> 
> 
> on 8/15/2023 6:07 PM, Baolin Wang wrote:
>>
>>
>> On 8/15/2023 5:32 PM, Kemeng Shi wrote:
>>>
>>>
>>> on 8/15/2023 4:38 PM, Baolin Wang wrote:
>>>>
>>>>
>>>> On 8/5/2023 7:07 PM, Kemeng Shi wrote:
>>>>> We call isolate_freepages_block in strict mode, continuous pages in
>>>>> pageblock will be isolated if isolate_freepages_block successed.
>>>>> Then pfn + isolated will point to start of next pageblock to scan
>>>>> no matter how many pageblocks are isolated in isolate_freepages_block.
>>>>> Use pfn + isolated as start of next pageblock to scan to simplify the
>>>>> iteration.
>>>>
>>>> IIUC, the isolate_freepages_block() can isolate high-order free pages, which means the pfn + isolated can be larger than the block_end_pfn. So in your patch, the 'block_start_pfn' and 'block_end_pfn' can be in different pageblocks, that will break pageblock_pfn_to_page().
>>>>
>>> In for update statement, we always update block_start_pfn to pfn and
>>
>> I mean, you changed to:
>> 1) pfn += isolated;
>> 2) block_start_pfn = pfn;
>> 3) block_end_pfn = pfn + pageblock_nr_pages;
>>
>> But in 1) pfn + isolated can go outside of the currnet pageblock if isolating a high-order page, for example, located in the middle of the next pageblock. So that the block_start_pfn can point to the middle of the next pageblock, not the start position. Meanwhile after 3), the block_end_pfn can point another pageblock. Or I missed something else?
>>
> Ah, I miss to explain this in changelog.
> In case we could we have buddy page with order higher than pageblock:
> 1. page in buddy page is aligned with it's order
> 2. order of page is higher than pageblock order
> Then page is aligned with pageblock order. So pfn of page and isolated pages
> count are both aligned pageblock order. So pfn + isolated is pageblock order
> aligned.

That's not what I mean. pfn + isolated is not always pageblock-aligned, 
since the isolate_freepages_block() can isolated high-order free pages 
(for example: order-1, order-2 ...).

Suppose the pageblock size is 2M, when isolating a pageblock (suppose 
the pfn range is 0 - 511 to make the arithmetic easy) by 
isolate_freepages_block(), and suppose pfn 0 to pfn 510 are all order-0 
page, but pfn 511 is order-1 page, so you will isolate 513 pages from 
this pageblock, which will make 'pfn + isolated' not pageblock aligned.

>>> update block_end_pfn to pfn + pageblock_nr_pages. So they should point
>>> to the same pageblock. I guess you missed the change to update of
>>> block_end_pfn. :)
>>>>>
>>>>> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
>>>>> ---
>>>>>     mm/compaction.c | 14 ++------------
>>>>>     1 file changed, 2 insertions(+), 12 deletions(-)
>>>>>
>>>>> diff --git a/mm/compaction.c b/mm/compaction.c
>>>>> index 684f6e6cd8bc..8d7d38073d30 100644
>>>>> --- a/mm/compaction.c
>>>>> +++ b/mm/compaction.c
>>>>> @@ -733,21 +733,11 @@ isolate_freepages_range(struct compact_control *cc,
>>>>>         block_end_pfn = pageblock_end_pfn(pfn);
>>>>>           for (; pfn < end_pfn; pfn += isolated,
>>>>> -                block_start_pfn = block_end_pfn,
>>>>> -                block_end_pfn += pageblock_nr_pages) {
>>>>> +                block_start_pfn = pfn,
>>>>> +                block_end_pfn = pfn + pageblock_nr_pages) {
>>>>>             /* Protect pfn from changing by isolate_freepages_block */
>>>>>             unsigned long isolate_start_pfn = pfn;
>>>>>     -        /*
>>>>> -         * pfn could pass the block_end_pfn if isolated freepage
>>>>> -         * is more than pageblock order. In this case, we adjust
>>>>> -         * scanning range to right one.
>>>>> -         */
>>>>> -        if (pfn >= block_end_pfn) {
>>>>> -            block_start_pfn = pageblock_start_pfn(pfn);
>>>>> -            block_end_pfn = pageblock_end_pfn(pfn);
>>>>> -        }
>>>>> -
>>>>>             block_end_pfn = min(block_end_pfn, end_pfn);
>>>>>               if (!pageblock_pfn_to_page(block_start_pfn,
>>>>
>>


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 6/9] mm/compaction: rename is_via_compact_memory to compaction_with_allocation_order
  2023-08-15 12:04     ` Kemeng Shi
@ 2023-08-19 12:14       ` Baolin Wang
  2023-08-22  1:51         ` Kemeng Shi
  0 siblings, 1 reply; 33+ messages in thread
From: Baolin Wang @ 2023-08-19 12:14 UTC (permalink / raw)
  To: Kemeng Shi, linux-mm, linux-kernel, akpm, mgorman, david



On 8/15/2023 8:04 PM, Kemeng Shi wrote:
> 
> 
> on 8/15/2023 4:58 PM, Baolin Wang wrote:
>>
>>
>> On 8/5/2023 7:07 PM, Kemeng Shi wrote:
>>> We have order = -1 via proactive compaction, the is_via_compact_memory is
>>> not proper name anymore.
>>> As cc->order informs the compaction to satisfy a allocation with that
>>> order, so rename it to compaction_with_allocation_order.
>>>
>>> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
>>> ---
>>>    mm/compaction.c | 11 +++++------
>>>    1 file changed, 5 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/mm/compaction.c b/mm/compaction.c
>>> index d8416d3dd445..b5a699ed526b 100644
>>> --- a/mm/compaction.c
>>> +++ b/mm/compaction.c
>>> @@ -2055,12 +2055,11 @@ static isolate_migrate_t isolate_migratepages(struct compact_control *cc)
>>>    }
>>>      /*
>>> - * order == -1 is expected when compacting via
>>> - * /proc/sys/vm/compact_memory
>>> + * compact to satisfy allocation with target order
>>>     */
>>> -static inline bool is_via_compact_memory(int order)
>>> +static inline bool compaction_with_allocation_order(int order)
>>
>> I know naming is hard, but this name is not good enough that can show the compaction mode. But the original one could.
>>
> Yes, I agree with this, but name and comment of is_via_compact_memory may
> mislead reader that order == -1 is equivalent to compaction from
> /proc/sys/vm/compact_memory.
> Actually, we have several approaches to trigger compaction with order == -1:
> 1. via /proc/sys/vm/compact_memory
> 2. via /sys/devices/system/node/nodex/compact
> 3. via proactive compact

They can all be called proactive compaction.

> 
> Instead of indicate compaction is tirggerred by compact_memocy or anything,
> order == -1 implies if compaction is triggerrred to meet allocation with high
> order and we will stop compaction if allocation with target order will success.

IMO, the is_via_compact_memory() function helps people better 
distinguish the compaction logic we have under direct compaction or 
kcompactd compaction, while proactive compaction does not concern itself 
with these details. But compaction_with_allocation_order() will make me 
just wonder why we should compare with -1. So I don't think this patch 
is worth it, but as you said above, we can add more comments to make it 
more clear.

>>>    {
>>> -    return order == -1;
>>> +    return order != -1;
>>>    }
>>>      /*
>>> @@ -2200,7 +2199,7 @@ static enum compact_result __compact_finished(struct compact_control *cc)
>>>            goto out;
>>>        }
>>>    -    if (is_via_compact_memory(cc->order))
>>> +    if (!compaction_with_allocation_order(cc->order))
>>>            return COMPACT_CONTINUE;
>>>          /*
>>> @@ -2390,7 +2389,7 @@ compact_zone(struct compact_control *cc, struct capture_control *capc)
>>>          cc->migratetype = gfp_migratetype(cc->gfp_mask);
>>>    -    if (!is_via_compact_memory(cc->order)) {
>>> +    if (compaction_with_allocation_order(cc->order)) {
>>>            unsigned long watermark;
>>>              /* Allocation can already succeed, nothing to do */
>>


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 7/9] mm/compaction: factor out code to test if we should run compaction for target order
  2023-08-15 12:10     ` Kemeng Shi
@ 2023-08-19 12:27       ` Baolin Wang
  2023-08-22  1:57         ` Kemeng Shi
  0 siblings, 1 reply; 33+ messages in thread
From: Baolin Wang @ 2023-08-19 12:27 UTC (permalink / raw)
  To: Kemeng Shi, linux-mm, linux-kernel, akpm, mgorman, david



On 8/15/2023 8:10 PM, Kemeng Shi wrote:
> 
> 
> on 8/15/2023 4:53 PM, Baolin Wang wrote:
>>
>>
>> On 8/5/2023 7:07 PM, Kemeng Shi wrote:
>>> We always do zone_watermark_ok check and compaction_suitable check
>>> together to test if compaction for target order should be runned.
>>> Factor these code out for preparation to remove repeat code.
>>>
>>> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
>>> ---
>>>    mm/compaction.c | 42 +++++++++++++++++++++++++++++-------------
>>>    1 file changed, 29 insertions(+), 13 deletions(-)
>>>
>>> diff --git a/mm/compaction.c b/mm/compaction.c
>>> index b5a699ed526b..26787ebb0297 100644
>>> --- a/mm/compaction.c
>>> +++ b/mm/compaction.c
>>> @@ -2365,6 +2365,30 @@ bool compaction_zonelist_suitable(struct alloc_context *ac, int order,
>>>        return false;
>>>    }
>>>    +/*
>>> + * Should we do compaction for target allocation order.
>>> + * Return COMPACT_SUCCESS if allocation for target order can be already
>>> + * satisfied
>>> + * Return COMPACT_SKIPPED if compaction for target order is likely to fail
>>> + * Return COMPACT_CONTINUE if compaction for target order should be runned
>>> + */
>>> +static inline enum compact_result
>>> +compaction_suit_allocation_order(struct zone *zone, unsigned int order,
>>> +                 int highest_zoneidx, unsigned int alloc_flags)
>>> +{
>>> +    unsigned long watermark;
>>> +
>>> +    watermark = wmark_pages(zone, alloc_flags & ALLOC_WMARK_MASK);
>>
>> IIUC, the watermark used in patch 8 and patch 9 is different, right? Have you measured the impact of modifying this watermark?
>>
> Actually, there is no functional change intended. Consider wmark_pages with
> alloc_flags = 0 is equivalent to min_wmark_pages, patch 8 and patch 9 still
> use original watermark.

Can you use ALLOC_WMARK_MIN macro to make it more clear?

And I think patch 8 and patch 9 should be squashed into patch 7 to 
convert all at once.

>>> +    if (zone_watermark_ok(zone, order, watermark, highest_zoneidx,
>>> +                  alloc_flags))
>>> +        return COMPACT_SUCCESS;
>>> +
>>> +    if (!compaction_suitable(zone, order, highest_zoneidx))
>>> +        return COMPACT_SKIPPED;
>>> +
>>> +    return COMPACT_CONTINUE;
>>> +}
>>> +
>>>    static enum compact_result
>>>    compact_zone(struct compact_control *cc, struct capture_control *capc)
>>>    {
>>> @@ -2390,19 +2414,11 @@ compact_zone(struct compact_control *cc, struct capture_control *capc)
>>>        cc->migratetype = gfp_migratetype(cc->gfp_mask);
>>>          if (compaction_with_allocation_order(cc->order)) {
>>> -        unsigned long watermark;
>>> -
>>> -        /* Allocation can already succeed, nothing to do */
>>> -        watermark = wmark_pages(cc->zone,
>>> -                    cc->alloc_flags & ALLOC_WMARK_MASK);
>>> -        if (zone_watermark_ok(cc->zone, cc->order, watermark,
>>> -                      cc->highest_zoneidx, cc->alloc_flags))
>>> -            return COMPACT_SUCCESS;
>>> -
>>> -        /* Compaction is likely to fail */
>>> -        if (!compaction_suitable(cc->zone, cc->order,
>>> -                     cc->highest_zoneidx))
>>> -            return COMPACT_SKIPPED;
>>> +        ret = compaction_suit_allocation_order(cc->zone, cc->order,
>>> +                               cc->highest_zoneidx,
>>> +                               cc->alloc_flags);
>>> +        if (ret != COMPACT_CONTINUE)
>>> +            return ret;
>>>        }
>>>          /*
>>
>>


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 4/9] mm/compaction: simplify pfn iteration in isolate_freepages_range
  2023-08-19 11:58           ` Baolin Wang
@ 2023-08-22  1:37             ` Kemeng Shi
  2023-08-24  2:19               ` Baolin Wang
  0 siblings, 1 reply; 33+ messages in thread
From: Kemeng Shi @ 2023-08-22  1:37 UTC (permalink / raw)
  To: Baolin Wang, linux-mm, linux-kernel, akpm, mgorman, david



on 8/19/2023 7:58 PM, Baolin Wang wrote:
> 
> 
> On 8/15/2023 6:37 PM, Kemeng Shi wrote:
>>
>>
>> on 8/15/2023 6:07 PM, Baolin Wang wrote:
>>>
>>>
>>> On 8/15/2023 5:32 PM, Kemeng Shi wrote:
>>>>
>>>>
>>>> on 8/15/2023 4:38 PM, Baolin Wang wrote:
>>>>>
>>>>>
>>>>> On 8/5/2023 7:07 PM, Kemeng Shi wrote:
>>>>>> We call isolate_freepages_block in strict mode, continuous pages in
>>>>>> pageblock will be isolated if isolate_freepages_block successed.
>>>>>> Then pfn + isolated will point to start of next pageblock to scan
>>>>>> no matter how many pageblocks are isolated in isolate_freepages_block.
>>>>>> Use pfn + isolated as start of next pageblock to scan to simplify the
>>>>>> iteration.
>>>>>
>>>>> IIUC, the isolate_freepages_block() can isolate high-order free pages, which means the pfn + isolated can be larger than the block_end_pfn. So in your patch, the 'block_start_pfn' and 'block_end_pfn' can be in different pageblocks, that will break pageblock_pfn_to_page().
>>>>>
>>>> In for update statement, we always update block_start_pfn to pfn and
>>>
>>> I mean, you changed to:
>>> 1) pfn += isolated;
>>> 2) block_start_pfn = pfn;
>>> 3) block_end_pfn = pfn + pageblock_nr_pages;
>>>
>>> But in 1) pfn + isolated can go outside of the currnet pageblock if isolating a high-order page, for example, located in the middle of the next pageblock. So that the block_start_pfn can point to the middle of the next pageblock, not the start position. Meanwhile after 3), the block_end_pfn can point another pageblock. Or I missed something else?
>>>
>> Ah, I miss to explain this in changelog.
>> In case we could we have buddy page with order higher than pageblock:
>> 1. page in buddy page is aligned with it's order
>> 2. order of page is higher than pageblock order
>> Then page is aligned with pageblock order. So pfn of page and isolated pages
>> count are both aligned pageblock order. So pfn + isolated is pageblock order
>> aligned.
> 
> That's not what I mean. pfn + isolated is not always pageblock-aligned, since the isolate_freepages_block() can isolated high-order free pages (for example: order-1, order-2 ...).
> 
> Suppose the pageblock size is 2M, when isolating a pageblock (suppose the pfn range is 0 - 511 to make the arithmetic easy) by isolate_freepages_block(), and suppose pfn 0 to pfn 510 are all order-0 page, but pfn 511 is order-1 page, so you will isolate 513 pages from this pageblock, which will make 'pfn + isolated' not pageblock aligned.
This is also no supposed to happen as low order buddy pages should never span
cross boundary of high order pages:
In buddy system, we always split order N pages into two order N - 1 pages as
following:
|        order N        |
|order N - 1|order N - 1|
So buddy pages with order N - 1 will never cross boudary of order N. Similar,
buddy pages with order N - 2 will never cross boudary of order N - 1 and so
on. Then any pages with order less than N will never cross boudary of order
N.

> 
>>>> update block_end_pfn to pfn + pageblock_nr_pages. So they should point
>>>> to the same pageblock. I guess you missed the change to update of
>>>> block_end_pfn. :)
>>>>>>
>>>>>> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
>>>>>> ---
>>>>>>     mm/compaction.c | 14 ++------------
>>>>>>     1 file changed, 2 insertions(+), 12 deletions(-)
>>>>>>
>>>>>> diff --git a/mm/compaction.c b/mm/compaction.c
>>>>>> index 684f6e6cd8bc..8d7d38073d30 100644
>>>>>> --- a/mm/compaction.c
>>>>>> +++ b/mm/compaction.c
>>>>>> @@ -733,21 +733,11 @@ isolate_freepages_range(struct compact_control *cc,
>>>>>>         block_end_pfn = pageblock_end_pfn(pfn);
>>>>>>           for (; pfn < end_pfn; pfn += isolated,
>>>>>> -                block_start_pfn = block_end_pfn,
>>>>>> -                block_end_pfn += pageblock_nr_pages) {
>>>>>> +                block_start_pfn = pfn,
>>>>>> +                block_end_pfn = pfn + pageblock_nr_pages) {
>>>>>>             /* Protect pfn from changing by isolate_freepages_block */
>>>>>>             unsigned long isolate_start_pfn = pfn;
>>>>>>     -        /*
>>>>>> -         * pfn could pass the block_end_pfn if isolated freepage
>>>>>> -         * is more than pageblock order. In this case, we adjust
>>>>>> -         * scanning range to right one.
>>>>>> -         */
>>>>>> -        if (pfn >= block_end_pfn) {
>>>>>> -            block_start_pfn = pageblock_start_pfn(pfn);
>>>>>> -            block_end_pfn = pageblock_end_pfn(pfn);
>>>>>> -        }
>>>>>> -
>>>>>>             block_end_pfn = min(block_end_pfn, end_pfn);
>>>>>>               if (!pageblock_pfn_to_page(block_start_pfn,
>>>>>
>>>
> 
> 



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 6/9] mm/compaction: rename is_via_compact_memory to compaction_with_allocation_order
  2023-08-19 12:14       ` Baolin Wang
@ 2023-08-22  1:51         ` Kemeng Shi
  2023-08-24  2:20           ` Baolin Wang
  0 siblings, 1 reply; 33+ messages in thread
From: Kemeng Shi @ 2023-08-22  1:51 UTC (permalink / raw)
  To: Baolin Wang, linux-mm, linux-kernel, akpm, mgorman, david



on 8/19/2023 8:14 PM, Baolin Wang wrote:
> 
> 
> On 8/15/2023 8:04 PM, Kemeng Shi wrote:
>>
>>
>> on 8/15/2023 4:58 PM, Baolin Wang wrote:
>>>
>>>
>>> On 8/5/2023 7:07 PM, Kemeng Shi wrote:
>>>> We have order = -1 via proactive compaction, the is_via_compact_memory is
>>>> not proper name anymore.
>>>> As cc->order informs the compaction to satisfy a allocation with that
>>>> order, so rename it to compaction_with_allocation_order.
>>>>
>>>> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
>>>> ---
>>>>    mm/compaction.c | 11 +++++------
>>>>    1 file changed, 5 insertions(+), 6 deletions(-)
>>>>
>>>> diff --git a/mm/compaction.c b/mm/compaction.c
>>>> index d8416d3dd445..b5a699ed526b 100644
>>>> --- a/mm/compaction.c
>>>> +++ b/mm/compaction.c
>>>> @@ -2055,12 +2055,11 @@ static isolate_migrate_t isolate_migratepages(struct compact_control *cc)
>>>>    }
>>>>      /*
>>>> - * order == -1 is expected when compacting via
>>>> - * /proc/sys/vm/compact_memory
>>>> + * compact to satisfy allocation with target order
>>>>     */
>>>> -static inline bool is_via_compact_memory(int order)
>>>> +static inline bool compaction_with_allocation_order(int order)
>>>
>>> I know naming is hard, but this name is not good enough that can show the compaction mode. But the original one could.
>>>
>> Yes, I agree with this, but name and comment of is_via_compact_memory may
>> mislead reader that order == -1 is equivalent to compaction from
>> /proc/sys/vm/compact_memory.
>> Actually, we have several approaches to trigger compaction with order == -1:
>> 1. via /proc/sys/vm/compact_memory
>> 2. via /sys/devices/system/node/nodex/compact
>> 3. via proactive compact
> 
> They can all be called proactive compaction.
I have considered rename to is_proactive_compaction. But "proactive compaction"
in comments of compaction.c mostly implies to compaction triggerred from
/proc/sys/vm/compaction_proactiveness. So "proactive compaction" itself looks
ambiguous...
> 
>>
>> Instead of indicate compaction is tirggerred by compact_memocy or anything,
>> order == -1 implies if compaction is triggerrred to meet allocation with high
>> order and we will stop compaction if allocation with target order will success.
> 
> IMO, the is_via_compact_memory() function helps people better distinguish the compaction logic we have under direct compaction or kcompactd compaction, while proactive compaction does not concern itself with these details. But compaction_with_allocation_order() will make me just wonder why we should compare with -1. So I don't think this patch is worth it, but as you said above, we can add more comments to make it more clear.
> 
Sure, no insistant on this.
Is it looks good to you just change comment of is_via_compact_memory to:
We need do compaction proactively with order == -1
order == -1 is expected for proactive compaction via:
1. via /proc/sys/vm/compact_memory
2. via /sys/devices/system/node/nodex/compact
3. /proc/sys/vm/compaction_proactiveness

>>>>    {
>>>> -    return order == -1;
>>>> +    return order != -1;
>>>>    }
>>>>      /*
>>>> @@ -2200,7 +2199,7 @@ static enum compact_result __compact_finished(struct compact_control *cc)
>>>>            goto out;
>>>>        }
>>>>    -    if (is_via_compact_memory(cc->order))
>>>> +    if (!compaction_with_allocation_order(cc->order))
>>>>            return COMPACT_CONTINUE;
>>>>          /*
>>>> @@ -2390,7 +2389,7 @@ compact_zone(struct compact_control *cc, struct capture_control *capc)
>>>>          cc->migratetype = gfp_migratetype(cc->gfp_mask);
>>>>    -    if (!is_via_compact_memory(cc->order)) {
>>>> +    if (compaction_with_allocation_order(cc->order)) {
>>>>            unsigned long watermark;
>>>>              /* Allocation can already succeed, nothing to do */
>>>
> 
> 



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 7/9] mm/compaction: factor out code to test if we should run compaction for target order
  2023-08-19 12:27       ` Baolin Wang
@ 2023-08-22  1:57         ` Kemeng Shi
  2023-08-24  2:25           ` Baolin Wang
  0 siblings, 1 reply; 33+ messages in thread
From: Kemeng Shi @ 2023-08-22  1:57 UTC (permalink / raw)
  To: Baolin Wang, linux-mm, linux-kernel, akpm, mgorman, david



on 8/19/2023 8:27 PM, Baolin Wang wrote:
> 
> 
> On 8/15/2023 8:10 PM, Kemeng Shi wrote:
>>
>>
>> on 8/15/2023 4:53 PM, Baolin Wang wrote:
>>>
>>>
>>> On 8/5/2023 7:07 PM, Kemeng Shi wrote:
>>>> We always do zone_watermark_ok check and compaction_suitable check
>>>> together to test if compaction for target order should be runned.
>>>> Factor these code out for preparation to remove repeat code.
>>>>
>>>> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
>>>> ---
>>>>    mm/compaction.c | 42 +++++++++++++++++++++++++++++-------------
>>>>    1 file changed, 29 insertions(+), 13 deletions(-)
>>>>
>>>> diff --git a/mm/compaction.c b/mm/compaction.c
>>>> index b5a699ed526b..26787ebb0297 100644
>>>> --- a/mm/compaction.c
>>>> +++ b/mm/compaction.c
>>>> @@ -2365,6 +2365,30 @@ bool compaction_zonelist_suitable(struct alloc_context *ac, int order,
>>>>        return false;
>>>>    }
>>>>    +/*
>>>> + * Should we do compaction for target allocation order.
>>>> + * Return COMPACT_SUCCESS if allocation for target order can be already
>>>> + * satisfied
>>>> + * Return COMPACT_SKIPPED if compaction for target order is likely to fail
>>>> + * Return COMPACT_CONTINUE if compaction for target order should be runned
>>>> + */
>>>> +static inline enum compact_result
>>>> +compaction_suit_allocation_order(struct zone *zone, unsigned int order,
>>>> +                 int highest_zoneidx, unsigned int alloc_flags)
>>>> +{
>>>> +    unsigned long watermark;
>>>> +
>>>> +    watermark = wmark_pages(zone, alloc_flags & ALLOC_WMARK_MASK);
>>>
>>> IIUC, the watermark used in patch 8 and patch 9 is different, right? Have you measured the impact of modifying this watermark?
>>>
>> Actually, there is no functional change intended. Consider wmark_pages with
>> alloc_flags = 0 is equivalent to min_wmark_pages, patch 8 and patch 9 still
>> use original watermark.
> 
> Can you use ALLOC_WMARK_MIN macro to make it more clear?
Sorry, I can't quite follow this. The watermark should differ with different
alloc_flags instead of WMARK_MIN hard-coded.
Patch 8 and patch 9 use watermark with WMARK_MIN as they get alloc_flags = 0.
> 
> And I think patch 8 and patch 9 should be squashed into patch 7 to convert all at once.
Sure, i could do this in next version.
> 
>>>> +    if (zone_watermark_ok(zone, order, watermark, highest_zoneidx,
>>>> +                  alloc_flags))
>>>> +        return COMPACT_SUCCESS;
>>>> +
>>>> +    if (!compaction_suitable(zone, order, highest_zoneidx))
>>>> +        return COMPACT_SKIPPED;
>>>> +
>>>> +    return COMPACT_CONTINUE;
>>>> +}
>>>> +
>>>>    static enum compact_result
>>>>    compact_zone(struct compact_control *cc, struct capture_control *capc)
>>>>    {
>>>> @@ -2390,19 +2414,11 @@ compact_zone(struct compact_control *cc, struct capture_control *capc)
>>>>        cc->migratetype = gfp_migratetype(cc->gfp_mask);
>>>>          if (compaction_with_allocation_order(cc->order)) {
>>>> -        unsigned long watermark;
>>>> -
>>>> -        /* Allocation can already succeed, nothing to do */
>>>> -        watermark = wmark_pages(cc->zone,
>>>> -                    cc->alloc_flags & ALLOC_WMARK_MASK);
>>>> -        if (zone_watermark_ok(cc->zone, cc->order, watermark,
>>>> -                      cc->highest_zoneidx, cc->alloc_flags))
>>>> -            return COMPACT_SUCCESS;
>>>> -
>>>> -        /* Compaction is likely to fail */
>>>> -        if (!compaction_suitable(cc->zone, cc->order,
>>>> -                     cc->highest_zoneidx))
>>>> -            return COMPACT_SKIPPED;
>>>> +        ret = compaction_suit_allocation_order(cc->zone, cc->order,
>>>> +                               cc->highest_zoneidx,
>>>> +                               cc->alloc_flags);
>>>> +        if (ret != COMPACT_CONTINUE)
>>>> +            return ret;
>>>>        }
>>>>          /*
>>>
>>>
> 
> 



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 4/9] mm/compaction: simplify pfn iteration in isolate_freepages_range
  2023-08-22  1:37             ` Kemeng Shi
@ 2023-08-24  2:19               ` Baolin Wang
  0 siblings, 0 replies; 33+ messages in thread
From: Baolin Wang @ 2023-08-24  2:19 UTC (permalink / raw)
  To: Kemeng Shi, linux-mm, linux-kernel, akpm, mgorman, david



On 8/22/2023 9:37 AM, Kemeng Shi wrote:
> 
> 
> on 8/19/2023 7:58 PM, Baolin Wang wrote:
>>
>>
>> On 8/15/2023 6:37 PM, Kemeng Shi wrote:
>>>
>>>
>>> on 8/15/2023 6:07 PM, Baolin Wang wrote:
>>>>
>>>>
>>>> On 8/15/2023 5:32 PM, Kemeng Shi wrote:
>>>>>
>>>>>
>>>>> on 8/15/2023 4:38 PM, Baolin Wang wrote:
>>>>>>
>>>>>>
>>>>>> On 8/5/2023 7:07 PM, Kemeng Shi wrote:
>>>>>>> We call isolate_freepages_block in strict mode, continuous pages in
>>>>>>> pageblock will be isolated if isolate_freepages_block successed.
>>>>>>> Then pfn + isolated will point to start of next pageblock to scan
>>>>>>> no matter how many pageblocks are isolated in isolate_freepages_block.
>>>>>>> Use pfn + isolated as start of next pageblock to scan to simplify the
>>>>>>> iteration.
>>>>>>
>>>>>> IIUC, the isolate_freepages_block() can isolate high-order free pages, which means the pfn + isolated can be larger than the block_end_pfn. So in your patch, the 'block_start_pfn' and 'block_end_pfn' can be in different pageblocks, that will break pageblock_pfn_to_page().
>>>>>>
>>>>> In for update statement, we always update block_start_pfn to pfn and
>>>>
>>>> I mean, you changed to:
>>>> 1) pfn += isolated;
>>>> 2) block_start_pfn = pfn;
>>>> 3) block_end_pfn = pfn + pageblock_nr_pages;
>>>>
>>>> But in 1) pfn + isolated can go outside of the currnet pageblock if isolating a high-order page, for example, located in the middle of the next pageblock. So that the block_start_pfn can point to the middle of the next pageblock, not the start position. Meanwhile after 3), the block_end_pfn can point another pageblock. Or I missed something else?
>>>>
>>> Ah, I miss to explain this in changelog.
>>> In case we could we have buddy page with order higher than pageblock:
>>> 1. page in buddy page is aligned with it's order
>>> 2. order of page is higher than pageblock order
>>> Then page is aligned with pageblock order. So pfn of page and isolated pages
>>> count are both aligned pageblock order. So pfn + isolated is pageblock order
>>> aligned.
>>
>> That's not what I mean. pfn + isolated is not always pageblock-aligned, since the isolate_freepages_block() can isolated high-order free pages (for example: order-1, order-2 ...).
>>
>> Suppose the pageblock size is 2M, when isolating a pageblock (suppose the pfn range is 0 - 511 to make the arithmetic easy) by isolate_freepages_block(), and suppose pfn 0 to pfn 510 are all order-0 page, but pfn 511 is order-1 page, so you will isolate 513 pages from this pageblock, which will make 'pfn + isolated' not pageblock aligned.

I realized I made a bad example, sorry for noise.

After more thinking, I agree that the 'pfn + isolated' is always 
pageblock aligned in strict mode. So feel free to add:
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>

> This is also no supposed to happen as low order buddy pages should never span
> cross boundary of high order pages:
> In buddy system, we always split order N pages into two order N - 1 pages as
> following:
> |        order N        |
> |order N - 1|order N - 1|
> So buddy pages with order N - 1 will never cross boudary of order N. Similar,
> buddy pages with order N - 2 will never cross boudary of order N - 1 and so
> on. Then any pages with order less than N will never cross boudary of order
> N.
> 
>>
>>>>> update block_end_pfn to pfn + pageblock_nr_pages. So they should point
>>>>> to the same pageblock. I guess you missed the change to update of
>>>>> block_end_pfn. :)
>>>>>>>
>>>>>>> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
>>>>>>> ---
>>>>>>>      mm/compaction.c | 14 ++------------
>>>>>>>      1 file changed, 2 insertions(+), 12 deletions(-)
>>>>>>>
>>>>>>> diff --git a/mm/compaction.c b/mm/compaction.c
>>>>>>> index 684f6e6cd8bc..8d7d38073d30 100644
>>>>>>> --- a/mm/compaction.c
>>>>>>> +++ b/mm/compaction.c
>>>>>>> @@ -733,21 +733,11 @@ isolate_freepages_range(struct compact_control *cc,
>>>>>>>          block_end_pfn = pageblock_end_pfn(pfn);
>>>>>>>            for (; pfn < end_pfn; pfn += isolated,
>>>>>>> -                block_start_pfn = block_end_pfn,
>>>>>>> -                block_end_pfn += pageblock_nr_pages) {
>>>>>>> +                block_start_pfn = pfn,
>>>>>>> +                block_end_pfn = pfn + pageblock_nr_pages) {
>>>>>>>              /* Protect pfn from changing by isolate_freepages_block */
>>>>>>>              unsigned long isolate_start_pfn = pfn;
>>>>>>>      -        /*
>>>>>>> -         * pfn could pass the block_end_pfn if isolated freepage
>>>>>>> -         * is more than pageblock order. In this case, we adjust
>>>>>>> -         * scanning range to right one.
>>>>>>> -         */
>>>>>>> -        if (pfn >= block_end_pfn) {
>>>>>>> -            block_start_pfn = pageblock_start_pfn(pfn);
>>>>>>> -            block_end_pfn = pageblock_end_pfn(pfn);
>>>>>>> -        }
>>>>>>> -
>>>>>>>              block_end_pfn = min(block_end_pfn, end_pfn);
>>>>>>>                if (!pageblock_pfn_to_page(block_start_pfn,
>>>>>>
>>>>
>>
>>


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 6/9] mm/compaction: rename is_via_compact_memory to compaction_with_allocation_order
  2023-08-22  1:51         ` Kemeng Shi
@ 2023-08-24  2:20           ` Baolin Wang
  0 siblings, 0 replies; 33+ messages in thread
From: Baolin Wang @ 2023-08-24  2:20 UTC (permalink / raw)
  To: Kemeng Shi, linux-mm, linux-kernel, akpm, mgorman, david



On 8/22/2023 9:51 AM, Kemeng Shi wrote:
> 
> 
> on 8/19/2023 8:14 PM, Baolin Wang wrote:
>>
>>
>> On 8/15/2023 8:04 PM, Kemeng Shi wrote:
>>>
>>>
>>> on 8/15/2023 4:58 PM, Baolin Wang wrote:
>>>>
>>>>
>>>> On 8/5/2023 7:07 PM, Kemeng Shi wrote:
>>>>> We have order = -1 via proactive compaction, the is_via_compact_memory is
>>>>> not proper name anymore.
>>>>> As cc->order informs the compaction to satisfy a allocation with that
>>>>> order, so rename it to compaction_with_allocation_order.
>>>>>
>>>>> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
>>>>> ---
>>>>>     mm/compaction.c | 11 +++++------
>>>>>     1 file changed, 5 insertions(+), 6 deletions(-)
>>>>>
>>>>> diff --git a/mm/compaction.c b/mm/compaction.c
>>>>> index d8416d3dd445..b5a699ed526b 100644
>>>>> --- a/mm/compaction.c
>>>>> +++ b/mm/compaction.c
>>>>> @@ -2055,12 +2055,11 @@ static isolate_migrate_t isolate_migratepages(struct compact_control *cc)
>>>>>     }
>>>>>       /*
>>>>> - * order == -1 is expected when compacting via
>>>>> - * /proc/sys/vm/compact_memory
>>>>> + * compact to satisfy allocation with target order
>>>>>      */
>>>>> -static inline bool is_via_compact_memory(int order)
>>>>> +static inline bool compaction_with_allocation_order(int order)
>>>>
>>>> I know naming is hard, but this name is not good enough that can show the compaction mode. But the original one could.
>>>>
>>> Yes, I agree with this, but name and comment of is_via_compact_memory may
>>> mislead reader that order == -1 is equivalent to compaction from
>>> /proc/sys/vm/compact_memory.
>>> Actually, we have several approaches to trigger compaction with order == -1:
>>> 1. via /proc/sys/vm/compact_memory
>>> 2. via /sys/devices/system/node/nodex/compact
>>> 3. via proactive compact
>>
>> They can all be called proactive compaction.
> I have considered rename to is_proactive_compaction. But "proactive compaction"
> in comments of compaction.c mostly implies to compaction triggerred from
> /proc/sys/vm/compaction_proactiveness. So "proactive compaction" itself looks
> ambiguous...
>>
>>>
>>> Instead of indicate compaction is tirggerred by compact_memocy or anything,
>>> order == -1 implies if compaction is triggerrred to meet allocation with high
>>> order and we will stop compaction if allocation with target order will success.
>>
>> IMO, the is_via_compact_memory() function helps people better distinguish the compaction logic we have under direct compaction or kcompactd compaction, while proactive compaction does not concern itself with these details. But compaction_with_allocation_order() will make me just wonder why we should compare with -1. So I don't think this patch is worth it, but as you said above, we can add more comments to make it more clear.
>>
> Sure, no insistant on this.
> Is it looks good to you just change comment of is_via_compact_memory to:
> We need do compaction proactively with order == -1
> order == -1 is expected for proactive compaction via:
> 1. via /proc/sys/vm/compact_memory
> 2. via /sys/devices/system/node/nodex/compact
> 3. /proc/sys/vm/compaction_proactiveness

Look good to me. Thanks.

> 
>>>>>     {
>>>>> -    return order == -1;
>>>>> +    return order != -1;
>>>>>     }
>>>>>       /*
>>>>> @@ -2200,7 +2199,7 @@ static enum compact_result __compact_finished(struct compact_control *cc)
>>>>>             goto out;
>>>>>         }
>>>>>     -    if (is_via_compact_memory(cc->order))
>>>>> +    if (!compaction_with_allocation_order(cc->order))
>>>>>             return COMPACT_CONTINUE;
>>>>>           /*
>>>>> @@ -2390,7 +2389,7 @@ compact_zone(struct compact_control *cc, struct capture_control *capc)
>>>>>           cc->migratetype = gfp_migratetype(cc->gfp_mask);
>>>>>     -    if (!is_via_compact_memory(cc->order)) {
>>>>> +    if (compaction_with_allocation_order(cc->order)) {
>>>>>             unsigned long watermark;
>>>>>               /* Allocation can already succeed, nothing to do */
>>>>
>>
>>


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 7/9] mm/compaction: factor out code to test if we should run compaction for target order
  2023-08-22  1:57         ` Kemeng Shi
@ 2023-08-24  2:25           ` Baolin Wang
  2023-08-24  2:59             ` Kemeng Shi
  0 siblings, 1 reply; 33+ messages in thread
From: Baolin Wang @ 2023-08-24  2:25 UTC (permalink / raw)
  To: Kemeng Shi, linux-mm, linux-kernel, akpm, mgorman, david



On 8/22/2023 9:57 AM, Kemeng Shi wrote:
> 
> 
> on 8/19/2023 8:27 PM, Baolin Wang wrote:
>>
>>
>> On 8/15/2023 8:10 PM, Kemeng Shi wrote:
>>>
>>>
>>> on 8/15/2023 4:53 PM, Baolin Wang wrote:
>>>>
>>>>
>>>> On 8/5/2023 7:07 PM, Kemeng Shi wrote:
>>>>> We always do zone_watermark_ok check and compaction_suitable check
>>>>> together to test if compaction for target order should be runned.
>>>>> Factor these code out for preparation to remove repeat code.
>>>>>
>>>>> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
>>>>> ---
>>>>>     mm/compaction.c | 42 +++++++++++++++++++++++++++++-------------
>>>>>     1 file changed, 29 insertions(+), 13 deletions(-)
>>>>>
>>>>> diff --git a/mm/compaction.c b/mm/compaction.c
>>>>> index b5a699ed526b..26787ebb0297 100644
>>>>> --- a/mm/compaction.c
>>>>> +++ b/mm/compaction.c
>>>>> @@ -2365,6 +2365,30 @@ bool compaction_zonelist_suitable(struct alloc_context *ac, int order,
>>>>>         return false;
>>>>>     }
>>>>>     +/*
>>>>> + * Should we do compaction for target allocation order.
>>>>> + * Return COMPACT_SUCCESS if allocation for target order can be already
>>>>> + * satisfied
>>>>> + * Return COMPACT_SKIPPED if compaction for target order is likely to fail
>>>>> + * Return COMPACT_CONTINUE if compaction for target order should be runned
>>>>> + */
>>>>> +static inline enum compact_result
>>>>> +compaction_suit_allocation_order(struct zone *zone, unsigned int order,
>>>>> +                 int highest_zoneidx, unsigned int alloc_flags)
>>>>> +{
>>>>> +    unsigned long watermark;
>>>>> +
>>>>> +    watermark = wmark_pages(zone, alloc_flags & ALLOC_WMARK_MASK);
>>>>
>>>> IIUC, the watermark used in patch 8 and patch 9 is different, right? Have you measured the impact of modifying this watermark?
>>>>
>>> Actually, there is no functional change intended. Consider wmark_pages with
>>> alloc_flags = 0 is equivalent to min_wmark_pages, patch 8 and patch 9 still
>>> use original watermark.
>>
>> Can you use ALLOC_WMARK_MIN macro to make it more clear?
> Sorry, I can't quite follow this. The watermark should differ with different
> alloc_flags instead of WMARK_MIN hard-coded.
> Patch 8 and patch 9 use watermark with WMARK_MIN as they get alloc_flags = 0.

I mean you can pass 'alloc_flags=ALLOC_WMARK_MIN' instead of a magic 
number 0 when calling compaction_suit_allocation_order() in patch 8 and 
patch 9.

>> And I think patch 8 and patch 9 should be squashed into patch 7 to convert all at once.
> Sure, i could do this in next version.
>>
>>>>> +    if (zone_watermark_ok(zone, order, watermark, highest_zoneidx,
>>>>> +                  alloc_flags))
>>>>> +        return COMPACT_SUCCESS;
>>>>> +
>>>>> +    if (!compaction_suitable(zone, order, highest_zoneidx))
>>>>> +        return COMPACT_SKIPPED;
>>>>> +
>>>>> +    return COMPACT_CONTINUE;
>>>>> +}
>>>>> +
>>>>>     static enum compact_result
>>>>>     compact_zone(struct compact_control *cc, struct capture_control *capc)
>>>>>     {
>>>>> @@ -2390,19 +2414,11 @@ compact_zone(struct compact_control *cc, struct capture_control *capc)
>>>>>         cc->migratetype = gfp_migratetype(cc->gfp_mask);
>>>>>           if (compaction_with_allocation_order(cc->order)) {
>>>>> -        unsigned long watermark;
>>>>> -
>>>>> -        /* Allocation can already succeed, nothing to do */
>>>>> -        watermark = wmark_pages(cc->zone,
>>>>> -                    cc->alloc_flags & ALLOC_WMARK_MASK);
>>>>> -        if (zone_watermark_ok(cc->zone, cc->order, watermark,
>>>>> -                      cc->highest_zoneidx, cc->alloc_flags))
>>>>> -            return COMPACT_SUCCESS;
>>>>> -
>>>>> -        /* Compaction is likely to fail */
>>>>> -        if (!compaction_suitable(cc->zone, cc->order,
>>>>> -                     cc->highest_zoneidx))
>>>>> -            return COMPACT_SKIPPED;
>>>>> +        ret = compaction_suit_allocation_order(cc->zone, cc->order,
>>>>> +                               cc->highest_zoneidx,
>>>>> +                               cc->alloc_flags);
>>>>> +        if (ret != COMPACT_CONTINUE)
>>>>> +            return ret;
>>>>>         }
>>>>>           /*
>>>>
>>>>
>>
>>


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 7/9] mm/compaction: factor out code to test if we should run compaction for target order
  2023-08-24  2:25           ` Baolin Wang
@ 2023-08-24  2:59             ` Kemeng Shi
  0 siblings, 0 replies; 33+ messages in thread
From: Kemeng Shi @ 2023-08-24  2:59 UTC (permalink / raw)
  To: Baolin Wang, linux-mm, linux-kernel, akpm, mgorman, david



on 8/24/2023 10:25 AM, Baolin Wang wrote:
> 
> 
> On 8/22/2023 9:57 AM, Kemeng Shi wrote:
>>
>>
>> on 8/19/2023 8:27 PM, Baolin Wang wrote:
>>>
>>>
>>> On 8/15/2023 8:10 PM, Kemeng Shi wrote:
>>>>
>>>>
>>>> on 8/15/2023 4:53 PM, Baolin Wang wrote:
>>>>>
>>>>>
>>>>> On 8/5/2023 7:07 PM, Kemeng Shi wrote:
>>>>>> We always do zone_watermark_ok check and compaction_suitable check
>>>>>> together to test if compaction for target order should be runned.
>>>>>> Factor these code out for preparation to remove repeat code.
>>>>>>
>>>>>> Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
>>>>>> ---
>>>>>>     mm/compaction.c | 42 +++++++++++++++++++++++++++++-------------
>>>>>>     1 file changed, 29 insertions(+), 13 deletions(-)
>>>>>>
>>>>>> diff --git a/mm/compaction.c b/mm/compaction.c
>>>>>> index b5a699ed526b..26787ebb0297 100644
>>>>>> --- a/mm/compaction.c
>>>>>> +++ b/mm/compaction.c
>>>>>> @@ -2365,6 +2365,30 @@ bool compaction_zonelist_suitable(struct alloc_context *ac, int order,
>>>>>>         return false;
>>>>>>     }
>>>>>>     +/*
>>>>>> + * Should we do compaction for target allocation order.
>>>>>> + * Return COMPACT_SUCCESS if allocation for target order can be already
>>>>>> + * satisfied
>>>>>> + * Return COMPACT_SKIPPED if compaction for target order is likely to fail
>>>>>> + * Return COMPACT_CONTINUE if compaction for target order should be runned
>>>>>> + */
>>>>>> +static inline enum compact_result
>>>>>> +compaction_suit_allocation_order(struct zone *zone, unsigned int order,
>>>>>> +                 int highest_zoneidx, unsigned int alloc_flags)
>>>>>> +{
>>>>>> +    unsigned long watermark;
>>>>>> +
>>>>>> +    watermark = wmark_pages(zone, alloc_flags & ALLOC_WMARK_MASK);
>>>>>
>>>>> IIUC, the watermark used in patch 8 and patch 9 is different, right? Have you measured the impact of modifying this watermark?
>>>>>
>>>> Actually, there is no functional change intended. Consider wmark_pages with
>>>> alloc_flags = 0 is equivalent to min_wmark_pages, patch 8 and patch 9 still
>>>> use original watermark.
>>>
>>> Can you use ALLOC_WMARK_MIN macro to make it more clear?
>> Sorry, I can't quite follow this. The watermark should differ with different
>> alloc_flags instead of WMARK_MIN hard-coded.
>> Patch 8 and patch 9 use watermark with WMARK_MIN as they get alloc_flags = 0.
> 
> I mean you can pass 'alloc_flags=ALLOC_WMARK_MIN' instead of a magic number 0 when calling compaction_suit_allocation_order() in patch 8 and patch 9.
> 
Thanks for explain and this do make it better. I will do this in next version.
>>> And I think patch 8 and patch 9 should be squashed into patch 7 to convert all at once.
>> Sure, i could do this in next version.
>>>
>>>>>> +    if (zone_watermark_ok(zone, order, watermark, highest_zoneidx,
>>>>>> +                  alloc_flags))
>>>>>> +        return COMPACT_SUCCESS;
>>>>>> +
>>>>>> +    if (!compaction_suitable(zone, order, highest_zoneidx))
>>>>>> +        return COMPACT_SKIPPED;
>>>>>> +
>>>>>> +    return COMPACT_CONTINUE;
>>>>>> +}
>>>>>> +
>>>>>>     static enum compact_result
>>>>>>     compact_zone(struct compact_control *cc, struct capture_control *capc)
>>>>>>     {
>>>>>> @@ -2390,19 +2414,11 @@ compact_zone(struct compact_control *cc, struct capture_control *capc)
>>>>>>         cc->migratetype = gfp_migratetype(cc->gfp_mask);
>>>>>>           if (compaction_with_allocation_order(cc->order)) {
>>>>>> -        unsigned long watermark;
>>>>>> -
>>>>>> -        /* Allocation can already succeed, nothing to do */
>>>>>> -        watermark = wmark_pages(cc->zone,
>>>>>> -                    cc->alloc_flags & ALLOC_WMARK_MASK);
>>>>>> -        if (zone_watermark_ok(cc->zone, cc->order, watermark,
>>>>>> -                      cc->highest_zoneidx, cc->alloc_flags))
>>>>>> -            return COMPACT_SUCCESS;
>>>>>> -
>>>>>> -        /* Compaction is likely to fail */
>>>>>> -        if (!compaction_suitable(cc->zone, cc->order,
>>>>>> -                     cc->highest_zoneidx))
>>>>>> -            return COMPACT_SKIPPED;
>>>>>> +        ret = compaction_suit_allocation_order(cc->zone, cc->order,
>>>>>> +                               cc->highest_zoneidx,
>>>>>> +                               cc->alloc_flags);
>>>>>> +        if (ret != COMPACT_CONTINUE)
>>>>>> +            return ret;
>>>>>>         }
>>>>>>           /*
>>>>>
>>>>>
>>>
>>>
> 



^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2023-08-24  2:59 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-08-05 11:07 [PATCH 0/9] Fixes and cleanups to compaction Kemeng Shi
2023-08-05  3:14 ` Matthew Wilcox
2023-08-05  4:07   ` Kemeng Shi
2023-08-05 11:07 ` [PATCH 2/9] mm/compaction: call list_is_{first}/{last} more intuitively in move_freelist_{head}/{tail} Kemeng Shi
2023-08-15  7:49   ` Baolin Wang
2023-08-05 11:07 ` [PATCH 5/9] mm/compaction: remove repeat compact_blockskip_flush check in reset_isolation_suitable Kemeng Shi
2023-08-15  8:42   ` Baolin Wang
2023-08-05 11:07 ` [PATCH 6/9] mm/compaction: rename is_via_compact_memory to compaction_with_allocation_order Kemeng Shi
2023-08-15  8:58   ` Baolin Wang
2023-08-15 12:04     ` Kemeng Shi
2023-08-19 12:14       ` Baolin Wang
2023-08-22  1:51         ` Kemeng Shi
2023-08-24  2:20           ` Baolin Wang
2023-08-05 11:07 ` [PATCH 7/9] mm/compaction: factor out code to test if we should run compaction for target order Kemeng Shi
2023-08-15  8:53   ` Baolin Wang
2023-08-15 12:10     ` Kemeng Shi
2023-08-19 12:27       ` Baolin Wang
2023-08-22  1:57         ` Kemeng Shi
2023-08-24  2:25           ` Baolin Wang
2023-08-24  2:59             ` Kemeng Shi
2023-08-05 11:07 ` [PATCH 9/9] mm/compaction: call compaction_suit_allocation_order in kcompactd_do_work Kemeng Shi
     [not found] ` <20230805110711.2975149-2-shikemeng@huaweicloud.com>
2023-08-05 17:11   ` [PATCH 1/9] mm/compaction: use correct list in move_freelist_{head}/{tail} Andrew Morton
2023-08-07  0:37     ` Kemeng Shi
2023-08-15  7:16   ` Baolin Wang
     [not found] ` <20230805110711.2975149-4-shikemeng@huaweicloud.com>
2023-08-15  8:28   ` [PATCH 3/9] mm/compaction: correctly return failure with bogus compound_order in strict mode Baolin Wang
2023-08-15  9:22     ` Kemeng Shi
     [not found] ` <20230805110711.2975149-5-shikemeng@huaweicloud.com>
2023-08-15  8:38   ` [PATCH 4/9] mm/compaction: simplify pfn iteration in isolate_freepages_range Baolin Wang
2023-08-15  9:32     ` Kemeng Shi
2023-08-15 10:07       ` Baolin Wang
2023-08-15 10:37         ` Kemeng Shi
2023-08-19 11:58           ` Baolin Wang
2023-08-22  1:37             ` Kemeng Shi
2023-08-24  2:19               ` Baolin Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).