All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/2] Reduce lock contention related with large folio
@ 2023-04-25  8:46 Yin Fengwei
  2023-04-25  8:46 ` [PATCH v2 1/2] THP: avoid lock when check whether THP is in deferred list Yin Fengwei
  2023-04-25  8:46 ` [PATCH v2 2/2] lru: allow large batched add large folio to lru list Yin Fengwei
  0 siblings, 2 replies; 15+ messages in thread
From: Yin Fengwei @ 2023-04-25  8:46 UTC (permalink / raw
  To: linux-mm, akpm, willy, yuzhao, ryan.roberts, ying.huang; +Cc: fengwei.yin

Ryan tried to enable the large folio for anonymous mapping [1].

Unlike large folio for page cache which doesn't trigger frequent page
allocation/free, large folio for anonymous mapping is allocated/freeed
more frequently. So large folio for anonymous mapping exposes some lock
contention.

Ryan mentioned the deferred queue lock in [1]. We also met other two
lock contention: lru lock and zone lock.

This series tries to mitigate the deferred queue lock and reduce lru
lock in some level.

The patch1 tries to reduce deferred queue lock by not acquiring queue
lock when check whether the folio is in deferred list or not. Test
page fault1 of will-it-scale showed 60% deferred queue lock contention
reduction.

The patch2 tries to reduce lru lock by allowing batched add large folio
to lru list. Test page fault1 of will-it-scale showed 20% lru lock
contention reduction.

The zone lock contention happens on large folio free path and related
with commit f26b3fa04611 "mm/page_alloc: limit number of high-order
pages on PCP during bulk free" and will not be address by this series.


[1]
https://lore.kernel.org/linux-mm/20230414130303.2345383-1-ryan.roberts@arm.com/

Changelog from v1:

For patch2:
  - Add Reported-by from Huang Ying which was missed by my mistake.

  - Fix kernel panic issue. The folio_batch_add() can have folio which
    doesn't reference folio directly:
    - For mlock usage, add new interface with extra parameter nr_pages.
      And callee pass nr_pages by direct reference folio.
    - For swap, shawdow and dax entries as parameter folio, treat the
      nr_pages as 1.

    With the fix, the stress testing can run 12 hours without any issue
    while hit kernel panic in around 3 minutes.

  - Update the lock contention info in commit message.

  - Change field name from pages_nr to nr_pages as Ying's suggestion.

For this version, still use PAGEVEC_SIZE as max nr_pages in fbatch. We
can revise it after we make decision about the page order for anonymous
large folio.


Yin Fengwei (2):
  THP: avoid lock when check whether THP is in deferred list
  lru: allow large batched add large folio to lru list

 include/linux/pagevec.h | 46 ++++++++++++++++++++++++++++++++++++++---
 mm/huge_memory.c        | 19 ++++++++++++++---
 mm/mlock.c              |  7 +++----
 mm/swap.c               |  3 +--
 4 files changed, 63 insertions(+), 12 deletions(-)

-- 
2.30.2



^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v2 1/2] THP: avoid lock when check whether THP is in deferred list
  2023-04-25  8:46 [PATCH v2 0/2] Reduce lock contention related with large folio Yin Fengwei
@ 2023-04-25  8:46 ` Yin Fengwei
  2023-04-25 12:38   ` Kirill A. Shutemov
                     ` (2 more replies)
  2023-04-25  8:46 ` [PATCH v2 2/2] lru: allow large batched add large folio to lru list Yin Fengwei
  1 sibling, 3 replies; 15+ messages in thread
From: Yin Fengwei @ 2023-04-25  8:46 UTC (permalink / raw
  To: linux-mm, akpm, willy, yuzhao, ryan.roberts, ying.huang; +Cc: fengwei.yin

free_transhuge_page() acquires split queue lock then check
whether the THP was added to deferred list or not.

It's safe to check whether the THP is in deferred list or not.
   When code hit free_transhuge_page(), there is no one tries
   to update the folio's _deferred_list.

   If folio is not in deferred_list, it's safe to check without
   acquiring lock.

   If folio is in deferred_list, the other node in deferred_list
   adding/deleteing doesn't impact the return value of
   list_epmty(@folio->_deferred_list).

Running page_fault1 of will-it-scale + order 2 folio for anonymous
mapping with 96 processes on an Ice Lake 48C/96T test box, we could
see the 61% split_queue_lock contention:
-   71.28%     0.35%  page_fault1_pro  [kernel.kallsyms]           [k]
    release_pages
   - 70.93% release_pages
      - 61.42% free_transhuge_page
         + 60.77% _raw_spin_lock_irqsave

With this patch applied, the split_queue_lock contention is less
than 1%.

Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
Tested-by: Ryan Roberts <ryan.roberts@arm.com>
---
 mm/huge_memory.c | 19 ++++++++++++++++---
 1 file changed, 16 insertions(+), 3 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 032fb0ef9cd1..c620f1f12247 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2799,12 +2799,25 @@ void free_transhuge_page(struct page *page)
 	struct deferred_split *ds_queue = get_deferred_split_queue(folio);
 	unsigned long flags;
 
-	spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
-	if (!list_empty(&folio->_deferred_list)) {
+	/*
+	 * At this point, there is no one trying to queue the folio
+	 * to deferred_list. folio->_deferred_list is not possible
+	 * being updated.
+	 *
+	 * If folio is already added to deferred_list, add/delete to/from
+	 * deferred_list will not impact list_empty(&folio->_deferred_list).
+	 * It's safe to check list_empty(&folio->_deferred_list) without
+	 * acquiring the lock.
+	 *
+	 * If folio is not in deferred_list, it's safe to check without
+	 * acquiring the lock.
+	 */
+	if (data_race(!list_empty(&folio->_deferred_list))) {
+		spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
 		ds_queue->split_queue_len--;
 		list_del(&folio->_deferred_list);
+		spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags);
 	}
-	spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags);
 	free_compound_page(page);
 }
 
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 2/2] lru: allow large batched add large folio to lru list
  2023-04-25  8:46 [PATCH v2 0/2] Reduce lock contention related with large folio Yin Fengwei
  2023-04-25  8:46 ` [PATCH v2 1/2] THP: avoid lock when check whether THP is in deferred list Yin Fengwei
@ 2023-04-25  8:46 ` Yin Fengwei
  1 sibling, 0 replies; 15+ messages in thread
From: Yin Fengwei @ 2023-04-25  8:46 UTC (permalink / raw
  To: linux-mm, akpm, willy, yuzhao, ryan.roberts, ying.huang; +Cc: fengwei.yin

Currently, large folio is not batched added to lru list. Which
cause high lru lock contention after enable large folio for
anonymous mapping.

Running page_fault1 of will-it-scale + order 2 folio with 96
processes on Ice Lake 48C/96T, the lru lock contention could
be around 65%:
-   65.38%     0.17%  page_fault1_pro  [kernel.kallsyms]           [k] folio_lruvec_lock_irqsave
   - 65.21% folio_lruvec_lock_irqsave

With this patch, the lru lock contention dropped to 47% with same
testing:
-   46.64%     0.24%  page_fault1_pro  [kernel.kallsyms]           [k] folio_lruvec_lock_irqsave
   + 46.40% folio_lruvec_lock_irqsave

Reported-by: "Huang, Ying" <ying.huang@intel.com>
Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
Tested-by: Ryan Roberts <ryan.roberts@arm.com>
---
 include/linux/pagevec.h | 46 ++++++++++++++++++++++++++++++++++++++---
 mm/mlock.c              |  7 +++----
 mm/swap.c               |  3 +--
 3 files changed, 47 insertions(+), 9 deletions(-)

diff --git a/include/linux/pagevec.h b/include/linux/pagevec.h
index f582f7213ea5..9479b7b50bc6 100644
--- a/include/linux/pagevec.h
+++ b/include/linux/pagevec.h
@@ -10,6 +10,7 @@
 #define _LINUX_PAGEVEC_H
 
 #include <linux/xarray.h>
+#include <linux/mm.h>
 
 /* 15 pointers + header align the pagevec structure to a power of two */
 #define PAGEVEC_SIZE	15
@@ -22,6 +23,7 @@ struct address_space;
 struct pagevec {
 	unsigned char nr;
 	bool percpu_pvec_drained;
+	unsigned short nr_pages;
 	struct page *pages[PAGEVEC_SIZE];
 };
 
@@ -30,12 +32,14 @@ void __pagevec_release(struct pagevec *pvec);
 static inline void pagevec_init(struct pagevec *pvec)
 {
 	pvec->nr = 0;
+	pvec->nr_pages = 0;
 	pvec->percpu_pvec_drained = false;
 }
 
 static inline void pagevec_reinit(struct pagevec *pvec)
 {
 	pvec->nr = 0;
+	pvec->nr_pages = 0;
 }
 
 static inline unsigned pagevec_count(struct pagevec *pvec)
@@ -54,7 +58,12 @@ static inline unsigned pagevec_space(struct pagevec *pvec)
 static inline unsigned pagevec_add(struct pagevec *pvec, struct page *page)
 {
 	pvec->pages[pvec->nr++] = page;
-	return pagevec_space(pvec);
+	pvec->nr_pages += compound_nr(page);
+
+	if (pvec->nr_pages > PAGEVEC_SIZE)
+		return 0;
+	else
+		return pagevec_space(pvec);
 }
 
 static inline void pagevec_release(struct pagevec *pvec)
@@ -75,6 +84,7 @@ static inline void pagevec_release(struct pagevec *pvec)
 struct folio_batch {
 	unsigned char nr;
 	bool percpu_pvec_drained;
+	unsigned short nr_pages;
 	struct folio *folios[PAGEVEC_SIZE];
 };
 
@@ -92,12 +102,14 @@ static_assert(offsetof(struct pagevec, pages) ==
 static inline void folio_batch_init(struct folio_batch *fbatch)
 {
 	fbatch->nr = 0;
+	fbatch->nr_pages = 0;
 	fbatch->percpu_pvec_drained = false;
 }
 
 static inline void folio_batch_reinit(struct folio_batch *fbatch)
 {
 	fbatch->nr = 0;
+	fbatch->nr_pages = 0;
 }
 
 static inline unsigned int folio_batch_count(struct folio_batch *fbatch)
@@ -110,6 +122,32 @@ static inline unsigned int fbatch_space(struct folio_batch *fbatch)
 	return PAGEVEC_SIZE - fbatch->nr;
 }
 
+/**
+ * folio_batch_add_nr_pages() - Add a folio to a batch.
+ * @fbatch: The folio batch.
+ * @folio: The folio to add.
+ * @nr_pages: The number of pages added to batch.
+ *
+ * The folio is added to the end of the batch.
+ * The batch must have previously been initialised using folio_batch_init().
+ *
+ * Return: The number of slots still available.
+ * Note: parameter folio may not be direct reference to folio and can't
+ *       use folio_nr_pages(folio).
+ *       Currently, this function is only called in mlock.c.
+ */
+static inline unsigned folio_batch_add_nr_pages(struct folio_batch *fbatch,
+		struct folio *folio, unsigned int nr_pages)
+{
+	fbatch->folios[fbatch->nr++] = folio;
+	fbatch->nr_pages += nr_pages;
+
+	if (fbatch->nr_pages > PAGEVEC_SIZE)
+		return 0;
+	else
+		return fbatch_space(fbatch);
+}
+
 /**
  * folio_batch_add() - Add a folio to a batch.
  * @fbatch: The folio batch.
@@ -123,8 +161,10 @@ static inline unsigned int fbatch_space(struct folio_batch *fbatch)
 static inline unsigned folio_batch_add(struct folio_batch *fbatch,
 		struct folio *folio)
 {
-	fbatch->folios[fbatch->nr++] = folio;
-	return fbatch_space(fbatch);
+	unsigned int nr_pages;
+
+	nr_pages = xa_is_value(folio) ? 1 : folio_nr_pages(folio);
+	return folio_batch_add_nr_pages(fbatch, folio, nr_pages);
 }
 
 static inline void folio_batch_release(struct folio_batch *fbatch)
diff --git a/mm/mlock.c b/mm/mlock.c
index 617469fce96d..2c7e5043aebe 100644
--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -243,19 +243,18 @@ bool need_mlock_drain(int cpu)
 void mlock_folio(struct folio *folio)
 {
 	struct folio_batch *fbatch;
+	unsigned int nr_pages = folio_nr_pages(folio);
 
 	local_lock(&mlock_fbatch.lock);
 	fbatch = this_cpu_ptr(&mlock_fbatch.fbatch);
 
 	if (!folio_test_set_mlocked(folio)) {
-		int nr_pages = folio_nr_pages(folio);
-
 		zone_stat_mod_folio(folio, NR_MLOCK, nr_pages);
 		__count_vm_events(UNEVICTABLE_PGMLOCKED, nr_pages);
 	}
 
 	folio_get(folio);
-	if (!folio_batch_add(fbatch, mlock_lru(folio)) ||
+	if (!folio_batch_add_nr_pages(fbatch, mlock_lru(folio), nr_pages) ||
 	    folio_test_large(folio) || lru_cache_disabled())
 		mlock_folio_batch(fbatch);
 	local_unlock(&mlock_fbatch.lock);
@@ -278,7 +277,7 @@ void mlock_new_folio(struct folio *folio)
 	__count_vm_events(UNEVICTABLE_PGMLOCKED, nr_pages);
 
 	folio_get(folio);
-	if (!folio_batch_add(fbatch, mlock_new(folio)) ||
+	if (!folio_batch_add_nr_pages(fbatch, mlock_new(folio), nr_pages) ||
 	    folio_test_large(folio) || lru_cache_disabled())
 		mlock_folio_batch(fbatch);
 	local_unlock(&mlock_fbatch.lock);
diff --git a/mm/swap.c b/mm/swap.c
index 57cb01b042f6..0f8554aeb338 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -228,8 +228,7 @@ static void folio_batch_move_lru(struct folio_batch *fbatch, move_fn_t move_fn)
 static void folio_batch_add_and_move(struct folio_batch *fbatch,
 		struct folio *folio, move_fn_t move_fn)
 {
-	if (folio_batch_add(fbatch, folio) && !folio_test_large(folio) &&
-	    !lru_cache_disabled())
+	if (folio_batch_add(fbatch, folio) && !lru_cache_disabled())
 		return;
 	folio_batch_move_lru(fbatch, move_fn);
 }
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 1/2] THP: avoid lock when check whether THP is in deferred list
  2023-04-25  8:46 ` [PATCH v2 1/2] THP: avoid lock when check whether THP is in deferred list Yin Fengwei
@ 2023-04-25 12:38   ` Kirill A. Shutemov
  2023-04-26  1:47     ` Yin Fengwei
                       ` (2 more replies)
  2023-04-26  1:13   ` Huang, Ying
  2023-04-26  8:11   ` Ryan Roberts
  2 siblings, 3 replies; 15+ messages in thread
From: Kirill A. Shutemov @ 2023-04-25 12:38 UTC (permalink / raw
  To: Yin Fengwei; +Cc: linux-mm, akpm, willy, yuzhao, ryan.roberts, ying.huang

On Tue, Apr 25, 2023 at 04:46:26PM +0800, Yin Fengwei wrote:
> free_transhuge_page() acquires split queue lock then check
> whether the THP was added to deferred list or not.
> 
> It's safe to check whether the THP is in deferred list or not.
>    When code hit free_transhuge_page(), there is no one tries
>    to update the folio's _deferred_list.
> 
>    If folio is not in deferred_list, it's safe to check without
>    acquiring lock.
> 
>    If folio is in deferred_list, the other node in deferred_list
>    adding/deleteing doesn't impact the return value of
>    list_epmty(@folio->_deferred_list).

Typo.

> 
> Running page_fault1 of will-it-scale + order 2 folio for anonymous
> mapping with 96 processes on an Ice Lake 48C/96T test box, we could
> see the 61% split_queue_lock contention:
> -   71.28%     0.35%  page_fault1_pro  [kernel.kallsyms]           [k]
>     release_pages
>    - 70.93% release_pages
>       - 61.42% free_transhuge_page
>          + 60.77% _raw_spin_lock_irqsave
> 
> With this patch applied, the split_queue_lock contention is less
> than 1%.
> 
> Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
> Tested-by: Ryan Roberts <ryan.roberts@arm.com>
> ---
>  mm/huge_memory.c | 19 ++++++++++++++++---
>  1 file changed, 16 insertions(+), 3 deletions(-)
> 
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 032fb0ef9cd1..c620f1f12247 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -2799,12 +2799,25 @@ void free_transhuge_page(struct page *page)
>  	struct deferred_split *ds_queue = get_deferred_split_queue(folio);
>  	unsigned long flags;
>  
> -	spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
> -	if (!list_empty(&folio->_deferred_list)) {
> +	/*
> +	 * At this point, there is no one trying to queue the folio
> +	 * to deferred_list. folio->_deferred_list is not possible
> +	 * being updated.
> +	 *
> +	 * If folio is already added to deferred_list, add/delete to/from
> +	 * deferred_list will not impact list_empty(&folio->_deferred_list).
> +	 * It's safe to check list_empty(&folio->_deferred_list) without
> +	 * acquiring the lock.
> +	 *
> +	 * If folio is not in deferred_list, it's safe to check without
> +	 * acquiring the lock.
> +	 */
> +	if (data_race(!list_empty(&folio->_deferred_list))) {
> +		spin_lock_irqsave(&ds_queue->split_queue_lock, flags);

Recheck under lock?

>  		ds_queue->split_queue_len--;
>  		list_del(&folio->_deferred_list);
> +		spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags);
>  	}
> -	spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags);
>  	free_compound_page(page);
>  }
>  
> -- 
> 2.30.2
> 
> 

-- 
  Kiryl Shutsemau / Kirill A. Shutemov


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 1/2] THP: avoid lock when check whether THP is in deferred list
  2023-04-25  8:46 ` [PATCH v2 1/2] THP: avoid lock when check whether THP is in deferred list Yin Fengwei
  2023-04-25 12:38   ` Kirill A. Shutemov
@ 2023-04-26  1:13   ` Huang, Ying
  2023-04-26  1:48     ` Yin Fengwei
  2023-04-26  8:11   ` Ryan Roberts
  2 siblings, 1 reply; 15+ messages in thread
From: Huang, Ying @ 2023-04-26  1:13 UTC (permalink / raw
  To: Yin Fengwei; +Cc: linux-mm, akpm, willy, yuzhao, ryan.roberts

Yin Fengwei <fengwei.yin@intel.com> writes:

> free_transhuge_page() acquires split queue lock then check
> whether the THP was added to deferred list or not.
>
> It's safe to check whether the THP is in deferred list or not.
>    When code hit free_transhuge_page(), there is no one tries
>    to update the folio's _deferred_list.

I think that it's clearer to enumerate all places pages are added and
removed from deferred list.  Then we can find out whether there's code
path that may race with this.

Take a glance at the search result of `grep split_queue_lock -r mm`.  It
seems that deferred_split_scan() may race with free_transhuge_page(), so
we need to recheck with the lock held as Kirill pointed out.

Best Regards,
Huang, Ying

>    If folio is not in deferred_list, it's safe to check without
>    acquiring lock.
>
>    If folio is in deferred_list, the other node in deferred_list
>    adding/deleteing doesn't impact the return value of
>    list_epmty(@folio->_deferred_list).
>
> Running page_fault1 of will-it-scale + order 2 folio for anonymous
> mapping with 96 processes on an Ice Lake 48C/96T test box, we could
> see the 61% split_queue_lock contention:
> -   71.28%     0.35%  page_fault1_pro  [kernel.kallsyms]           [k]
>     release_pages
>    - 70.93% release_pages
>       - 61.42% free_transhuge_page
>          + 60.77% _raw_spin_lock_irqsave
>
> With this patch applied, the split_queue_lock contention is less
> than 1%.
>
> Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
> Tested-by: Ryan Roberts <ryan.roberts@arm.com>
> ---
>  mm/huge_memory.c | 19 ++++++++++++++++---
>  1 file changed, 16 insertions(+), 3 deletions(-)
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 032fb0ef9cd1..c620f1f12247 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -2799,12 +2799,25 @@ void free_transhuge_page(struct page *page)
>  	struct deferred_split *ds_queue = get_deferred_split_queue(folio);
>  	unsigned long flags;
>  
> -	spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
> -	if (!list_empty(&folio->_deferred_list)) {
> +	/*
> +	 * At this point, there is no one trying to queue the folio
> +	 * to deferred_list. folio->_deferred_list is not possible
> +	 * being updated.
> +	 *
> +	 * If folio is already added to deferred_list, add/delete to/from
> +	 * deferred_list will not impact list_empty(&folio->_deferred_list).
> +	 * It's safe to check list_empty(&folio->_deferred_list) without
> +	 * acquiring the lock.
> +	 *
> +	 * If folio is not in deferred_list, it's safe to check without
> +	 * acquiring the lock.
> +	 */
> +	if (data_race(!list_empty(&folio->_deferred_list))) {
> +		spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
>  		ds_queue->split_queue_len--;
>  		list_del(&folio->_deferred_list);
> +		spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags);
>  	}
> -	spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags);
>  	free_compound_page(page);
>  }


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 1/2] THP: avoid lock when check whether THP is in deferred list
  2023-04-25 12:38   ` Kirill A. Shutemov
@ 2023-04-26  1:47     ` Yin Fengwei
  2023-04-26  2:08     ` Yin Fengwei
  2023-04-28  6:28     ` Yin, Fengwei
  2 siblings, 0 replies; 15+ messages in thread
From: Yin Fengwei @ 2023-04-26  1:47 UTC (permalink / raw
  To: Kirill A. Shutemov
  Cc: linux-mm, akpm, willy, yuzhao, ryan.roberts, ying.huang



On 4/25/23 20:38, Kirill A. Shutemov wrote:
> On Tue, Apr 25, 2023 at 04:46:26PM +0800, Yin Fengwei wrote:
>> free_transhuge_page() acquires split queue lock then check
>> whether the THP was added to deferred list or not.
>>
>> It's safe to check whether the THP is in deferred list or not.
>>    When code hit free_transhuge_page(), there is no one tries
>>    to update the folio's _deferred_list.
>>
>>    If folio is not in deferred_list, it's safe to check without
>>    acquiring lock.
>>
>>    If folio is in deferred_list, the other node in deferred_list
>>    adding/deleteing doesn't impact the return value of
>>    list_epmty(@folio->_deferred_list).
> 
> Typo.
Oops. Will correct it in next version.

> 
>>
>> Running page_fault1 of will-it-scale + order 2 folio for anonymous
>> mapping with 96 processes on an Ice Lake 48C/96T test box, we could
>> see the 61% split_queue_lock contention:
>> -   71.28%     0.35%  page_fault1_pro  [kernel.kallsyms]           [k]
>>     release_pages
>>    - 70.93% release_pages
>>       - 61.42% free_transhuge_page
>>          + 60.77% _raw_spin_lock_irqsave
>>
>> With this patch applied, the split_queue_lock contention is less
>> than 1%.
>>
>> Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
>> Tested-by: Ryan Roberts <ryan.roberts@arm.com>
>> ---
>>  mm/huge_memory.c | 19 ++++++++++++++++---
>>  1 file changed, 16 insertions(+), 3 deletions(-)
>>
>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>> index 032fb0ef9cd1..c620f1f12247 100644
>> --- a/mm/huge_memory.c
>> +++ b/mm/huge_memory.c
>> @@ -2799,12 +2799,25 @@ void free_transhuge_page(struct page *page)
>>  	struct deferred_split *ds_queue = get_deferred_split_queue(folio);
>>  	unsigned long flags;
>>  
>> -	spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
>> -	if (!list_empty(&folio->_deferred_list)) {
>> +	/*
>> +	 * At this point, there is no one trying to queue the folio
>> +	 * to deferred_list. folio->_deferred_list is not possible
>> +	 * being updated.
>> +	 *
>> +	 * If folio is already added to deferred_list, add/delete to/from
>> +	 * deferred_list will not impact list_empty(&folio->_deferred_list).
>> +	 * It's safe to check list_empty(&folio->_deferred_list) without
>> +	 * acquiring the lock.
>> +	 *
>> +	 * If folio is not in deferred_list, it's safe to check without
>> +	 * acquiring the lock.
>> +	 */
>> +	if (data_race(!list_empty(&folio->_deferred_list))) {
>> +		spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
> 
> Recheck under lock?
My understanding is even there is race, the race doesn't impact the
correctness of list_epmty(@folio->_deferred_list).
  - If the folio is not in deferred_list, list_empty() always return
    true.
  - If the folio is in deferred_list, even the element near the folio
    is being added/removed deferred_list, the list_empty() always return
    false.

There is one precondition:
  No other user adds/removes the folio to/from deferred_list concurrently.

I think it's true for free_transhuge_page() so recheck after take the lock
is not necessary. Thanks

Regards
Yin, Fengwei

> 
>>  		ds_queue->split_queue_len--;
>>  		list_del(&folio->_deferred_list);
>> +		spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags);
>>  	}
>> -	spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags);
>>  	free_compound_page(page);
>>  }
>>  
>> -- 
>> 2.30.2
>>
>>
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 1/2] THP: avoid lock when check whether THP is in deferred list
  2023-04-26  1:13   ` Huang, Ying
@ 2023-04-26  1:48     ` Yin Fengwei
  0 siblings, 0 replies; 15+ messages in thread
From: Yin Fengwei @ 2023-04-26  1:48 UTC (permalink / raw
  To: Huang, Ying; +Cc: linux-mm, akpm, willy, yuzhao, ryan.roberts



On 4/26/23 09:13, Huang, Ying wrote:
> Yin Fengwei <fengwei.yin@intel.com> writes:
> 
>> free_transhuge_page() acquires split queue lock then check
>> whether the THP was added to deferred list or not.
>>
>> It's safe to check whether the THP is in deferred list or not.
>>    When code hit free_transhuge_page(), there is no one tries
>>    to update the folio's _deferred_list.
> 
> I think that it's clearer to enumerate all places pages are added and
> removed from deferred list.  Then we can find out whether there's code
> path that may race with this.
> 
> Take a glance at the search result of `grep split_queue_lock -r mm`.  It
> seems that deferred_split_scan() may race with free_transhuge_page(), so
> we need to recheck with the lock held as Kirill pointed out.
My understanding is the check after take the lock is not necessary. See
my reply to Kirill. Thanks.


Regards
Yin, Fengwei

> 
> Best Regards,
> Huang, Ying
> 
>>    If folio is not in deferred_list, it's safe to check without
>>    acquiring lock.
>>
>>    If folio is in deferred_list, the other node in deferred_list
>>    adding/deleteing doesn't impact the return value of
>>    list_epmty(@folio->_deferred_list).
>>
>> Running page_fault1 of will-it-scale + order 2 folio for anonymous
>> mapping with 96 processes on an Ice Lake 48C/96T test box, we could
>> see the 61% split_queue_lock contention:
>> -   71.28%     0.35%  page_fault1_pro  [kernel.kallsyms]           [k]
>>     release_pages
>>    - 70.93% release_pages
>>       - 61.42% free_transhuge_page
>>          + 60.77% _raw_spin_lock_irqsave
>>
>> With this patch applied, the split_queue_lock contention is less
>> than 1%.
>>
>> Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
>> Tested-by: Ryan Roberts <ryan.roberts@arm.com>
>> ---
>>  mm/huge_memory.c | 19 ++++++++++++++++---
>>  1 file changed, 16 insertions(+), 3 deletions(-)
>>
>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>> index 032fb0ef9cd1..c620f1f12247 100644
>> --- a/mm/huge_memory.c
>> +++ b/mm/huge_memory.c
>> @@ -2799,12 +2799,25 @@ void free_transhuge_page(struct page *page)
>>  	struct deferred_split *ds_queue = get_deferred_split_queue(folio);
>>  	unsigned long flags;
>>  
>> -	spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
>> -	if (!list_empty(&folio->_deferred_list)) {
>> +	/*
>> +	 * At this point, there is no one trying to queue the folio
>> +	 * to deferred_list. folio->_deferred_list is not possible
>> +	 * being updated.
>> +	 *
>> +	 * If folio is already added to deferred_list, add/delete to/from
>> +	 * deferred_list will not impact list_empty(&folio->_deferred_list).
>> +	 * It's safe to check list_empty(&folio->_deferred_list) without
>> +	 * acquiring the lock.
>> +	 *
>> +	 * If folio is not in deferred_list, it's safe to check without
>> +	 * acquiring the lock.
>> +	 */
>> +	if (data_race(!list_empty(&folio->_deferred_list))) {
>> +		spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
>>  		ds_queue->split_queue_len--;
>>  		list_del(&folio->_deferred_list);
>> +		spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags);
>>  	}
>> -	spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags);
>>  	free_compound_page(page);
>>  }


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 1/2] THP: avoid lock when check whether THP is in deferred list
  2023-04-25 12:38   ` Kirill A. Shutemov
  2023-04-26  1:47     ` Yin Fengwei
@ 2023-04-26  2:08     ` Yin Fengwei
  2023-04-26  8:17       ` Ryan Roberts
  2023-04-28  6:28     ` Yin, Fengwei
  2 siblings, 1 reply; 15+ messages in thread
From: Yin Fengwei @ 2023-04-26  2:08 UTC (permalink / raw
  To: Kirill A. Shutemov
  Cc: linux-mm, akpm, willy, yuzhao, ryan.roberts, ying.huang



On 4/25/23 20:38, Kirill A. Shutemov wrote:
> On Tue, Apr 25, 2023 at 04:46:26PM +0800, Yin Fengwei wrote:
>> free_transhuge_page() acquires split queue lock then check
>> whether the THP was added to deferred list or not.
>>
>> It's safe to check whether the THP is in deferred list or not.
>>    When code hit free_transhuge_page(), there is no one tries
>>    to update the folio's _deferred_list.
>>
>>    If folio is not in deferred_list, it's safe to check without
>>    acquiring lock.
>>
>>    If folio is in deferred_list, the other node in deferred_list
>>    adding/deleteing doesn't impact the return value of
>>    list_epmty(@folio->_deferred_list).
> 
> Typo.
> 
>>
>> Running page_fault1 of will-it-scale + order 2 folio for anonymous
>> mapping with 96 processes on an Ice Lake 48C/96T test box, we could
>> see the 61% split_queue_lock contention:
>> -   71.28%     0.35%  page_fault1_pro  [kernel.kallsyms]           [k]
>>     release_pages
>>    - 70.93% release_pages
>>       - 61.42% free_transhuge_page
>>          + 60.77% _raw_spin_lock_irqsave
>>
>> With this patch applied, the split_queue_lock contention is less
>> than 1%.
>>
>> Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
>> Tested-by: Ryan Roberts <ryan.roberts@arm.com>
>> ---
>>  mm/huge_memory.c | 19 ++++++++++++++++---
>>  1 file changed, 16 insertions(+), 3 deletions(-)
>>
>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>> index 032fb0ef9cd1..c620f1f12247 100644
>> --- a/mm/huge_memory.c
>> +++ b/mm/huge_memory.c
>> @@ -2799,12 +2799,25 @@ void free_transhuge_page(struct page *page)
>>  	struct deferred_split *ds_queue = get_deferred_split_queue(folio);
>>  	unsigned long flags;
>>  
>> -	spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
>> -	if (!list_empty(&folio->_deferred_list)) {
>> +	/*
>> +	 * At this point, there is no one trying to queue the folio
>> +	 * to deferred_list. folio->_deferred_list is not possible
>> +	 * being updated.
>> +	 *
>> +	 * If folio is already added to deferred_list, add/delete to/from
>> +	 * deferred_list will not impact list_empty(&folio->_deferred_list).
>> +	 * It's safe to check list_empty(&folio->_deferred_list) without
>> +	 * acquiring the lock.
>> +	 *
>> +	 * If folio is not in deferred_list, it's safe to check without
>> +	 * acquiring the lock.
>> +	 */
>> +	if (data_race(!list_empty(&folio->_deferred_list))) {
>> +		spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
> 
> Recheck under lock?
Huang Ying pointed out the race with deferred_split_scan(). And Yes. Need
recheck under lock. Will update in next version.


Regards
Yin, Fengwei

> 
>>  		ds_queue->split_queue_len--;
>>  		list_del(&folio->_deferred_list);
>> +		spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags);
>>  	}
>> -	spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags);
>>  	free_compound_page(page);
>>  }
>>  
>> -- 
>> 2.30.2
>>
>>
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 1/2] THP: avoid lock when check whether THP is in deferred list
  2023-04-25  8:46 ` [PATCH v2 1/2] THP: avoid lock when check whether THP is in deferred list Yin Fengwei
  2023-04-25 12:38   ` Kirill A. Shutemov
  2023-04-26  1:13   ` Huang, Ying
@ 2023-04-26  8:11   ` Ryan Roberts
  2 siblings, 0 replies; 15+ messages in thread
From: Ryan Roberts @ 2023-04-26  8:11 UTC (permalink / raw
  To: Yin Fengwei, linux-mm, akpm, willy, yuzhao, ying.huang

On 25/04/2023 09:46, Yin Fengwei wrote:
> free_transhuge_page() acquires split queue lock then check
> whether the THP was added to deferred list or not.
> 
> It's safe to check whether the THP is in deferred list or not.
>    When code hit free_transhuge_page(), there is no one tries
>    to update the folio's _deferred_list.
> 
>    If folio is not in deferred_list, it's safe to check without
>    acquiring lock.
> 
>    If folio is in deferred_list, the other node in deferred_list
>    adding/deleteing doesn't impact the return value of
>    list_epmty(@folio->_deferred_list).
> 
> Running page_fault1 of will-it-scale + order 2 folio for anonymous
> mapping with 96 processes on an Ice Lake 48C/96T test box, we could
> see the 61% split_queue_lock contention:
> -   71.28%     0.35%  page_fault1_pro  [kernel.kallsyms]           [k]
>     release_pages
>    - 70.93% release_pages
>       - 61.42% free_transhuge_page
>          + 60.77% _raw_spin_lock_irqsave
> 
> With this patch applied, the split_queue_lock contention is less
> than 1%.
> 
> Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
> Tested-by: Ryan Roberts <ryan.roberts@arm.com>
> ---
>  mm/huge_memory.c | 19 ++++++++++++++++---
>  1 file changed, 16 insertions(+), 3 deletions(-)
> 
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 032fb0ef9cd1..c620f1f12247 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -2799,12 +2799,25 @@ void free_transhuge_page(struct page *page)
>  	struct deferred_split *ds_queue = get_deferred_split_queue(folio);
>  	unsigned long flags;
>  
> -	spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
> -	if (!list_empty(&folio->_deferred_list)) {
> +	/*
> +	 * At this point, there is no one trying to queue the folio
> +	 * to deferred_list. folio->_deferred_list is not possible
> +	 * being updated.
> +	 *
> +	 * If folio is already added to deferred_list, add/delete to/from
> +	 * deferred_list will not impact list_empty(&folio->_deferred_list).
> +	 * It's safe to check list_empty(&folio->_deferred_list) without
> +	 * acquiring the lock.
> +	 *
> +	 * If folio is not in deferred_list, it's safe to check without
> +	 * acquiring the lock.
> +	 */
> +	if (data_race(!list_empty(&folio->_deferred_list))) {
> +		spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
>  		ds_queue->split_queue_len--;
>  		list_del(&folio->_deferred_list);

I wonder if there is a race here? Could the folio have been in the deferred list
when checking, but then something removed it from the list before the lock is
taken? In this case, I guess split_queue_len would be out of sync with the
number of folios in the queue? Perhaps recheck list_empty() after taking the lock?

Thanks,
Ryan


> +		spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags);
>  	}
> -	spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags);
>  	free_compound_page(page);
>  }
>  



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 1/2] THP: avoid lock when check whether THP is in deferred list
  2023-04-26  2:08     ` Yin Fengwei
@ 2023-04-26  8:17       ` Ryan Roberts
  0 siblings, 0 replies; 15+ messages in thread
From: Ryan Roberts @ 2023-04-26  8:17 UTC (permalink / raw
  To: Yin Fengwei, Kirill A. Shutemov; +Cc: linux-mm, akpm, willy, yuzhao, ying.huang

On 26/04/2023 03:08, Yin Fengwei wrote:
> 
> 
> On 4/25/23 20:38, Kirill A. Shutemov wrote:
>> On Tue, Apr 25, 2023 at 04:46:26PM +0800, Yin Fengwei wrote:
>>> free_transhuge_page() acquires split queue lock then check
>>> whether the THP was added to deferred list or not.
>>>
>>> It's safe to check whether the THP is in deferred list or not.
>>>    When code hit free_transhuge_page(), there is no one tries
>>>    to update the folio's _deferred_list.
>>>
>>>    If folio is not in deferred_list, it's safe to check without
>>>    acquiring lock.
>>>
>>>    If folio is in deferred_list, the other node in deferred_list
>>>    adding/deleteing doesn't impact the return value of
>>>    list_epmty(@folio->_deferred_list).
>>
>> Typo.
>>
>>>
>>> Running page_fault1 of will-it-scale + order 2 folio for anonymous
>>> mapping with 96 processes on an Ice Lake 48C/96T test box, we could
>>> see the 61% split_queue_lock contention:
>>> -   71.28%     0.35%  page_fault1_pro  [kernel.kallsyms]           [k]
>>>     release_pages
>>>    - 70.93% release_pages
>>>       - 61.42% free_transhuge_page
>>>          + 60.77% _raw_spin_lock_irqsave
>>>
>>> With this patch applied, the split_queue_lock contention is less
>>> than 1%.
>>>
>>> Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
>>> Tested-by: Ryan Roberts <ryan.roberts@arm.com>
>>> ---
>>>  mm/huge_memory.c | 19 ++++++++++++++++---
>>>  1 file changed, 16 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>>> index 032fb0ef9cd1..c620f1f12247 100644
>>> --- a/mm/huge_memory.c
>>> +++ b/mm/huge_memory.c
>>> @@ -2799,12 +2799,25 @@ void free_transhuge_page(struct page *page)
>>>  	struct deferred_split *ds_queue = get_deferred_split_queue(folio);
>>>  	unsigned long flags;
>>>  
>>> -	spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
>>> -	if (!list_empty(&folio->_deferred_list)) {
>>> +	/*
>>> +	 * At this point, there is no one trying to queue the folio
>>> +	 * to deferred_list. folio->_deferred_list is not possible
>>> +	 * being updated.
>>> +	 *
>>> +	 * If folio is already added to deferred_list, add/delete to/from
>>> +	 * deferred_list will not impact list_empty(&folio->_deferred_list).
>>> +	 * It's safe to check list_empty(&folio->_deferred_list) without
>>> +	 * acquiring the lock.
>>> +	 *
>>> +	 * If folio is not in deferred_list, it's safe to check without
>>> +	 * acquiring the lock.
>>> +	 */
>>> +	if (data_race(!list_empty(&folio->_deferred_list))) {
>>> +		spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
>>
>> Recheck under lock?
> Huang Ying pointed out the race with deferred_split_scan(). And Yes. Need
> recheck under lock. Will update in next version.

Oops sorry - I see this was already pointed out. Disregard my previous mail.

Thanks,
Ryan


> 
> 
> Regards
> Yin, Fengwei
> 
>>
>>>  		ds_queue->split_queue_len--;
>>>  		list_del(&folio->_deferred_list);
>>> +		spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags);
>>>  	}
>>> -	spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags);
>>>  	free_compound_page(page);
>>>  }
>>>  
>>> -- 
>>> 2.30.2
>>>
>>>
>>



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 1/2] THP: avoid lock when check whether THP is in deferred list
  2023-04-25 12:38   ` Kirill A. Shutemov
  2023-04-26  1:47     ` Yin Fengwei
  2023-04-26  2:08     ` Yin Fengwei
@ 2023-04-28  6:28     ` Yin, Fengwei
  2023-04-28 14:02       ` Kirill A. Shutemov
  2 siblings, 1 reply; 15+ messages in thread
From: Yin, Fengwei @ 2023-04-28  6:28 UTC (permalink / raw
  To: Kirill A. Shutemov
  Cc: linux-mm, akpm, willy, yuzhao, ryan.roberts, ying.huang

Hi Kirill,

On 4/25/2023 8:38 PM, Kirill A. Shutemov wrote:
> On Tue, Apr 25, 2023 at 04:46:26PM +0800, Yin Fengwei wrote:
>> free_transhuge_page() acquires split queue lock then check
>> whether the THP was added to deferred list or not.
>>
>> It's safe to check whether the THP is in deferred list or not.
>>    When code hit free_transhuge_page(), there is no one tries
>>    to update the folio's _deferred_list.
>>
>>    If folio is not in deferred_list, it's safe to check without
>>    acquiring lock.
>>
>>    If folio is in deferred_list, the other node in deferred_list
>>    adding/deleteing doesn't impact the return value of
>>    list_epmty(@folio->_deferred_list).
> 
> Typo.
> 
>>
>> Running page_fault1 of will-it-scale + order 2 folio for anonymous
>> mapping with 96 processes on an Ice Lake 48C/96T test box, we could
>> see the 61% split_queue_lock contention:
>> -   71.28%     0.35%  page_fault1_pro  [kernel.kallsyms]           [k]
>>     release_pages
>>    - 70.93% release_pages
>>       - 61.42% free_transhuge_page
>>          + 60.77% _raw_spin_lock_irqsave
>>
>> With this patch applied, the split_queue_lock contention is less
>> than 1%.
>>
>> Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
>> Tested-by: Ryan Roberts <ryan.roberts@arm.com>
>> ---
>>  mm/huge_memory.c | 19 ++++++++++++++++---
>>  1 file changed, 16 insertions(+), 3 deletions(-)
>>
>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>> index 032fb0ef9cd1..c620f1f12247 100644
>> --- a/mm/huge_memory.c
>> +++ b/mm/huge_memory.c
>> @@ -2799,12 +2799,25 @@ void free_transhuge_page(struct page *page)
>>  	struct deferred_split *ds_queue = get_deferred_split_queue(folio);
>>  	unsigned long flags;
>>  
>> -	spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
>> -	if (!list_empty(&folio->_deferred_list)) {
>> +	/*
>> +	 * At this point, there is no one trying to queue the folio
>> +	 * to deferred_list. folio->_deferred_list is not possible
>> +	 * being updated.
>> +	 *
>> +	 * If folio is already added to deferred_list, add/delete to/from
>> +	 * deferred_list will not impact list_empty(&folio->_deferred_list).
>> +	 * It's safe to check list_empty(&folio->_deferred_list) without
>> +	 * acquiring the lock.
>> +	 *
>> +	 * If folio is not in deferred_list, it's safe to check without
>> +	 * acquiring the lock.
>> +	 */
>> +	if (data_race(!list_empty(&folio->_deferred_list))) {
>> +		spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
> 
> Recheck under lock?
In function deferred_split_scan(), there is following code block:
                if (folio_try_get(folio)) {
                        list_move(&folio->_deferred_list, &list);
                } else {
                        /* We lost race with folio_put() */
                        list_del_init(&folio->_deferred_list);
                        ds_queue->split_queue_len--;
                }

I am wondering what kind of "lost race with folio_put()" can be.

My understanding is that it's not necessary to handle this case here
because free_transhuge_page() will handle it once folio get zero ref.
But I must miss something here. Thanks.


Regards
Yin, Fengwei

> 
>>  		ds_queue->split_queue_len--;
>>  		list_del(&folio->_deferred_list);
>> +		spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags);
>>  	}
>> -	spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags);
>>  	free_compound_page(page);
>>  }
>>  
>> -- 
>> 2.30.2
>>
>>
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 1/2] THP: avoid lock when check whether THP is in deferred list
  2023-04-28  6:28     ` Yin, Fengwei
@ 2023-04-28 14:02       ` Kirill A. Shutemov
  2023-04-29  8:32         ` Yin, Fengwei
  0 siblings, 1 reply; 15+ messages in thread
From: Kirill A. Shutemov @ 2023-04-28 14:02 UTC (permalink / raw
  To: Yin, Fengwei; +Cc: linux-mm, akpm, willy, yuzhao, ryan.roberts, ying.huang

On Fri, Apr 28, 2023 at 02:28:07PM +0800, Yin, Fengwei wrote:
> Hi Kirill,
> 
> On 4/25/2023 8:38 PM, Kirill A. Shutemov wrote:
> > On Tue, Apr 25, 2023 at 04:46:26PM +0800, Yin Fengwei wrote:
> >> free_transhuge_page() acquires split queue lock then check
> >> whether the THP was added to deferred list or not.
> >>
> >> It's safe to check whether the THP is in deferred list or not.
> >>    When code hit free_transhuge_page(), there is no one tries
> >>    to update the folio's _deferred_list.
> >>
> >>    If folio is not in deferred_list, it's safe to check without
> >>    acquiring lock.
> >>
> >>    If folio is in deferred_list, the other node in deferred_list
> >>    adding/deleteing doesn't impact the return value of
> >>    list_epmty(@folio->_deferred_list).
> > 
> > Typo.
> > 
> >>
> >> Running page_fault1 of will-it-scale + order 2 folio for anonymous
> >> mapping with 96 processes on an Ice Lake 48C/96T test box, we could
> >> see the 61% split_queue_lock contention:
> >> -   71.28%     0.35%  page_fault1_pro  [kernel.kallsyms]           [k]
> >>     release_pages
> >>    - 70.93% release_pages
> >>       - 61.42% free_transhuge_page
> >>          + 60.77% _raw_spin_lock_irqsave
> >>
> >> With this patch applied, the split_queue_lock contention is less
> >> than 1%.
> >>
> >> Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
> >> Tested-by: Ryan Roberts <ryan.roberts@arm.com>
> >> ---
> >>  mm/huge_memory.c | 19 ++++++++++++++++---
> >>  1 file changed, 16 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> >> index 032fb0ef9cd1..c620f1f12247 100644
> >> --- a/mm/huge_memory.c
> >> +++ b/mm/huge_memory.c
> >> @@ -2799,12 +2799,25 @@ void free_transhuge_page(struct page *page)
> >>  	struct deferred_split *ds_queue = get_deferred_split_queue(folio);
> >>  	unsigned long flags;
> >>  
> >> -	spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
> >> -	if (!list_empty(&folio->_deferred_list)) {
> >> +	/*
> >> +	 * At this point, there is no one trying to queue the folio
> >> +	 * to deferred_list. folio->_deferred_list is not possible
> >> +	 * being updated.
> >> +	 *
> >> +	 * If folio is already added to deferred_list, add/delete to/from
> >> +	 * deferred_list will not impact list_empty(&folio->_deferred_list).
> >> +	 * It's safe to check list_empty(&folio->_deferred_list) without
> >> +	 * acquiring the lock.
> >> +	 *
> >> +	 * If folio is not in deferred_list, it's safe to check without
> >> +	 * acquiring the lock.
> >> +	 */
> >> +	if (data_race(!list_empty(&folio->_deferred_list))) {
> >> +		spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
> > 
> > Recheck under lock?
> In function deferred_split_scan(), there is following code block:
>                 if (folio_try_get(folio)) {
>                         list_move(&folio->_deferred_list, &list);
>                 } else {
>                         /* We lost race with folio_put() */
>                         list_del_init(&folio->_deferred_list);
>                         ds_queue->split_queue_len--;
>                 }
> 
> I am wondering what kind of "lost race with folio_put()" can be.
> 
> My understanding is that it's not necessary to handle this case here
> because free_transhuge_page() will handle it once folio get zero ref.
> But I must miss something here. Thanks.

free_transhuge_page() got when refcount is already zero. Both
deferred_split_scan() and free_transhuge_page() can see the page with zero
refcount. The check makes deferred_split_scan() to leave the page to the
free_transhuge_page().

-- 
  Kiryl Shutsemau / Kirill A. Shutemov


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 1/2] THP: avoid lock when check whether THP is in deferred list
  2023-04-28 14:02       ` Kirill A. Shutemov
@ 2023-04-29  8:32         ` Yin, Fengwei
  2023-04-29  8:46           ` Kirill A. Shutemov
  0 siblings, 1 reply; 15+ messages in thread
From: Yin, Fengwei @ 2023-04-29  8:32 UTC (permalink / raw
  To: Kirill A. Shutemov
  Cc: linux-mm, akpm, willy, yuzhao, ryan.roberts, ying.huang

Hi Kirill,

On 4/28/2023 10:02 PM, Kirill A. Shutemov wrote:
> On Fri, Apr 28, 2023 at 02:28:07PM +0800, Yin, Fengwei wrote:
>> Hi Kirill,
>>
>> On 4/25/2023 8:38 PM, Kirill A. Shutemov wrote:
>>> On Tue, Apr 25, 2023 at 04:46:26PM +0800, Yin Fengwei wrote:
>>>> free_transhuge_page() acquires split queue lock then check
>>>> whether the THP was added to deferred list or not.
>>>>
>>>> It's safe to check whether the THP is in deferred list or not.
>>>>    When code hit free_transhuge_page(), there is no one tries
>>>>    to update the folio's _deferred_list.
>>>>
>>>>    If folio is not in deferred_list, it's safe to check without
>>>>    acquiring lock.
>>>>
>>>>    If folio is in deferred_list, the other node in deferred_list
>>>>    adding/deleteing doesn't impact the return value of
>>>>    list_epmty(@folio->_deferred_list).
>>>
>>> Typo.
>>>
>>>>
>>>> Running page_fault1 of will-it-scale + order 2 folio for anonymous
>>>> mapping with 96 processes on an Ice Lake 48C/96T test box, we could
>>>> see the 61% split_queue_lock contention:
>>>> -   71.28%     0.35%  page_fault1_pro  [kernel.kallsyms]           [k]
>>>>     release_pages
>>>>    - 70.93% release_pages
>>>>       - 61.42% free_transhuge_page
>>>>          + 60.77% _raw_spin_lock_irqsave
>>>>
>>>> With this patch applied, the split_queue_lock contention is less
>>>> than 1%.
>>>>
>>>> Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
>>>> Tested-by: Ryan Roberts <ryan.roberts@arm.com>
>>>> ---
>>>>  mm/huge_memory.c | 19 ++++++++++++++++---
>>>>  1 file changed, 16 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>>>> index 032fb0ef9cd1..c620f1f12247 100644
>>>> --- a/mm/huge_memory.c
>>>> +++ b/mm/huge_memory.c
>>>> @@ -2799,12 +2799,25 @@ void free_transhuge_page(struct page *page)
>>>>  	struct deferred_split *ds_queue = get_deferred_split_queue(folio);
>>>>  	unsigned long flags;
>>>>  
>>>> -	spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
>>>> -	if (!list_empty(&folio->_deferred_list)) {
>>>> +	/*
>>>> +	 * At this point, there is no one trying to queue the folio
>>>> +	 * to deferred_list. folio->_deferred_list is not possible
>>>> +	 * being updated.
>>>> +	 *
>>>> +	 * If folio is already added to deferred_list, add/delete to/from
>>>> +	 * deferred_list will not impact list_empty(&folio->_deferred_list).
>>>> +	 * It's safe to check list_empty(&folio->_deferred_list) without
>>>> +	 * acquiring the lock.
>>>> +	 *
>>>> +	 * If folio is not in deferred_list, it's safe to check without
>>>> +	 * acquiring the lock.
>>>> +	 */
>>>> +	if (data_race(!list_empty(&folio->_deferred_list))) {
>>>> +		spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
>>>
>>> Recheck under lock?
>> In function deferred_split_scan(), there is following code block:
>>                 if (folio_try_get(folio)) {
>>                         list_move(&folio->_deferred_list, &list);
>>                 } else {
>>                         /* We lost race with folio_put() */
>>                         list_del_init(&folio->_deferred_list);
>>                         ds_queue->split_queue_len--;
>>                 }
>>
>> I am wondering what kind of "lost race with folio_put()" can be.
>>
>> My understanding is that it's not necessary to handle this case here
>> because free_transhuge_page() will handle it once folio get zero ref.
>> But I must miss something here. Thanks.
> 
> free_transhuge_page() got when refcount is already zero. Both
> deferred_split_scan() and free_transhuge_page() can see the page with zero
> refcount. The check makes deferred_split_scan() to leave the page to the
> free_transhuge_page().
> 
If deferred_split_scan() leaves the page to free_transhuge_page(), is it
necessary to do
        list_del_init(&folio->_deferred_list);
        ds_queue->split_queue_len--;

Can these two line be left to free_transhuge_page() either? Thanks.

Regards
Yin, Fengwei


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 1/2] THP: avoid lock when check whether THP is in deferred list
  2023-04-29  8:32         ` Yin, Fengwei
@ 2023-04-29  8:46           ` Kirill A. Shutemov
  2023-05-01  5:50             ` Yin, Fengwei
  0 siblings, 1 reply; 15+ messages in thread
From: Kirill A. Shutemov @ 2023-04-29  8:46 UTC (permalink / raw
  To: Yin, Fengwei; +Cc: linux-mm, akpm, willy, yuzhao, ryan.roberts, ying.huang

On Sat, Apr 29, 2023 at 04:32:34PM +0800, Yin, Fengwei wrote:
> Hi Kirill,
> 
> On 4/28/2023 10:02 PM, Kirill A. Shutemov wrote:
> > On Fri, Apr 28, 2023 at 02:28:07PM +0800, Yin, Fengwei wrote:
> >> Hi Kirill,
> >>
> >> On 4/25/2023 8:38 PM, Kirill A. Shutemov wrote:
> >>> On Tue, Apr 25, 2023 at 04:46:26PM +0800, Yin Fengwei wrote:
> >>>> free_transhuge_page() acquires split queue lock then check
> >>>> whether the THP was added to deferred list or not.
> >>>>
> >>>> It's safe to check whether the THP is in deferred list or not.
> >>>>    When code hit free_transhuge_page(), there is no one tries
> >>>>    to update the folio's _deferred_list.
> >>>>
> >>>>    If folio is not in deferred_list, it's safe to check without
> >>>>    acquiring lock.
> >>>>
> >>>>    If folio is in deferred_list, the other node in deferred_list
> >>>>    adding/deleteing doesn't impact the return value of
> >>>>    list_epmty(@folio->_deferred_list).
> >>>
> >>> Typo.
> >>>
> >>>>
> >>>> Running page_fault1 of will-it-scale + order 2 folio for anonymous
> >>>> mapping with 96 processes on an Ice Lake 48C/96T test box, we could
> >>>> see the 61% split_queue_lock contention:
> >>>> -   71.28%     0.35%  page_fault1_pro  [kernel.kallsyms]           [k]
> >>>>     release_pages
> >>>>    - 70.93% release_pages
> >>>>       - 61.42% free_transhuge_page
> >>>>          + 60.77% _raw_spin_lock_irqsave
> >>>>
> >>>> With this patch applied, the split_queue_lock contention is less
> >>>> than 1%.
> >>>>
> >>>> Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
> >>>> Tested-by: Ryan Roberts <ryan.roberts@arm.com>
> >>>> ---
> >>>>  mm/huge_memory.c | 19 ++++++++++++++++---
> >>>>  1 file changed, 16 insertions(+), 3 deletions(-)
> >>>>
> >>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> >>>> index 032fb0ef9cd1..c620f1f12247 100644
> >>>> --- a/mm/huge_memory.c
> >>>> +++ b/mm/huge_memory.c
> >>>> @@ -2799,12 +2799,25 @@ void free_transhuge_page(struct page *page)
> >>>>  	struct deferred_split *ds_queue = get_deferred_split_queue(folio);
> >>>>  	unsigned long flags;
> >>>>  
> >>>> -	spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
> >>>> -	if (!list_empty(&folio->_deferred_list)) {
> >>>> +	/*
> >>>> +	 * At this point, there is no one trying to queue the folio
> >>>> +	 * to deferred_list. folio->_deferred_list is not possible
> >>>> +	 * being updated.
> >>>> +	 *
> >>>> +	 * If folio is already added to deferred_list, add/delete to/from
> >>>> +	 * deferred_list will not impact list_empty(&folio->_deferred_list).
> >>>> +	 * It's safe to check list_empty(&folio->_deferred_list) without
> >>>> +	 * acquiring the lock.
> >>>> +	 *
> >>>> +	 * If folio is not in deferred_list, it's safe to check without
> >>>> +	 * acquiring the lock.
> >>>> +	 */
> >>>> +	if (data_race(!list_empty(&folio->_deferred_list))) {
> >>>> +		spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
> >>>
> >>> Recheck under lock?
> >> In function deferred_split_scan(), there is following code block:
> >>                 if (folio_try_get(folio)) {
> >>                         list_move(&folio->_deferred_list, &list);
> >>                 } else {
> >>                         /* We lost race with folio_put() */
> >>                         list_del_init(&folio->_deferred_list);
> >>                         ds_queue->split_queue_len--;
> >>                 }
> >>
> >> I am wondering what kind of "lost race with folio_put()" can be.
> >>
> >> My understanding is that it's not necessary to handle this case here
> >> because free_transhuge_page() will handle it once folio get zero ref.
> >> But I must miss something here. Thanks.
> > 
> > free_transhuge_page() got when refcount is already zero. Both
> > deferred_split_scan() and free_transhuge_page() can see the page with zero
> > refcount. The check makes deferred_split_scan() to leave the page to the
> > free_transhuge_page().
> > 
> If deferred_split_scan() leaves the page to free_transhuge_page(), is it
> necessary to do
>         list_del_init(&folio->_deferred_list);
>         ds_queue->split_queue_len--;
> 
> Can these two line be left to free_transhuge_page() either? Thanks.

I *think* (my cache is cold on deferred split) we can. But since we
already hold the lock, why not take care of it? It makes your change more
efficient.

-- 
  Kiryl Shutsemau / Kirill A. Shutemov


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 1/2] THP: avoid lock when check whether THP is in deferred list
  2023-04-29  8:46           ` Kirill A. Shutemov
@ 2023-05-01  5:50             ` Yin, Fengwei
  0 siblings, 0 replies; 15+ messages in thread
From: Yin, Fengwei @ 2023-05-01  5:50 UTC (permalink / raw
  To: Kirill A. Shutemov
  Cc: linux-mm, akpm, willy, yuzhao, ryan.roberts, ying.huang

Hi Kirill,

On 4/29/2023 4:46 PM, Kirill A. Shutemov wrote:
> On Sat, Apr 29, 2023 at 04:32:34PM +0800, Yin, Fengwei wrote:
>> Hi Kirill,
>>
>> On 4/28/2023 10:02 PM, Kirill A. Shutemov wrote:
>>> On Fri, Apr 28, 2023 at 02:28:07PM +0800, Yin, Fengwei wrote:
>>>> Hi Kirill,
>>>>
>>>> On 4/25/2023 8:38 PM, Kirill A. Shutemov wrote:
>>>>> On Tue, Apr 25, 2023 at 04:46:26PM +0800, Yin Fengwei wrote:
>>>>>> free_transhuge_page() acquires split queue lock then check
>>>>>> whether the THP was added to deferred list or not.
>>>>>>
>>>>>> It's safe to check whether the THP is in deferred list or not.
>>>>>>    When code hit free_transhuge_page(), there is no one tries
>>>>>>    to update the folio's _deferred_list.
>>>>>>
>>>>>>    If folio is not in deferred_list, it's safe to check without
>>>>>>    acquiring lock.
>>>>>>
>>>>>>    If folio is in deferred_list, the other node in deferred_list
>>>>>>    adding/deleteing doesn't impact the return value of
>>>>>>    list_epmty(@folio->_deferred_list).
>>>>>
>>>>> Typo.
>>>>>
>>>>>>
>>>>>> Running page_fault1 of will-it-scale + order 2 folio for anonymous
>>>>>> mapping with 96 processes on an Ice Lake 48C/96T test box, we could
>>>>>> see the 61% split_queue_lock contention:
>>>>>> -   71.28%     0.35%  page_fault1_pro  [kernel.kallsyms]           [k]
>>>>>>     release_pages
>>>>>>    - 70.93% release_pages
>>>>>>       - 61.42% free_transhuge_page
>>>>>>          + 60.77% _raw_spin_lock_irqsave
>>>>>>
>>>>>> With this patch applied, the split_queue_lock contention is less
>>>>>> than 1%.
>>>>>>
>>>>>> Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
>>>>>> Tested-by: Ryan Roberts <ryan.roberts@arm.com>
>>>>>> ---
>>>>>>  mm/huge_memory.c | 19 ++++++++++++++++---
>>>>>>  1 file changed, 16 insertions(+), 3 deletions(-)
>>>>>>
>>>>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>>>>>> index 032fb0ef9cd1..c620f1f12247 100644
>>>>>> --- a/mm/huge_memory.c
>>>>>> +++ b/mm/huge_memory.c
>>>>>> @@ -2799,12 +2799,25 @@ void free_transhuge_page(struct page *page)
>>>>>>  	struct deferred_split *ds_queue = get_deferred_split_queue(folio);
>>>>>>  	unsigned long flags;
>>>>>>  
>>>>>> -	spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
>>>>>> -	if (!list_empty(&folio->_deferred_list)) {
>>>>>> +	/*
>>>>>> +	 * At this point, there is no one trying to queue the folio
>>>>>> +	 * to deferred_list. folio->_deferred_list is not possible
>>>>>> +	 * being updated.
>>>>>> +	 *
>>>>>> +	 * If folio is already added to deferred_list, add/delete to/from
>>>>>> +	 * deferred_list will not impact list_empty(&folio->_deferred_list).
>>>>>> +	 * It's safe to check list_empty(&folio->_deferred_list) without
>>>>>> +	 * acquiring the lock.
>>>>>> +	 *
>>>>>> +	 * If folio is not in deferred_list, it's safe to check without
>>>>>> +	 * acquiring the lock.
>>>>>> +	 */
>>>>>> +	if (data_race(!list_empty(&folio->_deferred_list))) {
>>>>>> +		spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
>>>>>
>>>>> Recheck under lock?
>>>> In function deferred_split_scan(), there is following code block:
>>>>                 if (folio_try_get(folio)) {
>>>>                         list_move(&folio->_deferred_list, &list);
>>>>                 } else {
>>>>                         /* We lost race with folio_put() */
>>>>                         list_del_init(&folio->_deferred_list);
>>>>                         ds_queue->split_queue_len--;
>>>>                 }
>>>>
>>>> I am wondering what kind of "lost race with folio_put()" can be.
>>>>
>>>> My understanding is that it's not necessary to handle this case here
>>>> because free_transhuge_page() will handle it once folio get zero ref.
>>>> But I must miss something here. Thanks.
>>>
>>> free_transhuge_page() got when refcount is already zero. Both
>>> deferred_split_scan() and free_transhuge_page() can see the page with zero
>>> refcount. The check makes deferred_split_scan() to leave the page to the
>>> free_transhuge_page().
>>>
>> If deferred_split_scan() leaves the page to free_transhuge_page(), is it
>> necessary to do
>>         list_del_init(&folio->_deferred_list);
>>         ds_queue->split_queue_len--;
>>
>> Can these two line be left to free_transhuge_page() either? Thanks.
> 
> I *think* (my cache is cold on deferred split) we can. But since we
> already hold the lock, why not take care of it? It makes your change more
> efficient.
Thanks a lot for your confirmation. I just wanted to make sure I understand
the race here correctly (I didn't notice this part of code before Ying pointed
it out).


Regards
Yin, Fengwei

> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2023-05-01  5:50 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-04-25  8:46 [PATCH v2 0/2] Reduce lock contention related with large folio Yin Fengwei
2023-04-25  8:46 ` [PATCH v2 1/2] THP: avoid lock when check whether THP is in deferred list Yin Fengwei
2023-04-25 12:38   ` Kirill A. Shutemov
2023-04-26  1:47     ` Yin Fengwei
2023-04-26  2:08     ` Yin Fengwei
2023-04-26  8:17       ` Ryan Roberts
2023-04-28  6:28     ` Yin, Fengwei
2023-04-28 14:02       ` Kirill A. Shutemov
2023-04-29  8:32         ` Yin, Fengwei
2023-04-29  8:46           ` Kirill A. Shutemov
2023-05-01  5:50             ` Yin, Fengwei
2023-04-26  1:13   ` Huang, Ying
2023-04-26  1:48     ` Yin Fengwei
2023-04-26  8:11   ` Ryan Roberts
2023-04-25  8:46 ` [PATCH v2 2/2] lru: allow large batched add large folio to lru list Yin Fengwei

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.