LKML Archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next v8 0/6] introduce page_pool_alloc() related API
@ 2023-09-12  8:31 Yunsheng Lin
  2023-09-12  8:31 ` [PATCH net-next v8 1/6] page_pool: frag API support for 32-bit arch with 64-bit DMA Yunsheng Lin
                   ` (5 more replies)
  0 siblings, 6 replies; 14+ messages in thread
From: Yunsheng Lin @ 2023-09-12  8:31 UTC (permalink / raw)
  To: davem, kuba, pabeni
  Cc: netdev, linux-kernel, Yunsheng Lin, Matthias Brugger,
	AngeloGioacchino Del Regno, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, linux-arm-kernel,
	linux-mediatek, bpf

In [1] & [2] & [3], there are usecases for veth and virtio_net
to use frag support in page pool to reduce memory usage, and it
may request different frag size depending on the head/tail
room space for xdp_frame/shinfo and mtu/packet size. When the
requested frag size is large enough that a single page can not
be split into more than one frag, using frag support only have
performance penalty because of the extra frag count handling
for frag support.

So this patchset provides a page pool API for the driver to
allocate memory with least memory utilization and performance
penalty when it doesn't know the size of memory it need
beforehand.

1. https://patchwork.kernel.org/project/netdevbpf/patch/d3ae6bd3537fbce379382ac6a42f67e22f27ece2.1683896626.git.lorenzo@kernel.org/
2. https://patchwork.kernel.org/project/netdevbpf/patch/20230526054621.18371-3-liangchen.linux@gmail.com/
3. https://github.com/alobakin/linux/tree/iavf-pp-frag

V8: Store the dma addr on a shifted u32 instead of using
    dma_addr_t explicitly for 32-bit arch with 64-bit DMA.
    Update document according to discussion in v7.

V7: Fix a compile error, a few typo and use kernel-doc syntax.

V6: Add a PP_FLAG_PAGE_SPLIT_IN_DRIVER flag to fail the page_pool
    creation for 32-bit arch with 64-bit DMA when driver tries to
    do the page splitting itself, adjust the requested size to
    include head/tail room in veth, and rebased on the latest
    next-net.

v5 RFC: Add a new page_pool_cache_alloc() API, and other minor
        change as discussed in v4. As there seems to be three
        comsumers that might be made use of the new API, so
        repost it as RFC and CC the relevant authors to see
        if the new API fits their need.

V4. Fix a typo and add a patch to update document about frag
    API, PAGE_POOL_DMA_USE_PP_FRAG_COUNT is not renamed yet
    as we may need a different thread to discuss that.

V3: Incorporate changes from the disscusion with Alexander,
    mostly the inline wraper, PAGE_POOL_DMA_USE_PP_FRAG_COUNT
    change split to separate patch and comment change.
V2: Add patch to remove PP_FLAG_PAGE_FRAG flags and mention
    virtio_net usecase in the cover letter.
V1: Drop RFC tag and page_pool_frag patch.

Yunsheng Lin (6):
  page_pool: frag API support for 32-bit arch with 64-bit DMA
  page_pool: unify frag_count handling in page_pool_is_last_frag()
  page_pool: remove PP_FLAG_PAGE_FRAG
  page_pool: introduce page_pool[_cache]_alloc() API
  page_pool: update document about frag API
  net: veth: use newly added page pool API for veth with xdp

 Documentation/networking/page_pool.rst        |   4 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt.c     |   2 -
 .../net/ethernet/hisilicon/hns3/hns3_enet.c   |   3 +-
 .../marvell/octeontx2/nic/otx2_common.c       |   2 +-
 .../net/ethernet/mellanox/mlx5/core/en_main.c |   2 +-
 drivers/net/veth.c                            |  25 +-
 drivers/net/wireless/mediatek/mt76/mac80211.c |   2 +-
 include/linux/mm_types.h                      |  13 +-
 include/net/page_pool/helpers.h               | 225 +++++++++++++++---
 include/net/page_pool/types.h                 |   6 +-
 net/core/page_pool.c                          |  31 ++-
 net/core/skbuff.c                             |   2 +-
 12 files changed, 240 insertions(+), 77 deletions(-)

-- 
2.33.0


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH net-next v8 1/6] page_pool: frag API support for 32-bit arch with 64-bit DMA
  2023-09-12  8:31 [PATCH net-next v8 0/6] introduce page_pool_alloc() related API Yunsheng Lin
@ 2023-09-12  8:31 ` Yunsheng Lin
  2023-09-15  8:28   ` Jesper Dangaard Brouer
  2023-09-12  8:31 ` [PATCH net-next v8 2/6] page_pool: unify frag_count handling in page_pool_is_last_frag() Yunsheng Lin
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 14+ messages in thread
From: Yunsheng Lin @ 2023-09-12  8:31 UTC (permalink / raw)
  To: davem, kuba, pabeni
  Cc: netdev, linux-kernel, Yunsheng Lin, Lorenzo Bianconi,
	Alexander Duyck, Liang Chen, Alexander Lobakin, Guillaume Tucker,
	Jesper Dangaard Brouer, Ilias Apalodimas, Eric Dumazet

Currently page_pool_alloc_frag() is not supported in 32-bit
arch with 64-bit DMA because of the overlap issue between
pp_frag_count and dma_addr_upper in 'struct page' for those
arches, which seems to be quite common, see [1], which means
driver may need to handle it when using frag API.

It is assumed that the combination of the above arch with an
address space >16TB does not exist, as all those arches have
64b equivalent, it seems logical to use the 64b version for a
system with a large address space. It is also assumed that dma
address is page aligned when we are dma mapping a page aliged
buffer, see [2].

That means we're storing 12 bits of 0 at the lower end for a
dma address, we can reuse those bits for the above arches to
support 32b+12b, which is 16TB of memory.

If we make a wrong assumption, a warning is emitted so that
user can report to us.

1. https://lore.kernel.org/all/20211117075652.58299-1-linyunsheng@huawei.com/
2. https://lore.kernel.org/all/20230818145145.4b357c89@kernel.org/

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
CC: Lorenzo Bianconi <lorenzo@kernel.org>
CC: Alexander Duyck <alexander.duyck@gmail.com>
CC: Liang Chen <liangchen.linux@gmail.com>
CC: Alexander Lobakin <aleksander.lobakin@intel.com>
CC: Guillaume Tucker <guillaume.tucker@collabora.com>
---
 include/linux/mm_types.h        | 13 +------------
 include/net/page_pool/helpers.h | 20 ++++++++++++++------
 net/core/page_pool.c            | 14 +++++++++-----
 3 files changed, 24 insertions(+), 23 deletions(-)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 36c5b43999e6..74b49c4c7a52 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -125,18 +125,7 @@ struct page {
 			struct page_pool *pp;
 			unsigned long _pp_mapping_pad;
 			unsigned long dma_addr;
-			union {
-				/**
-				 * dma_addr_upper: might require a 64-bit
-				 * value on 32-bit architectures.
-				 */
-				unsigned long dma_addr_upper;
-				/**
-				 * For frag page support, not supported in
-				 * 32-bit architectures with 64-bit DMA.
-				 */
-				atomic_long_t pp_frag_count;
-			};
+			atomic_long_t pp_frag_count;
 		};
 		struct {	/* Tail pages of compound page */
 			unsigned long compound_head;	/* Bit zero is set */
diff --git a/include/net/page_pool/helpers.h b/include/net/page_pool/helpers.h
index 94231533a369..8e1c85de4995 100644
--- a/include/net/page_pool/helpers.h
+++ b/include/net/page_pool/helpers.h
@@ -197,7 +197,7 @@ static inline void page_pool_recycle_direct(struct page_pool *pool,
 	page_pool_put_full_page(pool, page, true);
 }
 
-#define PAGE_POOL_DMA_USE_PP_FRAG_COUNT	\
+#define PAGE_POOL_32BIT_ARCH_WITH_64BIT_DMA	\
 		(sizeof(dma_addr_t) > sizeof(unsigned long))
 
 /**
@@ -211,17 +211,25 @@ static inline dma_addr_t page_pool_get_dma_addr(struct page *page)
 {
 	dma_addr_t ret = page->dma_addr;
 
-	if (PAGE_POOL_DMA_USE_PP_FRAG_COUNT)
-		ret |= (dma_addr_t)page->dma_addr_upper << 16 << 16;
+	if (PAGE_POOL_32BIT_ARCH_WITH_64BIT_DMA)
+		ret <<= PAGE_SHIFT;
 
 	return ret;
 }
 
-static inline void page_pool_set_dma_addr(struct page *page, dma_addr_t addr)
+static inline bool page_pool_set_dma_addr(struct page *page, dma_addr_t addr)
 {
+	if (PAGE_POOL_32BIT_ARCH_WITH_64BIT_DMA) {
+		page->dma_addr = addr >> PAGE_SHIFT;
+
+		/* We assume page alignment to shave off bottom bits,
+		 * if this "compression" doesn't work we need to drop.
+		 */
+		return addr != (dma_addr_t)page->dma_addr << PAGE_SHIFT;
+	}
+
 	page->dma_addr = addr;
-	if (PAGE_POOL_DMA_USE_PP_FRAG_COUNT)
-		page->dma_addr_upper = upper_32_bits(addr);
+	return false;
 }
 
 static inline bool page_pool_put(struct page_pool *pool)
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index 77cb75e63aca..8a9868ea5067 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -211,10 +211,6 @@ static int page_pool_init(struct page_pool *pool,
 		 */
 	}
 
-	if (PAGE_POOL_DMA_USE_PP_FRAG_COUNT &&
-	    pool->p.flags & PP_FLAG_PAGE_FRAG)
-		return -EINVAL;
-
 #ifdef CONFIG_PAGE_POOL_STATS
 	pool->recycle_stats = alloc_percpu(struct page_pool_recycle_stats);
 	if (!pool->recycle_stats)
@@ -359,12 +355,20 @@ static bool page_pool_dma_map(struct page_pool *pool, struct page *page)
 	if (dma_mapping_error(pool->p.dev, dma))
 		return false;
 
-	page_pool_set_dma_addr(page, dma);
+	if (page_pool_set_dma_addr(page, dma))
+		goto unmap_failed;
 
 	if (pool->p.flags & PP_FLAG_DMA_SYNC_DEV)
 		page_pool_dma_sync_for_device(pool, page, pool->p.max_len);
 
 	return true;
+
+unmap_failed:
+	WARN_ON_ONCE("unexpected DMA address, please report to netdev@");
+	dma_unmap_page_attrs(pool->p.dev, dma,
+			     PAGE_SIZE << pool->p.order, pool->p.dma_dir,
+			     DMA_ATTR_SKIP_CPU_SYNC | DMA_ATTR_WEAK_ORDERING);
+	return false;
 }
 
 static void page_pool_set_pp_info(struct page_pool *pool,
-- 
2.33.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH net-next v8 2/6] page_pool: unify frag_count handling in page_pool_is_last_frag()
  2023-09-12  8:31 [PATCH net-next v8 0/6] introduce page_pool_alloc() related API Yunsheng Lin
  2023-09-12  8:31 ` [PATCH net-next v8 1/6] page_pool: frag API support for 32-bit arch with 64-bit DMA Yunsheng Lin
@ 2023-09-12  8:31 ` Yunsheng Lin
  2023-09-14 15:17   ` Paolo Abeni
  2023-09-12  8:31 ` [PATCH net-next v8 3/6] page_pool: remove PP_FLAG_PAGE_FRAG Yunsheng Lin
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 14+ messages in thread
From: Yunsheng Lin @ 2023-09-12  8:31 UTC (permalink / raw)
  To: davem, kuba, pabeni
  Cc: netdev, linux-kernel, Yunsheng Lin, Lorenzo Bianconi,
	Alexander Duyck, Liang Chen, Alexander Lobakin,
	Jesper Dangaard Brouer, Ilias Apalodimas, Eric Dumazet

Currently when page_pool_create() is called with
PP_FLAG_PAGE_FRAG flag, page_pool_alloc_pages() is only
allowed to be called under the below constraints:
1. page_pool_fragment_page() need to be called to setup
   page->pp_frag_count immediately.
2. page_pool_defrag_page() often need to be called to drain
   the page->pp_frag_count when there is no more user will
   be holding on to that page.

Those constraints exist in order to support a page to be
split into multi frags.

And those constraints have some overhead because of the
cache line dirtying/bouncing and atomic update.

Those constraints are unavoidable for case when we need a
page to be split into more than one frag, but there is also
case that we want to avoid the above constraints and their
overhead when a page can't be split as it can only hold a big
frag as requested by user, depending on different use cases:
use case 1: allocate page without page splitting.
use case 2: allocate page with page splitting.
use case 3: allocate page with or without page splitting
            depending on the frag size.

Currently page pool only provide page_pool_alloc_pages() and
page_pool_alloc_frag() API to enable the 1 & 2 separately,
so we can not use a combination of 1 & 2 to enable 3, it is
not possible yet because of the per page_pool flag
PP_FLAG_PAGE_FRAG.

So in order to allow allocating unsplit page without the
overhead of split page while still allow allocating split
page we need to remove the per page_pool flag in
page_pool_is_last_frag(), as best as I can think of, it seems
there are two methods as below:
1. Add per page flag/bit to indicate a page is split or
   not, which means we might need to update that flag/bit
   everytime the page is recycled, dirtying the cache line
   of 'struct page' for use case 1.
2. Unify the page->pp_frag_count handling for both split and
   unsplit page by assuming all pages in the page pool is split
   into a big frag initially.

As page pool already supports use case 1 without dirtying the
cache line of 'struct page' whenever a page is recyclable, we
need to support the above use case 3 with minimal overhead,
especially not adding any noticeable overhead for use case 1,
and we are already doing an optimization by not updating
pp_frag_count in page_pool_defrag_page() for the last frag
user, this patch chooses to unify the pp_frag_count handling
to support the above use case 3.

Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
CC: Lorenzo Bianconi <lorenzo@kernel.org>
CC: Alexander Duyck <alexander.duyck@gmail.com>
CC: Liang Chen <liangchen.linux@gmail.com>
CC: Alexander Lobakin <aleksander.lobakin@intel.com>
---
 include/net/page_pool/helpers.h | 48 ++++++++++++++++++++++++---------
 net/core/page_pool.c            | 10 ++++++-
 2 files changed, 44 insertions(+), 14 deletions(-)

diff --git a/include/net/page_pool/helpers.h b/include/net/page_pool/helpers.h
index 8e1c85de4995..0ec81b91bed8 100644
--- a/include/net/page_pool/helpers.h
+++ b/include/net/page_pool/helpers.h
@@ -115,28 +115,50 @@ static inline long page_pool_defrag_page(struct page *page, long nr)
 	long ret;
 
 	/* If nr == pp_frag_count then we have cleared all remaining
-	 * references to the page. No need to actually overwrite it, instead
-	 * we can leave this to be overwritten by the calling function.
+	 * references to the page:
+	 * 1. 'n == 1': no need to actually overwrite it.
+	 * 2. 'n != 1': overwrite it with one, which is the rare case
+	 *              for frag draining.
 	 *
-	 * The main advantage to doing this is that an atomic_read is
-	 * generally a much cheaper operation than an atomic update,
-	 * especially when dealing with a page that may be partitioned
-	 * into only 2 or 3 pieces.
+	 * The main advantage to doing this is that not only we avoid a
+	 * atomic update, as an atomic_read is generally a much cheaper
+	 * operation than an atomic update, especially when dealing with
+	 * a page that may be partitioned into only 2 or 3 pieces; but
+	 * also unify the frag and non-frag handling by ensuring all
+	 * pages have been split into one big frag initially, and only
+	 * overwrite it when the page is split into more than one frag.
 	 */
-	if (atomic_long_read(&page->pp_frag_count) == nr)
+	if (atomic_long_read(&page->pp_frag_count) == nr) {
+		/* As we have ensured nr is always one for constant case
+		 * using the BUILD_BUG_ON(), only need to handle the
+		 * non-constant case here for frag count draining, which
+		 * is a rare case.
+		 */
+		BUILD_BUG_ON(__builtin_constant_p(nr) && nr != 1);
+		if (!__builtin_constant_p(nr))
+			atomic_long_set(&page->pp_frag_count, 1);
+
 		return 0;
+	}
 
 	ret = atomic_long_sub_return(nr, &page->pp_frag_count);
 	WARN_ON(ret < 0);
+
+	/* We are the last user here too, reset frag count back to 1 to
+	 * ensure all pages have been split into one big frag initially,
+	 * this should be the rare case when the last two frag users call
+	 * page_pool_defrag_page() currently.
+	 */
+	if (unlikely(!ret))
+		atomic_long_set(&page->pp_frag_count, 1);
+
 	return ret;
 }
 
-static inline bool page_pool_is_last_frag(struct page_pool *pool,
-					  struct page *page)
+static inline bool page_pool_is_last_frag(struct page *page)
 {
-	/* If fragments aren't enabled or count is 0 we were the last user */
-	return !(pool->p.flags & PP_FLAG_PAGE_FRAG) ||
-	       (page_pool_defrag_page(page, 1) == 0);
+	/* If page_pool_defrag_page() returns 0, we were the last user */
+	return page_pool_defrag_page(page, 1) == 0;
 }
 
 /**
@@ -161,7 +183,7 @@ static inline void page_pool_put_page(struct page_pool *pool,
 	 * allow registering MEM_TYPE_PAGE_POOL, but shield linker.
 	 */
 #ifdef CONFIG_PAGE_POOL
-	if (!page_pool_is_last_frag(pool, page))
+	if (!page_pool_is_last_frag(page))
 		return;
 
 	page_pool_put_defragged_page(pool, page, dma_sync_size, allow_direct);
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index 8a9868ea5067..403b6df2e144 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -376,6 +376,14 @@ static void page_pool_set_pp_info(struct page_pool *pool,
 {
 	page->pp = pool;
 	page->pp_magic |= PP_SIGNATURE;
+
+	/* Ensuring all pages have been split into one big frag initially:
+	 * page_pool_set_pp_info() is only called once for every page when it
+	 * is allocated from the page allocator and page_pool_fragment_page()
+	 * is dirtying the same cache line as the page->pp_magic above, so
+	 * the overhead is negligible.
+	 */
+	page_pool_fragment_page(page, 1);
 	if (pool->p.init_callback)
 		pool->p.init_callback(page, pool->p.init_arg);
 }
@@ -672,7 +680,7 @@ void page_pool_put_page_bulk(struct page_pool *pool, void **data,
 		struct page *page = virt_to_head_page(data[i]);
 
 		/* It is not the last user for the page frag case */
-		if (!page_pool_is_last_frag(pool, page))
+		if (!page_pool_is_last_frag(page))
 			continue;
 
 		page = __page_pool_put_page(pool, page, -1, false);
-- 
2.33.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH net-next v8 3/6] page_pool: remove PP_FLAG_PAGE_FRAG
  2023-09-12  8:31 [PATCH net-next v8 0/6] introduce page_pool_alloc() related API Yunsheng Lin
  2023-09-12  8:31 ` [PATCH net-next v8 1/6] page_pool: frag API support for 32-bit arch with 64-bit DMA Yunsheng Lin
  2023-09-12  8:31 ` [PATCH net-next v8 2/6] page_pool: unify frag_count handling in page_pool_is_last_frag() Yunsheng Lin
@ 2023-09-12  8:31 ` Yunsheng Lin
  2023-09-12  8:31 ` [PATCH net-next v8 4/6] page_pool: introduce page_pool[_cache]_alloc() API Yunsheng Lin
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 14+ messages in thread
From: Yunsheng Lin @ 2023-09-12  8:31 UTC (permalink / raw)
  To: davem, kuba, pabeni
  Cc: netdev, linux-kernel, Yunsheng Lin, Lorenzo Bianconi,
	Alexander Duyck, Liang Chen, Alexander Lobakin, Michael Chan,
	Eric Dumazet, Yisen Zhuang, Salil Mehta, Sunil Goutham,
	Geetha sowjanya, Subbaraya Sundeep, hariprasad, Saeed Mahameed,
	Leon Romanovsky, Felix Fietkau, Ryder Lee, Shayne Chen, Sean Wang,
	Kalle Valo, Matthias Brugger, AngeloGioacchino Del Regno,
	Jesper Dangaard Brouer, Ilias Apalodimas, linux-rdma,
	linux-wireless, linux-arm-kernel, linux-mediatek

PP_FLAG_PAGE_FRAG is not really needed after pp_frag_count
handling is unified and page_pool_alloc_frag() is supported
in 32-bit arch with 64-bit DMA, so remove it.

Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
CC: Lorenzo Bianconi <lorenzo@kernel.org>
CC: Alexander Duyck <alexander.duyck@gmail.com>
CC: Liang Chen <liangchen.linux@gmail.com>
CC: Alexander Lobakin <aleksander.lobakin@intel.com>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c                | 2 --
 drivers/net/ethernet/hisilicon/hns3/hns3_enet.c          | 3 +--
 drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c | 2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c        | 2 +-
 drivers/net/wireless/mediatek/mt76/mac80211.c            | 2 +-
 include/net/page_pool/types.h                            | 6 ++----
 net/core/page_pool.c                                     | 3 +--
 net/core/skbuff.c                                        | 2 +-
 8 files changed, 8 insertions(+), 14 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 5cc0dbe12132..8c2e455f534d 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -3194,8 +3194,6 @@ static int bnxt_alloc_rx_page_pool(struct bnxt *bp,
 	pp.dma_dir = bp->rx_dir;
 	pp.max_len = PAGE_SIZE;
 	pp.flags = PP_FLAG_DMA_MAP | PP_FLAG_DMA_SYNC_DEV;
-	if (PAGE_SIZE > BNXT_RX_PAGE_SIZE)
-		pp.flags |= PP_FLAG_PAGE_FRAG;
 
 	rxr->page_pool = page_pool_create(&pp);
 	if (IS_ERR(rxr->page_pool)) {
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
index b4895c7b3efd..b9b66c1018d7 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
@@ -4931,8 +4931,7 @@ static void hns3_put_ring_config(struct hns3_nic_priv *priv)
 static void hns3_alloc_page_pool(struct hns3_enet_ring *ring)
 {
 	struct page_pool_params pp_params = {
-		.flags = PP_FLAG_DMA_MAP | PP_FLAG_PAGE_FRAG |
-				PP_FLAG_DMA_SYNC_DEV,
+		.flags = PP_FLAG_DMA_MAP | PP_FLAG_DMA_SYNC_DEV,
 		.order = hns3_page_order(ring),
 		.pool_size = ring->desc_num * hns3_buf_size(ring) /
 				(PAGE_SIZE << hns3_page_order(ring)),
diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c
index 8511906cb4e2..84573609e41b 100644
--- a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c
+++ b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c
@@ -1434,7 +1434,7 @@ int otx2_pool_init(struct otx2_nic *pfvf, u16 pool_id,
 		return 0;
 	}
 
-	pp_params.flags = PP_FLAG_PAGE_FRAG | PP_FLAG_DMA_MAP;
+	pp_params.flags = PP_FLAG_DMA_MAP;
 	pp_params.pool_size = min(OTX2_PAGE_POOL_SZ, numptrs);
 	pp_params.nid = NUMA_NO_NODE;
 	pp_params.dev = pfvf->dev;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index a2ae791538ed..f3cf13a8bb19 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -834,7 +834,7 @@ static int mlx5e_alloc_rq(struct mlx5e_params *params,
 		struct page_pool_params pp_params = { 0 };
 
 		pp_params.order     = 0;
-		pp_params.flags     = PP_FLAG_DMA_MAP | PP_FLAG_DMA_SYNC_DEV | PP_FLAG_PAGE_FRAG;
+		pp_params.flags     = PP_FLAG_DMA_MAP | PP_FLAG_DMA_SYNC_DEV;
 		pp_params.pool_size = pool_size;
 		pp_params.nid       = node;
 		pp_params.dev       = rq->pdev;
diff --git a/drivers/net/wireless/mediatek/mt76/mac80211.c b/drivers/net/wireless/mediatek/mt76/mac80211.c
index d158320bc15d..fe7cc67b7ee2 100644
--- a/drivers/net/wireless/mediatek/mt76/mac80211.c
+++ b/drivers/net/wireless/mediatek/mt76/mac80211.c
@@ -566,7 +566,7 @@ int mt76_create_page_pool(struct mt76_dev *dev, struct mt76_queue *q)
 {
 	struct page_pool_params pp_params = {
 		.order = 0,
-		.flags = PP_FLAG_PAGE_FRAG,
+		.flags = 0,
 		.nid = NUMA_NO_NODE,
 		.dev = dev->dma_dev,
 	};
diff --git a/include/net/page_pool/types.h b/include/net/page_pool/types.h
index 887e7946a597..6fc5134095ed 100644
--- a/include/net/page_pool/types.h
+++ b/include/net/page_pool/types.h
@@ -17,10 +17,8 @@
 					* Please note DMA-sync-for-CPU is still
 					* device driver responsibility
 					*/
-#define PP_FLAG_PAGE_FRAG	BIT(2) /* for page frag feature */
 #define PP_FLAG_ALL		(PP_FLAG_DMA_MAP |\
-				 PP_FLAG_DMA_SYNC_DEV |\
-				 PP_FLAG_PAGE_FRAG)
+				 PP_FLAG_DMA_SYNC_DEV)
 
 /*
  * Fast allocation side cache array/stack
@@ -45,7 +43,7 @@ struct pp_alloc_cache {
 
 /**
  * struct page_pool_params - page pool parameters
- * @flags:	PP_FLAG_DMA_MAP, PP_FLAG_DMA_SYNC_DEV, PP_FLAG_PAGE_FRAG
+ * @flags:	PP_FLAG_DMA_MAP, PP_FLAG_DMA_SYNC_DEV
  * @order:	2^order pages on allocation
  * @pool_size:	size of the ptr_ring
  * @nid:	NUMA node id to allocate from pages from
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index 403b6df2e144..1927b9c36c23 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -756,8 +756,7 @@ struct page *page_pool_alloc_frag(struct page_pool *pool,
 	unsigned int max_size = PAGE_SIZE << pool->p.order;
 	struct page *page = pool->frag_page;
 
-	if (WARN_ON(!(pool->p.flags & PP_FLAG_PAGE_FRAG) ||
-		    size > max_size))
+	if (WARN_ON(size > max_size))
 		return NULL;
 
 	size = ALIGN(size, dma_get_cache_alignment());
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 4eaf7ed0d1f4..a5f98b292e03 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -5748,7 +5748,7 @@ bool skb_try_coalesce(struct sk_buff *to, struct sk_buff *from,
 	/* In general, avoid mixing page_pool and non-page_pool allocated
 	 * pages within the same SKB. Additionally avoid dealing with clones
 	 * with page_pool pages, in case the SKB is using page_pool fragment
-	 * references (PP_FLAG_PAGE_FRAG). Since we only take full page
+	 * references (page_pool_alloc_frag()). Since we only take full page
 	 * references for cloned SKBs at the moment that would result in
 	 * inconsistent reference counts.
 	 * In theory we could take full references if @from is cloned and
-- 
2.33.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH net-next v8 4/6] page_pool: introduce page_pool[_cache]_alloc() API
  2023-09-12  8:31 [PATCH net-next v8 0/6] introduce page_pool_alloc() related API Yunsheng Lin
                   ` (2 preceding siblings ...)
  2023-09-12  8:31 ` [PATCH net-next v8 3/6] page_pool: remove PP_FLAG_PAGE_FRAG Yunsheng Lin
@ 2023-09-12  8:31 ` Yunsheng Lin
  2023-09-12  8:31 ` [PATCH net-next v8 5/6] page_pool: update document about frag API Yunsheng Lin
  2023-09-12  8:31 ` [PATCH net-next v8 6/6] net: veth: use newly added page pool API for veth with xdp Yunsheng Lin
  5 siblings, 0 replies; 14+ messages in thread
From: Yunsheng Lin @ 2023-09-12  8:31 UTC (permalink / raw)
  To: davem, kuba, pabeni
  Cc: netdev, linux-kernel, Yunsheng Lin, Lorenzo Bianconi,
	Alexander Duyck, Liang Chen, Alexander Lobakin,
	Jesper Dangaard Brouer, Ilias Apalodimas, Eric Dumazet

Currently page pool supports the below use cases:
use case 1: allocate page without page splitting using
            page_pool_alloc_pages() API if the driver knows
            that the memory it need is always bigger than
            half of the page allocated from page pool.
use case 2: allocate page frag with page splitting using
            page_pool_alloc_frag() API if the driver knows
            that the memory it need is always smaller than
            or equal to the half of the page allocated from
            page pool.

There is emerging use case [1] & [2] that is a mix of the
above two case: the driver doesn't know the size of memory it
need beforehand, so the driver may use something like below to
allocate memory with least memory utilization and performance
penalty:

if (size << 1 > max_size)
	page = page_pool_alloc_pages();
else
	page = page_pool_alloc_frag();

To avoid the driver doing something like above, add the
page_pool[_cache]_alloc() API to support the above use case,
and update the true size of memory that is acctually allocated
by updating '*size' back to the driver in order to avoid
exacerbating truesize underestimate problem.

Rename page_pool_free() which is used in the destroy process to
__page_pool_destroy() to avoid confusion with the newly added
API.

1. https://lore.kernel.org/all/d3ae6bd3537fbce379382ac6a42f67e22f27ece2.1683896626.git.lorenzo@kernel.org/
2. https://lore.kernel.org/all/20230526054621.18371-3-liangchen.linux@gmail.com/

Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
CC: Lorenzo Bianconi <lorenzo@kernel.org>
CC: Alexander Duyck <alexander.duyck@gmail.com>
CC: Liang Chen <liangchen.linux@gmail.com>
CC: Alexander Lobakin <aleksander.lobakin@intel.com>
---
 include/net/page_pool/helpers.h | 65 +++++++++++++++++++++++++++++++++
 net/core/page_pool.c            |  4 +-
 2 files changed, 67 insertions(+), 2 deletions(-)

diff --git a/include/net/page_pool/helpers.h b/include/net/page_pool/helpers.h
index 0ec81b91bed8..c0e6c7d1b219 100644
--- a/include/net/page_pool/helpers.h
+++ b/include/net/page_pool/helpers.h
@@ -82,6 +82,65 @@ static inline struct page *page_pool_dev_alloc_frag(struct page_pool *pool,
 	return page_pool_alloc_frag(pool, offset, size, gfp);
 }
 
+static inline struct page *page_pool_alloc(struct page_pool *pool,
+					   unsigned int *offset,
+					   unsigned int *size, gfp_t gfp)
+{
+	unsigned int max_size = PAGE_SIZE << pool->p.order;
+	struct page *page;
+
+	if ((*size << 1) > max_size) {
+		*size = max_size;
+		*offset = 0;
+		return page_pool_alloc_pages(pool, gfp);
+	}
+
+	page = page_pool_alloc_frag(pool, offset, *size, gfp);
+	if (unlikely(!page))
+		return NULL;
+
+	/* There is very likely not enough space for another frag, so append the
+	 * remaining size to the current frag to avoid truesize underestimate
+	 * problem.
+	 */
+	if (pool->frag_offset + *size > max_size) {
+		*size = max_size - *offset;
+		pool->frag_offset = max_size;
+	}
+
+	return page;
+}
+
+static inline struct page *page_pool_dev_alloc(struct page_pool *pool,
+					       unsigned int *offset,
+					       unsigned int *size)
+{
+	gfp_t gfp = (GFP_ATOMIC | __GFP_NOWARN);
+
+	return page_pool_alloc(pool, offset, size, gfp);
+}
+
+static inline void *page_pool_cache_alloc(struct page_pool *pool,
+					  unsigned int *size, gfp_t gfp)
+{
+	unsigned int offset;
+	struct page *page;
+
+	page = page_pool_alloc(pool, &offset, size, gfp);
+	if (unlikely(!page))
+		return NULL;
+
+	return page_address(page) + offset;
+}
+
+static inline void *page_pool_dev_cache_alloc(struct page_pool *pool,
+					      unsigned int *size)
+{
+	gfp_t gfp = (GFP_ATOMIC | __GFP_NOWARN);
+
+	return page_pool_cache_alloc(pool, size, gfp);
+}
+
 /**
  * page_pool_get_dma_dir() - Retrieve the stored DMA direction.
  * @pool:	pool from which page was allocated
@@ -222,6 +281,12 @@ static inline void page_pool_recycle_direct(struct page_pool *pool,
 #define PAGE_POOL_32BIT_ARCH_WITH_64BIT_DMA	\
 		(sizeof(dma_addr_t) > sizeof(unsigned long))
 
+static inline void page_pool_cache_free(struct page_pool *pool, void *data,
+					bool allow_direct)
+{
+	page_pool_put_page(pool, virt_to_head_page(data), -1, allow_direct);
+}
+
 /**
  * page_pool_get_dma_addr() - Retrieve the stored DMA address.
  * @page:	page allocated from a page pool
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index 1927b9c36c23..74106f6d8f73 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -809,7 +809,7 @@ static void page_pool_empty_ring(struct page_pool *pool)
 	}
 }
 
-static void page_pool_free(struct page_pool *pool)
+static void __page_pool_destroy(struct page_pool *pool)
 {
 	if (pool->disconnect)
 		pool->disconnect(pool);
@@ -860,7 +860,7 @@ static int page_pool_release(struct page_pool *pool)
 	page_pool_scrub(pool);
 	inflight = page_pool_inflight(pool);
 	if (!inflight)
-		page_pool_free(pool);
+		__page_pool_destroy(pool);
 
 	return inflight;
 }
-- 
2.33.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH net-next v8 5/6] page_pool: update document about frag API
  2023-09-12  8:31 [PATCH net-next v8 0/6] introduce page_pool_alloc() related API Yunsheng Lin
                   ` (3 preceding siblings ...)
  2023-09-12  8:31 ` [PATCH net-next v8 4/6] page_pool: introduce page_pool[_cache]_alloc() API Yunsheng Lin
@ 2023-09-12  8:31 ` Yunsheng Lin
  2023-09-12  8:31 ` [PATCH net-next v8 6/6] net: veth: use newly added page pool API for veth with xdp Yunsheng Lin
  5 siblings, 0 replies; 14+ messages in thread
From: Yunsheng Lin @ 2023-09-12  8:31 UTC (permalink / raw)
  To: davem, kuba, pabeni
  Cc: netdev, linux-kernel, Yunsheng Lin, Lorenzo Bianconi,
	Alexander Duyck, Liang Chen, Alexander Lobakin,
	Jesper Dangaard Brouer, Ilias Apalodimas, Eric Dumazet,
	Jonathan Corbet, Alexei Starovoitov, Daniel Borkmann,
	John Fastabend, linux-doc, bpf

As more drivers begin to use the frag API, update the
document about how to decide which API to use for the
driver author.

Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
CC: Lorenzo Bianconi <lorenzo@kernel.org>
CC: Alexander Duyck <alexander.duyck@gmail.com>
CC: Liang Chen <liangchen.linux@gmail.com>
CC: Alexander Lobakin <aleksander.lobakin@intel.com>
---
 Documentation/networking/page_pool.rst |  4 +-
 include/net/page_pool/helpers.h        | 88 ++++++++++++++++++++++----
 2 files changed, 79 insertions(+), 13 deletions(-)

diff --git a/Documentation/networking/page_pool.rst b/Documentation/networking/page_pool.rst
index 215ebc92752c..0c0705994f51 100644
--- a/Documentation/networking/page_pool.rst
+++ b/Documentation/networking/page_pool.rst
@@ -58,7 +58,9 @@ a page will cause no race conditions is enough.
 
 .. kernel-doc:: include/net/page_pool/helpers.h
    :identifiers: page_pool_put_page page_pool_put_full_page
-		 page_pool_recycle_direct page_pool_dev_alloc_pages
+		 page_pool_recycle_direct page_pool_cache_free
+		 page_pool_dev_alloc_pages page_pool_dev_alloc_frag
+		 page_pool_dev_alloc page_pool_dev_cache_alloc
 		 page_pool_get_dma_addr page_pool_get_dma_dir
 
 .. kernel-doc:: net/core/page_pool.c
diff --git a/include/net/page_pool/helpers.h b/include/net/page_pool/helpers.h
index c0e6c7d1b219..b20afe22c17d 100644
--- a/include/net/page_pool/helpers.h
+++ b/include/net/page_pool/helpers.h
@@ -8,23 +8,47 @@
 /**
  * DOC: page_pool allocator
  *
- * The page_pool allocator is optimized for the XDP mode that
- * uses one frame per-page, but it can fallback on the
- * regular page allocator APIs.
+ * The page_pool allocator is optimized for recycling page or page frag used by
+ * skb packet and xdp frame.
  *
- * Basic use involves replacing alloc_pages() calls with the
- * page_pool_alloc_pages() call.  Drivers should use
- * page_pool_dev_alloc_pages() replacing dev_alloc_pages().
+ * Basic use involves replacing napi_alloc_frag() and alloc_pages() calls with
+ * page_pool_cache_alloc() and page_pool_alloc(), which allocate memory with or
+ * without page splitting depending on the requested memory size.
  *
- * API keeps track of in-flight pages, in order to let API user know
- * when it is safe to free a page_pool object.  Thus, API users
- * must call page_pool_put_page() to free the page, or attach
- * the page to a page_pool-aware objects like skbs marked with
+ * If the driver knows that it always requires full pages or its allocations are
+ * always smaller than half a page, it can use one of the more specific API
+ * calls:
+ *
+ * 1. page_pool_alloc_pages(): allocate memory without page splitting when
+ * driver knows that the memory it need is always bigger than half of the page
+ * allocated from page pool. There is no cache line dirtying for 'struct page'
+ * when a page is recycled back to the page pool.
+ *
+ * 2. page_pool_alloc_frag(): allocate memory with page splitting when driver
+ * knows that the memory it need is always smaller than or equal to half of the
+ * page allocated from page pool. Page splitting enables memory saving and thus
+ * avoids TLB/cache miss for data access, but there also is some cost to
+ * implement page splitting, mainly some cache line dirtying/bouncing for
+ * 'struct page' and atomic operation for page->pp_frag_count.
+ *
+ * API keeps track of in-flight pages, in order to let API user know when it is
+ * safe to free a page_pool object, the API users must call page_pool_put_page()
+ * or page_pool_cache_free() to free the pp page or the pp buffer, or attach the
+ * pp page or the pp buffer to a page_pool-aware objects like skbs marked with
  * skb_mark_for_recycle().
  *
- * API user must call page_pool_put_page() once on a page, as it
- * will either recycle the page, or in case of refcnt > 1, it will
+ * page_pool_put_page() may be called multi times on the same page if a page is
+ * split into multi frags. For the last frag, see page_pool_is_last_frag(), it
+ * will either recycle the page, or in case of page->_refcount > 1, it will
  * release the DMA mapping and in-flight state accounting.
+ *
+ * dma_sync_single_range_for_device() is only called for the last pp page user
+ * when page_pool is created with PP_FLAG_DMA_SYNC_DEV flag, so it depend on the
+ * last freed frag to do the sync_for_device operation for all frags in the same
+ * page when a page is split, the API user must setup pool->p.max_len and
+ * pool->p.offset correctly and ensure that page_pool_put_page() is called with
+ * dma_sync_size being -1 for page_pool_alloc(), page_pool_cache_alloc() and
+ * page_pool_alloc_frag() API.
  */
 #ifndef _NET_PAGE_POOL_HELPERS_H
 #define _NET_PAGE_POOL_HELPERS_H
@@ -73,6 +97,17 @@ static inline struct page *page_pool_dev_alloc_pages(struct page_pool *pool)
 	return page_pool_alloc_pages(pool, gfp);
 }
 
+/**
+ * page_pool_dev_alloc_frag() - allocate a page frag.
+ * @pool: pool from which to allocate
+ * @offset: offset to the allocated page
+ * @size: requested size
+ *
+ * Get a page frag from the page allocator or page_pool caches.
+ *
+ * Return:
+ * Returns allocated page frag, otherwise return NULL.
+ */
 static inline struct page *page_pool_dev_alloc_frag(struct page_pool *pool,
 						    unsigned int *offset,
 						    unsigned int size)
@@ -111,6 +146,17 @@ static inline struct page *page_pool_alloc(struct page_pool *pool,
 	return page;
 }
 
+/**
+ * page_pool_dev_alloc() - allocate a page or a page frag.
+ * @pool: pool from which to allocate
+ * @offset: offset to the allocated page
+ * @size: in as the requested size, out as the allocated size
+ *
+ * Get a page or a page frag from the page allocator or page_pool caches.
+ *
+ * Return:
+ * Returns a page or a page frag, otherwise return NULL.
+ */
 static inline struct page *page_pool_dev_alloc(struct page_pool *pool,
 					       unsigned int *offset,
 					       unsigned int *size)
@@ -133,6 +179,16 @@ static inline void *page_pool_cache_alloc(struct page_pool *pool,
 	return page_address(page) + offset;
 }
 
+/**
+ * page_pool_dev_cache_alloc() - allocate a cache.
+ * @pool: pool from which to allocate
+ * @size: in as the requested size, out as the allocated size
+ *
+ * Get a cache from the page allocator or page_pool caches.
+ *
+ * Return:
+ * Returns the addr for the allocated cache, otherwise return NULL.
+ */
 static inline void *page_pool_dev_cache_alloc(struct page_pool *pool,
 					      unsigned int *size)
 {
@@ -281,6 +337,14 @@ static inline void page_pool_recycle_direct(struct page_pool *pool,
 #define PAGE_POOL_32BIT_ARCH_WITH_64BIT_DMA	\
 		(sizeof(dma_addr_t) > sizeof(unsigned long))
 
+/**
+ * page_pool_cache_free() - free a cache into the page_pool
+ * @pool: pool from which cache was allocated
+ * @data: addr of cache to be free
+ * @allow_direct: freed by the consumer, allow lockless caching
+ *
+ * Free a cache allocated from page_pool_dev_cache_alloc().
+ */
 static inline void page_pool_cache_free(struct page_pool *pool, void *data,
 					bool allow_direct)
 {
-- 
2.33.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH net-next v8 6/6] net: veth: use newly added page pool API for veth with xdp
  2023-09-12  8:31 [PATCH net-next v8 0/6] introduce page_pool_alloc() related API Yunsheng Lin
                   ` (4 preceding siblings ...)
  2023-09-12  8:31 ` [PATCH net-next v8 5/6] page_pool: update document about frag API Yunsheng Lin
@ 2023-09-12  8:31 ` Yunsheng Lin
  5 siblings, 0 replies; 14+ messages in thread
From: Yunsheng Lin @ 2023-09-12  8:31 UTC (permalink / raw)
  To: davem, kuba, pabeni
  Cc: netdev, linux-kernel, Yunsheng Lin, Lorenzo Bianconi,
	Alexander Duyck, Liang Chen, Alexander Lobakin, Eric Dumazet,
	Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
	John Fastabend, bpf

Use page_pool[_cache]_alloc() API to allocate memory with
least memory utilization and performance penalty.

Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
CC: Lorenzo Bianconi <lorenzo@kernel.org>
CC: Alexander Duyck <alexander.duyck@gmail.com>
CC: Liang Chen <liangchen.linux@gmail.com>
CC: Alexander Lobakin <aleksander.lobakin@intel.com>
---
 drivers/net/veth.c | 25 ++++++++++++++++---------
 1 file changed, 16 insertions(+), 9 deletions(-)

diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index 9c6f4f83f22b..a31a792aa00d 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -737,10 +737,11 @@ static int veth_convert_skb_to_xdp_buff(struct veth_rq *rq,
 	if (skb_shared(skb) || skb_head_is_locked(skb) ||
 	    skb_shinfo(skb)->nr_frags ||
 	    skb_headroom(skb) < XDP_PACKET_HEADROOM) {
-		u32 size, len, max_head_size, off;
+		u32 size, len, max_head_size, off, truesize, page_offset;
 		struct sk_buff *nskb;
 		struct page *page;
 		int i, head_off;
+		void *data;
 
 		/* We need a private copy of the skb and data buffers since
 		 * the ebpf program can modify it. We segment the original skb
@@ -753,14 +754,17 @@ static int veth_convert_skb_to_xdp_buff(struct veth_rq *rq,
 		if (skb->len > PAGE_SIZE * MAX_SKB_FRAGS + max_head_size)
 			goto drop;
 
+		size = min_t(u32, skb->len, max_head_size);
+		truesize = SKB_HEAD_ALIGN(size) + VETH_XDP_HEADROOM;
+
 		/* Allocate skb head */
-		page = page_pool_dev_alloc_pages(rq->page_pool);
-		if (!page)
+		data = page_pool_dev_cache_alloc(rq->page_pool, &truesize);
+		if (!data)
 			goto drop;
 
-		nskb = napi_build_skb(page_address(page), PAGE_SIZE);
+		nskb = napi_build_skb(data, truesize);
 		if (!nskb) {
-			page_pool_put_full_page(rq->page_pool, page, true);
+			page_pool_cache_free(rq->page_pool, data, true);
 			goto drop;
 		}
 
@@ -768,7 +772,6 @@ static int veth_convert_skb_to_xdp_buff(struct veth_rq *rq,
 		skb_copy_header(nskb, skb);
 		skb_mark_for_recycle(nskb);
 
-		size = min_t(u32, skb->len, max_head_size);
 		if (skb_copy_bits(skb, 0, nskb->data, size)) {
 			consume_skb(nskb);
 			goto drop;
@@ -783,14 +786,18 @@ static int veth_convert_skb_to_xdp_buff(struct veth_rq *rq,
 		len = skb->len - off;
 
 		for (i = 0; i < MAX_SKB_FRAGS && off < skb->len; i++) {
-			page = page_pool_dev_alloc_pages(rq->page_pool);
+			size = min_t(u32, len, PAGE_SIZE);
+			truesize = size;
+
+			page = page_pool_dev_alloc(rq->page_pool, &page_offset,
+						   &truesize);
 			if (!page) {
 				consume_skb(nskb);
 				goto drop;
 			}
 
-			size = min_t(u32, len, PAGE_SIZE);
-			skb_add_rx_frag(nskb, i, page, 0, size, PAGE_SIZE);
+			skb_add_rx_frag(nskb, i, page, page_offset, size,
+					truesize);
 			if (skb_copy_bits(skb, off, page_address(page),
 					  size)) {
 				consume_skb(nskb);
-- 
2.33.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next v8 2/6] page_pool: unify frag_count handling in page_pool_is_last_frag()
  2023-09-12  8:31 ` [PATCH net-next v8 2/6] page_pool: unify frag_count handling in page_pool_is_last_frag() Yunsheng Lin
@ 2023-09-14 15:17   ` Paolo Abeni
  2023-09-18 11:15     ` Yunsheng Lin
  0 siblings, 1 reply; 14+ messages in thread
From: Paolo Abeni @ 2023-09-14 15:17 UTC (permalink / raw)
  To: Yunsheng Lin, davem, kuba
  Cc: netdev, linux-kernel, Lorenzo Bianconi, Alexander Duyck,
	Liang Chen, Alexander Lobakin, Jesper Dangaard Brouer,
	Ilias Apalodimas, Eric Dumazet

On Tue, 2023-09-12 at 16:31 +0800, Yunsheng Lin wrote:
> Currently when page_pool_create() is called with
> PP_FLAG_PAGE_FRAG flag, page_pool_alloc_pages() is only
> allowed to be called under the below constraints:
> 1. page_pool_fragment_page() need to be called to setup
>    page->pp_frag_count immediately.
> 2. page_pool_defrag_page() often need to be called to drain
>    the page->pp_frag_count when there is no more user will
>    be holding on to that page.
> 
> Those constraints exist in order to support a page to be
> split into multi frags.
> 
> And those constraints have some overhead because of the
> cache line dirtying/bouncing and atomic update.
> 
> Those constraints are unavoidable for case when we need a
> page to be split into more than one frag, but there is also
> case that we want to avoid the above constraints and their
> overhead when a page can't be split as it can only hold a big
> frag as requested by user, depending on different use cases:
> use case 1: allocate page without page splitting.
> use case 2: allocate page with page splitting.
> use case 3: allocate page with or without page splitting
>             depending on the frag size.
> 
> Currently page pool only provide page_pool_alloc_pages() and
> page_pool_alloc_frag() API to enable the 1 & 2 separately,
> so we can not use a combination of 1 & 2 to enable 3, it is
> not possible yet because of the per page_pool flag
> PP_FLAG_PAGE_FRAG.
> 
> So in order to allow allocating unsplit page without the
> overhead of split page while still allow allocating split
> page we need to remove the per page_pool flag in
> page_pool_is_last_frag(), as best as I can think of, it seems
> there are two methods as below:
> 1. Add per page flag/bit to indicate a page is split or
>    not, which means we might need to update that flag/bit
>    everytime the page is recycled, dirtying the cache line
>    of 'struct page' for use case 1.
> 2. Unify the page->pp_frag_count handling for both split and
>    unsplit page by assuming all pages in the page pool is split
>    into a big frag initially.
> 
> As page pool already supports use case 1 without dirtying the
> cache line of 'struct page' whenever a page is recyclable, we
> need to support the above use case 3 with minimal overhead,
> especially not adding any noticeable overhead for use case 1,
> and we are already doing an optimization by not updating
> pp_frag_count in page_pool_defrag_page() for the last frag
> user, this patch chooses to unify the pp_frag_count handling
> to support the above use case 3.
> 
> Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
> CC: Lorenzo Bianconi <lorenzo@kernel.org>
> CC: Alexander Duyck <alexander.duyck@gmail.com>
> CC: Liang Chen <liangchen.linux@gmail.com>
> CC: Alexander Lobakin <aleksander.lobakin@intel.com>
> ---
>  include/net/page_pool/helpers.h | 48 ++++++++++++++++++++++++---------
>  net/core/page_pool.c            | 10 ++++++-
>  2 files changed, 44 insertions(+), 14 deletions(-)
> 
> diff --git a/include/net/page_pool/helpers.h b/include/net/page_pool/helpers.h
> index 8e1c85de4995..0ec81b91bed8 100644
> --- a/include/net/page_pool/helpers.h
> +++ b/include/net/page_pool/helpers.h
> @@ -115,28 +115,50 @@ static inline long page_pool_defrag_page(struct page *page, long nr)
>  	long ret;
>  
>  	/* If nr == pp_frag_count then we have cleared all remaining
> -	 * references to the page. No need to actually overwrite it, instead
> -	 * we can leave this to be overwritten by the calling function.
> +	 * references to the page:
> +	 * 1. 'n == 1': no need to actually overwrite it.
> +	 * 2. 'n != 1': overwrite it with one, which is the rare case
> +	 *              for frag draining.
>  	 *
> -	 * The main advantage to doing this is that an atomic_read is
> -	 * generally a much cheaper operation than an atomic update,
> -	 * especially when dealing with a page that may be partitioned
> -	 * into only 2 or 3 pieces.
> +	 * The main advantage to doing this is that not only we avoid a
> +	 * atomic update, as an atomic_read is generally a much cheaper
> +	 * operation than an atomic update, especially when dealing with
> +	 * a page that may be partitioned into only 2 or 3 pieces; but
> +	 * also unify the frag and non-frag handling by ensuring all
> +	 * pages have been split into one big frag initially, and only
> +	 * overwrite it when the page is split into more than one frag.
>  	 */
> -	if (atomic_long_read(&page->pp_frag_count) == nr)
> +	if (atomic_long_read(&page->pp_frag_count) == nr) {
> +		/* As we have ensured nr is always one for constant case
> +		 * using the BUILD_BUG_ON(), only need to handle the
> +		 * non-constant case here for frag count draining, which
> +		 * is a rare case.
> +		 */
> +		BUILD_BUG_ON(__builtin_constant_p(nr) && nr != 1);
> +		if (!__builtin_constant_p(nr))
> +			atomic_long_set(&page->pp_frag_count, 1);
> +
>  		return 0;
> +	}
>  
>  	ret = atomic_long_sub_return(nr, &page->pp_frag_count);
>  	WARN_ON(ret < 0);
> +
> +	/* We are the last user here too, reset frag count back to 1 to
> +	 * ensure all pages have been split into one big frag initially,
> +	 * this should be the rare case when the last two frag users call
> +	 * page_pool_defrag_page() currently.
> +	 */
> +	if (unlikely(!ret))
> +		atomic_long_set(&page->pp_frag_count, 1);
> +
>  	return ret;
>  }
>  
> -static inline bool page_pool_is_last_frag(struct page_pool *pool,
> -					  struct page *page)
> +static inline bool page_pool_is_last_frag(struct page *page)
>  {
> -	/* If fragments aren't enabled or count is 0 we were the last user */
> -	return !(pool->p.flags & PP_FLAG_PAGE_FRAG) ||
> -	       (page_pool_defrag_page(page, 1) == 0);
> +	/* If page_pool_defrag_page() returns 0, we were the last user */
> +	return page_pool_defrag_page(page, 1) == 0;
>  }
>  
>  /**
> @@ -161,7 +183,7 @@ static inline void page_pool_put_page(struct page_pool *pool,
>  	 * allow registering MEM_TYPE_PAGE_POOL, but shield linker.
>  	 */
>  #ifdef CONFIG_PAGE_POOL
> -	if (!page_pool_is_last_frag(pool, page))
> +	if (!page_pool_is_last_frag(page))
>  		return;
>  
>  	page_pool_put_defragged_page(pool, page, dma_sync_size, allow_direct);
> diff --git a/net/core/page_pool.c b/net/core/page_pool.c
> index 8a9868ea5067..403b6df2e144 100644
> --- a/net/core/page_pool.c
> +++ b/net/core/page_pool.c
> @@ -376,6 +376,14 @@ static void page_pool_set_pp_info(struct page_pool *pool,
>  {
>  	page->pp = pool;
>  	page->pp_magic |= PP_SIGNATURE;
> +
> +	/* Ensuring all pages have been split into one big frag initially:
> +	 * page_pool_set_pp_info() is only called once for every page when it
> +	 * is allocated from the page allocator and page_pool_fragment_page()
> +	 * is dirtying the same cache line as the page->pp_magic above, so
> +	 * the overhead is negligible.
> +	 */
> +	page_pool_fragment_page(page, 1);
>  	if (pool->p.init_callback)
>  		pool->p.init_callback(page, pool->p.init_arg);
>  }

I think it would be nice backing the above claim with some benchmarks.
(possibly even just a micro-benchmark around the relevant APIs)
and include such info into the changelog message.

Cheers,

Paolo


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next v8 1/6] page_pool: frag API support for 32-bit arch with 64-bit DMA
  2023-09-12  8:31 ` [PATCH net-next v8 1/6] page_pool: frag API support for 32-bit arch with 64-bit DMA Yunsheng Lin
@ 2023-09-15  8:28   ` Jesper Dangaard Brouer
  2023-09-20  8:59     ` Yunsheng Lin
  0 siblings, 1 reply; 14+ messages in thread
From: Jesper Dangaard Brouer @ 2023-09-15  8:28 UTC (permalink / raw)
  To: Yunsheng Lin, davem, kuba, pabeni
  Cc: brouer, netdev, linux-kernel, Lorenzo Bianconi, Alexander Duyck,
	Liang Chen, Alexander Lobakin, Guillaume Tucker,
	Jesper Dangaard Brouer, Ilias Apalodimas, Eric Dumazet, Linux-MM,
	Matthew Wilcox, Mel Gorman

Hi Lin,

This looks reasonable, but given you are changing struct-page
(include/linux/mm_types.h) we need to MM-list <linux-mm@kvack.org>.
Also Cc Wilcox.

I think it was Ilias and Duyck that validated the assumptions, last time
this patch was discussed. Thus I want to see their review before this is
applied.

-Jesper

On 12/09/2023 10.31, Yunsheng Lin wrote:
> Currently page_pool_alloc_frag() is not supported in 32-bit
> arch with 64-bit DMA because of the overlap issue between
> pp_frag_count and dma_addr_upper in 'struct page' for those
> arches, which seems to be quite common, see [1], which means
> driver may need to handle it when using frag API.
> 
> It is assumed that the combination of the above arch with an
> address space >16TB does not exist, as all those arches have
> 64b equivalent, it seems logical to use the 64b version for a
> system with a large address space. It is also assumed that dma
> address is page aligned when we are dma mapping a page aliged
> buffer, see [2].
> 
> That means we're storing 12 bits of 0 at the lower end for a
> dma address, we can reuse those bits for the above arches to
> support 32b+12b, which is 16TB of memory.
> 
> If we make a wrong assumption, a warning is emitted so that
> user can report to us.
> 
> 1. https://lore.kernel.org/all/20211117075652.58299-1-linyunsheng@huawei.com/
> 2. https://lore.kernel.org/all/20230818145145.4b357c89@kernel.org/
> 
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
> CC: Lorenzo Bianconi <lorenzo@kernel.org>
> CC: Alexander Duyck <alexander.duyck@gmail.com>
> CC: Liang Chen <liangchen.linux@gmail.com>
> CC: Alexander Lobakin <aleksander.lobakin@intel.com>
> CC: Guillaume Tucker <guillaume.tucker@collabora.com>
> ---
>   include/linux/mm_types.h        | 13 +------------
>   include/net/page_pool/helpers.h | 20 ++++++++++++++------
>   net/core/page_pool.c            | 14 +++++++++-----
>   3 files changed, 24 insertions(+), 23 deletions(-)
> 
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 36c5b43999e6..74b49c4c7a52 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -125,18 +125,7 @@ struct page {
>   			struct page_pool *pp;
>   			unsigned long _pp_mapping_pad;
>   			unsigned long dma_addr;
> -			union {
> -				/**
> -				 * dma_addr_upper: might require a 64-bit
> -				 * value on 32-bit architectures.
> -				 */
> -				unsigned long dma_addr_upper;
> -				/**
> -				 * For frag page support, not supported in
> -				 * 32-bit architectures with 64-bit DMA.
> -				 */
> -				atomic_long_t pp_frag_count;
> -			};
> +			atomic_long_t pp_frag_count;
>   		};
>   		struct {	/* Tail pages of compound page */
>   			unsigned long compound_head;	/* Bit zero is set */
> diff --git a/include/net/page_pool/helpers.h b/include/net/page_pool/helpers.h
> index 94231533a369..8e1c85de4995 100644
> --- a/include/net/page_pool/helpers.h
> +++ b/include/net/page_pool/helpers.h
> @@ -197,7 +197,7 @@ static inline void page_pool_recycle_direct(struct page_pool *pool,
>   	page_pool_put_full_page(pool, page, true);
>   }
>   
> -#define PAGE_POOL_DMA_USE_PP_FRAG_COUNT	\
> +#define PAGE_POOL_32BIT_ARCH_WITH_64BIT_DMA	\
>   		(sizeof(dma_addr_t) > sizeof(unsigned long))
>   
>   /**
> @@ -211,17 +211,25 @@ static inline dma_addr_t page_pool_get_dma_addr(struct page *page)
>   {
>   	dma_addr_t ret = page->dma_addr;
>   
> -	if (PAGE_POOL_DMA_USE_PP_FRAG_COUNT)
> -		ret |= (dma_addr_t)page->dma_addr_upper << 16 << 16;
> +	if (PAGE_POOL_32BIT_ARCH_WITH_64BIT_DMA)
> +		ret <<= PAGE_SHIFT;
>   
>   	return ret;
>   }
>   
> -static inline void page_pool_set_dma_addr(struct page *page, dma_addr_t addr)
> +static inline bool page_pool_set_dma_addr(struct page *page, dma_addr_t addr)
>   {
> +	if (PAGE_POOL_32BIT_ARCH_WITH_64BIT_DMA) {
> +		page->dma_addr = addr >> PAGE_SHIFT;
> +
> +		/* We assume page alignment to shave off bottom bits,
> +		 * if this "compression" doesn't work we need to drop.
> +		 */
> +		return addr != (dma_addr_t)page->dma_addr << PAGE_SHIFT;
> +	}
> +
>   	page->dma_addr = addr;
> -	if (PAGE_POOL_DMA_USE_PP_FRAG_COUNT)
> -		page->dma_addr_upper = upper_32_bits(addr);
> +	return false;
>   }
>   
>   static inline bool page_pool_put(struct page_pool *pool)
> diff --git a/net/core/page_pool.c b/net/core/page_pool.c
> index 77cb75e63aca..8a9868ea5067 100644
> --- a/net/core/page_pool.c
> +++ b/net/core/page_pool.c
> @@ -211,10 +211,6 @@ static int page_pool_init(struct page_pool *pool,
>   		 */
>   	}
>   
> -	if (PAGE_POOL_DMA_USE_PP_FRAG_COUNT &&
> -	    pool->p.flags & PP_FLAG_PAGE_FRAG)
> -		return -EINVAL;
> -
>   #ifdef CONFIG_PAGE_POOL_STATS
>   	pool->recycle_stats = alloc_percpu(struct page_pool_recycle_stats);
>   	if (!pool->recycle_stats)
> @@ -359,12 +355,20 @@ static bool page_pool_dma_map(struct page_pool *pool, struct page *page)
>   	if (dma_mapping_error(pool->p.dev, dma))
>   		return false;
>   
> -	page_pool_set_dma_addr(page, dma);
> +	if (page_pool_set_dma_addr(page, dma))
> +		goto unmap_failed;
>   
>   	if (pool->p.flags & PP_FLAG_DMA_SYNC_DEV)
>   		page_pool_dma_sync_for_device(pool, page, pool->p.max_len);
>   
>   	return true;
> +
> +unmap_failed:
> +	WARN_ON_ONCE("unexpected DMA address, please report to netdev@");
> +	dma_unmap_page_attrs(pool->p.dev, dma,
> +			     PAGE_SIZE << pool->p.order, pool->p.dma_dir,
> +			     DMA_ATTR_SKIP_CPU_SYNC | DMA_ATTR_WEAK_ORDERING);
> +	return false;
>   }
>   
>   static void page_pool_set_pp_info(struct page_pool *pool,


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next v8 2/6] page_pool: unify frag_count handling in page_pool_is_last_frag()
  2023-09-14 15:17   ` Paolo Abeni
@ 2023-09-18 11:15     ` Yunsheng Lin
  2023-09-19 12:47       ` Yunsheng Lin
  0 siblings, 1 reply; 14+ messages in thread
From: Yunsheng Lin @ 2023-09-18 11:15 UTC (permalink / raw)
  To: Paolo Abeni, davem, kuba
  Cc: netdev, linux-kernel, Lorenzo Bianconi, Alexander Duyck,
	Liang Chen, Alexander Lobakin, Jesper Dangaard Brouer,
	Ilias Apalodimas, Eric Dumazet

On 2023/9/14 23:17, Paolo Abeni wrote:
>> --- a/net/core/page_pool.c
>> +++ b/net/core/page_pool.c
>> @@ -376,6 +376,14 @@ static void page_pool_set_pp_info(struct page_pool *pool,
>>  {
>>  	page->pp = pool;
>>  	page->pp_magic |= PP_SIGNATURE;
>> +
>> +	/* Ensuring all pages have been split into one big frag initially:
>> +	 * page_pool_set_pp_info() is only called once for every page when it
>> +	 * is allocated from the page allocator and page_pool_fragment_page()
>> +	 * is dirtying the same cache line as the page->pp_magic above, so
>> +	 * the overhead is negligible.
>> +	 */
>> +	page_pool_fragment_page(page, 1);
>>  	if (pool->p.init_callback)
>>  		pool->p.init_callback(page, pool->p.init_arg);
>>  }
> 
> I think it would be nice backing the above claim with some benchmarks.
> (possibly even just a micro-benchmark around the relevant APIs)
> and include such info into the changelog message.

Sure, will adjust Jesper's below micro-benchmark to test it:
https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/lib/bench_page_pool_simple.c

Please let me know if there is other better idea to do the
micro-benchmark in your mind, thanks.

> 
> Cheers,
> 
> Paolo
> 
> .
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next v8 2/6] page_pool: unify frag_count handling in page_pool_is_last_frag()
  2023-09-18 11:15     ` Yunsheng Lin
@ 2023-09-19 12:47       ` Yunsheng Lin
  0 siblings, 0 replies; 14+ messages in thread
From: Yunsheng Lin @ 2023-09-19 12:47 UTC (permalink / raw)
  To: Paolo Abeni, davem, kuba
  Cc: netdev, linux-kernel, Lorenzo Bianconi, Alexander Duyck,
	Liang Chen, Alexander Lobakin, Jesper Dangaard Brouer,
	Ilias Apalodimas, Eric Dumazet

On 2023/9/18 19:15, Yunsheng Lin wrote:
> On 2023/9/14 23:17, Paolo Abeni wrote:
>>> --- a/net/core/page_pool.c
>>> +++ b/net/core/page_pool.c
>>> @@ -376,6 +376,14 @@ static void page_pool_set_pp_info(struct page_pool *pool,
>>>  {
>>>  	page->pp = pool;
>>>  	page->pp_magic |= PP_SIGNATURE;
>>> +
>>> +	/* Ensuring all pages have been split into one big frag initially:
>>> +	 * page_pool_set_pp_info() is only called once for every page when it
>>> +	 * is allocated from the page allocator and page_pool_fragment_page()
>>> +	 * is dirtying the same cache line as the page->pp_magic above, so
>>> +	 * the overhead is negligible.
>>> +	 */
>>> +	page_pool_fragment_page(page, 1);
>>>  	if (pool->p.init_callback)
>>>  		pool->p.init_callback(page, pool->p.init_arg);
>>>  }
>>
>> I think it would be nice backing the above claim with some benchmarks.
>> (possibly even just a micro-benchmark around the relevant APIs)
>> and include such info into the changelog message.
> 
> Sure, will adjust Jesper's below micro-benchmark to test it:
> https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/lib/bench_page_pool_simple.c
> 
> Please let me know if there is other better idea to do the
> micro-benchmark in your mind, thanks.

As I am testing in arm64, and Jesper's micro-benchmark is only
work on x86, so I adjust some functions in time_bench.h, mainly
use get_cycles() instead hard-coded asm in tsc_start_clock()/
tsc_stop_clock(), and return 0 in p_rdpmc()/pmc_clk().

using the below cmd:
taskset -c 0 insmod ./bench_page_pool_simple.ko

For tasklet_page_pool01_fast_path before this patchset:
3 cycles(tsc) 31.171 ns
3 cycles(tsc) 31.171 ns
3 cycles(tsc) 31.172 ns

For tasklet_page_pool01_fast_path after this patchset:
2 cycles(tsc) 27.496 ns
2 cycles(tsc) 27.484 ns
2 cycles(tsc) 27.514 ns

It seem above difference is within the standard deviation,
see more raw performance data in [1] & [2].

I also tested how much time it takes to use frag API to allocate
a whole page for tasklet_page_pool01_fast_path:
7 cycles(tsc) 71.179 ns
7 cycles(tsc) 75.987 ns
7 cycles(tsc) 75.795 ns

It seems make sense to unify frag_count handling so that the driver
can both use frag API and non-frag API with least memory utilization
and performance penalty if the driver doesn't know the size of memory
it need beforehand.

Please let me know if there is any other testing that needed.

1. raw performance data before this patchset

root@(none)$ taskset -c 0 insmod ./bench_page_pool_simple.ko
[   70.650364] bench_page_pool_simple: Loaded
[   70.692958] time_bench: Type:for_loop Per elem: 0 cycles(tsc) 0.384 ns (step:0) - (measurement period time:0.038495410 sec time_interval:38495410) - (invoke count:100000000 tsc_interval:3849532)
[   71.287361] time_bench: Type:atomic_inc Per elem: 0 cycles(tsc) 5.771 ns (step:0) - (measurement period time:0.577120890 sec time_interval:577120890) - (invoke count:100000000 tsc_interval:57712083)
[   71.451202] time_bench: Type:lock Per elem: 1 cycles(tsc) 14.621 ns (step:0) - (measurement period time:0.146210990 sec time_interval:146210990) - (invoke count:10000000 tsc_interval:14621094)
[   71.468329] bench_page_pool_simple: time_bench_page_pool01_fast_path(): Cannot use page_pool fast-path
[   72.050743] time_bench: Type:no-softirq-page_pool01 Per elem: 5 cycles(tsc) 57.310 ns (step:0) - (measurement period time:0.573106430 sec time_interval:573106430) - (invoke count:10000000 tsc_interval:57310638)
[   72.069422] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): Cannot use page_pool fast-path
[   72.648953] time_bench: Type:no-softirq-page_pool02 Per elem: 5 cycles(tsc) 57.031 ns (step:0) - (measurement period time:0.570316730 sec time_interval:570316730) - (invoke count:10000000 tsc_interval:57031667)
[   72.667630] bench_page_pool_simple: time_bench_page_pool03_slow(): Cannot use page_pool fast-path
[   74.362804] time_bench: Type:no-softirq-page_pool03 Per elem: 16 cycles(tsc) 168.631 ns (step:0) - (measurement period time:1.686315810 sec time_interval:1686315810) - (invoke count:10000000 tsc_interval:168631576)
[   74.381828] bench_page_pool_simple: pp_tasklet_handler(): in_serving_softirq fast-path
[   74.389739] bench_page_pool_simple: time_bench_page_pool01_fast_path(): in_serving_softirq fast-path
[   74.710586] time_bench: Type:tasklet_page_pool01_fast_path Per elem: 3 cycles(tsc) 31.172 ns (step:0) - (measurement period time:0.311721410 sec time_interval:311721410) - (invoke count:10000000 tsc_interval:31172132)
[   74.729869] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): in_serving_softirq fast-path
[   75.257671] time_bench: Type:tasklet_page_pool02_ptr_ring Per elem: 5 cycles(tsc) 51.876 ns (step:0) - (measurement period time:0.518763020 sec time_interval:518763020) - (invoke count:10000000 tsc_interval:51876297)
[   75.276867] bench_page_pool_simple: time_bench_page_pool03_slow(): in_serving_softirq fast-path
[   77.005755] time_bench: Type:tasklet_page_pool03_slow Per elem: 17 cycles(tsc) 172.020 ns (step:0) - (measurement period time:1.720203240 sec time_interval:1720203240) - (invoke count:10000000 tsc_interval:172020318)

root@(none)$ taskset -c 0 insmod ./bench_page_pool_simple.ko
[  136.690195] bench_page_pool_simple: Loaded
[  136.732787] time_bench: Type:for_loop Per elem: 0 cycles(tsc) 0.384 ns (step:0) - (measurement period time:0.038494310 sec time_interval:38494310) - (invoke count:100000000 tsc_interval:3849423)
[  137.327204] time_bench: Type:atomic_inc Per elem: 0 cycles(tsc) 5.771 ns (step:0) - (measurement period time:0.577134850 sec time_interval:577134850) - (invoke count:100000000 tsc_interval:57713479)
[  137.491072] time_bench: Type:lock Per elem: 1 cycles(tsc) 14.624 ns (step:0) - (measurement period time:0.146240840 sec time_interval:146240840) - (invoke count:10000000 tsc_interval:14624079)
[  137.508195] bench_page_pool_simple: time_bench_page_pool01_fast_path(): Cannot use page_pool fast-path
[  138.089672] time_bench: Type:no-softirq-page_pool01 Per elem: 5 cycles(tsc) 57.217 ns (step:0) - (measurement period time:0.572173750 sec time_interval:572173750) - (invoke count:10000000 tsc_interval:57217369)
[  138.108348] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): Cannot use page_pool fast-path
[  138.689834] time_bench: Type:no-softirq-page_pool02 Per elem: 5 cycles(tsc) 57.227 ns (step:0) - (measurement period time:0.572271320 sec time_interval:572271320) - (invoke count:10000000 tsc_interval:57227127)
[  138.708511] bench_page_pool_simple: time_bench_page_pool03_slow(): Cannot use page_pool fast-path
[  140.405334] time_bench: Type:no-softirq-page_pool03 Per elem: 16 cycles(tsc) 168.796 ns (step:0) - (measurement period time:1.687964470 sec time_interval:1687964470) - (invoke count:10000000 tsc_interval:168796441)
[  140.428060] bench_page_pool_simple: pp_tasklet_handler(): in_serving_softirq fast-path
[  140.435970] bench_page_pool_simple: time_bench_page_pool01_fast_path(): in_serving_softirq fast-path
[  140.756813] time_bench: Type:tasklet_page_pool01_fast_path Per elem: 3 cycles(tsc) 31.171 ns (step:0) - (measurement period time:0.311719350 sec time_interval:311719350) - (invoke count:10000000 tsc_interval:31171926)
[  140.776096] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): in_serving_softirq fast-path
[  141.300653] time_bench: Type:tasklet_page_pool02_ptr_ring Per elem: 5 cycles(tsc) 51.551 ns (step:0) - (measurement period time:0.515517680 sec time_interval:515517680) - (invoke count:10000000 tsc_interval:51551762)
[  141.319853] bench_page_pool_simple: time_bench_page_pool03_slow(): in_serving_softirq fast-path
[  143.009275] time_bench: Type:tasklet_page_pool03_slow Per elem: 16 cycles(tsc) 168.073 ns (step:0) - (measurement period time:1.680733640 sec time_interval:1680733640) - (invoke count:10000000 tsc_interval:168073359)

root@(none)$ taskset -c 0 insmod ./bench_page_pool_simple.ko
[  174.946152] bench_page_pool_simple: Loaded
[  174.988745] time_bench: Type:for_loop Per elem: 0 cycles(tsc) 0.384 ns (step:0) - (measurement period time:0.038495340 sec time_interval:38495340) - (invoke count:100000000 tsc_interval:3849520)
[  175.583180] time_bench: Type:atomic_inc Per elem: 0 cycles(tsc) 5.771 ns (step:0) - (measurement period time:0.577154420 sec time_interval:577154420) - (invoke count:100000000 tsc_interval:57715437)
[  175.747009] time_bench: Type:lock Per elem: 1 cycles(tsc) 14.620 ns (step:0) - (measurement period time:0.146200740 sec time_interval:146200740) - (invoke count:10000000 tsc_interval:14620070)
[  175.764131] bench_page_pool_simple: time_bench_page_pool01_fast_path(): Cannot use page_pool fast-path
[  176.345767] time_bench: Type:no-softirq-page_pool01 Per elem: 5 cycles(tsc) 57.233 ns (step:0) - (measurement period time:0.572333700 sec time_interval:572333700) - (invoke count:10000000 tsc_interval:57233364)
[  176.364446] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): Cannot use page_pool fast-path
[  176.944547] time_bench: Type:no-softirq-page_pool02 Per elem: 5 cycles(tsc) 57.088 ns (step:0) - (measurement period time:0.570887110 sec time_interval:570887110) - (invoke count:10000000 tsc_interval:57088706)
[  176.963225] bench_page_pool_simple: time_bench_page_pool03_slow(): Cannot use page_pool fast-path
[  178.656473] time_bench: Type:no-softirq-page_pool03 Per elem: 16 cycles(tsc) 168.438 ns (step:0) - (measurement period time:1.684389720 sec time_interval:1684389720) - (invoke count:10000000 tsc_interval:168438965)
[  178.675492] bench_page_pool_simple: pp_tasklet_handler(): in_serving_softirq fast-path
[  178.683405] bench_page_pool_simple: time_bench_page_pool01_fast_path(): in_serving_softirq fast-path
[  179.004242] time_bench: Type:tasklet_page_pool01_fast_path Per elem: 3 cycles(tsc) 31.171 ns (step:0) - (measurement period time:0.311712990 sec time_interval:311712990) - (invoke count:10000000 tsc_interval:31171291)
[  179.023526] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): in_serving_softirq fast-path
[  179.550221] time_bench: Type:tasklet_page_pool02_ptr_ring Per elem: 5 cycles(tsc) 51.765 ns (step:0) - (measurement period time:0.517655870 sec time_interval:517655870) - (invoke count:10000000 tsc_interval:51765580)
[  179.569417] bench_page_pool_simple: time_bench_page_pool03_slow(): in_serving_softirq fast-path
[  181.292780] time_bench: Type:tasklet_page_pool03_slow Per elem: 17 cycles(tsc) 171.467 ns (step:0) - (measurement period time:1.714677840 sec time_interval:1714677840) - (invoke count:10000000 tsc_interval:171467778)

2. raw performance data after this patchset

root@(none)$ taskset -c 0 insmod ./bench_page_pool_simple.ko
[   92.210702] bench_page_pool_simple: Loaded
[   92.253767] time_bench: Type:for_loop Per elem: 0 cycles(tsc) 0.389 ns (step:0) - (measurement period time:0.038968350 sec time_interval:38968350) - (invoke count:100000000 tsc_interval:3896825)
[   92.848206] time_bench: Type:atomic_inc Per elem: 0 cycles(tsc) 5.771 ns (step:0) - (measurement period time:0.577158660 sec time_interval:577158660) - (invoke count:100000000 tsc_interval:57715860)
[   93.015899] time_bench: Type:lock Per elem: 1 cycles(tsc) 15.006 ns (step:0) - (measurement period time:0.150063860 sec time_interval:150063860) - (invoke count:10000000 tsc_interval:15006381)
[   93.033022] bench_page_pool_simple: time_bench_page_pool01_fast_path(): Cannot use page_pool fast-path
[   93.596905] time_bench: Type:no-softirq-page_pool01 Per elem: 5 cycles(tsc) 55.458 ns (step:0) - (measurement period time:0.554580560 sec time_interval:554580560) - (invoke count:10000000 tsc_interval:55458050)
[   93.615583] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): Cannot use page_pool fast-path
[   94.207234] time_bench: Type:no-softirq-page_pool02 Per elem: 5 cycles(tsc) 58.243 ns (step:0) - (measurement period time:0.582437650 sec time_interval:582437650) - (invoke count:10000000 tsc_interval:58243758)
[   94.225912] bench_page_pool_simple: time_bench_page_pool03_slow(): Cannot use page_pool fast-path
[   95.860315] time_bench: Type:no-softirq-page_pool03 Per elem: 16 cycles(tsc) 162.554 ns (step:0) - (measurement period time:1.625544130 sec time_interval:1625544130) - (invoke count:10000000 tsc_interval:162554407)
[   95.879337] bench_page_pool_simple: pp_tasklet_handler(): in_serving_softirq fast-path
[   95.887249] bench_page_pool_simple: time_bench_page_pool01_fast_path(): in_serving_softirq fast-path
[   96.171343] time_bench: Type:tasklet_page_pool01_fast_path Per elem: 2 cycles(tsc) 27.496 ns (step:0) - (measurement period time:0.274969220 sec time_interval:274969220) - (invoke count:10000000 tsc_interval:27496914)
[   96.190627] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): in_serving_softirq fast-path
[   96.713599] time_bench: Type:tasklet_page_pool02_ptr_ring Per elem: 5 cycles(tsc) 51.393 ns (step:0) - (measurement period time:0.513932220 sec time_interval:513932220) - (invoke count:10000000 tsc_interval:51393217)
[   96.732796] bench_page_pool_simple: time_bench_page_pool03_slow(): in_serving_softirq fast-path
[   98.456894] time_bench: Type:tasklet_page_pool03_slow Per elem: 17 cycles(tsc) 171.541 ns (step:0) - (measurement period time:1.715412930 sec time_interval:1715412930) - (invoke count:10000000 tsc_interval:171541286)

root@(none)$ taskset -c 0 insmod ./bench_page_pool_simple.ko
[  163.266630] bench_page_pool_simple: Loaded
[  163.309219] time_bench: Type:for_loop Per elem: 0 cycles(tsc) 0.384 ns (step:0) - (measurement period time:0.038494410 sec time_interval:38494410) - (invoke count:100000000 tsc_interval:3849426)
[  163.903651] time_bench: Type:atomic_inc Per elem: 0 cycles(tsc) 5.771 ns (step:0) - (measurement period time:0.577150320 sec time_interval:577150320) - (invoke count:100000000 tsc_interval:57715025)
[  164.071342] time_bench: Type:lock Per elem: 1 cycles(tsc) 15.006 ns (step:0) - (measurement period time:0.150062900 sec time_interval:150062900) - (invoke count:10000000 tsc_interval:15006285)
[  164.088461] bench_page_pool_simple: time_bench_page_pool01_fast_path(): Cannot use page_pool fast-path
[  164.652649] time_bench: Type:no-softirq-page_pool01 Per elem: 5 cycles(tsc) 55.488 ns (step:0) - (measurement period time:0.554886720 sec time_interval:554886720) - (invoke count:10000000 tsc_interval:55488665)
[  164.671327] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): Cannot use page_pool fast-path
[  165.263541] time_bench: Type:no-softirq-page_pool02 Per elem: 5 cycles(tsc) 58.299 ns (step:0) - (measurement period time:0.582999800 sec time_interval:582999800) - (invoke count:10000000 tsc_interval:58299973)
[  165.282218] bench_page_pool_simple: time_bench_page_pool03_slow(): Cannot use page_pool fast-path
[  166.917289] time_bench: Type:no-softirq-page_pool03 Per elem: 16 cycles(tsc) 162.621 ns (step:0) - (measurement period time:1.626211550 sec time_interval:1626211550) - (invoke count:10000000 tsc_interval:162621149)
[  166.936545] bench_page_pool_simple: pp_tasklet_handler(): in_serving_softirq fast-path
[  166.944456] bench_page_pool_simple: time_bench_page_pool01_fast_path(): in_serving_softirq fast-path
[  167.228428] time_bench: Type:tasklet_page_pool01_fast_path Per elem: 2 cycles(tsc) 27.484 ns (step:0) - (measurement period time:0.274847160 sec time_interval:274847160) - (invoke count:10000000 tsc_interval:27484709)
[  167.247711] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): in_serving_softirq fast-path
[  167.771686] time_bench: Type:tasklet_page_pool02_ptr_ring Per elem: 5 cycles(tsc) 51.493 ns (step:0) - (measurement period time:0.514934110 sec time_interval:514934110) - (invoke count:10000000 tsc_interval:51493406)
[  167.790883] bench_page_pool_simple: time_bench_page_pool03_slow(): in_serving_softirq fast-path
[  169.517320] time_bench: Type:tasklet_page_pool03_slow Per elem: 17 cycles(tsc) 171.775 ns (step:0) - (measurement period time:1.717751440 sec time_interval:1717751440) - (invoke count:10000000 tsc_interval:171775139)

taskset -c 0 insmod ./bench_page_pool_simple.ko
[  203.494623] bench_page_pool_simple: Loaded
[  203.537239] time_bench: Type:for_loop Per elem: 0 cycles(tsc) 0.384 ns (step:0) - (measurement period time:0.038493730 sec time_interval:38493730) - (invoke count:100000000 tsc_interval:3849355)
[  204.131650] time_bench: Type:atomic_inc Per elem: 0 cycles(tsc) 5.771 ns (step:0) - (measurement period time:0.577129200 sec time_interval:577129200) - (invoke count:100000000 tsc_interval:57712915)
[  204.299329] time_bench: Type:lock Per elem: 1 cycles(tsc) 15.005 ns (step:0) - (measurement period time:0.150052320 sec time_interval:150052320) - (invoke count:10000000 tsc_interval:15005228)
[  204.316447] bench_page_pool_simple: time_bench_page_pool01_fast_path(): Cannot use page_pool fast-path
[  204.843709] time_bench: Type:no-softirq-page_pool01 Per elem: 5 cycles(tsc) 51.796 ns (step:0) - (measurement period time:0.517960340 sec time_interval:517960340) - (invoke count:10000000 tsc_interval:51796028)
[  204.862388] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): Cannot use page_pool fast-path
[  205.456260] time_bench: Type:no-softirq-page_pool02 Per elem: 5 cycles(tsc) 58.465 ns (step:0) - (measurement period time:0.584658150 sec time_interval:584658150) - (invoke count:10000000 tsc_interval:58465807)
[  205.474940] bench_page_pool_simple: time_bench_page_pool03_slow(): Cannot use page_pool fast-path
[  207.109889] time_bench: Type:no-softirq-page_pool03 Per elem: 16 cycles(tsc) 162.609 ns (step:0) - (measurement period time:1.626090490 sec time_interval:1626090490) - (invoke count:10000000 tsc_interval:162609042)
[  207.132545] bench_page_pool_simple: pp_tasklet_handler(): in_serving_softirq fast-path
[  207.140457] bench_page_pool_simple: time_bench_page_pool01_fast_path(): in_serving_softirq fast-path
[  207.424723] time_bench: Type:tasklet_page_pool01_fast_path Per elem: 2 cycles(tsc) 27.514 ns (step:0) - (measurement period time:0.275141610 sec time_interval:275141610) - (invoke count:10000000 tsc_interval:27514155)
[  207.444006] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): in_serving_softirq fast-path
[  207.967963] time_bench: Type:tasklet_page_pool02_ptr_ring Per elem: 5 cycles(tsc) 51.491 ns (step:0) - (measurement period time:0.514916640 sec time_interval:514916640) - (invoke count:10000000 tsc_interval:51491659)
[  207.987160] bench_page_pool_simple: time_bench_page_pool03_slow(): in_serving_softirq fast-path
[  209.712648] time_bench: Type:tasklet_page_pool03_slow Per elem: 17 cycles(tsc) 171.680 ns (step:0) - (measurement period time:1.716802930 sec time_interval:1716802930) - (invoke count:10000000 tsc_interval:171680285)

3. raw performance data for frag API after this patchset
root@(none)$ taskset -c 0 insmod ./bench_page_pool_frag.ko
[ 5981.070839] bench_page_pool_frag: Loaded
[ 5981.113253] time_bench: Type:for_loop Per elem: 0 cycles(tsc) 0.384 ns (step:0) - (measurement period time:0.038492620 sec time_interval:38492620) - (invoke count:100000000 tsc_interval:3849248)
[ 5981.707686] time_bench: Type:atomic_inc Per elem: 0 cycles(tsc) 5.771 ns (step:0) - (measurement period time:0.577150800 sec time_interval:577150800) - (invoke count:100000000 tsc_interval:57715075)
[ 5981.875360] time_bench: Type:lock Per elem: 1 cycles(tsc) 15.004 ns (step:0) - (measurement period time:0.150047290 sec time_interval:150047290) - (invoke count:10000000 tsc_interval:15004725)
[ 5981.892479] bench_page_pool_frag: time_bench_page_pool01_fast_path(): Cannot use page_pool fast-path
[ 5983.030870] time_bench: Type:no-softirq-page_pool01 Per elem: 11 cycles(tsc) 112.926 ns (step:0) - (measurement period time:1.129261630 sec time_interval:1129261630) - (invoke count:10000000 tsc_interval:112926158)
[ 5983.049896] bench_page_pool_frag: time_bench_page_pool02_ptr_ring(): Cannot use page_pool fast-path
[ 5984.085723] time_bench: Type:no-softirq-page_pool02 Per elem: 10 cycles(tsc) 102.678 ns (step:0) - (measurement period time:1.026786610 sec time_interval:1026786610) - (invoke count:10000000 tsc_interval:102678656)
[ 5984.104749] bench_page_pool_frag: time_bench_page_pool03_slow(): Cannot use page_pool fast-path
[ 5986.210648] time_bench: Type:no-softirq-page_pool03 Per elem: 20 cycles(tsc) 209.721 ns (step:0) - (measurement period time:2.097213570 sec time_interval:2097213570) - (invoke count:10000000 tsc_interval:209721350)
[ 5986.229668] bench_page_pool_frag: pp_tasklet_handler(): in_serving_softirq fast-path
[ 5986.237406] bench_page_pool_frag: time_bench_page_pool01_fast_path(): in_serving_softirq fast-path
[ 5986.958155] time_bench: Type:tasklet_page_pool01_fast_path Per elem: 7 cycles(tsc) 71.179 ns (step:0) - (measurement period time:0.711798570 sec time_interval:711798570) - (invoke count:10000000 tsc_interval:71179850)
[ 5986.977439] bench_page_pool_frag: time_bench_page_pool02_ptr_ring(): in_serving_softirq fast-path
[ 5987.947411] time_bench: Type:tasklet_page_pool02_ptr_ring Per elem: 9 cycles(tsc) 96.110 ns (step:0) - (measurement period time:0.961105800 sec time_interval:961105800) - (invoke count:10000000 tsc_interval:96110574)
[ 5987.966608] bench_page_pool_frag: time_bench_page_pool03_slow(): in_serving_softirq fast-path
[ 5990.018801] time_bench: Type:tasklet_page_pool03_slow Per elem: 20 cycles(tsc) 204.368 ns (step:0) - (measurement period time:2.043681900 sec time_interval:2043681900) - (invoke count:10000000 tsc_interval:204368185)

root@(none)$ taskset -c 0 insmod ./bench_page_pool_frag.ko
[10101.778903] bench_page_pool_frag: Loaded
[10101.821317] time_bench: Type:for_loop Per elem: 0 cycles(tsc) 0.384 ns (step:0) - (measurement period time:0.038491670 sec time_interval:38491670) - (invoke count:100000000 tsc_interval:3849157)
[10102.415720] time_bench: Type:atomic_inc Per elem: 0 cycles(tsc) 5.771 ns (step:0) - (measurement period time:0.577120320 sec time_interval:577120320) - (invoke count:100000000 tsc_interval:57712027)
[10102.583401] time_bench: Type:lock Per elem: 1 cycles(tsc) 15.005 ns (step:0) - (measurement period time:0.150052570 sec time_interval:150052570) - (invoke count:10000000 tsc_interval:15005251)
[10102.600521] bench_page_pool_frag: time_bench_page_pool01_fast_path(): Cannot use page_pool fast-path
[10103.780027] time_bench: Type:no-softirq-page_pool01 Per elem: 11 cycles(tsc) 117.037 ns (step:0) - (measurement period time:1.170377390 sec time_interval:1170377390) - (invoke count:10000000 tsc_interval:117037734)
[10103.799052] bench_page_pool_frag: time_bench_page_pool02_ptr_ring(): Cannot use page_pool fast-path
[10104.879716] time_bench: Type:no-softirq-page_pool02 Per elem: 10 cycles(tsc) 107.162 ns (step:0) - (measurement period time:1.071623350 sec time_interval:1071623350) - (invoke count:10000000 tsc_interval:107162329)
[10104.898741] bench_page_pool_frag: time_bench_page_pool03_slow(): Cannot use page_pool fast-path
[10107.050254] time_bench: Type:no-softirq-page_pool03 Per elem: 21 cycles(tsc) 214.282 ns (step:0) - (measurement period time:2.142827400 sec time_interval:2142827400) - (invoke count:10000000 tsc_interval:214282734)
[10107.072799] bench_page_pool_frag: pp_tasklet_handler(): in_serving_softirq fast-path
[10107.080540] bench_page_pool_frag: time_bench_page_pool01_fast_path(): in_serving_softirq fast-path
[10107.849370] time_bench: Type:tasklet_page_pool01_fast_path Per elem: 7 cycles(tsc) 75.987 ns (step:0) - (measurement period time:0.759878270 sec time_interval:759878270) - (invoke count:10000000 tsc_interval:75987820)
[10107.868653] bench_page_pool_frag: time_bench_page_pool02_ptr_ring(): in_serving_softirq fast-path
[10108.884336] time_bench: Type:tasklet_page_pool02_ptr_ring Per elem: 10 cycles(tsc) 100.681 ns (step:0) - (measurement period time:1.006815920 sec time_interval:1006815920) - (invoke count:10000000 tsc_interval:100681587)
[10108.903880] bench_page_pool_frag: time_bench_page_pool03_slow(): in_serving_softirq fast-path
[10111.002467] time_bench: Type:tasklet_page_pool03_slow Per elem: 20 cycles(tsc) 209.007 ns (step:0) - (measurement period time:2.090076740 sec time_interval:2090076740) - (invoke count:10000000 tsc_interval:209007670)

root@(none)$ taskset -c 0 insmod ./bench_page_pool_frag.ko
[10135.250815] bench_page_pool_frag: Loaded
[10135.293228] time_bench: Type:for_loop Per elem: 0 cycles(tsc) 0.384 ns (step:0) - (measurement period time:0.038491560 sec time_interval:38491560) - (invoke count:100000000 tsc_interval:3849145)
[10135.887668] time_bench: Type:atomic_inc Per elem: 0 cycles(tsc) 5.771 ns (step:0) - (measurement period time:0.577159960 sec time_interval:577159960) - (invoke count:100000000 tsc_interval:57715991)
[10136.055343] time_bench: Type:lock Per elem: 1 cycles(tsc) 15.004 ns (step:0) - (measurement period time:0.150047050 sec time_interval:150047050) - (invoke count:10000000 tsc_interval:15004701)
[10136.072462] bench_page_pool_frag: time_bench_page_pool01_fast_path(): Cannot use page_pool fast-path
[10137.255828] time_bench: Type:no-softirq-page_pool01 Per elem: 11 cycles(tsc) 117.423 ns (step:0) - (measurement period time:1.174238030 sec time_interval:1174238030) - (invoke count:10000000 tsc_interval:117423797)
[10137.274854] bench_page_pool_frag: time_bench_page_pool02_ptr_ring(): Cannot use page_pool fast-path
[10138.356896] time_bench: Type:no-softirq-page_pool02 Per elem: 10 cycles(tsc) 107.300 ns (step:0) - (measurement period time:1.073001550 sec time_interval:1073001550) - (invoke count:10000000 tsc_interval:107300149)
[10138.375920] bench_page_pool_frag: time_bench_page_pool03_slow(): Cannot use page_pool fast-path
[10140.534446] time_bench: Type:no-softirq-page_pool03 Per elem: 21 cycles(tsc) 214.984 ns (step:0) - (measurement period time:2.149840170 sec time_interval:2149840170) - (invoke count:10000000 tsc_interval:214984011)
[10140.553464] bench_page_pool_frag: pp_tasklet_handler(): in_serving_softirq fast-path
[10140.561202] bench_page_pool_frag: time_bench_page_pool01_fast_path(): in_serving_softirq fast-path
[10141.328108] time_bench: Type:tasklet_page_pool01_fast_path Per elem: 7 cycles(tsc) 75.795 ns (step:0) - (measurement period time:0.757955870 sec time_interval:757955870) - (invoke count:10000000 tsc_interval:75795580)
[10141.347392] bench_page_pool_frag: time_bench_page_pool02_ptr_ring(): in_serving_softirq fast-path
[10142.362664] time_bench: Type:tasklet_page_pool02_ptr_ring Per elem: 10 cycles(tsc) 100.640 ns (step:0) - (measurement period time:1.006406390 sec time_interval:1006406390) - (invoke count:10000000 tsc_interval:100640634)
[10142.382208] bench_page_pool_frag: time_bench_page_pool03_slow(): in_serving_softirq fast-path
[10144.479816] time_bench: Type:tasklet_page_pool03_slow Per elem: 20 cycles(tsc) 208.909 ns (step:0) - (measurement period time:2.089097170 sec time_interval:2089097170) - (invoke count:10000000 tsc_interval:208909713

> 
>>
>> Cheers,
>>
>> Paolo
>>
>> .
>>
> 
> 
> .
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next v8 1/6] page_pool: frag API support for 32-bit arch with 64-bit DMA
  2023-09-15  8:28   ` Jesper Dangaard Brouer
@ 2023-09-20  8:59     ` Yunsheng Lin
  2023-09-25 10:37       ` Ilias Apalodimas
  0 siblings, 1 reply; 14+ messages in thread
From: Yunsheng Lin @ 2023-09-20  8:59 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, davem, kuba, pabeni
  Cc: brouer, netdev, linux-kernel, Lorenzo Bianconi, Alexander Duyck,
	Liang Chen, Alexander Lobakin, Guillaume Tucker,
	Jesper Dangaard Brouer, Ilias Apalodimas, Eric Dumazet, Linux-MM,
	Matthew Wilcox, Mel Gorman

On 2023/9/15 16:28, Jesper Dangaard Brouer wrote:
> Hi Lin,
> 
> This looks reasonable, but given you are changing struct-page
> (include/linux/mm_types.h) we need to MM-list <linux-mm@kvack.org>.
> Also Cc Wilcox.
> 
> I think it was Ilias and Duyck that validated the assumptions, last time
> this patch was discussed. Thus I want to see their review before this is
> applied.

FWIW, PAGE_SIZE aligned buffer being PAGE_SIZE aligned in DMA is
validated by Duyck:
https://lore.kernel.org/all/CAKgT0UfeUAUQpEffxnkc+gzXsjOrHkuMgxU_Aw0VXSJYKzaovQ@mail.gmail.com/

And I had done researching to find out there seems to be no combination of
the above arch with an address space >16TB:
https://lore.kernel.org/all/2b570282-24f8-f23b-1ff7-ad836794baa9@huawei.com/

> 
> -Jesper
> 
> On 12/09/2023 10.31, Yunsheng Lin wrote:
>> Currently page_pool_alloc_frag() is not supported in 32-bit
>> arch with 64-bit DMA because of the overlap issue between
>> pp_frag_count and dma_addr_upper in 'struct page' for those
>> arches, which seems to be quite common, see [1], which means
>> driver may need to handle it when using frag API.
>>
>> It is assumed that the combination of the above arch with an
>> address space >16TB does not exist, as all those arches have
>> 64b equivalent, it seems logical to use the 64b version for a
>> system with a large address space. It is also assumed that dma
>> address is page aligned when we are dma mapping a page aliged
>> buffer, see [2].
>>
>> That means we're storing 12 bits of 0 at the lower end for a
>> dma address, we can reuse those bits for the above arches to
>> support 32b+12b, which is 16TB of memory.
>>
>> If we make a wrong assumption, a warning is emitted so that
>> user can report to us.
>>
>> 1. https://lore.kernel.org/all/20211117075652.58299-1-linyunsheng@huawei.com/
>> 2. https://lore.kernel.org/all/20230818145145.4b357c89@kernel.org/
>>
>> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
>> Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
>> CC: Lorenzo Bianconi <lorenzo@kernel.org>
>> CC: Alexander Duyck <alexander.duyck@gmail.com>
>> CC: Liang Chen <liangchen.linux@gmail.com>
>> CC: Alexander Lobakin <aleksander.lobakin@intel.com>
>> CC: Guillaume Tucker <guillaume.tucker@collabora.com>
>> ---
>>   include/linux/mm_types.h        | 13 +------------
>>   include/net/page_pool/helpers.h | 20 ++++++++++++++------
>>   net/core/page_pool.c            | 14 +++++++++-----
>>   3 files changed, 24 insertions(+), 23 deletions(-)
>>
>> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
>> index 36c5b43999e6..74b49c4c7a52 100644
>> --- a/include/linux/mm_types.h
>> +++ b/include/linux/mm_types.h
>> @@ -125,18 +125,7 @@ struct page {
>>               struct page_pool *pp;
>>               unsigned long _pp_mapping_pad;
>>               unsigned long dma_addr;
>> -            union {
>> -                /**
>> -                 * dma_addr_upper: might require a 64-bit
>> -                 * value on 32-bit architectures.
>> -                 */
>> -                unsigned long dma_addr_upper;
>> -                /**
>> -                 * For frag page support, not supported in
>> -                 * 32-bit architectures with 64-bit DMA.
>> -                 */
>> -                atomic_long_t pp_frag_count;
>> -            };
>> +            atomic_long_t pp_frag_count;
>>           };
>>           struct {    /* Tail pages of compound page */
>>               unsigned long compound_head;    /* Bit zero is set */
>> diff --git a/include/net/page_pool/helpers.h b/include/net/page_pool/helpers.h
>> index 94231533a369..8e1c85de4995 100644
>> --- a/include/net/page_pool/helpers.h
>> +++ b/include/net/page_pool/helpers.h
>> @@ -197,7 +197,7 @@ static inline void page_pool_recycle_direct(struct page_pool *pool,
>>       page_pool_put_full_page(pool, page, true);
>>   }
>>   -#define PAGE_POOL_DMA_USE_PP_FRAG_COUNT    \
>> +#define PAGE_POOL_32BIT_ARCH_WITH_64BIT_DMA    \
>>           (sizeof(dma_addr_t) > sizeof(unsigned long))
>>     /**
>> @@ -211,17 +211,25 @@ static inline dma_addr_t page_pool_get_dma_addr(struct page *page)
>>   {
>>       dma_addr_t ret = page->dma_addr;
>>   -    if (PAGE_POOL_DMA_USE_PP_FRAG_COUNT)
>> -        ret |= (dma_addr_t)page->dma_addr_upper << 16 << 16;
>> +    if (PAGE_POOL_32BIT_ARCH_WITH_64BIT_DMA)
>> +        ret <<= PAGE_SHIFT;
>>         return ret;
>>   }
>>   -static inline void page_pool_set_dma_addr(struct page *page, dma_addr_t addr)
>> +static inline bool page_pool_set_dma_addr(struct page *page, dma_addr_t addr)
>>   {
>> +    if (PAGE_POOL_32BIT_ARCH_WITH_64BIT_DMA) {
>> +        page->dma_addr = addr >> PAGE_SHIFT;
>> +
>> +        /* We assume page alignment to shave off bottom bits,
>> +         * if this "compression" doesn't work we need to drop.
>> +         */
>> +        return addr != (dma_addr_t)page->dma_addr << PAGE_SHIFT;
>> +    }
>> +
>>       page->dma_addr = addr;
>> -    if (PAGE_POOL_DMA_USE_PP_FRAG_COUNT)
>> -        page->dma_addr_upper = upper_32_bits(addr);
>> +    return false;
>>   }
>>     static inline bool page_pool_put(struct page_pool *pool)
>> diff --git a/net/core/page_pool.c b/net/core/page_pool.c
>> index 77cb75e63aca..8a9868ea5067 100644
>> --- a/net/core/page_pool.c
>> +++ b/net/core/page_pool.c
>> @@ -211,10 +211,6 @@ static int page_pool_init(struct page_pool *pool,
>>            */
>>       }
>>   -    if (PAGE_POOL_DMA_USE_PP_FRAG_COUNT &&
>> -        pool->p.flags & PP_FLAG_PAGE_FRAG)
>> -        return -EINVAL;
>> -
>>   #ifdef CONFIG_PAGE_POOL_STATS
>>       pool->recycle_stats = alloc_percpu(struct page_pool_recycle_stats);
>>       if (!pool->recycle_stats)
>> @@ -359,12 +355,20 @@ static bool page_pool_dma_map(struct page_pool *pool, struct page *page)
>>       if (dma_mapping_error(pool->p.dev, dma))
>>           return false;
>>   -    page_pool_set_dma_addr(page, dma);
>> +    if (page_pool_set_dma_addr(page, dma))
>> +        goto unmap_failed;
>>         if (pool->p.flags & PP_FLAG_DMA_SYNC_DEV)
>>           page_pool_dma_sync_for_device(pool, page, pool->p.max_len);
>>         return true;
>> +
>> +unmap_failed:
>> +    WARN_ON_ONCE("unexpected DMA address, please report to netdev@");
>> +    dma_unmap_page_attrs(pool->p.dev, dma,
>> +                 PAGE_SIZE << pool->p.order, pool->p.dma_dir,
>> +                 DMA_ATTR_SKIP_CPU_SYNC | DMA_ATTR_WEAK_ORDERING);
>> +    return false;
>>   }
>>     static void page_pool_set_pp_info(struct page_pool *pool,
> 
> .
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next v8 1/6] page_pool: frag API support for 32-bit arch with 64-bit DMA
  2023-09-20  8:59     ` Yunsheng Lin
@ 2023-09-25 10:37       ` Ilias Apalodimas
  2023-09-25 11:31         ` Yunsheng Lin
  0 siblings, 1 reply; 14+ messages in thread
From: Ilias Apalodimas @ 2023-09-25 10:37 UTC (permalink / raw)
  To: Yunsheng Lin
  Cc: Jesper Dangaard Brouer, davem, kuba, pabeni, brouer, netdev,
	linux-kernel, Lorenzo Bianconi, Alexander Duyck, Liang Chen,
	Alexander Lobakin, Guillaume Tucker, Jesper Dangaard Brouer,
	Eric Dumazet, Linux-MM, Matthew Wilcox, Mel Gorman

Hi

On Wed, 20 Sept 2023 at 11:59, Yunsheng Lin <linyunsheng@huawei.com> wrote:
>
> On 2023/9/15 16:28, Jesper Dangaard Brouer wrote:
> > Hi Lin,
> >
> > This looks reasonable, but given you are changing struct-page
> > (include/linux/mm_types.h) we need to MM-list <linux-mm@kvack.org>.
> > Also Cc Wilcox.
> >
> > I think it was Ilias and Duyck that validated the assumptions, last time
> > this patch was discussed. Thus I want to see their review before this is
> > applied.
>
> FWIW, PAGE_SIZE aligned buffer being PAGE_SIZE aligned in DMA is
> validated by Duyck:
> https://lore.kernel.org/all/CAKgT0UfeUAUQpEffxnkc+gzXsjOrHkuMgxU_Aw0VXSJYKzaovQ@mail.gmail.com/
>
> And I had done researching to find out there seems to be no combination of
> the above arch with an address space >16TB:
> https://lore.kernel.org/all/2b570282-24f8-f23b-1ff7-ad836794baa9@huawei.com/

Apologies for the late reply.  I just saw you sent a v9, I'll review
that instead, but I am traveling right now, will take a while

Thanks
/Ilias
>
> >
> > -Jesper
> >
> > On 12/09/2023 10.31, Yunsheng Lin wrote:
> >> Currently page_pool_alloc_frag() is not supported in 32-bit
> >> arch with 64-bit DMA because of the overlap issue between
> >> pp_frag_count and dma_addr_upper in 'struct page' for those
> >> arches, which seems to be quite common, see [1], which means
> >> driver may need to handle it when using frag API.
> >>
> >> It is assumed that the combination of the above arch with an
> >> address space >16TB does not exist, as all those arches have
> >> 64b equivalent, it seems logical to use the 64b version for a
> >> system with a large address space. It is also assumed that dma
> >> address is page aligned when we are dma mapping a page aliged
> >> buffer, see [2].
> >>
> >> That means we're storing 12 bits of 0 at the lower end for a
> >> dma address, we can reuse those bits for the above arches to
> >> support 32b+12b, which is 16TB of memory.
> >>
> >> If we make a wrong assumption, a warning is emitted so that
> >> user can report to us.
> >>
> >> 1. https://lore.kernel.org/all/20211117075652.58299-1-linyunsheng@huawei.com/
> >> 2. https://lore.kernel.org/all/20230818145145.4b357c89@kernel.org/
> >>
> >> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> >> Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
> >> CC: Lorenzo Bianconi <lorenzo@kernel.org>
> >> CC: Alexander Duyck <alexander.duyck@gmail.com>
> >> CC: Liang Chen <liangchen.linux@gmail.com>
> >> CC: Alexander Lobakin <aleksander.lobakin@intel.com>
> >> CC: Guillaume Tucker <guillaume.tucker@collabora.com>
> >> ---
> >>   include/linux/mm_types.h        | 13 +------------
> >>   include/net/page_pool/helpers.h | 20 ++++++++++++++------
> >>   net/core/page_pool.c            | 14 +++++++++-----
> >>   3 files changed, 24 insertions(+), 23 deletions(-)
> >>
> >> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> >> index 36c5b43999e6..74b49c4c7a52 100644
> >> --- a/include/linux/mm_types.h
> >> +++ b/include/linux/mm_types.h
> >> @@ -125,18 +125,7 @@ struct page {
> >>               struct page_pool *pp;
> >>               unsigned long _pp_mapping_pad;
> >>               unsigned long dma_addr;
> >> -            union {
> >> -                /**
> >> -                 * dma_addr_upper: might require a 64-bit
> >> -                 * value on 32-bit architectures.
> >> -                 */
> >> -                unsigned long dma_addr_upper;
> >> -                /**
> >> -                 * For frag page support, not supported in
> >> -                 * 32-bit architectures with 64-bit DMA.
> >> -                 */
> >> -                atomic_long_t pp_frag_count;
> >> -            };
> >> +            atomic_long_t pp_frag_count;
> >>           };
> >>           struct {    /* Tail pages of compound page */
> >>               unsigned long compound_head;    /* Bit zero is set */
> >> diff --git a/include/net/page_pool/helpers.h b/include/net/page_pool/helpers.h
> >> index 94231533a369..8e1c85de4995 100644
> >> --- a/include/net/page_pool/helpers.h
> >> +++ b/include/net/page_pool/helpers.h
> >> @@ -197,7 +197,7 @@ static inline void page_pool_recycle_direct(struct page_pool *pool,
> >>       page_pool_put_full_page(pool, page, true);
> >>   }
> >>   -#define PAGE_POOL_DMA_USE_PP_FRAG_COUNT    \
> >> +#define PAGE_POOL_32BIT_ARCH_WITH_64BIT_DMA    \
> >>           (sizeof(dma_addr_t) > sizeof(unsigned long))
> >>     /**
> >> @@ -211,17 +211,25 @@ static inline dma_addr_t page_pool_get_dma_addr(struct page *page)
> >>   {
> >>       dma_addr_t ret = page->dma_addr;
> >>   -    if (PAGE_POOL_DMA_USE_PP_FRAG_COUNT)
> >> -        ret |= (dma_addr_t)page->dma_addr_upper << 16 << 16;
> >> +    if (PAGE_POOL_32BIT_ARCH_WITH_64BIT_DMA)
> >> +        ret <<= PAGE_SHIFT;
> >>         return ret;
> >>   }
> >>   -static inline void page_pool_set_dma_addr(struct page *page, dma_addr_t addr)
> >> +static inline bool page_pool_set_dma_addr(struct page *page, dma_addr_t addr)
> >>   {
> >> +    if (PAGE_POOL_32BIT_ARCH_WITH_64BIT_DMA) {
> >> +        page->dma_addr = addr >> PAGE_SHIFT;
> >> +
> >> +        /* We assume page alignment to shave off bottom bits,
> >> +         * if this "compression" doesn't work we need to drop.
> >> +         */
> >> +        return addr != (dma_addr_t)page->dma_addr << PAGE_SHIFT;
> >> +    }
> >> +
> >>       page->dma_addr = addr;
> >> -    if (PAGE_POOL_DMA_USE_PP_FRAG_COUNT)
> >> -        page->dma_addr_upper = upper_32_bits(addr);
> >> +    return false;
> >>   }
> >>     static inline bool page_pool_put(struct page_pool *pool)
> >> diff --git a/net/core/page_pool.c b/net/core/page_pool.c
> >> index 77cb75e63aca..8a9868ea5067 100644
> >> --- a/net/core/page_pool.c
> >> +++ b/net/core/page_pool.c
> >> @@ -211,10 +211,6 @@ static int page_pool_init(struct page_pool *pool,
> >>            */
> >>       }
> >>   -    if (PAGE_POOL_DMA_USE_PP_FRAG_COUNT &&
> >> -        pool->p.flags & PP_FLAG_PAGE_FRAG)
> >> -        return -EINVAL;
> >> -
> >>   #ifdef CONFIG_PAGE_POOL_STATS
> >>       pool->recycle_stats = alloc_percpu(struct page_pool_recycle_stats);
> >>       if (!pool->recycle_stats)
> >> @@ -359,12 +355,20 @@ static bool page_pool_dma_map(struct page_pool *pool, struct page *page)
> >>       if (dma_mapping_error(pool->p.dev, dma))
> >>           return false;
> >>   -    page_pool_set_dma_addr(page, dma);
> >> +    if (page_pool_set_dma_addr(page, dma))
> >> +        goto unmap_failed;
> >>         if (pool->p.flags & PP_FLAG_DMA_SYNC_DEV)
> >>           page_pool_dma_sync_for_device(pool, page, pool->p.max_len);
> >>         return true;
> >> +
> >> +unmap_failed:
> >> +    WARN_ON_ONCE("unexpected DMA address, please report to netdev@");
> >> +    dma_unmap_page_attrs(pool->p.dev, dma,
> >> +                 PAGE_SIZE << pool->p.order, pool->p.dma_dir,
> >> +                 DMA_ATTR_SKIP_CPU_SYNC | DMA_ATTR_WEAK_ORDERING);
> >> +    return false;
> >>   }
> >>     static void page_pool_set_pp_info(struct page_pool *pool,
> >
> > .
> >

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next v8 1/6] page_pool: frag API support for 32-bit arch with 64-bit DMA
  2023-09-25 10:37       ` Ilias Apalodimas
@ 2023-09-25 11:31         ` Yunsheng Lin
  0 siblings, 0 replies; 14+ messages in thread
From: Yunsheng Lin @ 2023-09-25 11:31 UTC (permalink / raw)
  To: Ilias Apalodimas
  Cc: Jesper Dangaard Brouer, davem, kuba, pabeni, brouer, netdev,
	linux-kernel, Lorenzo Bianconi, Alexander Duyck, Liang Chen,
	Alexander Lobakin, Guillaume Tucker, Jesper Dangaard Brouer,
	Eric Dumazet, Linux-MM, Matthew Wilcox, Mel Gorman

On 2023/9/25 18:37, Ilias Apalodimas wrote:
> Hi
> 
> On Wed, 20 Sept 2023 at 11:59, Yunsheng Lin <linyunsheng@huawei.com> wrote:
>>
>> On 2023/9/15 16:28, Jesper Dangaard Brouer wrote:
>>> Hi Lin,
>>>
>>> This looks reasonable, but given you are changing struct-page
>>> (include/linux/mm_types.h) we need to MM-list <linux-mm@kvack.org>.
>>> Also Cc Wilcox.
>>>
>>> I think it was Ilias and Duyck that validated the assumptions, last time
>>> this patch was discussed. Thus I want to see their review before this is
>>> applied.
>>
>> FWIW, PAGE_SIZE aligned buffer being PAGE_SIZE aligned in DMA is
>> validated by Duyck:
>> https://lore.kernel.org/all/CAKgT0UfeUAUQpEffxnkc+gzXsjOrHkuMgxU_Aw0VXSJYKzaovQ@mail.gmail.com/
>>
>> And I had done researching to find out there seems to be no combination of
>> the above arch with an address space >16TB:
>> https://lore.kernel.org/all/2b570282-24f8-f23b-1ff7-ad836794baa9@huawei.com/
> 
> Apologies for the late reply.  I just saw you sent a v9, I'll review
> that instead, but I am traveling right now, will take a while

Actually there is a newer v10, see:
https://lore.kernel.org/all/20230922091138.18014-2-linyunsheng@huawei.com/

Thanks for the time reviewing.

> 
> Thanks
> /Ilias


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2023-09-25 11:31 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-09-12  8:31 [PATCH net-next v8 0/6] introduce page_pool_alloc() related API Yunsheng Lin
2023-09-12  8:31 ` [PATCH net-next v8 1/6] page_pool: frag API support for 32-bit arch with 64-bit DMA Yunsheng Lin
2023-09-15  8:28   ` Jesper Dangaard Brouer
2023-09-20  8:59     ` Yunsheng Lin
2023-09-25 10:37       ` Ilias Apalodimas
2023-09-25 11:31         ` Yunsheng Lin
2023-09-12  8:31 ` [PATCH net-next v8 2/6] page_pool: unify frag_count handling in page_pool_is_last_frag() Yunsheng Lin
2023-09-14 15:17   ` Paolo Abeni
2023-09-18 11:15     ` Yunsheng Lin
2023-09-19 12:47       ` Yunsheng Lin
2023-09-12  8:31 ` [PATCH net-next v8 3/6] page_pool: remove PP_FLAG_PAGE_FRAG Yunsheng Lin
2023-09-12  8:31 ` [PATCH net-next v8 4/6] page_pool: introduce page_pool[_cache]_alloc() API Yunsheng Lin
2023-09-12  8:31 ` [PATCH net-next v8 5/6] page_pool: update document about frag API Yunsheng Lin
2023-09-12  8:31 ` [PATCH net-next v8 6/6] net: veth: use newly added page pool API for veth with xdp Yunsheng Lin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).