Netfilter-Devel Archive mirror
 help / color / mirror / Atom feed
* [PATCH nf-next 0/9] nft_set_pipapo: remove cannot-fail allocations on commit and abort
@ 2024-04-03  8:41 Florian Westphal
  2024-04-03  8:41 ` [PATCH nf-next 1/9] netfilter: nft_set_pipapo: move prove_locking helper around Florian Westphal
                   ` (8 more replies)
  0 siblings, 9 replies; 16+ messages in thread
From: Florian Westphal @ 2024-04-03  8:41 UTC (permalink / raw
  To: netfilter-devel; +Cc: sbrivio, Florian Westphal

pipapo keeps one active set data (used from datapath) and one shadow
copy, in priv->clone, used from transactional path to update the set.

On abort and commit, the clone/shadow becomes the active set,
and a new clone is made for the next transaction.

The problem with this is that we cannot fail in ->commit.

This patchset rearranges priv->clone allocation so the cloning occurs on
the first insertion/removal.

set flush needs a bit of extra work, this is done by adding a iter_type
hint to the walker callbacks so that a set flush will be able to perform
the needed clone.

The dirty flag is no longer meaningful after these changes, so last
patch removes it again.

After this patch it is possible to elide calls to nft_setelem_remove
from the abort path IFF the set backend implements an abort() function,
but this change isn't included here.

Florian Westphal (9):
  netfilter: nft_set_pipapo: move prove_locking helper around
  netfilter: nft_set_pipapo: make pipapo_clone helper return NULL
  netfilter: nft_set_pipapo: prepare destroy function for on-demand clone
  netfilter: nft_set_pipapo: prepare walk function for on-demand clone
  netfilter: nf_tables: pass new nft_iter_type hint to walker
  netfilter: nft_set_pipapo: merge deactivate helper into caller
  netfilter: nft_set_pipapo: prepare pipapo_get helper for on-demand clone
  netfilter: nft_set_pipapo: move cloning of match info to
    insert/removal path
  netfilter: nft_set_pipapo: remove dirty flag

 include/net/netfilter/nf_tables.h |  12 ++
 net/netfilter/nf_tables_api.c     |   1 +
 net/netfilter/nft_set_pipapo.c    | 259 +++++++++++++++---------------
 net/netfilter/nft_set_pipapo.h    |   2 -
 4 files changed, 140 insertions(+), 134 deletions(-)

-- 
2.43.2


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH nf-next 1/9] netfilter: nft_set_pipapo: move prove_locking helper around
  2024-04-03  8:41 [PATCH nf-next 0/9] nft_set_pipapo: remove cannot-fail allocations on commit and abort Florian Westphal
@ 2024-04-03  8:41 ` Florian Westphal
  2024-04-03  8:41 ` [PATCH nf-next 2/9] netfilter: nft_set_pipapo: make pipapo_clone helper return NULL Florian Westphal
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 16+ messages in thread
From: Florian Westphal @ 2024-04-03  8:41 UTC (permalink / raw
  To: netfilter-devel; +Cc: sbrivio, Florian Westphal

Preparation patch, the helper will soon get called from insert
function too.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/netfilter/nft_set_pipapo.c | 22 +++++++++++-----------
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/net/netfilter/nft_set_pipapo.c b/net/netfilter/nft_set_pipapo.c
index df8de5090246..a05e5d62a78e 100644
--- a/net/netfilter/nft_set_pipapo.c
+++ b/net/netfilter/nft_set_pipapo.c
@@ -1247,6 +1247,17 @@ static int pipapo_realloc_scratch(struct nft_pipapo_match *clone,
 	return 0;
 }
 
+static bool nft_pipapo_transaction_mutex_held(const struct nft_set *set)
+{
+#ifdef CONFIG_PROVE_LOCKING
+	const struct net *net = read_pnet(&set->net);
+
+	return lockdep_is_held(&nft_pernet(net)->commit_mutex);
+#else
+	return true;
+#endif
+}
+
 /**
  * nft_pipapo_insert() - Validate and insert ranged elements
  * @net:	Network namespace
@@ -1799,17 +1810,6 @@ static void nft_pipapo_commit(struct nft_set *set)
 	priv->clone = new_clone;
 }
 
-static bool nft_pipapo_transaction_mutex_held(const struct nft_set *set)
-{
-#ifdef CONFIG_PROVE_LOCKING
-	const struct net *net = read_pnet(&set->net);
-
-	return lockdep_is_held(&nft_pernet(net)->commit_mutex);
-#else
-	return true;
-#endif
-}
-
 static void nft_pipapo_abort(const struct nft_set *set)
 {
 	struct nft_pipapo *priv = nft_set_priv(set);
-- 
2.43.2


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH nf-next 2/9] netfilter: nft_set_pipapo: make pipapo_clone helper return NULL
  2024-04-03  8:41 [PATCH nf-next 0/9] nft_set_pipapo: remove cannot-fail allocations on commit and abort Florian Westphal
  2024-04-03  8:41 ` [PATCH nf-next 1/9] netfilter: nft_set_pipapo: move prove_locking helper around Florian Westphal
@ 2024-04-03  8:41 ` Florian Westphal
  2024-04-03  8:41 ` [PATCH nf-next 3/9] netfilter: nft_set_pipapo: prepare destroy function for on-demand clone Florian Westphal
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 16+ messages in thread
From: Florian Westphal @ 2024-04-03  8:41 UTC (permalink / raw
  To: netfilter-devel; +Cc: sbrivio, Florian Westphal

Currently it returns an error pointer, but the only possible failure
is ENOMEM.

After a followup patch, we'd need to discard the errno code, i.e.

x = pipapo_clone()
if (IS_ERR(x))
	return NULL

or make more changes to fix up callers to expect IS_ERR() code
from set->ops->deactivate().

So simplify this and make it return ptr-or-null.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/netfilter/nft_set_pipapo.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/net/netfilter/nft_set_pipapo.c b/net/netfilter/nft_set_pipapo.c
index a05e5d62a78e..48d5600f8836 100644
--- a/net/netfilter/nft_set_pipapo.c
+++ b/net/netfilter/nft_set_pipapo.c
@@ -1395,7 +1395,7 @@ static int nft_pipapo_insert(const struct net *net, const struct nft_set *set,
  * pipapo_clone() - Clone matching data to create new working copy
  * @old:	Existing matching data
  *
- * Return: copy of matching data passed as 'old', error pointer on failure
+ * Return: copy of matching data passed as 'old' or NULL.
  */
 static struct nft_pipapo_match *pipapo_clone(struct nft_pipapo_match *old)
 {
@@ -1405,7 +1405,7 @@ static struct nft_pipapo_match *pipapo_clone(struct nft_pipapo_match *old)
 
 	new = kmalloc(struct_size(new, f, old->field_count), GFP_KERNEL);
 	if (!new)
-		return ERR_PTR(-ENOMEM);
+		return NULL;
 
 	new->field_count = old->field_count;
 	new->bsize_max = old->bsize_max;
@@ -1477,7 +1477,7 @@ static struct nft_pipapo_match *pipapo_clone(struct nft_pipapo_match *old)
 	free_percpu(new->scratch);
 	kfree(new);
 
-	return ERR_PTR(-ENOMEM);
+	return NULL;
 }
 
 /**
@@ -1797,7 +1797,7 @@ static void nft_pipapo_commit(struct nft_set *set)
 		return;
 
 	new_clone = pipapo_clone(priv->clone);
-	if (IS_ERR(new_clone))
+	if (!new_clone)
 		return;
 
 	priv->dirty = false;
@@ -1821,7 +1821,7 @@ static void nft_pipapo_abort(const struct nft_set *set)
 	m = rcu_dereference_protected(priv->match, nft_pipapo_transaction_mutex_held(set));
 
 	new_clone = pipapo_clone(m);
-	if (IS_ERR(new_clone))
+	if (!new_clone)
 		return;
 
 	priv->dirty = false;
@@ -2266,8 +2266,8 @@ static int nft_pipapo_init(const struct nft_set *set,
 
 	/* Create an initial clone of matching data for next insertion */
 	priv->clone = pipapo_clone(m);
-	if (IS_ERR(priv->clone)) {
-		err = PTR_ERR(priv->clone);
+	if (!priv->clone) {
+		err = -ENOMEM;
 		goto out_free;
 	}
 
-- 
2.43.2


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH nf-next 3/9] netfilter: nft_set_pipapo: prepare destroy function for on-demand clone
  2024-04-03  8:41 [PATCH nf-next 0/9] nft_set_pipapo: remove cannot-fail allocations on commit and abort Florian Westphal
  2024-04-03  8:41 ` [PATCH nf-next 1/9] netfilter: nft_set_pipapo: move prove_locking helper around Florian Westphal
  2024-04-03  8:41 ` [PATCH nf-next 2/9] netfilter: nft_set_pipapo: make pipapo_clone helper return NULL Florian Westphal
@ 2024-04-03  8:41 ` Florian Westphal
  2024-04-08 15:45   ` Stefano Brivio
  2024-04-03  8:41 ` [PATCH nf-next 4/9] netfilter: nft_set_pipapo: prepare walk " Florian Westphal
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 16+ messages in thread
From: Florian Westphal @ 2024-04-03  8:41 UTC (permalink / raw
  To: netfilter-devel; +Cc: sbrivio, Florian Westphal

Once priv->clone can be NULL in case no insertions/removals occurred
in the last transaction we need to drop set elements from priv->match
if priv->clone is NULL.

While at it, condense this function by reusing the pipapo_free_match
helper instead of open-coded version.

The rcu_barrier() is removed, its not needed: old call_rcu instances
for pipapo_reclaim_match do not access struct nft_set.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/netfilter/nft_set_pipapo.c | 27 ++++++---------------------
 1 file changed, 6 insertions(+), 21 deletions(-)

diff --git a/net/netfilter/nft_set_pipapo.c b/net/netfilter/nft_set_pipapo.c
index 48d5600f8836..d2ac2d5560e4 100644
--- a/net/netfilter/nft_set_pipapo.c
+++ b/net/netfilter/nft_set_pipapo.c
@@ -2323,33 +2323,18 @@ static void nft_pipapo_destroy(const struct nft_ctx *ctx,
 {
 	struct nft_pipapo *priv = nft_set_priv(set);
 	struct nft_pipapo_match *m;
-	int cpu;
 
 	m = rcu_dereference_protected(priv->match, true);
-	if (m) {
-		rcu_barrier();
-
-		for_each_possible_cpu(cpu)
-			pipapo_free_scratch(m, cpu);
-		free_percpu(m->scratch);
-		pipapo_free_fields(m);
-		kfree(m);
-		priv->match = NULL;
-	}
 
 	if (priv->clone) {
-		m = priv->clone;
-
-		nft_set_pipapo_match_destroy(ctx, set, m);
-
-		for_each_possible_cpu(cpu)
-			pipapo_free_scratch(priv->clone, cpu);
-		free_percpu(priv->clone->scratch);
-
-		pipapo_free_fields(priv->clone);
-		kfree(priv->clone);
+		nft_set_pipapo_match_destroy(ctx, set, priv->clone);
+		pipapo_free_match(priv->clone);
 		priv->clone = NULL;
+	} else {
+		nft_set_pipapo_match_destroy(ctx, set, m);
 	}
+
+	pipapo_free_match(m);
 }
 
 /**
-- 
2.43.2


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH nf-next 4/9] netfilter: nft_set_pipapo: prepare walk function for on-demand clone
  2024-04-03  8:41 [PATCH nf-next 0/9] nft_set_pipapo: remove cannot-fail allocations on commit and abort Florian Westphal
                   ` (2 preceding siblings ...)
  2024-04-03  8:41 ` [PATCH nf-next 3/9] netfilter: nft_set_pipapo: prepare destroy function for on-demand clone Florian Westphal
@ 2024-04-03  8:41 ` Florian Westphal
  2024-04-03  8:41 ` [PATCH nf-next 5/9] netfilter: nf_tables: pass new nft_iter_type hint to walker Florian Westphal
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 16+ messages in thread
From: Florian Westphal @ 2024-04-03  8:41 UTC (permalink / raw
  To: netfilter-devel; +Cc: sbrivio, Florian Westphal

Right now, without pending updates, priv->clone and priv->match
will point to different memory locations, but they have identical
content.

Future patch will make priv->clone == NULL if there are no pending
changes.

We cannot just fallback to the live data in this case because there
are different types of walks:

- set dump: this can fallback to the live copy.
- flush walk: all set elements should be deactivated.
  If no single element was removed before, then we must first
  make a copy of the live data.
- deactivate/activate walks during abort or commit.
  This would always have a non-null clone.

The existing test is unreliable, if genmask is not equal to current
one, we can't infer that the transaction mutex is held, we could
be in a (lockless) dump.

Its only safe at this time because both commit and abort paths
will re-clone the live copy, so ->clone is always non-null -- something
that is about to change.

Next patch will add explicit iter type to tell when flushing
is requested (i.e., when live data must be copied first).

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/netfilter/nft_set_pipapo.c | 54 +++++++++++++++++++++++-----------
 1 file changed, 37 insertions(+), 17 deletions(-)

diff --git a/net/netfilter/nft_set_pipapo.c b/net/netfilter/nft_set_pipapo.c
index d2ac2d5560e4..57b1508d3502 100644
--- a/net/netfilter/nft_set_pipapo.c
+++ b/net/netfilter/nft_set_pipapo.c
@@ -2102,33 +2102,23 @@ static void nft_pipapo_remove(const struct net *net, const struct nft_set *set,
 }
 
 /**
- * nft_pipapo_walk() - Walk over elements
+ * __nft_pipapo_walk() - Walk over elements in m
  * @ctx:	nftables API context
  * @set:	nftables API set representation
+ * @m:		matching data pointing to key mapping array
  * @iter:	Iterator
  *
  * As elements are referenced in the mapping array for the last field, directly
  * scan that array: there's no need to follow rule mappings from the first
  * field.
  */
-static void nft_pipapo_walk(const struct nft_ctx *ctx, struct nft_set *set,
-			    struct nft_set_iter *iter)
+static void __nft_pipapo_walk(const struct nft_ctx *ctx, struct nft_set *set,
+			      const struct nft_pipapo_match *m,
+			      struct nft_set_iter *iter)
 {
-	struct nft_pipapo *priv = nft_set_priv(set);
-	struct net *net = read_pnet(&set->net);
-	const struct nft_pipapo_match *m;
 	const struct nft_pipapo_field *f;
 	unsigned int i, r;
 
-	rcu_read_lock();
-	if (iter->genmask == nft_genmask_cur(net))
-		m = rcu_dereference(priv->match);
-	else
-		m = priv->clone;
-
-	if (unlikely(!m))
-		goto out;
-
 	for (i = 0, f = m->f; i < m->field_count - 1; i++, f++)
 		;
 
@@ -2148,13 +2138,43 @@ static void nft_pipapo_walk(const struct nft_ctx *ctx, struct nft_set *set,
 
 		iter->err = iter->fn(ctx, set, iter, &e->priv);
 		if (iter->err < 0)
-			goto out;
+			return;
 
 cont:
 		iter->count++;
 	}
+}
+
+/**
+ * nft_pipapo_walk() - Walk over elements
+ * @ctx:	nftables API context
+ * @set:	nftables API set representation
+ * @iter:	Iterator
+ *
+ * Test if destructive action is needed or not, clone active backend if needed
+ * and call the real function to work on the data.
+ */
+static void nft_pipapo_walk(const struct nft_ctx *ctx, struct nft_set *set,
+			    struct nft_set_iter *iter)
+{
+	struct nft_pipapo *priv = nft_set_priv(set);
+	struct net *net = read_pnet(&set->net);
+	const struct nft_pipapo_match *m;
+
+	rcu_read_lock();
+	if (iter->genmask == nft_genmask_cur(net)) {
+		m = rcu_dereference(priv->match);
+	} else {
+		m = priv->clone;
+		if (!m) /* no pending updates */
+			m = rcu_dereference(priv->match);
+	}
+
+	if (m)
+		__nft_pipapo_walk(ctx, set, m, iter);
+	else
+		WARN_ON_ONCE(1);
 
-out:
 	rcu_read_unlock();
 }
 
-- 
2.43.2


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH nf-next 5/9] netfilter: nf_tables: pass new nft_iter_type hint to walker
  2024-04-03  8:41 [PATCH nf-next 0/9] nft_set_pipapo: remove cannot-fail allocations on commit and abort Florian Westphal
                   ` (3 preceding siblings ...)
  2024-04-03  8:41 ` [PATCH nf-next 4/9] netfilter: nft_set_pipapo: prepare walk " Florian Westphal
@ 2024-04-03  8:41 ` Florian Westphal
  2024-04-03  8:41 ` [PATCH nf-next 6/9] netfilter: nft_set_pipapo: merge deactivate helper into caller Florian Westphal
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 16+ messages in thread
From: Florian Westphal @ 2024-04-03  8:41 UTC (permalink / raw
  To: netfilter-devel; +Cc: sbrivio, Florian Westphal

This will be needed by the 'pipapo' set backend.  If this is set, then
it needs to perform copy-on-write of the active set data storage.

Its not possible to use genmask test, the walker function is also used by
(rcu locked) set listing which can run in parallel to set updates.

If priv->clone is null, then fallback to the active data storeage is
safe EXCEPT for the flush case, which must do the copy.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 include/net/netfilter/nf_tables.h | 12 +++++++++++
 net/netfilter/nf_tables_api.c     |  1 +
 net/netfilter/nft_set_pipapo.c    | 35 +++++++++++++++++++++----------
 3 files changed, 37 insertions(+), 11 deletions(-)

diff --git a/include/net/netfilter/nf_tables.h b/include/net/netfilter/nf_tables.h
index e27c28b612e4..9912a2621344 100644
--- a/include/net/netfilter/nf_tables.h
+++ b/include/net/netfilter/nf_tables.h
@@ -307,9 +307,21 @@ static inline void *nft_elem_priv_cast(const struct nft_elem_priv *priv)
 	return (void *)priv;
 }
 
+
+/**
+ * enum nft_iter_type - nftables set iterator type
+ *
+ * @NFT_ITER_FLUSH: destructive iteration, transaction mutex must be held
+ */
+enum nft_iter_type {
+	/* undef == 0 */
+	NFT_ITER_FLUSH = 1,
+};
+
 struct nft_set;
 struct nft_set_iter {
 	u8		genmask;
+	enum nft_iter_type type:8;
 	unsigned int	count;
 	unsigned int	skip;
 	int		err;
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index fd86f2720c9e..facd33e97dfe 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -7369,6 +7369,7 @@ static int nft_set_flush(struct nft_ctx *ctx, struct nft_set *set, u8 genmask)
 	struct nft_set_iter iter = {
 		.genmask	= genmask,
 		.fn		= nft_setelem_flush,
+		.type		= NFT_ITER_FLUSH,
 	};
 
 	set->ops->walk(ctx, set, &iter);
diff --git a/net/netfilter/nft_set_pipapo.c b/net/netfilter/nft_set_pipapo.c
index 57b1508d3502..eca81c5e5810 100644
--- a/net/netfilter/nft_set_pipapo.c
+++ b/net/netfilter/nft_set_pipapo.c
@@ -2161,21 +2161,34 @@ static void nft_pipapo_walk(const struct nft_ctx *ctx, struct nft_set *set,
 	struct net *net = read_pnet(&set->net);
 	const struct nft_pipapo_match *m;
 
-	rcu_read_lock();
-	if (iter->genmask == nft_genmask_cur(net)) {
-		m = rcu_dereference(priv->match);
-	} else {
+	switch (iter->type) {
+	case NFT_ITER_FLUSH:
 		m = priv->clone;
-		if (!m) /* no pending updates */
-			m = rcu_dereference(priv->match);
-	}
+		if (!m) {
+			iter->err = -ENOMEM;
+			return;
+		}
 
-	if (m)
 		__nft_pipapo_walk(ctx, set, m, iter);
-	else
-		WARN_ON_ONCE(1);
+		break;
+	default:
+		rcu_read_lock();
+		if (iter->genmask == nft_genmask_cur(net)) {
+			m = rcu_dereference(priv->match);
+		} else {
+			m = priv->clone;
+			if (!m) /* no pending updates */
+				m = rcu_dereference(priv->match);
+		}
 
-	rcu_read_unlock();
+		if (m)
+			__nft_pipapo_walk(ctx, set, m, iter);
+		else
+			WARN_ON_ONCE(1);
+
+		rcu_read_unlock();
+		break;
+	}
 }
 
 /**
-- 
2.43.2


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH nf-next 6/9] netfilter: nft_set_pipapo: merge deactivate helper into caller
  2024-04-03  8:41 [PATCH nf-next 0/9] nft_set_pipapo: remove cannot-fail allocations on commit and abort Florian Westphal
                   ` (4 preceding siblings ...)
  2024-04-03  8:41 ` [PATCH nf-next 5/9] netfilter: nf_tables: pass new nft_iter_type hint to walker Florian Westphal
@ 2024-04-03  8:41 ` Florian Westphal
  2024-04-03  8:41 ` [PATCH nf-next 7/9] netfilter: nft_set_pipapo: prepare pipapo_get helper for on-demand clone Florian Westphal
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 16+ messages in thread
From: Florian Westphal @ 2024-04-03  8:41 UTC (permalink / raw
  To: netfilter-devel; +Cc: sbrivio, Florian Westphal

Its the only remaining call site so there is no need for this to
be separated anymore.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/netfilter/nft_set_pipapo.c | 39 ++++++++--------------------------
 1 file changed, 9 insertions(+), 30 deletions(-)

diff --git a/net/netfilter/nft_set_pipapo.c b/net/netfilter/nft_set_pipapo.c
index eca81c5e5810..9dd6725ada4d 100644
--- a/net/netfilter/nft_set_pipapo.c
+++ b/net/netfilter/nft_set_pipapo.c
@@ -1851,52 +1851,31 @@ static void nft_pipapo_activate(const struct net *net,
 }
 
 /**
- * pipapo_deactivate() - Check that element is in set, mark as inactive
+ * nft_pipapo_deactivate() - Search for element and make it inactive
  * @net:	Network namespace
  * @set:	nftables API set representation
- * @data:	Input key data
- * @ext:	nftables API extension pointer, used to check for end element
- *
- * This is a convenience function that can be called from both
- * nft_pipapo_deactivate() and nft_pipapo_flush(), as they are in fact the same
- * operation.
+ * @elem:	nftables API element representation containing key data
  *
  * Return: deactivated element if found, NULL otherwise.
  */
-static void *pipapo_deactivate(const struct net *net, const struct nft_set *set,
-			       const u8 *data, const struct nft_set_ext *ext)
+static struct nft_elem_priv *
+nft_pipapo_deactivate(const struct net *net, const struct nft_set *set,
+		      const struct nft_set_elem *elem)
 {
 	struct nft_pipapo_elem *e;
 
-	e = pipapo_get(net, set, data, nft_genmask_next(net),
-		       nft_net_tstamp(net), GFP_KERNEL);
+	e = pipapo_get(net, set, (const u8 *)elem->key.val.data,
+		       nft_genmask_next(net), nft_net_tstamp(net), GFP_KERNEL);
 	if (IS_ERR(e))
 		return NULL;
 
 	nft_set_elem_change_active(net, set, &e->ext);
 
-	return e;
-}
-
-/**
- * nft_pipapo_deactivate() - Call pipapo_deactivate() to make element inactive
- * @net:	Network namespace
- * @set:	nftables API set representation
- * @elem:	nftables API element representation containing key data
- *
- * Return: deactivated element if found, NULL otherwise.
- */
-static struct nft_elem_priv *
-nft_pipapo_deactivate(const struct net *net, const struct nft_set *set,
-		      const struct nft_set_elem *elem)
-{
-	const struct nft_set_ext *ext = nft_set_elem_ext(set, elem->priv);
-
-	return pipapo_deactivate(net, set, (const u8 *)elem->key.val.data, ext);
+	return &e->priv;
 }
 
 /**
- * nft_pipapo_flush() - Call pipapo_deactivate() to make element inactive
+ * nft_pipapo_flush() - make element inactive
  * @net:	Network namespace
  * @set:	nftables API set representation
  * @elem_priv:	nftables API element representation containing key data
-- 
2.43.2


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH nf-next 7/9] netfilter: nft_set_pipapo: prepare pipapo_get helper for on-demand clone
  2024-04-03  8:41 [PATCH nf-next 0/9] nft_set_pipapo: remove cannot-fail allocations on commit and abort Florian Westphal
                   ` (5 preceding siblings ...)
  2024-04-03  8:41 ` [PATCH nf-next 6/9] netfilter: nft_set_pipapo: merge deactivate helper into caller Florian Westphal
@ 2024-04-03  8:41 ` Florian Westphal
  2024-04-03  8:41 ` [PATCH nf-next 8/9] netfilter: nft_set_pipapo: move cloning of match info to insert/removal path Florian Westphal
  2024-04-03  8:41 ` [PATCH nf-next 9/9] netfilter: nft_set_pipapo: remove dirty flag Florian Westphal
  8 siblings, 0 replies; 16+ messages in thread
From: Florian Westphal @ 2024-04-03  8:41 UTC (permalink / raw
  To: netfilter-devel; +Cc: sbrivio, Florian Westphal

The helper uses priv->clone unconditionally which will fail once we do
the clone conditionally on first insert or removal.

'nft get element' from userspace needs to use priv->match if priv->clone
is null.

Prepare for this by passing the match backend data as argument.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/netfilter/nft_set_pipapo.c | 17 ++++++++++-------
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/net/netfilter/nft_set_pipapo.c b/net/netfilter/nft_set_pipapo.c
index 9dd6725ada4d..2cc905e92889 100644
--- a/net/netfilter/nft_set_pipapo.c
+++ b/net/netfilter/nft_set_pipapo.c
@@ -504,6 +504,7 @@ bool nft_pipapo_lookup(const struct net *net, const struct nft_set *set,
  * pipapo_get() - Get matching element reference given key data
  * @net:	Network namespace
  * @set:	nftables API set representation
+ * @m:		storage containing active/existing elements
  * @data:	Key data to be matched against existing elements
  * @genmask:	If set, check that element is active in given genmask
  * @tstamp:	timestamp to check for expired elements
@@ -517,17 +518,15 @@ bool nft_pipapo_lookup(const struct net *net, const struct nft_set *set,
  */
 static struct nft_pipapo_elem *pipapo_get(const struct net *net,
 					  const struct nft_set *set,
+					  const struct nft_pipapo_match *m,
 					  const u8 *data, u8 genmask,
 					  u64 tstamp, gfp_t gfp)
 {
 	struct nft_pipapo_elem *ret = ERR_PTR(-ENOENT);
-	struct nft_pipapo *priv = nft_set_priv(set);
 	unsigned long *res_map, *fill_map = NULL;
-	const struct nft_pipapo_match *m;
 	const struct nft_pipapo_field *f;
 	int i;
 
-	m = priv->clone;
 	if (m->bsize_max == 0)
 		return ret;
 
@@ -612,9 +611,11 @@ static struct nft_elem_priv *
 nft_pipapo_get(const struct net *net, const struct nft_set *set,
 	       const struct nft_set_elem *elem, unsigned int flags)
 {
+	struct nft_pipapo *priv = nft_set_priv(set);
+	struct nft_pipapo_match *m = priv->clone;
 	struct nft_pipapo_elem *e;
 
-	e = pipapo_get(net, set, (const u8 *)elem->key.val.data,
+	e = pipapo_get(net, set, m, (const u8 *)elem->key.val.data,
 		       nft_genmask_cur(net), get_jiffies_64(),
 		       GFP_ATOMIC);
 	if (IS_ERR(e))
@@ -1288,7 +1289,7 @@ static int nft_pipapo_insert(const struct net *net, const struct nft_set *set,
 	else
 		end = start;
 
-	dup = pipapo_get(net, set, start, genmask, tstamp, GFP_KERNEL);
+	dup = pipapo_get(net, set, m, start, genmask, tstamp, GFP_KERNEL);
 	if (!IS_ERR(dup)) {
 		/* Check if we already have the same exact entry */
 		const struct nft_data *dup_key, *dup_end;
@@ -1310,7 +1311,7 @@ static int nft_pipapo_insert(const struct net *net, const struct nft_set *set,
 
 	if (PTR_ERR(dup) == -ENOENT) {
 		/* Look for partially overlapping entries */
-		dup = pipapo_get(net, set, end, nft_genmask_next(net), tstamp,
+		dup = pipapo_get(net, set, m, end, nft_genmask_next(net), tstamp,
 				 GFP_KERNEL);
 	}
 
@@ -1862,9 +1863,11 @@ static struct nft_elem_priv *
 nft_pipapo_deactivate(const struct net *net, const struct nft_set *set,
 		      const struct nft_set_elem *elem)
 {
+	const struct nft_pipapo *priv = nft_set_priv(set);
+	struct nft_pipapo_match *m = priv->clone;
 	struct nft_pipapo_elem *e;
 
-	e = pipapo_get(net, set, (const u8 *)elem->key.val.data,
+	e = pipapo_get(net, set, m, (const u8 *)elem->key.val.data,
 		       nft_genmask_next(net), nft_net_tstamp(net), GFP_KERNEL);
 	if (IS_ERR(e))
 		return NULL;
-- 
2.43.2


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH nf-next 8/9] netfilter: nft_set_pipapo: move cloning of match info to insert/removal path
  2024-04-03  8:41 [PATCH nf-next 0/9] nft_set_pipapo: remove cannot-fail allocations on commit and abort Florian Westphal
                   ` (6 preceding siblings ...)
  2024-04-03  8:41 ` [PATCH nf-next 7/9] netfilter: nft_set_pipapo: prepare pipapo_get helper for on-demand clone Florian Westphal
@ 2024-04-03  8:41 ` Florian Westphal
  2024-04-08 15:45   ` Stefano Brivio
  2024-04-03  8:41 ` [PATCH nf-next 9/9] netfilter: nft_set_pipapo: remove dirty flag Florian Westphal
  8 siblings, 1 reply; 16+ messages in thread
From: Florian Westphal @ 2024-04-03  8:41 UTC (permalink / raw
  To: netfilter-devel; +Cc: sbrivio, Florian Westphal

This set type keeps two copies of the sets' content,
   priv->match (live version, used to match from packet path)
   priv->clone (work-in-progress version of the 'future' priv->match).

All additions and removals are done on priv->clone.  When transaction
completes, priv->clone becomes priv->match and a new clone is allocated
for use by next transaction.

Problem is that the cloning requires GFP_KERNEL allocations but we
cannot fail at either commit or abort time.

This patch defers the clone until we get an insertion or removal
request.  This allows us to handle OOM situations correctly.

This also allows to remove ->dirty in a followup change:

If ->clone exists, ->dirty is always true
If ->clone is NULL, ->dirty is always false, no elements were added
or removed (except catchall elements which are external to the specific
set backend).

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/netfilter/nft_set_pipapo.c | 62 ++++++++++++++++++++++------------
 1 file changed, 41 insertions(+), 21 deletions(-)

diff --git a/net/netfilter/nft_set_pipapo.c b/net/netfilter/nft_set_pipapo.c
index 2cc905e92889..eef6a978561f 100644
--- a/net/netfilter/nft_set_pipapo.c
+++ b/net/netfilter/nft_set_pipapo.c
@@ -615,6 +615,9 @@ nft_pipapo_get(const struct net *net, const struct nft_set *set,
 	struct nft_pipapo_match *m = priv->clone;
 	struct nft_pipapo_elem *e;
 
+	if (!m)
+		m = rcu_dereference(priv->match);
+
 	e = pipapo_get(net, set, m, (const u8 *)elem->key.val.data,
 		       nft_genmask_cur(net), get_jiffies_64(),
 		       GFP_ATOMIC);
@@ -1259,6 +1262,23 @@ static bool nft_pipapo_transaction_mutex_held(const struct nft_set *set)
 #endif
 }
 
+static struct nft_pipapo_match *pipapo_clone(struct nft_pipapo_match *old);
+
+static struct nft_pipapo_match *pipapo_maybe_clone(const struct nft_set *set)
+{
+	struct nft_pipapo *priv = nft_set_priv(set);
+	struct nft_pipapo_match *m;
+
+	if (priv->clone)
+		return priv->clone;
+
+	m = rcu_dereference_protected(priv->match,
+				      nft_pipapo_transaction_mutex_held(set));
+	priv->clone = pipapo_clone(m);
+
+	return priv->clone;
+}
+
 /**
  * nft_pipapo_insert() - Validate and insert ranged elements
  * @net:	Network namespace
@@ -1275,8 +1295,8 @@ static int nft_pipapo_insert(const struct net *net, const struct nft_set *set,
 	const struct nft_set_ext *ext = nft_set_elem_ext(set, elem->priv);
 	union nft_pipapo_map_bucket rulemap[NFT_PIPAPO_MAX_FIELDS];
 	const u8 *start = (const u8 *)elem->key.val.data, *end;
+	struct nft_pipapo_match *m = pipapo_maybe_clone(set);
 	struct nft_pipapo *priv = nft_set_priv(set);
-	struct nft_pipapo_match *m = priv->clone;
 	u8 genmask = nft_genmask_next(net);
 	struct nft_pipapo_elem *e, *dup;
 	u64 tstamp = nft_net_tstamp(net);
@@ -1284,6 +1304,9 @@ static int nft_pipapo_insert(const struct net *net, const struct nft_set *set,
 	const u8 *start_p, *end_p;
 	int i, bsize_max, err = 0;
 
+	if (!m)
+		return -ENOMEM;
+
 	if (nft_set_ext_exists(ext, NFT_SET_EXT_KEY_END))
 		end = (const u8 *)nft_set_ext_key_end(ext)->data;
 	else
@@ -1789,7 +1812,10 @@ static void pipapo_reclaim_match(struct rcu_head *rcu)
 static void nft_pipapo_commit(struct nft_set *set)
 {
 	struct nft_pipapo *priv = nft_set_priv(set);
-	struct nft_pipapo_match *new_clone, *old;
+	struct nft_pipapo_match *old;
+
+	if (!priv->clone)
+		return;
 
 	if (time_after_eq(jiffies, priv->last_gc + nft_set_gc_interval(set)))
 		pipapo_gc(set, priv->clone);
@@ -1797,38 +1823,27 @@ static void nft_pipapo_commit(struct nft_set *set)
 	if (!priv->dirty)
 		return;
 
-	new_clone = pipapo_clone(priv->clone);
-	if (!new_clone)
-		return;
-
+	old = rcu_replace_pointer(priv->match, priv->clone,
+				  nft_pipapo_transaction_mutex_held(set));
+	priv->clone = NULL;
 	priv->dirty = false;
 
-	old = rcu_access_pointer(priv->match);
-	rcu_assign_pointer(priv->match, priv->clone);
 	if (old)
 		call_rcu(&old->rcu, pipapo_reclaim_match);
-
-	priv->clone = new_clone;
 }
 
 static void nft_pipapo_abort(const struct nft_set *set)
 {
 	struct nft_pipapo *priv = nft_set_priv(set);
-	struct nft_pipapo_match *new_clone, *m;
 
 	if (!priv->dirty)
 		return;
 
-	m = rcu_dereference_protected(priv->match, nft_pipapo_transaction_mutex_held(set));
-
-	new_clone = pipapo_clone(m);
-	if (!new_clone)
+	if (!priv->clone)
 		return;
-
 	priv->dirty = false;
-
 	pipapo_free_match(priv->clone);
-	priv->clone = new_clone;
+	priv->clone = NULL;
 }
 
 /**
@@ -1863,10 +1878,15 @@ static struct nft_elem_priv *
 nft_pipapo_deactivate(const struct net *net, const struct nft_set *set,
 		      const struct nft_set_elem *elem)
 {
-	const struct nft_pipapo *priv = nft_set_priv(set);
-	struct nft_pipapo_match *m = priv->clone;
+	struct nft_pipapo_match *m = pipapo_maybe_clone(set);
 	struct nft_pipapo_elem *e;
 
+	/* removal must occur on priv->clone, if we are low on memory
+	 * we have no choice and must fail the removal request.
+	 */
+	if (!m)
+		return NULL;
+
 	e = pipapo_get(net, set, m, (const u8 *)elem->key.val.data,
 		       nft_genmask_next(net), nft_net_tstamp(net), GFP_KERNEL);
 	if (IS_ERR(e))
@@ -2145,7 +2165,7 @@ static void nft_pipapo_walk(const struct nft_ctx *ctx, struct nft_set *set,
 
 	switch (iter->type) {
 	case NFT_ITER_FLUSH:
-		m = priv->clone;
+		m = pipapo_maybe_clone(set);
 		if (!m) {
 			iter->err = -ENOMEM;
 			return;
-- 
2.43.2


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH nf-next 9/9] netfilter: nft_set_pipapo: remove dirty flag
  2024-04-03  8:41 [PATCH nf-next 0/9] nft_set_pipapo: remove cannot-fail allocations on commit and abort Florian Westphal
                   ` (7 preceding siblings ...)
  2024-04-03  8:41 ` [PATCH nf-next 8/9] netfilter: nft_set_pipapo: move cloning of match info to insert/removal path Florian Westphal
@ 2024-04-03  8:41 ` Florian Westphal
  8 siblings, 0 replies; 16+ messages in thread
From: Florian Westphal @ 2024-04-03  8:41 UTC (permalink / raw
  To: netfilter-devel; +Cc: sbrivio, Florian Westphal

Its not needed anymore, after previous changes priv->clone != NULL
during commit means priv->match needs to be updated with the clone.

On abort, priv->clone needs to be discarded, it doesn't contain
anything new anymore.

Note that its now possible to resurrect
ebd032fa8818 ("netfilter: nf_tables: do not remove elements if set backend implements .abort")
to speed up the abort path, removal from pipapo sets is slow.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/netfilter/nft_set_pipapo.c | 25 -------------------------
 net/netfilter/nft_set_pipapo.h |  2 --
 2 files changed, 27 deletions(-)

diff --git a/net/netfilter/nft_set_pipapo.c b/net/netfilter/nft_set_pipapo.c
index eef6a978561f..bb9a03426696 100644
--- a/net/netfilter/nft_set_pipapo.c
+++ b/net/netfilter/nft_set_pipapo.c
@@ -1296,7 +1296,6 @@ static int nft_pipapo_insert(const struct net *net, const struct nft_set *set,
 	union nft_pipapo_map_bucket rulemap[NFT_PIPAPO_MAX_FIELDS];
 	const u8 *start = (const u8 *)elem->key.val.data, *end;
 	struct nft_pipapo_match *m = pipapo_maybe_clone(set);
-	struct nft_pipapo *priv = nft_set_priv(set);
 	u8 genmask = nft_genmask_next(net);
 	struct nft_pipapo_elem *e, *dup;
 	u64 tstamp = nft_net_tstamp(net);
@@ -1367,8 +1366,6 @@ static int nft_pipapo_insert(const struct net *net, const struct nft_set *set,
 	}
 
 	/* Insert */
-	priv->dirty = true;
-
 	bsize_max = m->bsize_max;
 
 	nft_pipapo_for_each_field(f, i, m) {
@@ -1733,8 +1730,6 @@ static void pipapo_gc(struct nft_set *set, struct nft_pipapo_match *m)
 		 * NFT_SET_ELEM_DEAD_BIT.
 		 */
 		if (__nft_set_elem_expired(&e->ext, tstamp)) {
-			priv->dirty = true;
-
 			gc = nft_trans_gc_queue_sync(gc, GFP_KERNEL);
 			if (!gc)
 				return;
@@ -1820,13 +1815,9 @@ static void nft_pipapo_commit(struct nft_set *set)
 	if (time_after_eq(jiffies, priv->last_gc + nft_set_gc_interval(set)))
 		pipapo_gc(set, priv->clone);
 
-	if (!priv->dirty)
-		return;
-
 	old = rcu_replace_pointer(priv->match, priv->clone,
 				  nft_pipapo_transaction_mutex_held(set));
 	priv->clone = NULL;
-	priv->dirty = false;
 
 	if (old)
 		call_rcu(&old->rcu, pipapo_reclaim_match);
@@ -1836,12 +1827,8 @@ static void nft_pipapo_abort(const struct nft_set *set)
 {
 	struct nft_pipapo *priv = nft_set_priv(set);
 
-	if (!priv->dirty)
-		return;
-
 	if (!priv->clone)
 		return;
-	priv->dirty = false;
 	pipapo_free_match(priv->clone);
 	priv->clone = NULL;
 }
@@ -2094,7 +2081,6 @@ static void nft_pipapo_remove(const struct net *net, const struct nft_set *set,
 		}
 
 		if (i == m->field_count) {
-			priv->dirty = true;
 			pipapo_drop(m, rulemap);
 			return;
 		}
@@ -2299,21 +2285,10 @@ static int nft_pipapo_init(const struct nft_set *set,
 		f->mt = NULL;
 	}
 
-	/* Create an initial clone of matching data for next insertion */
-	priv->clone = pipapo_clone(m);
-	if (!priv->clone) {
-		err = -ENOMEM;
-		goto out_free;
-	}
-
-	priv->dirty = false;
-
 	rcu_assign_pointer(priv->match, m);
 
 	return 0;
 
-out_free:
-	free_percpu(m->scratch);
 out_scratch:
 	kfree(m);
 
diff --git a/net/netfilter/nft_set_pipapo.h b/net/netfilter/nft_set_pipapo.h
index 24cd1ff73f98..0d2e40e10f7f 100644
--- a/net/netfilter/nft_set_pipapo.h
+++ b/net/netfilter/nft_set_pipapo.h
@@ -155,14 +155,12 @@ struct nft_pipapo_match {
  * @match:	Currently in-use matching data
  * @clone:	Copy where pending insertions and deletions are kept
  * @width:	Total bytes to be matched for one packet, including padding
- * @dirty:	Working copy has pending insertions or deletions
  * @last_gc:	Timestamp of last garbage collection run, jiffies
  */
 struct nft_pipapo {
 	struct nft_pipapo_match __rcu *match;
 	struct nft_pipapo_match *clone;
 	int width;
-	bool dirty;
 	unsigned long last_gc;
 };
 
-- 
2.43.2


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH nf-next 3/9] netfilter: nft_set_pipapo: prepare destroy function for on-demand clone
  2024-04-03  8:41 ` [PATCH nf-next 3/9] netfilter: nft_set_pipapo: prepare destroy function for on-demand clone Florian Westphal
@ 2024-04-08 15:45   ` Stefano Brivio
  2024-04-09 11:07     ` Florian Westphal
  0 siblings, 1 reply; 16+ messages in thread
From: Stefano Brivio @ 2024-04-08 15:45 UTC (permalink / raw
  To: Florian Westphal; +Cc: netfilter-devel

On Wed,  3 Apr 2024 10:41:03 +0200
Florian Westphal <fw@strlen.de> wrote:

> Once priv->clone can be NULL in case no insertions/removals occurred
> in the last transaction we need to drop set elements from priv->match
> if priv->clone is NULL.
> 
> While at it, condense this function by reusing the pipapo_free_match
> helper instead of open-coded version.
> 
> The rcu_barrier() is removed, its not needed: old call_rcu instances
> for pipapo_reclaim_match do not access struct nft_set.

True, pipapo_reclaim_match() won't, but nft_set_pipapo_match_destroy()
will, right? That is:

> 
> Signed-off-by: Florian Westphal <fw@strlen.de>
> ---
>  net/netfilter/nft_set_pipapo.c | 27 ++++++---------------------
>  1 file changed, 6 insertions(+), 21 deletions(-)
> 
> diff --git a/net/netfilter/nft_set_pipapo.c b/net/netfilter/nft_set_pipapo.c
> index 48d5600f8836..d2ac2d5560e4 100644
> --- a/net/netfilter/nft_set_pipapo.c
> +++ b/net/netfilter/nft_set_pipapo.c
> @@ -2323,33 +2323,18 @@ static void nft_pipapo_destroy(const struct nft_ctx *ctx,
>  {
>  	struct nft_pipapo *priv = nft_set_priv(set);
>  	struct nft_pipapo_match *m;
> -	int cpu;
>  
>  	m = rcu_dereference_protected(priv->match, true);
> -	if (m) {
> -		rcu_barrier();

...before b0e256f3dd2b ("netfilter: nft_set_pipapo: release elements in
clone only from destroy path"), this rcu_barrier() was needed because we'd
call nft_set_pipapo_match_destroy() on 'm'.

That call is now gone, and we could have dropped it at that point, but:

> -
> -		for_each_possible_cpu(cpu)
> -			pipapo_free_scratch(m, cpu);
> -		free_percpu(m->scratch);
> -		pipapo_free_fields(m);
> -		kfree(m);
> -		priv->match = NULL;
> -	}
>  
>  	if (priv->clone) {
> -		m = priv->clone;
> -
> -		nft_set_pipapo_match_destroy(ctx, set, m);
> -
> -		for_each_possible_cpu(cpu)
> -			pipapo_free_scratch(priv->clone, cpu);
> -		free_percpu(priv->clone->scratch);
> -
> -		pipapo_free_fields(priv->clone);
> -		kfree(priv->clone);
> +		nft_set_pipapo_match_destroy(ctx, set, priv->clone);
> +		pipapo_free_match(priv->clone);
>  		priv->clone = NULL;
> +	} else {
> +		nft_set_pipapo_match_destroy(ctx, set, m);

now it's back, so we should actually move rcu_barrier() before this
call? I think it's needed because nft_set_pipapo_match_destroy() does
access nft_set data.

>  	}
> +
> +	pipapo_free_match(m);
>  }
>  
>  /**

-- 
Stefano


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH nf-next 8/9] netfilter: nft_set_pipapo: move cloning of match info to insert/removal path
  2024-04-03  8:41 ` [PATCH nf-next 8/9] netfilter: nft_set_pipapo: move cloning of match info to insert/removal path Florian Westphal
@ 2024-04-08 15:45   ` Stefano Brivio
  0 siblings, 0 replies; 16+ messages in thread
From: Stefano Brivio @ 2024-04-08 15:45 UTC (permalink / raw
  To: Florian Westphal; +Cc: netfilter-devel

On Wed,  3 Apr 2024 10:41:08 +0200
Florian Westphal <fw@strlen.de> wrote:

> This set type keeps two copies of the sets' content,
>    priv->match (live version, used to match from packet path)
>    priv->clone (work-in-progress version of the 'future' priv->match).
> 
> All additions and removals are done on priv->clone.  When transaction
> completes, priv->clone becomes priv->match and a new clone is allocated
> for use by next transaction.
> 
> Problem is that the cloning requires GFP_KERNEL allocations but we
> cannot fail at either commit or abort time.
> 
> This patch defers the clone until we get an insertion or removal
> request.  This allows us to handle OOM situations correctly.
> 
> This also allows to remove ->dirty in a followup change:
> 
> If ->clone exists, ->dirty is always true
> If ->clone is NULL, ->dirty is always false, no elements were added
> or removed (except catchall elements which are external to the specific
> set backend).
> 
> Signed-off-by: Florian Westphal <fw@strlen.de>
> ---
>  net/netfilter/nft_set_pipapo.c | 62 ++++++++++++++++++++++------------
>  1 file changed, 41 insertions(+), 21 deletions(-)
> 
> diff --git a/net/netfilter/nft_set_pipapo.c b/net/netfilter/nft_set_pipapo.c
> index 2cc905e92889..eef6a978561f 100644
> --- a/net/netfilter/nft_set_pipapo.c
> +++ b/net/netfilter/nft_set_pipapo.c
> @@ -615,6 +615,9 @@ nft_pipapo_get(const struct net *net, const struct nft_set *set,
>  	struct nft_pipapo_match *m = priv->clone;
>  	struct nft_pipapo_elem *e;
>  
> +	if (!m)
> +		m = rcu_dereference(priv->match);
> +
>  	e = pipapo_get(net, set, m, (const u8 *)elem->key.val.data,
>  		       nft_genmask_cur(net), get_jiffies_64(),
>  		       GFP_ATOMIC);
> @@ -1259,6 +1262,23 @@ static bool nft_pipapo_transaction_mutex_held(const struct nft_set *set)
>  #endif
>  }
>  
> +static struct nft_pipapo_match *pipapo_clone(struct nft_pipapo_match *old);
> +
> +static struct nft_pipapo_match *pipapo_maybe_clone(const struct nft_set *set)

Nit:

/**
 * pipapo_maybe_clone() - Build clone for pending data changes, if not existing
 * @set:	nftables API set representation
 *
 * Return: newly created or existing clone, if any. NULL on allocation failure
 */

The rest of the series looks good (and like a big improvement) to me,
thanks! For all the patches minus 3/9,

Reviewed-by: Stefano Brivio <sbrivio@redhat.com>

-- 
Stefano


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH nf-next 3/9] netfilter: nft_set_pipapo: prepare destroy function for on-demand clone
  2024-04-08 15:45   ` Stefano Brivio
@ 2024-04-09 11:07     ` Florian Westphal
  2024-04-09 12:54       ` Stefano Brivio
  0 siblings, 1 reply; 16+ messages in thread
From: Florian Westphal @ 2024-04-09 11:07 UTC (permalink / raw
  To: Stefano Brivio; +Cc: Florian Westphal, netfilter-devel

Stefano Brivio <sbrivio@redhat.com> wrote:
> > The rcu_barrier() is removed, its not needed: old call_rcu instances
> > for pipapo_reclaim_match do not access struct nft_set.
> 
> True, pipapo_reclaim_match() won't, but nft_set_pipapo_match_destroy()
> will, right? That is:
> 
> > -	if (m) {
> > -		rcu_barrier();
> 
> ...before b0e256f3dd2b ("netfilter: nft_set_pipapo: release elements in
> clone only from destroy path"), this rcu_barrier() was needed because we'd
> call nft_set_pipapo_match_destroy() on 'm'.
> 
> That call is now gone, and we could have dropped it at that point, but:

I do not follow.  nft_pipapo_destroy() is not invoked asynchronously via
call_rcu, its invoked from either abort path or the gc work queue at at
time where there must be no references to nft_set anymore.

What do we wait for, i.e., which outstanding rcu callback could
reference a data structure that nft_pipapo_destroy() will free?

> > +	} else {
> > +		nft_set_pipapo_match_destroy(ctx, set, m);
> 
> now it's back, so we should actually move rcu_barrier() before this
> call? I think it's needed because nft_set_pipapo_match_destroy() does
> access nft_set data.

See above, what do we need to wait for?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH nf-next 3/9] netfilter: nft_set_pipapo: prepare destroy function for on-demand clone
  2024-04-09 11:07     ` Florian Westphal
@ 2024-04-09 12:54       ` Stefano Brivio
  2024-04-09 13:04         ` Florian Westphal
  0 siblings, 1 reply; 16+ messages in thread
From: Stefano Brivio @ 2024-04-09 12:54 UTC (permalink / raw
  To: Florian Westphal; +Cc: netfilter-devel

On Tue, 9 Apr 2024 13:07:04 +0200
Florian Westphal <fw@strlen.de> wrote:

> Stefano Brivio <sbrivio@redhat.com> wrote:
> > > The rcu_barrier() is removed, its not needed: old call_rcu instances
> > > for pipapo_reclaim_match do not access struct nft_set.  
> > 
> > True, pipapo_reclaim_match() won't, but nft_set_pipapo_match_destroy()
> > will, right? That is:
> >   
> > > -	if (m) {
> > > -		rcu_barrier();  
> > 
> > ...before b0e256f3dd2b ("netfilter: nft_set_pipapo: release elements in
> > clone only from destroy path"), this rcu_barrier() was needed because we'd
> > call nft_set_pipapo_match_destroy() on 'm'.
> > 
> > That call is now gone, and we could have dropped it at that point, but:  
> 
> I do not follow.  nft_pipapo_destroy() is not invoked asynchronously via
> call_rcu, its invoked from either abort path or the gc work queue at at
> time where there must be no references to nft_set anymore.

Hmm, sorry, I was all focused on nft_set_pipapo_match_destroy()
accessing nft_set, but that has nothing to do with
pipapo_reclaim_match(). However:

> What do we wait for, i.e., which outstanding rcu callback could
> reference a data structure that nft_pipapo_destroy() will free?

...we still have pipapo_free_match(), called by pipapo_reclaim_match(),
referencing the per-CPU scratch areas, and nft_pipapo_destroy() freeing
them (using pipapo_free_match() since this patch).

I guess *that* was the original reason why I added this rcu_barrier()
call here? Unless I'm missing something, nothing changed in this regard
since then.

Or is there another reason why pending call_rcu() callbacks can't touch
the same scratch areas now?

-- 
Stefano


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH nf-next 3/9] netfilter: nft_set_pipapo: prepare destroy function for on-demand clone
  2024-04-09 12:54       ` Stefano Brivio
@ 2024-04-09 13:04         ` Florian Westphal
  2024-04-09 13:10           ` Stefano Brivio
  0 siblings, 1 reply; 16+ messages in thread
From: Florian Westphal @ 2024-04-09 13:04 UTC (permalink / raw
  To: Stefano Brivio; +Cc: Florian Westphal, netfilter-devel

Stefano Brivio <sbrivio@redhat.com> wrote:
> > I do not follow.  nft_pipapo_destroy() is not invoked asynchronously via
> > call_rcu, its invoked from either abort path or the gc work queue at at
> > time where there must be no references to nft_set anymore.
> 
> Hmm, sorry, I was all focused on nft_set_pipapo_match_destroy()
> accessing nft_set, but that has nothing to do with
> pipapo_reclaim_match(). However:
> 
> > What do we wait for, i.e., which outstanding rcu callback could
> > reference a data structure that nft_pipapo_destroy() will free?
> 
> ...we still have pipapo_free_match(), called by pipapo_reclaim_match(),
> referencing the per-CPU scratch areas, and nft_pipapo_destroy() freeing
> them (using pipapo_free_match() since this patch).

But those scratchmaps are anchored in struct nft_pipapo_match.

So, if we have a call_rcu() for struct nft_pipapo_match $m, and then
get into nft_pipapo_destroy() where priv->match == $m or
priv->clone == $m we are already in trouble ($m is free'd twice).

If not, then I don't see why ordering would matter.

Can you sketch a race where pipapo_reclaim_match, running from a
(severely delayed) call_rcu, will access something that has been
released already?

I can't spot anything.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH nf-next 3/9] netfilter: nft_set_pipapo: prepare destroy function for on-demand clone
  2024-04-09 13:04         ` Florian Westphal
@ 2024-04-09 13:10           ` Stefano Brivio
  0 siblings, 0 replies; 16+ messages in thread
From: Stefano Brivio @ 2024-04-09 13:10 UTC (permalink / raw
  To: Florian Westphal; +Cc: netfilter-devel

On Tue, 9 Apr 2024 15:04:02 +0200
Florian Westphal <fw@strlen.de> wrote:

> Stefano Brivio <sbrivio@redhat.com> wrote:
> > > I do not follow.  nft_pipapo_destroy() is not invoked asynchronously via
> > > call_rcu, its invoked from either abort path or the gc work queue at at
> > > time where there must be no references to nft_set anymore.  
> > 
> > Hmm, sorry, I was all focused on nft_set_pipapo_match_destroy()
> > accessing nft_set, but that has nothing to do with
> > pipapo_reclaim_match(). However:
> >   
> > > What do we wait for, i.e., which outstanding rcu callback could
> > > reference a data structure that nft_pipapo_destroy() will free?  
> > 
> > ...we still have pipapo_free_match(), called by pipapo_reclaim_match(),
> > referencing the per-CPU scratch areas, and nft_pipapo_destroy() freeing
> > them (using pipapo_free_match() since this patch).  
> 
> But those scratchmaps are anchored in struct nft_pipapo_match.
> 
> So, if we have a call_rcu() for struct nft_pipapo_match $m, and then
> get into nft_pipapo_destroy() where priv->match == $m or
> priv->clone == $m we are already in trouble ($m is free'd twice).

Ah, sure, you're right, they can't be the same instance.

> If not, then I don't see why ordering would matter.
> 
> Can you sketch a race where pipapo_reclaim_match, running from a
> (severely delayed) call_rcu, will access something that has been
> released already?
> 
> I can't spot anything.

Me neither. Sorry for the noise. For the series,

Reviewed-by: Stefano Brivio <sbrivio@redhat.com>

-- 
Stefano


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2024-04-09 13:11 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-04-03  8:41 [PATCH nf-next 0/9] nft_set_pipapo: remove cannot-fail allocations on commit and abort Florian Westphal
2024-04-03  8:41 ` [PATCH nf-next 1/9] netfilter: nft_set_pipapo: move prove_locking helper around Florian Westphal
2024-04-03  8:41 ` [PATCH nf-next 2/9] netfilter: nft_set_pipapo: make pipapo_clone helper return NULL Florian Westphal
2024-04-03  8:41 ` [PATCH nf-next 3/9] netfilter: nft_set_pipapo: prepare destroy function for on-demand clone Florian Westphal
2024-04-08 15:45   ` Stefano Brivio
2024-04-09 11:07     ` Florian Westphal
2024-04-09 12:54       ` Stefano Brivio
2024-04-09 13:04         ` Florian Westphal
2024-04-09 13:10           ` Stefano Brivio
2024-04-03  8:41 ` [PATCH nf-next 4/9] netfilter: nft_set_pipapo: prepare walk " Florian Westphal
2024-04-03  8:41 ` [PATCH nf-next 5/9] netfilter: nf_tables: pass new nft_iter_type hint to walker Florian Westphal
2024-04-03  8:41 ` [PATCH nf-next 6/9] netfilter: nft_set_pipapo: merge deactivate helper into caller Florian Westphal
2024-04-03  8:41 ` [PATCH nf-next 7/9] netfilter: nft_set_pipapo: prepare pipapo_get helper for on-demand clone Florian Westphal
2024-04-03  8:41 ` [PATCH nf-next 8/9] netfilter: nft_set_pipapo: move cloning of match info to insert/removal path Florian Westphal
2024-04-08 15:45   ` Stefano Brivio
2024-04-03  8:41 ` [PATCH nf-next 9/9] netfilter: nft_set_pipapo: remove dirty flag Florian Westphal

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).