All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next v4 0/3] make skip_sw actually skip software
@ 2024-03-25 20:47 Asbjørn Sloth Tønnesen
  2024-03-25 20:47 ` [PATCH net-next v4 1/3] net: sched: cls_api: add skip_sw counter Asbjørn Sloth Tønnesen
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Asbjørn Sloth Tønnesen @ 2024-03-25 20:47 UTC (permalink / raw
  To: Jamal Hadi Salim, Cong Wang, Jiri Pirko
  Cc: Asbjørn Sloth Tønnesen, Daniel Borkmann,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Vlad Buslov, Marcelo Ricardo Leitner, netdev, linux-kernel, llu

Hi,

During development of flower-route[1], which I
recently presented at FOSDEM[2], I noticed that
CPU usage, would increase the more rules I installed
into the hardware for IP forwarding offloading.

Since we use TC flower offload for the hottest
prefixes, and leave the long tail to the normal (non-TC)
Linux network stack for slow-path IP forwarding.
We therefore need both the hardware and software
datapath to perform well.

I found that skip_sw rules, are quite expensive
in the kernel datapath, since they must be evaluated
and matched upon, before the kernel checks the
skip_sw flag.

This patchset optimizes the case where all rules
are skip_sw, by implementing a TC bypass for these
cases, where TC is only used as a control plane
for the hardware path.

v4:
- Rebased onto net-next, now that net-next is open again

v3: https://lore.kernel.org/netdev/20240306165813.656931-1-ast@fiberby.net/
- Patch 3:
  - Fix source_inline
  - Fix build failure, when CONFIG_NET_CLS without CONFIG_NET_CLS_ACT.

v2: https://lore.kernel.org/netdev/20240305144404.569632-1-ast@fiberby.net/
- Patch 1:
  - Add Reviewed-By from Jiri Pirko
- Patch 2:
  - Move code, to avoid forward declaration (Jiri).
- Patch 3
  - Refactor to use a static key.
  - Add performance data for trapping, or sending
    a packet to a non-existent chain (as suggested by Marcelo).

v1: https://lore.kernel.org/netdev/20240215160458.1727237-1-ast@fiberby.net/

[1] flower-route
    https://github.com/fiberby-dk/flower-route

[2] FOSDEM talk
    https://fosdem.org/2024/schedule/event/fosdem-2024-3337-flying-higher-hardware-offloading-with-bird/

Asbjørn Sloth Tønnesen (3):
  net: sched: cls_api: add skip_sw counter
  net: sched: cls_api: add filter counter
  net: sched: make skip_sw actually skip software

 include/net/pkt_cls.h     |  9 +++++++++
 include/net/sch_generic.h |  4 ++++
 net/core/dev.c            | 10 ++++++++++
 net/sched/cls_api.c       | 41 +++++++++++++++++++++++++++++++++++++++
 4 files changed, 64 insertions(+)

-- 
2.43.0


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH net-next v4 1/3] net: sched: cls_api: add skip_sw counter
  2024-03-25 20:47 [PATCH net-next v4 0/3] make skip_sw actually skip software Asbjørn Sloth Tønnesen
@ 2024-03-25 20:47 ` Asbjørn Sloth Tønnesen
  2024-03-27 13:51   ` Simon Horman
  2024-03-28  0:46   ` Marcelo Ricardo Leitner
  2024-03-25 20:47 ` [PATCH net-next v4 2/3] net: sched: cls_api: add filter counter Asbjørn Sloth Tønnesen
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 11+ messages in thread
From: Asbjørn Sloth Tønnesen @ 2024-03-25 20:47 UTC (permalink / raw
  To: Jamal Hadi Salim, Cong Wang, Jiri Pirko
  Cc: Asbjørn Sloth Tønnesen, Daniel Borkmann,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Vlad Buslov, Marcelo Ricardo Leitner, netdev, linux-kernel, llu,
	Jiri Pirko

Maintain a count of skip_sw filters.

This counter is protected by the cb_lock, and is updated
at the same time as offloadcnt.

Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
---
 include/net/sch_generic.h | 1 +
 net/sched/cls_api.c       | 4 ++++
 2 files changed, 5 insertions(+)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index cefe0c4bdae3..120a4ca6ec9b 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -471,6 +471,7 @@ struct tcf_block {
 	struct flow_block flow_block;
 	struct list_head owner_list;
 	bool keep_dst;
+	atomic_t skipswcnt; /* Number of skip_sw filters */
 	atomic_t offloadcnt; /* Number of oddloaded filters */
 	unsigned int nooffloaddevcnt; /* Number of devs unable to do offload */
 	unsigned int lockeddevcnt; /* Number of devs that require rtnl lock. */
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index ca5676b2668e..397c3d29659c 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -3483,6 +3483,8 @@ static void tcf_block_offload_inc(struct tcf_block *block, u32 *flags)
 	if (*flags & TCA_CLS_FLAGS_IN_HW)
 		return;
 	*flags |= TCA_CLS_FLAGS_IN_HW;
+	if (tc_skip_sw(*flags))
+		atomic_inc(&block->skipswcnt);
 	atomic_inc(&block->offloadcnt);
 }
 
@@ -3491,6 +3493,8 @@ static void tcf_block_offload_dec(struct tcf_block *block, u32 *flags)
 	if (!(*flags & TCA_CLS_FLAGS_IN_HW))
 		return;
 	*flags &= ~TCA_CLS_FLAGS_IN_HW;
+	if (tc_skip_sw(*flags))
+		atomic_dec(&block->skipswcnt);
 	atomic_dec(&block->offloadcnt);
 }
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH net-next v4 2/3] net: sched: cls_api: add filter counter
  2024-03-25 20:47 [PATCH net-next v4 0/3] make skip_sw actually skip software Asbjørn Sloth Tønnesen
  2024-03-25 20:47 ` [PATCH net-next v4 1/3] net: sched: cls_api: add skip_sw counter Asbjørn Sloth Tønnesen
@ 2024-03-25 20:47 ` Asbjørn Sloth Tønnesen
  2024-03-27 13:51   ` Simon Horman
  2024-03-28  0:46   ` Marcelo Ricardo Leitner
  2024-03-25 20:47 ` [PATCH net-next v4 3/3] net: sched: make skip_sw actually skip software Asbjørn Sloth Tønnesen
  2024-03-29  9:50 ` [PATCH net-next v4 0/3] " patchwork-bot+netdevbpf
  3 siblings, 2 replies; 11+ messages in thread
From: Asbjørn Sloth Tønnesen @ 2024-03-25 20:47 UTC (permalink / raw
  To: Jamal Hadi Salim, Cong Wang, Jiri Pirko
  Cc: Asbjørn Sloth Tønnesen, Daniel Borkmann,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Vlad Buslov, Marcelo Ricardo Leitner, netdev, linux-kernel, llu

Maintain a count of filters per block.

Counter updates are protected by cb_lock, which is
also used to protect the offload counters.

Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
---
 include/net/sch_generic.h |  2 ++
 net/sched/cls_api.c       | 19 +++++++++++++++++++
 2 files changed, 21 insertions(+)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 120a4ca6ec9b..eb3872c22fcd 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -422,6 +422,7 @@ struct tcf_proto {
 	 */
 	spinlock_t		lock;
 	bool			deleting;
+	bool			counted;
 	refcount_t		refcnt;
 	struct rcu_head		rcu;
 	struct hlist_node	destroy_ht_node;
@@ -471,6 +472,7 @@ struct tcf_block {
 	struct flow_block flow_block;
 	struct list_head owner_list;
 	bool keep_dst;
+	atomic_t filtercnt; /* Number of filters */
 	atomic_t skipswcnt; /* Number of skip_sw filters */
 	atomic_t offloadcnt; /* Number of oddloaded filters */
 	unsigned int nooffloaddevcnt; /* Number of devs unable to do offload */
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 397c3d29659c..304a46ab0e0b 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -410,12 +410,30 @@ static void tcf_proto_get(struct tcf_proto *tp)
 	refcount_inc(&tp->refcnt);
 }
 
+static void tcf_block_filter_cnt_update(struct tcf_block *block, bool *counted, bool add)
+{
+	lockdep_assert_not_held(&block->cb_lock);
+
+	down_write(&block->cb_lock);
+	if (*counted != add) {
+		if (add) {
+			atomic_inc(&block->filtercnt);
+			*counted = true;
+		} else {
+			atomic_dec(&block->filtercnt);
+			*counted = false;
+		}
+	}
+	up_write(&block->cb_lock);
+}
+
 static void tcf_chain_put(struct tcf_chain *chain);
 
 static void tcf_proto_destroy(struct tcf_proto *tp, bool rtnl_held,
 			      bool sig_destroy, struct netlink_ext_ack *extack)
 {
 	tp->ops->destroy(tp, rtnl_held, extack);
+	tcf_block_filter_cnt_update(tp->chain->block, &tp->counted, false);
 	if (sig_destroy)
 		tcf_proto_signal_destroyed(tp->chain, tp);
 	tcf_chain_put(tp->chain);
@@ -2364,6 +2382,7 @@ static int tc_new_tfilter(struct sk_buff *skb, struct nlmsghdr *n,
 	err = tp->ops->change(net, skb, tp, cl, t->tcm_handle, tca, &fh,
 			      flags, extack);
 	if (err == 0) {
+		tcf_block_filter_cnt_update(block, &tp->counted, true);
 		tfilter_notify(net, skb, n, tp, block, q, parent, fh,
 			       RTM_NEWTFILTER, false, rtnl_held, extack);
 		tfilter_put(tp, fh);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH net-next v4 3/3] net: sched: make skip_sw actually skip software
  2024-03-25 20:47 [PATCH net-next v4 0/3] make skip_sw actually skip software Asbjørn Sloth Tønnesen
  2024-03-25 20:47 ` [PATCH net-next v4 1/3] net: sched: cls_api: add skip_sw counter Asbjørn Sloth Tønnesen
  2024-03-25 20:47 ` [PATCH net-next v4 2/3] net: sched: cls_api: add filter counter Asbjørn Sloth Tønnesen
@ 2024-03-25 20:47 ` Asbjørn Sloth Tønnesen
  2024-03-27 13:52   ` Simon Horman
  2024-03-28  0:46   ` Marcelo Ricardo Leitner
  2024-03-29  9:50 ` [PATCH net-next v4 0/3] " patchwork-bot+netdevbpf
  3 siblings, 2 replies; 11+ messages in thread
From: Asbjørn Sloth Tønnesen @ 2024-03-25 20:47 UTC (permalink / raw
  To: Jamal Hadi Salim, Cong Wang, Jiri Pirko
  Cc: Asbjørn Sloth Tønnesen, Daniel Borkmann,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Vlad Buslov, Marcelo Ricardo Leitner, netdev, linux-kernel, llu

TC filters come in 3 variants:
- no flag (try to process in hardware, but fallback to software))
- skip_hw (do not process filter by hardware)
- skip_sw (do not process filter by software)

However skip_sw is implemented so that the skip_sw
flag can first be checked, after it has been matched.

IMHO it's common when using skip_sw, to use it on all rules.

So if all filters in a block is skip_sw filters, then
we can bail early, we can thus avoid having to match
the filters, just to check for the skip_sw flag.

This patch adds a bypass, for when only TC skip_sw rules
are used. The bypass is guarded by a static key, to avoid
harming other workloads.

There are 3 ways that a packet from a skip_sw ruleset, can
end up in the kernel path. Although the send packets to a
non-existent chain way is only improved a few percents, then
I believe it's worth optimizing the trap and fall-though
use-cases.

 +----------------------------+--------+--------+--------+
 | Test description           | Pre-   | Post-  | Rel.   |
 |                            | kpps   | kpps   | chg.   |
 +----------------------------+--------+--------+--------+
 | basic forwarding + notrack | 3589.3 | 3587.9 |  1.00x |
 | switch to eswitch mode     | 3081.8 | 3094.7 |  1.00x |
 | add ingress qdisc          | 3042.9 | 3063.6 |  1.01x |
 | tc forward in hw / skip_sw |37024.7 |37028.4 |  1.00x |
 | tc forward in sw / skip_hw | 3245.0 | 3245.3 |  1.00x |
 +----------------------------+--------+--------+--------+
 | tests with only skip_sw rules below:                  |
 +----------------------------+--------+--------+--------+
 | 1 non-matching rule        | 2694.7 | 3058.7 |  1.14x |
 | 1 n-m rule, match trap     | 2611.2 | 3323.1 |  1.27x |
 | 1 n-m rule, goto non-chain | 2886.8 | 2945.9 |  1.02x |
 | 5 non-matching rules       | 1958.2 | 3061.3 |  1.56x |
 | 5 n-m rules, match trap    | 1911.9 | 3327.0 |  1.74x |
 | 5 n-m rules, goto non-chain| 2883.1 | 2947.5 |  1.02x |
 | 10 non-matching rules      | 1466.3 | 3062.8 |  2.09x |
 | 10 n-m rules, match trap   | 1444.3 | 3317.9 |  2.30x |
 | 10 n-m rules,goto non-chain| 2883.1 | 2939.5 |  1.02x |
 | 25 non-matching rules      |  838.5 | 3058.9 |  3.65x |
 | 25 n-m rules, match trap   |  824.5 | 3323.0 |  4.03x |
 | 25 n-m rules,goto non-chain| 2875.8 | 2944.7 |  1.02x |
 | 50 non-matching rules      |  488.1 | 3054.7 |  6.26x |
 | 50 n-m rules, match trap   |  484.9 | 3318.5 |  6.84x |
 | 50 n-m rules,goto non-chain| 2884.1 | 2939.7 |  1.02x |
 +----------------------------+--------+--------+--------+

perf top (25 n-m skip_sw rules - pre patch):
  20.39%  [kernel]  [k] __skb_flow_dissect
  16.43%  [kernel]  [k] rhashtable_jhash2
  10.58%  [kernel]  [k] fl_classify
  10.23%  [kernel]  [k] fl_mask_lookup
   4.79%  [kernel]  [k] memset_orig
   2.58%  [kernel]  [k] tcf_classify
   1.47%  [kernel]  [k] __x86_indirect_thunk_rax
   1.42%  [kernel]  [k] __dev_queue_xmit
   1.36%  [kernel]  [k] nft_do_chain
   1.21%  [kernel]  [k] __rcu_read_lock

perf top (25 n-m skip_sw rules - post patch):
   5.12%  [kernel]  [k] __dev_queue_xmit
   4.77%  [kernel]  [k] nft_do_chain
   3.65%  [kernel]  [k] dev_gro_receive
   3.41%  [kernel]  [k] check_preemption_disabled
   3.14%  [kernel]  [k] mlx5e_skb_from_cqe_mpwrq_nonlinear
   2.88%  [kernel]  [k] __netif_receive_skb_core.constprop.0
   2.49%  [kernel]  [k] mlx5e_xmit
   2.15%  [kernel]  [k] ip_forward
   1.95%  [kernel]  [k] mlx5e_tc_restore_tunnel
   1.92%  [kernel]  [k] vlan_gro_receive

Test setup:
 DUT: Intel Xeon D-1518 (2.20GHz) w/ Nvidia/Mellanox ConnectX-6 Dx 2x100G
 Data rate measured on switch (Extreme X690), and DUT connected as
 a router on a stick, with pktgen and pktsink as VLANs.
 Pktgen-dpdk was in range 36.6-37.7 Mpps 64B packets across all tests.
 Full test data at https://files.fiberby.net/ast/2024/tc_skip_sw/v2_tests/

Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
---
 include/net/pkt_cls.h     |  9 +++++++++
 include/net/sch_generic.h |  1 +
 net/core/dev.c            | 10 ++++++++++
 net/sched/cls_api.c       | 18 ++++++++++++++++++
 4 files changed, 38 insertions(+)

diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index a4ee43f493bb..41297bd38dff 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -74,6 +74,15 @@ static inline bool tcf_block_non_null_shared(struct tcf_block *block)
 	return block && block->index;
 }
 
+#ifdef CONFIG_NET_CLS_ACT
+DECLARE_STATIC_KEY_FALSE(tcf_bypass_check_needed_key);
+
+static inline bool tcf_block_bypass_sw(struct tcf_block *block)
+{
+	return block && block->bypass_wanted;
+}
+#endif
+
 static inline struct Qdisc *tcf_block_q(struct tcf_block *block)
 {
 	WARN_ON(tcf_block_shared(block));
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index eb3872c22fcd..76db6be16083 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -472,6 +472,7 @@ struct tcf_block {
 	struct flow_block flow_block;
 	struct list_head owner_list;
 	bool keep_dst;
+	bool bypass_wanted;
 	atomic_t filtercnt; /* Number of filters */
 	atomic_t skipswcnt; /* Number of skip_sw filters */
 	atomic_t offloadcnt; /* Number of oddloaded filters */
diff --git a/net/core/dev.c b/net/core/dev.c
index 9a67003e49db..53f36991ea8e 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2057,6 +2057,11 @@ void net_dec_egress_queue(void)
 EXPORT_SYMBOL_GPL(net_dec_egress_queue);
 #endif
 
+#ifdef CONFIG_NET_CLS_ACT
+DEFINE_STATIC_KEY_FALSE(tcf_bypass_check_needed_key);
+EXPORT_SYMBOL(tcf_bypass_check_needed_key);
+#endif
+
 DEFINE_STATIC_KEY_FALSE(netstamp_needed_key);
 EXPORT_SYMBOL(netstamp_needed_key);
 #ifdef CONFIG_JUMP_LABEL
@@ -3911,6 +3916,11 @@ static int tc_run(struct tcx_entry *entry, struct sk_buff *skb,
 	if (!miniq)
 		return ret;
 
+	if (static_branch_unlikely(&tcf_bypass_check_needed_key)) {
+		if (tcf_block_bypass_sw(miniq->block))
+			return ret;
+	}
+
 	tc_skb_cb(skb)->mru = 0;
 	tc_skb_cb(skb)->post_ct = false;
 	tcf_set_drop_reason(skb, *drop_reason);
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 304a46ab0e0b..db0653993632 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -410,6 +410,23 @@ static void tcf_proto_get(struct tcf_proto *tp)
 	refcount_inc(&tp->refcnt);
 }
 
+static void tcf_maintain_bypass(struct tcf_block *block)
+{
+	int filtercnt = atomic_read(&block->filtercnt);
+	int skipswcnt = atomic_read(&block->skipswcnt);
+	bool bypass_wanted = filtercnt > 0 && filtercnt == skipswcnt;
+
+	if (bypass_wanted != block->bypass_wanted) {
+#ifdef CONFIG_NET_CLS_ACT
+		if (bypass_wanted)
+			static_branch_inc(&tcf_bypass_check_needed_key);
+		else
+			static_branch_dec(&tcf_bypass_check_needed_key);
+#endif
+		block->bypass_wanted = bypass_wanted;
+	}
+}
+
 static void tcf_block_filter_cnt_update(struct tcf_block *block, bool *counted, bool add)
 {
 	lockdep_assert_not_held(&block->cb_lock);
@@ -424,6 +441,7 @@ static void tcf_block_filter_cnt_update(struct tcf_block *block, bool *counted,
 			*counted = false;
 		}
 	}
+	tcf_maintain_bypass(block);
 	up_write(&block->cb_lock);
 }
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH net-next v4 1/3] net: sched: cls_api: add skip_sw counter
  2024-03-25 20:47 ` [PATCH net-next v4 1/3] net: sched: cls_api: add skip_sw counter Asbjørn Sloth Tønnesen
@ 2024-03-27 13:51   ` Simon Horman
  2024-03-28  0:46   ` Marcelo Ricardo Leitner
  1 sibling, 0 replies; 11+ messages in thread
From: Simon Horman @ 2024-03-27 13:51 UTC (permalink / raw
  To: Asbjørn Sloth Tønnesen
  Cc: Jamal Hadi Salim, Cong Wang, Jiri Pirko, Daniel Borkmann,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Vlad Buslov, Marcelo Ricardo Leitner, netdev, linux-kernel, llu,
	Jiri Pirko

On Mon, Mar 25, 2024 at 08:47:34PM +0000, Asbjørn Sloth Tønnesen wrote:
> Maintain a count of skip_sw filters.
> 
> This counter is protected by the cb_lock, and is updated
> at the same time as offloadcnt.
> 
> Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
> Reviewed-by: Jiri Pirko <jiri@nvidia.com>

Reviewed-by: Simon Horman <horms@kernel.org>


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net-next v4 2/3] net: sched: cls_api: add filter counter
  2024-03-25 20:47 ` [PATCH net-next v4 2/3] net: sched: cls_api: add filter counter Asbjørn Sloth Tønnesen
@ 2024-03-27 13:51   ` Simon Horman
  2024-03-28  0:46   ` Marcelo Ricardo Leitner
  1 sibling, 0 replies; 11+ messages in thread
From: Simon Horman @ 2024-03-27 13:51 UTC (permalink / raw
  To: Asbjørn Sloth Tønnesen
  Cc: Jamal Hadi Salim, Cong Wang, Jiri Pirko, Daniel Borkmann,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Vlad Buslov, Marcelo Ricardo Leitner, netdev, linux-kernel, llu

On Mon, Mar 25, 2024 at 08:47:35PM +0000, Asbjørn Sloth Tønnesen wrote:
> Maintain a count of filters per block.
> 
> Counter updates are protected by cb_lock, which is
> also used to protect the offload counters.
> 
> Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>

Reviewed-by: Simon Horman <horms@kernel.org>


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net-next v4 3/3] net: sched: make skip_sw actually skip software
  2024-03-25 20:47 ` [PATCH net-next v4 3/3] net: sched: make skip_sw actually skip software Asbjørn Sloth Tønnesen
@ 2024-03-27 13:52   ` Simon Horman
  2024-03-28  0:46   ` Marcelo Ricardo Leitner
  1 sibling, 0 replies; 11+ messages in thread
From: Simon Horman @ 2024-03-27 13:52 UTC (permalink / raw
  To: Asbjørn Sloth Tønnesen
  Cc: Jamal Hadi Salim, Cong Wang, Jiri Pirko, Daniel Borkmann,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Vlad Buslov, Marcelo Ricardo Leitner, netdev, linux-kernel, llu

On Mon, Mar 25, 2024 at 08:47:36PM +0000, Asbjørn Sloth Tønnesen wrote:
> TC filters come in 3 variants:
> - no flag (try to process in hardware, but fallback to software))
> - skip_hw (do not process filter by hardware)
> - skip_sw (do not process filter by software)
> 
> However skip_sw is implemented so that the skip_sw
> flag can first be checked, after it has been matched.
> 
> IMHO it's common when using skip_sw, to use it on all rules.
> 
> So if all filters in a block is skip_sw filters, then
> we can bail early, we can thus avoid having to match
> the filters, just to check for the skip_sw flag.
> 
> This patch adds a bypass, for when only TC skip_sw rules
> are used. The bypass is guarded by a static key, to avoid
> harming other workloads.
> 
> There are 3 ways that a packet from a skip_sw ruleset, can
> end up in the kernel path. Although the send packets to a
> non-existent chain way is only improved a few percents, then
> I believe it's worth optimizing the trap and fall-though
> use-cases.
> 
>  +----------------------------+--------+--------+--------+
>  | Test description           | Pre-   | Post-  | Rel.   |
>  |                            | kpps   | kpps   | chg.   |
>  +----------------------------+--------+--------+--------+
>  | basic forwarding + notrack | 3589.3 | 3587.9 |  1.00x |
>  | switch to eswitch mode     | 3081.8 | 3094.7 |  1.00x |
>  | add ingress qdisc          | 3042.9 | 3063.6 |  1.01x |
>  | tc forward in hw / skip_sw |37024.7 |37028.4 |  1.00x |
>  | tc forward in sw / skip_hw | 3245.0 | 3245.3 |  1.00x |
>  +----------------------------+--------+--------+--------+
>  | tests with only skip_sw rules below:                  |
>  +----------------------------+--------+--------+--------+
>  | 1 non-matching rule        | 2694.7 | 3058.7 |  1.14x |
>  | 1 n-m rule, match trap     | 2611.2 | 3323.1 |  1.27x |
>  | 1 n-m rule, goto non-chain | 2886.8 | 2945.9 |  1.02x |
>  | 5 non-matching rules       | 1958.2 | 3061.3 |  1.56x |
>  | 5 n-m rules, match trap    | 1911.9 | 3327.0 |  1.74x |
>  | 5 n-m rules, goto non-chain| 2883.1 | 2947.5 |  1.02x |
>  | 10 non-matching rules      | 1466.3 | 3062.8 |  2.09x |
>  | 10 n-m rules, match trap   | 1444.3 | 3317.9 |  2.30x |
>  | 10 n-m rules,goto non-chain| 2883.1 | 2939.5 |  1.02x |
>  | 25 non-matching rules      |  838.5 | 3058.9 |  3.65x |
>  | 25 n-m rules, match trap   |  824.5 | 3323.0 |  4.03x |
>  | 25 n-m rules,goto non-chain| 2875.8 | 2944.7 |  1.02x |
>  | 50 non-matching rules      |  488.1 | 3054.7 |  6.26x |
>  | 50 n-m rules, match trap   |  484.9 | 3318.5 |  6.84x |
>  | 50 n-m rules,goto non-chain| 2884.1 | 2939.7 |  1.02x |
>  +----------------------------+--------+--------+--------+
> 
> perf top (25 n-m skip_sw rules - pre patch):
>   20.39%  [kernel]  [k] __skb_flow_dissect
>   16.43%  [kernel]  [k] rhashtable_jhash2
>   10.58%  [kernel]  [k] fl_classify
>   10.23%  [kernel]  [k] fl_mask_lookup
>    4.79%  [kernel]  [k] memset_orig
>    2.58%  [kernel]  [k] tcf_classify
>    1.47%  [kernel]  [k] __x86_indirect_thunk_rax
>    1.42%  [kernel]  [k] __dev_queue_xmit
>    1.36%  [kernel]  [k] nft_do_chain
>    1.21%  [kernel]  [k] __rcu_read_lock
> 
> perf top (25 n-m skip_sw rules - post patch):
>    5.12%  [kernel]  [k] __dev_queue_xmit
>    4.77%  [kernel]  [k] nft_do_chain
>    3.65%  [kernel]  [k] dev_gro_receive
>    3.41%  [kernel]  [k] check_preemption_disabled
>    3.14%  [kernel]  [k] mlx5e_skb_from_cqe_mpwrq_nonlinear
>    2.88%  [kernel]  [k] __netif_receive_skb_core.constprop.0
>    2.49%  [kernel]  [k] mlx5e_xmit
>    2.15%  [kernel]  [k] ip_forward
>    1.95%  [kernel]  [k] mlx5e_tc_restore_tunnel
>    1.92%  [kernel]  [k] vlan_gro_receive
> 
> Test setup:
>  DUT: Intel Xeon D-1518 (2.20GHz) w/ Nvidia/Mellanox ConnectX-6 Dx 2x100G
>  Data rate measured on switch (Extreme X690), and DUT connected as
>  a router on a stick, with pktgen and pktsink as VLANs.
>  Pktgen-dpdk was in range 36.6-37.7 Mpps 64B packets across all tests.
>  Full test data at https://files.fiberby.net/ast/2024/tc_skip_sw/v2_tests/
> 
> Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>

Reviewed-by: Simon Horman <horms@kernel.org>


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net-next v4 3/3] net: sched: make skip_sw actually skip software
  2024-03-25 20:47 ` [PATCH net-next v4 3/3] net: sched: make skip_sw actually skip software Asbjørn Sloth Tønnesen
  2024-03-27 13:52   ` Simon Horman
@ 2024-03-28  0:46   ` Marcelo Ricardo Leitner
  1 sibling, 0 replies; 11+ messages in thread
From: Marcelo Ricardo Leitner @ 2024-03-28  0:46 UTC (permalink / raw
  To: Asbjørn Sloth Tønnesen
  Cc: Jamal Hadi Salim, Cong Wang, Jiri Pirko, Daniel Borkmann,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Vlad Buslov, netdev, linux-kernel, llu

On Mon, Mar 25, 2024 at 08:47:36PM +0000, Asbjørn Sloth Tønnesen wrote:
...
>  +----------------------------+--------+--------+--------+
>  | tests with only skip_sw rules below:                  |
>  +----------------------------+--------+--------+--------+
>  | 1 non-matching rule        | 2694.7 | 3058.7 |  1.14x |
>  | 1 n-m rule, match trap     | 2611.2 | 3323.1 |  1.27x |
>  | 1 n-m rule, goto non-chain | 2886.8 | 2945.9 |  1.02x |
>  | 5 non-matching rules       | 1958.2 | 3061.3 |  1.56x |
>  | 5 n-m rules, match trap    | 1911.9 | 3327.0 |  1.74x |
>  | 5 n-m rules, goto non-chain| 2883.1 | 2947.5 |  1.02x |
>  | 10 non-matching rules      | 1466.3 | 3062.8 |  2.09x |
>  | 10 n-m rules, match trap   | 1444.3 | 3317.9 |  2.30x |
>  | 10 n-m rules,goto non-chain| 2883.1 | 2939.5 |  1.02x |
>  | 25 non-matching rules      |  838.5 | 3058.9 |  3.65x |
>  | 25 n-m rules, match trap   |  824.5 | 3323.0 |  4.03x |
>  | 25 n-m rules,goto non-chain| 2875.8 | 2944.7 |  1.02x |
>  | 50 non-matching rules      |  488.1 | 3054.7 |  6.26x |
                                            [A]

>  | 50 n-m rules, match trap   |  484.9 | 3318.5 |  6.84x |

Interesting. I can't explain why it consistently got 10% better than
[A] after the patch. If you check tcf_classify(), even though it
resumes to action, it still searches for the right chain. Maybe
something works differently in the driver.

In on the logs,
https://files.fiberby.net/ast/2024/tc_skip_sw/v2_tests/test_runs/netnext/tests/non_matching_and_trap_007/tc.txt

filter protocol 802.1Q pref 8 flower chain 0
filter protocol 802.1Q pref 8 flower chain 0 handle 0x1
  vlan_ethtype ip
  eth_type ipv4
  dst_ip 10.53.22.3
  skip_sw
  in_hw in_hw_count 1
	action order 1: gact action trap
	 random type none pass val 0
	 index 8 ref 1 bind 1 installed 20 sec used 0 sec
	Action statistics:
	Sent 29894330340 bytes 439622505 pkt (dropped 0, overlimits 0 requeues 0)
	Sent software 0 bytes 0 pkt
	Sent hardware 29894330340 bytes 439622505 pkt
	backlog 0b 0p requeues 0
	used_hw_stats delayed

It matched nicely.

>  | 50 n-m rules,goto non-chain| 2884.1 | 2939.7 |  1.02x |
                                   [B]

If we compare [A] and [B], there's still a 5.9% increase, plus
not requiring somewhat hacky rules.

Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net-next v4 1/3] net: sched: cls_api: add skip_sw counter
  2024-03-25 20:47 ` [PATCH net-next v4 1/3] net: sched: cls_api: add skip_sw counter Asbjørn Sloth Tønnesen
  2024-03-27 13:51   ` Simon Horman
@ 2024-03-28  0:46   ` Marcelo Ricardo Leitner
  1 sibling, 0 replies; 11+ messages in thread
From: Marcelo Ricardo Leitner @ 2024-03-28  0:46 UTC (permalink / raw
  To: Asbjørn Sloth Tønnesen
  Cc: Jamal Hadi Salim, Cong Wang, Jiri Pirko, Daniel Borkmann,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Vlad Buslov, netdev, linux-kernel, llu, Jiri Pirko

On Mon, Mar 25, 2024 at 08:47:34PM +0000, Asbjørn Sloth Tønnesen wrote:
> Maintain a count of skip_sw filters.
>
> This counter is protected by the cb_lock, and is updated
> at the same time as offloadcnt.
>
> Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
> Reviewed-by: Jiri Pirko <jiri@nvidia.com>

Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net-next v4 2/3] net: sched: cls_api: add filter counter
  2024-03-25 20:47 ` [PATCH net-next v4 2/3] net: sched: cls_api: add filter counter Asbjørn Sloth Tønnesen
  2024-03-27 13:51   ` Simon Horman
@ 2024-03-28  0:46   ` Marcelo Ricardo Leitner
  1 sibling, 0 replies; 11+ messages in thread
From: Marcelo Ricardo Leitner @ 2024-03-28  0:46 UTC (permalink / raw
  To: Asbjørn Sloth Tønnesen
  Cc: Jamal Hadi Salim, Cong Wang, Jiri Pirko, Daniel Borkmann,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Vlad Buslov, netdev, linux-kernel, llu

On Mon, Mar 25, 2024 at 08:47:35PM +0000, Asbjørn Sloth Tønnesen wrote:
> Maintain a count of filters per block.
>
> Counter updates are protected by cb_lock, which is
> also used to protect the offload counters.
>
> Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>

Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net-next v4 0/3] make skip_sw actually skip software
  2024-03-25 20:47 [PATCH net-next v4 0/3] make skip_sw actually skip software Asbjørn Sloth Tønnesen
                   ` (2 preceding siblings ...)
  2024-03-25 20:47 ` [PATCH net-next v4 3/3] net: sched: make skip_sw actually skip software Asbjørn Sloth Tønnesen
@ 2024-03-29  9:50 ` patchwork-bot+netdevbpf
  3 siblings, 0 replies; 11+ messages in thread
From: patchwork-bot+netdevbpf @ 2024-03-29  9:50 UTC (permalink / raw
  To: =?utf-8?b?QXNiasO4cm4gU2xvdGggVMO4bm5lc2VuIDxhc3RAZmliZXJieS5uZXQ+?=
  Cc: jhs, xiyou.wangcong, jiri, daniel, davem, edumazet, kuba, pabeni,
	vladbu, mleitner, netdev, linux-kernel, llu

Hello:

This series was applied to netdev/net-next.git (main)
by David S. Miller <davem@davemloft.net>:

On Mon, 25 Mar 2024 20:47:33 +0000 you wrote:
> Hi,
> 
> During development of flower-route[1], which I
> recently presented at FOSDEM[2], I noticed that
> CPU usage, would increase the more rules I installed
> into the hardware for IP forwarding offloading.
> 
> [...]

Here is the summary with links:
  - [net-next,v4,1/3] net: sched: cls_api: add skip_sw counter
    https://git.kernel.org/netdev/net-next/c/f631ef39d819
  - [net-next,v4,2/3] net: sched: cls_api: add filter counter
    https://git.kernel.org/netdev/net-next/c/2081fd3445fe
  - [net-next,v4,3/3] net: sched: make skip_sw actually skip software
    https://git.kernel.org/netdev/net-next/c/047f340b36fc

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2024-03-29  9:50 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-03-25 20:47 [PATCH net-next v4 0/3] make skip_sw actually skip software Asbjørn Sloth Tønnesen
2024-03-25 20:47 ` [PATCH net-next v4 1/3] net: sched: cls_api: add skip_sw counter Asbjørn Sloth Tønnesen
2024-03-27 13:51   ` Simon Horman
2024-03-28  0:46   ` Marcelo Ricardo Leitner
2024-03-25 20:47 ` [PATCH net-next v4 2/3] net: sched: cls_api: add filter counter Asbjørn Sloth Tønnesen
2024-03-27 13:51   ` Simon Horman
2024-03-28  0:46   ` Marcelo Ricardo Leitner
2024-03-25 20:47 ` [PATCH net-next v4 3/3] net: sched: make skip_sw actually skip software Asbjørn Sloth Tønnesen
2024-03-27 13:52   ` Simon Horman
2024-03-28  0:46   ` Marcelo Ricardo Leitner
2024-03-29  9:50 ` [PATCH net-next v4 0/3] " patchwork-bot+netdevbpf

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.