LKML Archive mirror
 help / color / mirror / Atom feed
* [PATCH net v2 0/3] net: gro: add flush/flush_id checks and fix wrong offset in udp
@ 2024-04-19 15:35 Richard Gobert
  2024-04-19 15:35 ` [PATCH net v2 1/3] net: gro: add {inner_}network_offset to napi_gro_cb Richard Gobert
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Richard Gobert @ 2024-04-19 15:35 UTC (permalink / raw
  To: davem, edumazet, kuba, pabeni, dsahern, willemdebruijn.kernel,
	alexander.duyck, aleksander.lobakin, netdev, linux-kernel
  Cc: Richard Gobert

This series fixes a bug in the complete phase of UDP in GRO, in which
socket lookup fails due to using network_header when parsing encapsulated
packets. The fix is to add network_offset and inner_network_offset to
napi_gro_cb and use these offsets for socket lookup.

In addition p->flush/flush_id should be checked in all UDP flows. The
same logic from tcp_gro_receive is applied for all the flows in
udp_gro_receive_segment. This prevents packets with mismatching network
headers (flush/flush_id turned on) from merging in UDP GRO.

The original series includes a change to vxlan test which adds the local
parameter to prevent similar future bugs. I plan to submit it separately to
net-next.

This series is part of a previously submitted series to net-next:
https://lore.kernel.org/all/20240408141720.98832-1-richardbgobert@gmail.com/

v1 -> v2:
 - Use network_offsets instead of p_poff param as suggested by Willem
 - Check flush before postpull, and for all UDP GRO flows (previously
   reviewed by Willem, and was changed since)
 - v1:
 https://lore.kernel.org/netdev/20240412152120.115067-1-richardbgobert@gmail.com/

Richard Gobert (3):
  net: gro: add {inner_}network_offset to napi_gro_cb
  net: gro: fix udp bad offset in socket lookup
  net: gro: add flush check in udp_gro_receive_segment

 drivers/net/geneve.c           |  1 +
 drivers/net/vxlan/vxlan_core.c |  1 +
 include/net/gro.h              | 18 ++++++++++++++++--
 net/8021q/vlan_core.c          |  2 ++
 net/core/gro.c                 |  1 +
 net/ethernet/eth.c             |  1 +
 net/ipv4/af_inet.c             |  5 +----
 net/ipv4/gre_offload.c         |  1 +
 net/ipv4/udp.c                 |  3 ++-
 net/ipv4/udp_offload.c         | 15 +++++++++++++--
 net/ipv6/ip6_offload.c         |  8 ++++----
 net/ipv6/udp.c                 |  3 ++-
 net/ipv6/udp_offload.c         |  3 ++-
 13 files changed, 47 insertions(+), 15 deletions(-)

-- 
2.36.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH net v2 1/3] net: gro: add {inner_}network_offset to napi_gro_cb
  2024-04-19 15:35 [PATCH net v2 0/3] net: gro: add flush/flush_id checks and fix wrong offset in udp Richard Gobert
@ 2024-04-19 15:35 ` Richard Gobert
  2024-04-19 18:51   ` Willem de Bruijn
  2024-04-19 15:35 ` [PATCH net v2 2/3] net: gro: fix udp bad offset in socket lookup Richard Gobert
  2024-04-19 15:35 ` [PATCH net v2 3/3] net: gro: add flush check in udp_gro_receive_segment Richard Gobert
  2 siblings, 1 reply; 9+ messages in thread
From: Richard Gobert @ 2024-04-19 15:35 UTC (permalink / raw
  To: davem, edumazet, kuba, pabeni, dsahern, willemdebruijn.kernel,
	alexander.duyck, aleksander.lobakin, netdev, linux-kernel
  Cc: Richard Gobert

This patch adds network_offset and inner_network_offset to napi_gro_cb, and
makes sure both are set correctly. In the common path there's only one
write (skb_gro_reset_offset, which replaces skb_set_network_header).

Signed-off-by: Richard Gobert <richardbgobert@gmail.com>
---
 drivers/net/geneve.c           |  1 +
 drivers/net/vxlan/vxlan_core.c |  1 +
 include/net/gro.h              | 18 ++++++++++++++++--
 net/8021q/vlan_core.c          |  2 ++
 net/core/gro.c                 |  1 +
 net/ethernet/eth.c             |  1 +
 net/ipv4/af_inet.c             |  5 +----
 net/ipv4/gre_offload.c         |  1 +
 net/ipv6/ip6_offload.c         |  8 ++++----
 9 files changed, 28 insertions(+), 10 deletions(-)

diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c
index 6c2835086b57..6549348cc24e 100644
--- a/drivers/net/geneve.c
+++ b/drivers/net/geneve.c
@@ -542,6 +542,7 @@ static struct sk_buff *geneve_gro_receive(struct sock *sk,
 	if (!ptype)
 		goto out;
 
+	NAPI_GRO_CB(skb)->inner_network_offset = hlen;
 	pp = call_gro_receive(ptype->callbacks.gro_receive, head, skb);
 	flush = 0;
 
diff --git a/drivers/net/vxlan/vxlan_core.c b/drivers/net/vxlan/vxlan_core.c
index ba319fc21957..c649a82eeca7 100644
--- a/drivers/net/vxlan/vxlan_core.c
+++ b/drivers/net/vxlan/vxlan_core.c
@@ -754,6 +754,7 @@ static struct sk_buff *vxlan_gpe_gro_receive(struct sock *sk,
 
 	vh = vxlan_gro_prepare_receive(sk, head, skb, &grc);
 	if (vh) {
+		NAPI_GRO_CB(skb)->inner_network_offset = skb_gro_offset(skb);
 		if (!vxlan_parse_gpe_proto(vh, &protocol))
 			goto out;
 		ptype = gro_find_receive_by_type(protocol);
diff --git a/include/net/gro.h b/include/net/gro.h
index 50f1e403dbbb..1faff23b66e8 100644
--- a/include/net/gro.h
+++ b/include/net/gro.h
@@ -87,6 +87,15 @@ struct napi_gro_cb {
 
 	/* used to support CHECKSUM_COMPLETE for tunneling protocols */
 	__wsum	csum;
+
+	/* L3 offsets */
+	union {
+		struct {
+			u16 network_offset;
+			u16 inner_network_offset;
+		};
+		u16 network_offsets[2];
+	};
 };
 
 #define NAPI_GRO_CB(skb) ((struct napi_gro_cb *)(skb)->cb)
@@ -172,12 +181,17 @@ static inline void *skb_gro_header(struct sk_buff *skb, unsigned int hlen,
 	return ptr;
 }
 
+static inline int skb_gro_network_offset(const struct sk_buff *skb)
+{
+	return NAPI_GRO_CB(skb)->network_offsets[NAPI_GRO_CB(skb)->encap_mark];
+}
+
 static inline void *skb_gro_network_header(const struct sk_buff *skb)
 {
 	if (skb_gro_may_pull(skb, skb_gro_offset(skb)))
-		return skb_gro_header_fast(skb, skb_network_offset(skb));
+		return skb_gro_header_fast(skb, skb_gro_network_offset(skb));
 
-	return skb_network_header(skb);
+	return skb->data + skb_gro_network_offset(skb);
 }
 
 static inline __wsum inet_gro_compute_pseudo(const struct sk_buff *skb,
diff --git a/net/8021q/vlan_core.c b/net/8021q/vlan_core.c
index f00158234505..9404dd551dfd 100644
--- a/net/8021q/vlan_core.c
+++ b/net/8021q/vlan_core.c
@@ -478,6 +478,8 @@ static struct sk_buff *vlan_gro_receive(struct list_head *head,
 	if (unlikely(!vhdr))
 		goto out;
 
+	NAPI_GRO_CB(skb)->network_offsets[NAPI_GRO_CB(skb)->encap_mark] = hlen;
+
 	type = vhdr->h_vlan_encapsulated_proto;
 
 	ptype = gro_find_receive_by_type(type);
diff --git a/net/core/gro.c b/net/core/gro.c
index 83f35d99a682..c7901253a1a8 100644
--- a/net/core/gro.c
+++ b/net/core/gro.c
@@ -371,6 +371,7 @@ static inline void skb_gro_reset_offset(struct sk_buff *skb, u32 nhoff)
 	const skb_frag_t *frag0;
 	unsigned int headlen;
 
+	NAPI_GRO_CB(skb)->network_offset = 0;
 	NAPI_GRO_CB(skb)->data_offset = 0;
 	headlen = skb_headlen(skb);
 	NAPI_GRO_CB(skb)->frag0 = skb->data;
diff --git a/net/ethernet/eth.c b/net/ethernet/eth.c
index 2edc8b796a4e..ea589e8cde2a 100644
--- a/net/ethernet/eth.c
+++ b/net/ethernet/eth.c
@@ -441,6 +441,7 @@ struct sk_buff *eth_gro_receive(struct list_head *head, struct sk_buff *skb)
 
 	skb_gro_pull(skb, sizeof(*eh));
 	skb_gro_postpull_rcsum(skb, eh, sizeof(*eh));
+	NAPI_GRO_CB(skb)->inner_network_offset = hlen;
 
 	pp = indirect_call_gro_receive_inet(ptype->callbacks.gro_receive,
 					    ipv6_gro_receive, inet_gro_receive,
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 55bd72997b31..7899cbd5b263 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -1568,10 +1568,6 @@ struct sk_buff *inet_gro_receive(struct list_head *head, struct sk_buff *skb)
 
 	NAPI_GRO_CB(skb)->is_atomic = !!(iph->frag_off & htons(IP_DF));
 	NAPI_GRO_CB(skb)->flush |= flush;
-	skb_set_network_header(skb, off);
-	/* The above will be needed by the transport layer if there is one
-	 * immediately following this IP hdr.
-	 */
 
 	/* Note : No need to call skb_gro_postpull_rcsum() here,
 	 * as we already checked checksum over ipv4 header was 0
@@ -1597,6 +1593,7 @@ static struct sk_buff *ipip_gro_receive(struct list_head *head,
 	}
 
 	NAPI_GRO_CB(skb)->encap_mark = 1;
+	NAPI_GRO_CB(skb)->inner_network_offset = skb_gro_offset(skb);
 
 	return inet_gro_receive(head, skb);
 }
diff --git a/net/ipv4/gre_offload.c b/net/ipv4/gre_offload.c
index 5028c72d494a..a1ff2bdf6206 100644
--- a/net/ipv4/gre_offload.c
+++ b/net/ipv4/gre_offload.c
@@ -224,6 +224,7 @@ static struct sk_buff *gre_gro_receive(struct list_head *head,
 	/* Adjusted NAPI_GRO_CB(skb)->csum after skb_gro_pull()*/
 	skb_gro_postpull_rcsum(skb, greh, grehlen);
 
+	NAPI_GRO_CB(skb)->inner_network_offset = hlen;
 	pp = call_gro_receive(ptype->callbacks.gro_receive, head, skb);
 	flush = 0;
 
diff --git a/net/ipv6/ip6_offload.c b/net/ipv6/ip6_offload.c
index b41e35af69ea..765797ca729c 100644
--- a/net/ipv6/ip6_offload.c
+++ b/net/ipv6/ip6_offload.c
@@ -67,7 +67,7 @@ static int ipv6_gro_pull_exthdrs(struct sk_buff *skb, int off, int proto)
 		off += len;
 	}
 
-	skb_gro_pull(skb, off - skb_network_offset(skb));
+	skb_gro_pull(skb, off - skb_gro_network_offset(skb));
 	return proto;
 }
 
@@ -236,8 +236,6 @@ INDIRECT_CALLABLE_SCOPE struct sk_buff *ipv6_gro_receive(struct list_head *head,
 	if (unlikely(!iph))
 		goto out;
 
-	skb_set_network_header(skb, off);
-
 	flush += ntohs(iph->payload_len) != skb->len - hlen;
 
 	proto = iph->nexthdr;
@@ -259,7 +257,7 @@ INDIRECT_CALLABLE_SCOPE struct sk_buff *ipv6_gro_receive(struct list_head *head,
 	NAPI_GRO_CB(skb)->proto = proto;
 
 	flush--;
-	nlen = skb_network_header_len(skb);
+	nlen = skb_gro_offset(skb) - off;
 
 	list_for_each_entry(p, head, list) {
 		const struct ipv6hdr *iph2;
@@ -327,6 +325,7 @@ static struct sk_buff *sit_ip6ip6_gro_receive(struct list_head *head,
 	}
 
 	NAPI_GRO_CB(skb)->encap_mark = 1;
+	NAPI_GRO_CB(skb)->inner_network_offset = skb_gro_offset(skb);
 
 	return ipv6_gro_receive(head, skb);
 }
@@ -342,6 +341,7 @@ static struct sk_buff *ip4ip6_gro_receive(struct list_head *head,
 	}
 
 	NAPI_GRO_CB(skb)->encap_mark = 1;
+	NAPI_GRO_CB(skb)->inner_network_offset = skb_gro_offset(skb);
 
 	return inet_gro_receive(head, skb);
 }
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH net v2 2/3] net: gro: fix udp bad offset in socket lookup
  2024-04-19 15:35 [PATCH net v2 0/3] net: gro: add flush/flush_id checks and fix wrong offset in udp Richard Gobert
  2024-04-19 15:35 ` [PATCH net v2 1/3] net: gro: add {inner_}network_offset to napi_gro_cb Richard Gobert
@ 2024-04-19 15:35 ` Richard Gobert
  2024-04-19 15:35 ` [PATCH net v2 3/3] net: gro: add flush check in udp_gro_receive_segment Richard Gobert
  2 siblings, 0 replies; 9+ messages in thread
From: Richard Gobert @ 2024-04-19 15:35 UTC (permalink / raw
  To: davem, edumazet, kuba, pabeni, dsahern, willemdebruijn.kernel,
	alexander.duyck, aleksander.lobakin, netdev, linux-kernel
  Cc: Richard Gobert

Commits a602456 ("udp: Add GRO functions to UDP socket") and 57c67ff ("udp:
additional GRO support") introduce incorrect usage of {ip,ipv6}_hdr in the
complete phase of gro. The functions always return skb->network_header,
which in the case of encapsulated packets at the gro complete phase, is
always set to the innermost L3 of the packet. That means that calling
{ip,ipv6}_hdr for skbs which completed the GRO receive phase (both in
gro_list and *_gro_complete) when parsing an encapsulated packet's _outer_
L3/L4 may return an unexpected value.

This incorrect usage leads to a bug in GRO's UDP socket lookup.
udp{4,6}_lib_lookup_skb functions use ip_hdr/ipv6_hdr respectively. These
*_hdr functions return network_header which will point to the innermost L3,
resulting in the wrong offset being used in __udp{4,6}_lib_lookup with
encapsulated packets.

To fix this issue, network_offsets union is used inside napi_gro_cb, in
which both the outer and the inner network offsets are saved.

Reproduction example:

Endpoint configuration example (fou + local address bind)

    # ip fou add port 6666 ipproto 4
    # ip link add name tun1 type ipip remote 2.2.2.1 local 2.2.2.2 encap fou encap-dport 5555 encap-sport 6666 mode ipip
    # ip link set tun1 up
    # ip a add 1.1.1.2/24 dev tun1

Netperf TCP_STREAM result on net-next before patch is applied:

net-next main, GRO enabled:
    $ netperf -H 1.1.1.2 -t TCP_STREAM -l 5
    Recv   Send    Send
    Socket Socket  Message  Elapsed
    Size   Size    Size     Time     Throughput
    bytes  bytes   bytes    secs.    10^6bits/sec

    131072  16384  16384    5.28        2.37

net-next main, GRO disabled:
    $ netperf -H 1.1.1.2 -t TCP_STREAM -l 5
    Recv   Send    Send
    Socket Socket  Message  Elapsed
    Size   Size    Size     Time     Throughput
    bytes  bytes   bytes    secs.    10^6bits/sec

    131072  16384  16384    5.01     2745.06

patch applied, GRO enabled:
    $ netperf -H 1.1.1.2 -t TCP_STREAM -l 5
    Recv   Send    Send
    Socket Socket  Message  Elapsed
    Size   Size    Size     Time     Throughput
    bytes  bytes   bytes    secs.    10^6bits/sec

    131072  16384  16384    5.01     2877.38

Fixes: 57c67ff4bd92 ("udp: additional GRO support")
Signed-off-by: Richard Gobert <richardbgobert@gmail.com>
---
 net/ipv4/udp.c         | 3 ++-
 net/ipv4/udp_offload.c | 3 ++-
 net/ipv6/udp.c         | 3 ++-
 net/ipv6/udp_offload.c | 3 ++-
 4 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index c02bf011d4a6..1399fce82b3f 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -532,7 +532,8 @@ static inline struct sock *__udp4_lib_lookup_skb(struct sk_buff *skb,
 struct sock *udp4_lib_lookup_skb(const struct sk_buff *skb,
 				 __be16 sport, __be16 dport)
 {
-	const struct iphdr *iph = ip_hdr(skb);
+	const u16 offset = NAPI_GRO_CB(skb)->network_offsets[skb->encapsulation];
+	const struct iphdr *iph = (struct iphdr *)(skb->data + offset);
 	struct net *net = dev_net(skb->dev);
 	int iif, sdif;
 
diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
index 3498dd1d0694..fd29d21d579c 100644
--- a/net/ipv4/udp_offload.c
+++ b/net/ipv4/udp_offload.c
@@ -718,7 +718,8 @@ EXPORT_SYMBOL(udp_gro_complete);
 
 INDIRECT_CALLABLE_SCOPE int udp4_gro_complete(struct sk_buff *skb, int nhoff)
 {
-	const struct iphdr *iph = ip_hdr(skb);
+	const u16 offset = NAPI_GRO_CB(skb)->network_offsets[skb->encapsulation];
+	const struct iphdr *iph = (struct iphdr *)(skb->data + offset);
 	struct udphdr *uh = (struct udphdr *)(skb->data + nhoff);
 
 	/* do fraglist only if there is no outer UDP encap (or we already processed it) */
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 8b1dd7f51249..f7880e306410 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -272,7 +272,8 @@ static struct sock *__udp6_lib_lookup_skb(struct sk_buff *skb,
 struct sock *udp6_lib_lookup_skb(const struct sk_buff *skb,
 				 __be16 sport, __be16 dport)
 {
-	const struct ipv6hdr *iph = ipv6_hdr(skb);
+	const u16 offset = NAPI_GRO_CB(skb)->network_offsets[skb->encapsulation];
+	const struct ipv6hdr *iph = (struct ipv6hdr *)(skb->data + offset);
 	struct net *net = dev_net(skb->dev);
 	int iif, sdif;
 
diff --git a/net/ipv6/udp_offload.c b/net/ipv6/udp_offload.c
index bbd347de00b4..b41152dd4246 100644
--- a/net/ipv6/udp_offload.c
+++ b/net/ipv6/udp_offload.c
@@ -164,7 +164,8 @@ struct sk_buff *udp6_gro_receive(struct list_head *head, struct sk_buff *skb)
 
 INDIRECT_CALLABLE_SCOPE int udp6_gro_complete(struct sk_buff *skb, int nhoff)
 {
-	const struct ipv6hdr *ipv6h = ipv6_hdr(skb);
+	const u16 offset = NAPI_GRO_CB(skb)->network_offsets[skb->encapsulation];
+	const struct ipv6hdr *ipv6h = (struct ipv6hdr *)(skb->data + offset);
 	struct udphdr *uh = (struct udphdr *)(skb->data + nhoff);
 
 	/* do fraglist only if there is no outer UDP encap (or we already processed it) */
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH net v2 3/3] net: gro: add flush check in udp_gro_receive_segment
  2024-04-19 15:35 [PATCH net v2 0/3] net: gro: add flush/flush_id checks and fix wrong offset in udp Richard Gobert
  2024-04-19 15:35 ` [PATCH net v2 1/3] net: gro: add {inner_}network_offset to napi_gro_cb Richard Gobert
  2024-04-19 15:35 ` [PATCH net v2 2/3] net: gro: fix udp bad offset in socket lookup Richard Gobert
@ 2024-04-19 15:35 ` Richard Gobert
  2 siblings, 0 replies; 9+ messages in thread
From: Richard Gobert @ 2024-04-19 15:35 UTC (permalink / raw
  To: davem, edumazet, kuba, pabeni, dsahern, willemdebruijn.kernel,
	alexander.duyck, aleksander.lobakin, netdev, linux-kernel
  Cc: Richard Gobert

GRO-GSO path is supposed to be transparent and as such L3 flush checks are
relevant to all UDP flows merging in GRO. This patch uses the same logic
and code from tcp_gro_receive, terminating merge if flush is non zero.

Fixes: 36707061d6ba ("udp: allow forwarding of plain (non-fraglisted) UDP GRO packets")
Signed-off-by: Richard Gobert <richardbgobert@gmail.com>
---
 net/ipv4/udp_offload.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
index fd29d21d579c..8721fe5beca2 100644
--- a/net/ipv4/udp_offload.c
+++ b/net/ipv4/udp_offload.c
@@ -471,6 +471,7 @@ static struct sk_buff *udp_gro_receive_segment(struct list_head *head,
 	struct sk_buff *p;
 	unsigned int ulen;
 	int ret = 0;
+	int flush;
 
 	/* requires non zero csum, for symmetry with GSO */
 	if (!uh->check) {
@@ -504,13 +505,22 @@ static struct sk_buff *udp_gro_receive_segment(struct list_head *head,
 			return p;
 		}
 
+		flush = NAPI_GRO_CB(p)->flush;
+
+		if (NAPI_GRO_CB(p)->flush_id != 1 ||
+		    NAPI_GRO_CB(p)->count != 1 ||
+		    !NAPI_GRO_CB(p)->is_atomic)
+			flush |= NAPI_GRO_CB(p)->flush_id;
+		else
+			NAPI_GRO_CB(p)->is_atomic = false;
+
 		/* Terminate the flow on len mismatch or if it grow "too much".
 		 * Under small packet flood GRO count could elsewhere grow a lot
 		 * leading to excessive truesize values.
 		 * On len mismatch merge the first packet shorter than gso_size,
 		 * otherwise complete the GRO packet.
 		 */
-		if (ulen > ntohs(uh2->len)) {
+		if (ulen > ntohs(uh2->len) || flush) {
 			pp = p;
 		} else {
 			if (NAPI_GRO_CB(skb)->is_flist) {
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH net v2 1/3] net: gro: add {inner_}network_offset to napi_gro_cb
  2024-04-19 15:35 ` [PATCH net v2 1/3] net: gro: add {inner_}network_offset to napi_gro_cb Richard Gobert
@ 2024-04-19 18:51   ` Willem de Bruijn
  2024-04-22  8:32     ` Richard Gobert
  0 siblings, 1 reply; 9+ messages in thread
From: Willem de Bruijn @ 2024-04-19 18:51 UTC (permalink / raw
  To: Richard Gobert, davem, edumazet, kuba, pabeni, dsahern,
	willemdebruijn.kernel, alexander.duyck, aleksander.lobakin,
	netdev, linux-kernel
  Cc: Richard Gobert

Richard Gobert wrote:
> This patch adds network_offset and inner_network_offset to napi_gro_cb, and
> makes sure both are set correctly. In the common path there's only one
> write (skb_gro_reset_offset, which replaces skb_set_network_header).
> 
> Signed-off-by: Richard Gobert <richardbgobert@gmail.com>
> ---
>  drivers/net/geneve.c           |  1 +
>  drivers/net/vxlan/vxlan_core.c |  1 +
>  include/net/gro.h              | 18 ++++++++++++++++--
>  net/8021q/vlan_core.c          |  2 ++
>  net/core/gro.c                 |  1 +
>  net/ethernet/eth.c             |  1 +
>  net/ipv4/af_inet.c             |  5 +----
>  net/ipv4/gre_offload.c         |  1 +
>  net/ipv6/ip6_offload.c         |  8 ++++----
>  9 files changed, 28 insertions(+), 10 deletions(-)
> 

> +static inline int skb_gro_network_offset(const struct sk_buff *skb)
> +{
> +	return NAPI_GRO_CB(skb)->network_offsets[NAPI_GRO_CB(skb)->encap_mark];
> +}
> +


> @@ -236,8 +236,6 @@ INDIRECT_CALLABLE_SCOPE struct sk_buff *ipv6_gro_receive(struct list_head *head,
>  	if (unlikely(!iph))
>  		goto out;
>  
> -	skb_set_network_header(skb, off);
> -

Especially for net, this is still a large patch.

Can we avoid touching all those tunnel callbacks and just set the
offsets in inet_gro_receive and ipv6_gro_receive themselves, just
as skb_set_network_header now:

@@ -236,7 +236,7 @@ INDIRECT_CALLABLE_SCOPE struct sk_buff *ipv6_gro_receive(struct list_head *head,
        if (unlikely(!iph))
                goto out;
 
-       skb_set_network_header(skb, off);
+       NAPI_GRO_CB(skb)->network_offsets[NAPI_GRO_CB(skb)->encap_mark] = off;


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net v2 1/3] net: gro: add {inner_}network_offset to napi_gro_cb
  2024-04-19 18:51   ` Willem de Bruijn
@ 2024-04-22  8:32     ` Richard Gobert
  2024-04-22 16:56       ` Willem de Bruijn
  0 siblings, 1 reply; 9+ messages in thread
From: Richard Gobert @ 2024-04-22  8:32 UTC (permalink / raw
  To: Willem de Bruijn, davem, edumazet, kuba, pabeni, dsahern,
	alexander.duyck, aleksander.lobakin, netdev, linux-kernel

Willem de Bruijn wrote:
> Richard Gobert wrote:
>> This patch adds network_offset and inner_network_offset to napi_gro_cb, and
>> makes sure both are set correctly. In the common path there's only one
>> write (skb_gro_reset_offset, which replaces skb_set_network_header).
>>
>> Signed-off-by: Richard Gobert <richardbgobert@gmail.com>
>> ---
>>  drivers/net/geneve.c           |  1 +
>>  drivers/net/vxlan/vxlan_core.c |  1 +
>>  include/net/gro.h              | 18 ++++++++++++++++--
>>  net/8021q/vlan_core.c          |  2 ++
>>  net/core/gro.c                 |  1 +
>>  net/ethernet/eth.c             |  1 +
>>  net/ipv4/af_inet.c             |  5 +----
>>  net/ipv4/gre_offload.c         |  1 +
>>  net/ipv6/ip6_offload.c         |  8 ++++----
>>  9 files changed, 28 insertions(+), 10 deletions(-)
>>
> 
>> +static inline int skb_gro_network_offset(const struct sk_buff *skb)
>> +{
>> +	return NAPI_GRO_CB(skb)->network_offsets[NAPI_GRO_CB(skb)->encap_mark];
>> +}
>> +
> 
> 
>> @@ -236,8 +236,6 @@ INDIRECT_CALLABLE_SCOPE struct sk_buff *ipv6_gro_receive(struct list_head *head,
>>  	if (unlikely(!iph))
>>  		goto out;
>>  
>> -	skb_set_network_header(skb, off);
>> -
> 
> Especially for net, this is still a large patch.
> 
> Can we avoid touching all those tunnel callbacks and just set the
> offsets in inet_gro_receive and ipv6_gro_receive themselves, just
> as skb_set_network_header now:
> 
> @@ -236,7 +236,7 @@ INDIRECT_CALLABLE_SCOPE struct sk_buff *ipv6_gro_receive(struct list_head *head,
>         if (unlikely(!iph))
>                 goto out;
>  
> -       skb_set_network_header(skb, off);
> +       NAPI_GRO_CB(skb)->network_offsets[NAPI_GRO_CB(skb)->encap_mark] = off;
> 

Thanks for the reply!

Setting network_offset on dev_gro_receive and inner_network_offset only
in the tunnel callbacks is the best option IMO. I agree that
we want a small patch to net that solves the problem, although I 
think always using ->encap_mark in the common path is not ideal. 

We can avoid changing all the tunnel callbacks by always setting
inner_network_offset in {ipv6,inet}_gro_receive and initialize
network_offset to 0 in dev_gro_receive. It will result in a small
change, without using ->encap_mark.

What are your thoughts?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net v2 1/3] net: gro: add {inner_}network_offset to napi_gro_cb
  2024-04-22  8:32     ` Richard Gobert
@ 2024-04-22 16:56       ` Willem de Bruijn
  2024-04-23  9:27         ` Richard Gobert
  0 siblings, 1 reply; 9+ messages in thread
From: Willem de Bruijn @ 2024-04-22 16:56 UTC (permalink / raw
  To: Richard Gobert, Willem de Bruijn, davem, edumazet, kuba, pabeni,
	dsahern, alexander.duyck, aleksander.lobakin, netdev,
	linux-kernel

Richard Gobert wrote:
> Willem de Bruijn wrote:
> > Richard Gobert wrote:
> >> This patch adds network_offset and inner_network_offset to napi_gro_cb, and
> >> makes sure both are set correctly. In the common path there's only one
> >> write (skb_gro_reset_offset, which replaces skb_set_network_header).
> >>
> >> Signed-off-by: Richard Gobert <richardbgobert@gmail.com>
> >> ---
> >>  drivers/net/geneve.c           |  1 +
> >>  drivers/net/vxlan/vxlan_core.c |  1 +
> >>  include/net/gro.h              | 18 ++++++++++++++++--
> >>  net/8021q/vlan_core.c          |  2 ++
> >>  net/core/gro.c                 |  1 +
> >>  net/ethernet/eth.c             |  1 +
> >>  net/ipv4/af_inet.c             |  5 +----
> >>  net/ipv4/gre_offload.c         |  1 +
> >>  net/ipv6/ip6_offload.c         |  8 ++++----
> >>  9 files changed, 28 insertions(+), 10 deletions(-)
> >>
> > 
> >> +static inline int skb_gro_network_offset(const struct sk_buff *skb)
> >> +{
> >> +	return NAPI_GRO_CB(skb)->network_offsets[NAPI_GRO_CB(skb)->encap_mark];
> >> +}
> >> +
> > 
> > 
> >> @@ -236,8 +236,6 @@ INDIRECT_CALLABLE_SCOPE struct sk_buff *ipv6_gro_receive(struct list_head *head,
> >>  	if (unlikely(!iph))
> >>  		goto out;
> >>  
> >> -	skb_set_network_header(skb, off);
> >> -
> > 
> > Especially for net, this is still a large patch.
> > 
> > Can we avoid touching all those tunnel callbacks and just set the
> > offsets in inet_gro_receive and ipv6_gro_receive themselves, just
> > as skb_set_network_header now:
> > 
> > @@ -236,7 +236,7 @@ INDIRECT_CALLABLE_SCOPE struct sk_buff *ipv6_gro_receive(struct list_head *head,
> >         if (unlikely(!iph))
> >                 goto out;
> >  
> > -       skb_set_network_header(skb, off);
> > +       NAPI_GRO_CB(skb)->network_offsets[NAPI_GRO_CB(skb)->encap_mark] = off;
> > 
> 
> Thanks for the reply!
> 
> Setting network_offset on dev_gro_receive and inner_network_offset only
> in the tunnel callbacks is the best option IMO. I agree that
> we want a small patch to net that solves the problem, although I 
> think always using ->encap_mark in the common path is not ideal. 
>
> We can avoid changing all the tunnel callbacks by always setting
> inner_network_offset in {ipv6,inet}_gro_receive and initialize
> network_offset to 0 in dev_gro_receive. It will result in a small
> change, without using ->encap_mark.
> 
> What are your thoughts?

That works. It's a bit ugly that inner_network_offset will always be
set, even if a packet only traverses inet_gro_receive once. What is
your concern with testing encap_mark?

How do you want to detect in udp[46]_lib_lookup_skb which of the two
offsets to use? That would still be encap_mark based?


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net v2 1/3] net: gro: add {inner_}network_offset to napi_gro_cb
  2024-04-22 16:56       ` Willem de Bruijn
@ 2024-04-23  9:27         ` Richard Gobert
  2024-04-23 13:55           ` Willem de Bruijn
  0 siblings, 1 reply; 9+ messages in thread
From: Richard Gobert @ 2024-04-23  9:27 UTC (permalink / raw
  To: Willem de Bruijn, davem, edumazet, kuba, pabeni, dsahern,
	alexander.duyck, aleksander.lobakin, netdev, linux-kernel

Willem de Bruijn wrote:
> Richard Gobert wrote:
>> Willem de Bruijn wrote:
>>> Richard Gobert wrote:
>>>> This patch adds network_offset and inner_network_offset to napi_gro_cb, and
>>>> makes sure both are set correctly. In the common path there's only one
>>>> write (skb_gro_reset_offset, which replaces skb_set_network_header).
>>>>
>>>> Signed-off-by: Richard Gobert <richardbgobert@gmail.com>
>>>> ---
>>>>  drivers/net/geneve.c           |  1 +
>>>>  drivers/net/vxlan/vxlan_core.c |  1 +
>>>>  include/net/gro.h              | 18 ++++++++++++++++--
>>>>  net/8021q/vlan_core.c          |  2 ++
>>>>  net/core/gro.c                 |  1 +
>>>>  net/ethernet/eth.c             |  1 +
>>>>  net/ipv4/af_inet.c             |  5 +----
>>>>  net/ipv4/gre_offload.c         |  1 +
>>>>  net/ipv6/ip6_offload.c         |  8 ++++----
>>>>  9 files changed, 28 insertions(+), 10 deletions(-)
>>>>
>>>
>>>> +static inline int skb_gro_network_offset(const struct sk_buff *skb)
>>>> +{
>>>> +	return NAPI_GRO_CB(skb)->network_offsets[NAPI_GRO_CB(skb)->encap_mark];
>>>> +}
>>>> +
>>>
>>>
>>>> @@ -236,8 +236,6 @@ INDIRECT_CALLABLE_SCOPE struct sk_buff *ipv6_gro_receive(struct list_head *head,
>>>>  	if (unlikely(!iph))
>>>>  		goto out;
>>>>  
>>>> -	skb_set_network_header(skb, off);
>>>> -
>>>
>>> Especially for net, this is still a large patch.
>>>
>>> Can we avoid touching all those tunnel callbacks and just set the
>>> offsets in inet_gro_receive and ipv6_gro_receive themselves, just
>>> as skb_set_network_header now:
>>>
>>> @@ -236,7 +236,7 @@ INDIRECT_CALLABLE_SCOPE struct sk_buff *ipv6_gro_receive(struct list_head *head,
>>>         if (unlikely(!iph))
>>>                 goto out;
>>>  
>>> -       skb_set_network_header(skb, off);
>>> +       NAPI_GRO_CB(skb)->network_offsets[NAPI_GRO_CB(skb)->encap_mark] = off;
>>>
>>
>> Thanks for the reply!
>>
>> Setting network_offset on dev_gro_receive and inner_network_offset only
>> in the tunnel callbacks is the best option IMO. I agree that
>> we want a small patch to net that solves the problem, although I 
>> think always using ->encap_mark in the common path is not ideal. 
>>
>> We can avoid changing all the tunnel callbacks by always setting
>> inner_network_offset in {ipv6,inet}_gro_receive and initialize
>> network_offset to 0 in dev_gro_receive. It will result in a small
>> change, without using ->encap_mark.
>>
>> What are your thoughts?
> 
> That works. It's a bit ugly that inner_network_offset will always be
> set, even if a packet only traverses inet_gro_receive once. What is
> your concern with testing encap_mark?
> 
> How do you want to detect in udp[46]_lib_lookup_skb which of the two
> offsets to use? That would still be encap_mark based?
> 

I'd like to minimize any potential overhead, even a small one, and this way
we do not need to access encap_mark at all in the common path.

NAPI_GRO_CB(skb)->network_offsets[NAPI_GRO_CB(skb)->encap_mark] = off;

compiles to:

	movzx   eax, byte ptr [rbx+46h]
	shr     al, 1
	and     eax, 1
	mov     [rbx+rax*2+4Ch], r14w

while

NAPI_GRO_CB(skb)->inner_network_offset = off;

compiles to:

	mov     [rbx+4Eh], r14w

I do plan to add a patch to net-next after this to remove the access
entirely from inet gro callbacks, in the meantime, it looks to me like a
reasonable patch and small enough to not raise concerns.

For udp_lib_lookup I don't see a way around it so yes, it would still be
dependent on encap_mark. Since this runs in the complete phase it's less
concerning.

Let me know that you're ok with that and I'll post a v3.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net v2 1/3] net: gro: add {inner_}network_offset to napi_gro_cb
  2024-04-23  9:27         ` Richard Gobert
@ 2024-04-23 13:55           ` Willem de Bruijn
  0 siblings, 0 replies; 9+ messages in thread
From: Willem de Bruijn @ 2024-04-23 13:55 UTC (permalink / raw
  To: Richard Gobert, Willem de Bruijn, davem, edumazet, kuba, pabeni,
	dsahern, alexander.duyck, aleksander.lobakin, netdev,
	linux-kernel

Richard Gobert wrote:
> Willem de Bruijn wrote:
> > Richard Gobert wrote:
> >> Willem de Bruijn wrote:
> >>> Richard Gobert wrote:
> >>>> This patch adds network_offset and inner_network_offset to napi_gro_cb, and
> >>>> makes sure both are set correctly. In the common path there's only one
> >>>> write (skb_gro_reset_offset, which replaces skb_set_network_header).
> >>>>
> >>>> Signed-off-by: Richard Gobert <richardbgobert@gmail.com>
> >>>> ---
> >>>>  drivers/net/geneve.c           |  1 +
> >>>>  drivers/net/vxlan/vxlan_core.c |  1 +
> >>>>  include/net/gro.h              | 18 ++++++++++++++++--
> >>>>  net/8021q/vlan_core.c          |  2 ++
> >>>>  net/core/gro.c                 |  1 +
> >>>>  net/ethernet/eth.c             |  1 +
> >>>>  net/ipv4/af_inet.c             |  5 +----
> >>>>  net/ipv4/gre_offload.c         |  1 +
> >>>>  net/ipv6/ip6_offload.c         |  8 ++++----
> >>>>  9 files changed, 28 insertions(+), 10 deletions(-)
> >>>>
> >>>
> >>>> +static inline int skb_gro_network_offset(const struct sk_buff *skb)
> >>>> +{
> >>>> +	return NAPI_GRO_CB(skb)->network_offsets[NAPI_GRO_CB(skb)->encap_mark];
> >>>> +}
> >>>> +
> >>>
> >>>
> >>>> @@ -236,8 +236,6 @@ INDIRECT_CALLABLE_SCOPE struct sk_buff *ipv6_gro_receive(struct list_head *head,
> >>>>  	if (unlikely(!iph))
> >>>>  		goto out;
> >>>>  
> >>>> -	skb_set_network_header(skb, off);
> >>>> -
> >>>
> >>> Especially for net, this is still a large patch.
> >>>
> >>> Can we avoid touching all those tunnel callbacks and just set the
> >>> offsets in inet_gro_receive and ipv6_gro_receive themselves, just
> >>> as skb_set_network_header now:
> >>>
> >>> @@ -236,7 +236,7 @@ INDIRECT_CALLABLE_SCOPE struct sk_buff *ipv6_gro_receive(struct list_head *head,
> >>>         if (unlikely(!iph))
> >>>                 goto out;
> >>>  
> >>> -       skb_set_network_header(skb, off);
> >>> +       NAPI_GRO_CB(skb)->network_offsets[NAPI_GRO_CB(skb)->encap_mark] = off;
> >>>
> >>
> >> Thanks for the reply!
> >>
> >> Setting network_offset on dev_gro_receive and inner_network_offset only
> >> in the tunnel callbacks is the best option IMO. I agree that
> >> we want a small patch to net that solves the problem, although I 
> >> think always using ->encap_mark in the common path is not ideal. 
> >>
> >> We can avoid changing all the tunnel callbacks by always setting
> >> inner_network_offset in {ipv6,inet}_gro_receive and initialize
> >> network_offset to 0 in dev_gro_receive. It will result in a small
> >> change, without using ->encap_mark.
> >>
> >> What are your thoughts?
> > 
> > That works. It's a bit ugly that inner_network_offset will always be
> > set, even if a packet only traverses inet_gro_receive once. What is
> > your concern with testing encap_mark?
> > 
> > How do you want to detect in udp[46]_lib_lookup_skb which of the two
> > offsets to use? That would still be encap_mark based?
> > 
> 
> I'd like to minimize any potential overhead, even a small one, and this way
> we do not need to access encap_mark at all in the common path.
> 
> NAPI_GRO_CB(skb)->network_offsets[NAPI_GRO_CB(skb)->encap_mark] = off;
> 
> compiles to:
> 
> 	movzx   eax, byte ptr [rbx+46h]
> 	shr     al, 1
> 	and     eax, 1
> 	mov     [rbx+rax*2+4Ch], r14w
> 
> while
> 
> NAPI_GRO_CB(skb)->inner_network_offset = off;
> 
> compiles to:
> 
> 	mov     [rbx+4Eh], r14w
> 
> I do plan to add a patch to net-next after this to remove the access
> entirely from inet gro callbacks, in the meantime, it looks to me like a
> reasonable patch and small enough to not raise concerns.
> 
> For udp_lib_lookup I don't see a way around it so yes, it would still be
> dependent on encap_mark. Since this runs in the complete phase it's less
> concerning.
> 
> Let me know that you're ok with that and I'll post a v3.

Yes, looks fine.

Main cost is memory access, and that encap_mark will be
accessed soon after in udp4_lib_lookup.

I don't expect two arithmetic instructions to matter. But this code
does now have one more store: the one in dev_gro_receive.

Either way, in the noise. Both approaches look fine to me: very
concise and essentially equivalent. Choose your preferred option.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2024-04-23 13:55 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-04-19 15:35 [PATCH net v2 0/3] net: gro: add flush/flush_id checks and fix wrong offset in udp Richard Gobert
2024-04-19 15:35 ` [PATCH net v2 1/3] net: gro: add {inner_}network_offset to napi_gro_cb Richard Gobert
2024-04-19 18:51   ` Willem de Bruijn
2024-04-22  8:32     ` Richard Gobert
2024-04-22 16:56       ` Willem de Bruijn
2024-04-23  9:27         ` Richard Gobert
2024-04-23 13:55           ` Willem de Bruijn
2024-04-19 15:35 ` [PATCH net v2 2/3] net: gro: fix udp bad offset in socket lookup Richard Gobert
2024-04-19 15:35 ` [PATCH net v2 3/3] net: gro: add flush check in udp_gro_receive_segment Richard Gobert

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).