All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
From: Jeremi Piotrowski <jpiotrowski@linux.microsoft.com>
To: Eric Dumazet <eric.dumazet@gmail.com>
Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	seh@panix.com, Herbert Xu <herbert@gondor.apana.org.au>,
	Eric Dumazet <edumazet@google.com>
Subject: Re: kernel BUG at net/core/skbuff.c:4219
Date: Fri, 21 Oct 2022 03:00:22 -0700	[thread overview]
Message-ID: <20221021100022.GA31916@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net> (raw)
In-Reply-To: <194f6b02-8ee7-b5d7-58f3-6a83b5ff275d@gmail.com>

On Tue, Oct 11, 2022 at 10:57:05AM -0700, Eric Dumazet wrote:
> 
> On 10/11/22 09:56, Jeremi Piotrowski wrote:
> >Hi,
> >
> >One of our Flatcar users has been hitting the kernel BUG in the subject line
> >for the past year (https://github.com/flatcar/Flatcar/issues/378). This was
> >first reported when on 5.10.25, but has been happening across kernel updates,
> >most recently with 5.15.63. The nodes where this happens are AWS EC2 instances,
> >using ENA and calico networking in eBPF mode with VXLAN encapsulation. When
> >GRO/GSO is enabled, the host hits this bug and prints the following stacktrace:
> 
> 
> I suspect eBPF code lowers gso_size ?
> 
> gso stack is not able to arbitrarily segment a GRO packet after
> gso_size being changed.
> 
> 

This was a good hint, see Tomas' response for some more observations.

This appears to still be happening with Calico v3.23 which started passing
BPF_F_ADJ_ROOM_FIXED_GSO to bpf_skb_adjust_room() on the decap (rx) path.
BPF_F_ADJ_ROOM_FIXED_GSO is not passed on the encap (tx) path. It is enough to
disable GRO to stop the BUG from being hit though, so there must be more going
on here ? (since the rx path does not change gso_size any longer).

> >
> >[Mon Oct 10 18:22:24 2022] ------------[ cut here ]------------
> >[Mon Oct 10 18:22:24 2022] kernel BUG at net/core/skbuff.c:4219!
> >[Mon Oct 10 18:22:24 2022] invalid opcode: 0000 [#1] SMP PTI
> >[Mon Oct 10 18:22:24 2022] CPU: 6 PID: 0 Comm: swapper/6 Not tainted 5.15.63-flatcar #1
> >[Mon Oct 10 18:22:24 2022] Hardware name: Amazon EC2 z1d.12xlarge/, BIOS 1.0 10/16/2017
> >[Mon Oct 10 18:22:24 2022] RIP: 0010:skb_segment+0xc70/0xe80
> >[Mon Oct 10 18:22:24 2022] Code: 44 24 50 48 89 44 24 30 48 8b 44 24 10 48 89 44 24 50 e9 16 f7 ff ff 0f 0b 89 44 24 2c c7 44 24 4c 00 00 00 00 e9 44 fe ff ff <0f> 0b 0f 0b 0f 0b 41 8b 7d 74 85 ff 0f 85 91 01 00 00 49 8b 95 c0
> >[Mon Oct 10 18:22:24 2022] RSP: 0018:ffffa2d38c780838 EFLAGS: 00010246
> >[Mon Oct 10 18:22:24 2022] RAX: ffff8954dd8312c0 RBX: ffff89293fbde300 RCX: ffff8957bd3d2fa0
> >[Mon Oct 10 18:22:24 2022] RDX: 0000000000000000 RSI: ffff89293fbde2c0 RDI: ffffffffffffffff
> >[Mon Oct 10 18:22:24 2022] RBP: ffffa2d38c780908 R08: 0000000000009db6 R09: 0000000000000000
> >[Mon Oct 10 18:22:24 2022] R10: 000000000000a356 R11: 000000000000a31a R12: 000000000000000b
> >[Mon Oct 10 18:22:24 2022] R13: ffff892940566100 R14: 000000000000a31a R15: ffff891ad0e5c600
> >[Mon Oct 10 18:22:24 2022] FS:  0000000000000000(0000) GS:ffff8948b9b80000(0000) knlGS:0000000000000000
> >[Mon Oct 10 18:22:24 2022] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >[Mon Oct 10 18:22:24 2022] CR2: 000000c011faf000 CR3: 0000000d66a0a001 CR4: 00000000007706e0
> >[Mon Oct 10 18:22:24 2022] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> >[Mon Oct 10 18:22:24 2022] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> >[Mon Oct 10 18:22:24 2022] PKRU: 55555554
> >[Mon Oct 10 18:22:24 2022] Call Trace:
> >[Mon Oct 10 18:22:24 2022]  <IRQ>
> >[Mon Oct 10 18:22:24 2022] ? csum_block_add_ext (include/net/checksum.h:117)
> >[Mon Oct 10 18:22:24 2022] ? reqsk_fastopen_remove (include/linux/bitops.h:119 include/net/checksum.h:87 include/net/checksum.h:94 include/net/checksum.h:100)
> >[Mon Oct 10 18:22:24 2022] tcp_gso_segment (net/ipv4/tcp_offload.c:99)
> >[Mon Oct 10 18:22:24 2022] inet_gso_segment (net/ipv4/af_inet.c:1385)
> >[Mon Oct 10 18:22:24 2022] skb_mac_gso_segment (net/core/dev.c:3339)
> >[Mon Oct 10 18:22:24 2022] __skb_gso_segment (net/core/dev.c:3414 (discriminator 3))
> >[Mon Oct 10 18:22:24 2022] ? netif_skb_features (include/net/mpls.h:21 net/core/dev.c:3463 net/core/dev.c:3483 net/core/dev.c:3574)
> >[Mon Oct 10 18:22:24 2022] validate_xmit_skb.constprop.0 (net/core/dev.c:3672)
> >[Mon Oct 10 18:22:24 2022] validate_xmit_skb_list (net/core/dev.c:3722)
> >[Mon Oct 10 18:22:24 2022] sch_direct_xmit (net/sched/sch_generic.c:327)
> >[Mon Oct 10 18:22:24 2022] __dev_queue_xmit (net/core/dev.c:3858 net/core/dev.c:4185)
> >[Mon Oct 10 18:22:24 2022] ip_finish_output2 (include/net/neighbour.h:500 include/net/neighbour.h:514 net/ipv4/ip_output.c:228)
> >[Mon Oct 10 18:22:24 2022] ? ip_route_input_rcu (net/ipv4/route.c:1745 net/ipv4/route.c:2499 net/ipv4/route.c:2458)
> >[Mon Oct 10 18:22:24 2022] ? skb_gso_validate_network_len (net/core/skbuff.c:5561 net/core/skbuff.c:5636)
> >[Mon Oct 10 18:22:24 2022] ? __ip_finish_output (net/ipv4/ip_output.c:249 net/ipv4/ip_output.c:301 net/ipv4/ip_output.c:288)
> >[Mon Oct 10 18:22:24 2022] ip_sublist_rcv_finish (include/net/dst.h:460 net/ipv4/ip_input.c:565)
> >[Mon Oct 10 18:22:24 2022] ip_sublist_rcv (net/ipv4/ip_input.c:624)
> >[Mon Oct 10 18:22:24 2022] ? ip_sublist_rcv (net/ipv4/ip_input.c:422)
> >[Mon Oct 10 18:22:24 2022] ip_list_rcv (net/ipv4/ip_input.c:659)
> >[Mon Oct 10 18:22:24 2022] __netif_receive_skb_list_core (net/core/dev.c:5498 net/core/dev.c:5546)
> >[Mon Oct 10 18:22:24 2022] netif_receive_skb_list_internal (net/core/dev.c:5600 net/core/dev.c:5689)
> >[Mon Oct 10 18:22:24 2022] ? inet_gro_complete (net/ipv4/af_inet.c:1645)
> >[Mon Oct 10 18:22:24 2022] napi_gro_complete.constprop.0.isra.0 (include/linux/list.h:35 net/core/dev.c:5844 net/core/dev.c:5839 net/core/dev.c:5856 net/core/dev.c:5892)
> >[Mon Oct 10 18:22:24 2022] dev_gro_receive (net/core/dev.c:6119)
> >[Mon Oct 10 18:22:24 2022] napi_gro_receive (net/core/dev.c:6223)
> >[Mon Oct 10 18:22:24 2022]  0xffffffffc069d699
> >[Mon Oct 10 18:22:24 2022] ? scheduler_tick (kernel/sched/core.c:7053 kernel/sched/core.c:5278)
> >[Mon Oct 10 18:22:24 2022] __napi_poll (net/core/dev.c:7005)
> >[Mon Oct 10 18:22:24 2022] net_rx_action (net/core/dev.c:7074 net/core/dev.c:7159)
> >[Mon Oct 10 18:22:24 2022] __do_softirq (arch/x86/include/asm/jump_label.h:27 include/linux/jump_label.h:212 include/trace/events/irq.h:142 kernel/softirq.c:559)
> >[Mon Oct 10 18:22:24 2022] irq_exit_rcu (kernel/softirq.c:432 kernel/softirq.c:636 kernel/softirq.c:648)
> >[Mon Oct 10 18:22:24 2022] common_interrupt (arch/x86/kernel/irq.c:240 (discriminator 14))
> >[Mon Oct 10 18:22:24 2022]  </IRQ>
> >[Mon Oct 10 18:22:24 2022]  <TASK>
> >[Mon Oct 10 18:22:24 2022]  asm_common_interrupt+0x21/0x40
> >[Mon Oct 10 18:22:24 2022] RIP: 0010:cpuidle_enter_state+0xc7/0x350
> >[Mon Oct 10 18:22:24 2022] Code: 8b 3d f5 e1 9b 4d e8 08 bb a7 ff 49 89 c5 0f 1f 44 00 00 31 ff e8 09 c9 a7 ff 45 84 ff 0f 85 fe 00 00 00 fb 66 0f 1f 44 00 00 <45> 85 f6 0f 88 0a 01 00 00 49 63 c6 4c 2b 2c 24 48 8d 14 40 48 8d
> >[Mon Oct 10 18:22:24 2022] RSP: 0018:ffffa2d38c527ea8 EFLAGS: 00000246
> >[Mon Oct 10 18:22:24 2022] RAX: ffff8948b9bac100 RBX: 0000000000000003 RCX: 00000000ffffffff
> >[Mon Oct 10 18:22:24 2022] RDX: 0000000000000006 RSI: 0000000000000006 RDI: 0000000000000000
> >[Mon Oct 10 18:22:24 2022] RBP: ffff8948b9bb6000 R08: 0000043f38b90644 R09: 0000043f6c0b1df3
> >[Mon Oct 10 18:22:24 2022] R10: 0000000000000014 R11: 0000000000000008 R12: ffffffffb3bbd7e0
> >[Mon Oct 10 18:22:24 2022] R13: 0000043f38b90644 R14: 0000000000000003 R15: 0000000000000000
> >[Mon Oct 10 18:22:24 2022]  ? cpuidle_enter_state+0xb7/0x350
> >[Mon Oct 10 18:22:24 2022]  cpuidle_enter+0x29/0x40
> >[Mon Oct 10 18:22:24 2022]  do_idle+0x1e9/0x280
> >[Mon Oct 10 18:22:24 2022]  cpu_startup_entry+0x19/0x20
> >[Mon Oct 10 18:22:24 2022]  secondary_startup_64_no_verify+0xc2/0xcb
> >[Mon Oct 10 18:22:24 2022]  </TASK>
> >[Mon Oct 10 18:22:24 2022] Modules linked in: xt_CT ip_set_hash_net ip_set vxlan cls_bpf sch_ingress veth xt_comment xt_mark xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack_netlink nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xfrm_user xfrm_algo nft_counter xt_addrtype nft_compat nf_tables nfnetlink nls_ascii nls_cp437 vfat fat mousedev intel_rapl_msr intel_rapl_common psmouse evdev i2c_piix4 i2c_core button sch_fq_codel fuse configfs ext4 crc16 mbcache jbd2 dm_verity dm_bufio aesni_intel nvme nvme_core libaes crypto_simd ena cryptd t10_pi crc_t10dif crct10dif_generic crct10dif_common btrfs blake2b_generic zstd_compress lzo_compress raid6_pq libcrc32c crc32c_generic crc32c_intel dm_mirror dm_region_hash dm_log dm_mod qla4xxx iscsi_boot_sysfs iscsi_tcp libiscsi_tcp libiscsi br_netfilter bridge scsi_transport_iscsi stp llc overlay scsi_mod scsi_common
> >[Mon Oct 10 18:22:24 2022] ---[ end trace 86a2732b8f4d0b13 ]---
> >
> >Disabling GSO/GRO *seems* to prevent the BUG_ON() from getting hit but is too
> >costly in terms of performance. There are also suggestions that this happens
> >more often under heavy network load, and has also been observed when running on
> >Vmware.
> >
> >If anyone has any suggestions or needs more information to come up with a
> >theory, we'd love to get to the bottom of this.
> >
> >Jeremi

WARNING: multiple messages have this Message-ID (diff)
From: dracoding <dracodingfly@gmail.com>
To: eric.dumazet@gmail.com
Cc: edumazet@google.com, herbert@gondor.apana.org.au,
	jpiotrowski@linux.microsoft.com, linux-kernel@vger.kernel.org,
	netdev@vger.kernel.org, seh@panix.com
Subject: Re: kernel BUG at net/core/skbuff.c:4219
Date: Thu, 11 Apr 2024 14:23:21 +0800	[thread overview]
Message-ID: <20221021100022.GA31916@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net> (raw)
Message-ID: <20240411062321.Y7CvbLz9TuYPSw0okz4QPwPo0Q7--VVyitGnZ0mvFVo@z> (raw)
In-Reply-To: <194f6b02-8ee7-b5d7-58f3-6a83b5ff275d@gmail.com>

From: Jeremi Piotrowski <jpiotrowski@linux.microsoft.com>

> On Tue, Oct 11, 2022 at 10:57:05AM -0700, Eric Dumazet wrote:
> > 
> > On 10/11/22 09:56, Jeremi Piotrowski wrote:
> > >Hi,
> > >
> > >One of our Flatcar users has been hitting the kernel BUG in the subject line
> > >for the past year (https://github.com/flatcar/Flatcar/issues/378). This was
> > >first reported when on 5.10.25, but has been happening across kernel updates,
> > >most recently with 5.15.63. The nodes where this happens are AWS EC2 instances,
> > >using ENA and calico networking in eBPF mode with VXLAN encapsulation. When
> > >GRO/GSO is enabled, the host hits this bug and prints the following stacktrace:
> > 
> > 
> > I suspect eBPF code lowers gso_size ?
> > 
> > gso stack is not able to arbitrarily segment a GRO packet after
> > gso_size being changed.
> > 
> > 
> 
> This was a good hint, see Tomas' response for some more observations.
> 
> This appears to still be happening with Calico v3.23 which started passing
> BPF_F_ADJ_ROOM_FIXED_GSO to bpf_skb_adjust_room() on the decap (rx) path.
> BPF_F_ADJ_ROOM_FIXED_GSO is not passed on the encap (tx) path. It is enough to
> disable GRO to stop the BUG from being hit though, so there must be more going
> on here ? (since the rx path does not change gso_size any longer).
>

Hi,

I encountered a similar error. The calico version is v3.24.5.
It was crash at BUG_ON(skb_headlen(list_skb) > len) with the following stacktrace.
But i don't konw how to reproduce it.

    [exception RIP: skb_segment+3016]
    RIP: ffffffffb97df2a8  RSP: ffffa3f2cce08728  RFLAGS: 00010293
    RAX: 000000000000007d  RBX: 00000000fffff7b3  RCX: 0000000000000011
    RDX: 0000000000000000  RSI: ffff895ea32c76c0  RDI: 00000000000008c1
    RBP: ffffa3f2cce087f8   R8: 000000000000088f   R9: 0000000000000011
    R10: 000000000000090c  R11: ffff895e47e68000  R12: ffff895eb2022f00
    R13: 000000000000004b  R14: ffff895ecdaf2000  R15: ffff895eb2023f00
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #9 [ffffa3f2cce08720] skb_segment at ffffffffb97ded63
#10 [ffffa3f2cce08800] tcp_gso_segment at ffffffffb98d0320
#11 [ffffa3f2cce08860] tcp4_gso_segment at ffffffffb98d07a3
#12 [ffffa3f2cce08880] inet_gso_segment at ffffffffb98e6de0
#13 [ffffa3f2cce088e0] skb_mac_gso_segment at ffffffffb97f3741
#14 [ffffa3f2cce08918] skb_udp_tunnel_segment at ffffffffb98daa59
#15 [ffffa3f2cce08980] udp4_ufo_fragment at ffffffffb98db471
#16 [ffffa3f2cce089b0] inet_gso_segment at ffffffffb98e6de0
#17 [ffffa3f2cce08a10] skb_mac_gso_segment at ffffffffb97f3741
#18 [ffffa3f2cce08a48] __skb_gso_segment at ffffffffb97f388e
#19 [ffffa3f2cce08a78] validate_xmit_skb at ffffffffb97f3d6e
#20 [ffffa3f2cce08ab8] __dev_queue_xmit at ffffffffb97f4614
#21 [ffffa3f2cce08b50] dev_queue_xmit at ffffffffb97f5030
#22 [ffffa3f2cce08b60] __bpf_redirect at ffffffffb98199a8
#23 [ffffa3f2cce08b88] skb_do_redirect at ffffffffb98205cd
#24 [ffffa3f2cce08bb8] __netif_receive_skb_core at ffffffffb97f6585
#25 [ffffa3f2cce08c68] __netif_receive_skb_list_core at ffffffffb97f6c0a
#26 [ffffa3f2cce08ce8] netif_receive_skb_list_internal at ffffffffb97f6f6a
#27 [ffffa3f2cce08d60] gro_normal_list at ffffffffb97f717e
#28 [ffffa3f2cce08d80] gro_normal_one at ffffffffb97f721c
#29 [ffffa3f2cce08db8] napi_gro_complete at ffffffffb97f72ac
#30 [ffffa3f2cce08de0] napi_gro_flush at ffffffffb97f73c1
#31 [ffffa3f2cce08e30] napi_complete_done at ffffffffb97f7d1e
#32 [ffffa3f2cce08e60] ice_napi_poll at ffffffffc0477dd6 [ice]
#33 [ffffa3f2cce08ec0] __napi_poll at ffffffffb97f823e
#34 [ffffa3f2cce08ef0] net_rx_action at ffffffffb97f86f1
#35 [ffffa3f2cce08f70] __softirqentry_text_start at ffffffffb9e000dd
#36 [ffffa3f2cce08fd8] irq_exit_rcu at ffffffffb9096074
#37 [ffffa3f2cce08ff0] common_interrupt at ffffffffb9a3272a

the gso_size is 75 which may subtract 50(the vxlan head length) by bpf_skb_adjust_room?。
the frag_list has one element which head_frag is 1. the skb_shared_info struct is as following.

struct skb_shared_info {
    nr_frags = 17 '\021', 
    gso_size = 75, 
    gso_segs = 0, 
    frag_list = 0xffff895eb2022f00, 
    gso_type = 1035, 
    destructor_arg = 0x2d656c6261747372, 
    frags = {{
        bv_page = 0xfffff80e86d4d180, 
        bv_len = 125, 
        bv_offset = 2306
      },
    ....
    }
}

If anyone has any suggestions excepth disabling GRO/GSO. The BPF_F_ADJ_ROOM_FIXED_GSO flag 
can be enabled on the encap path? I‘d love to provide more information if you need.

fred

  reply	other threads:[~2022-10-21 10:00 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-11 16:56 kernel BUG at net/core/skbuff.c:4219 Jeremi Piotrowski
2022-10-11 17:57 ` Eric Dumazet
2022-10-21 10:00   ` Jeremi Piotrowski [this message]
2024-04-11  6:23     ` dracoding
     [not found] <CAM=1FV3ODgP1+iST6zVh4EFY9WLf=Us8PTTmbH=8KF1Xc7zmvA@mail.gmail.com>
2022-10-19 19:10 ` Tomas Hruby

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20221021100022.GA31916@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net \
    --to=jpiotrowski@linux.microsoft.com \
    --cc=edumazet@google.com \
    --cc=eric.dumazet@gmail.com \
    --cc=herbert@gondor.apana.org.au \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=seh@panix.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.