LKML Archive mirror
 help / color / mirror / Atom feed
* [syzbot] [net?] INFO: task hung in register_nexthop_notifier (3)
@ 2024-03-18 11:26 syzbot
  2024-03-18 14:46 ` Eric Dumazet
  2024-06-20 13:39 ` Tetsuo Handa
  0 siblings, 2 replies; 5+ messages in thread
From: syzbot @ 2024-03-18 11:26 UTC (permalink / raw
  To: davem, dsahern, edumazet, kuba, linux-kernel, netdev, pabeni,
	syzkaller-bugs

Hello,

syzbot found the following issue on:

HEAD commit:    76839e2f1fde net/packet: Add getsockopt support for PACKET..
git tree:       net-next
console output: https://syzkaller.appspot.com/x/log.txt?x=10c38e3e180000
kernel config:  https://syzkaller.appspot.com/x/.config?x=63afafeedf00ef8f
dashboard link: https://syzkaller.appspot.com/bug?extid=99b8125966713aa4b0c3
compiler:       Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40

Unfortunately, I don't have any reproducer for this issue yet.

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/1fdcc43b8d91/disk-76839e2f.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/7710da23cc3a/vmlinux-76839e2f.xz
kernel image: https://storage.googleapis.com/syzbot-assets/2b8715f33de4/bzImage-76839e2f.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+99b8125966713aa4b0c3@syzkaller.appspotmail.com

INFO: task syz-executor.3:6975 blocked for more than 143 seconds.
      Not tainted 6.8.0-rc7-syzkaller-02500-g76839e2f1fde #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:syz-executor.3  state:D stack:20920 pid:6975  tgid:6975  ppid:1      flags:0x00004006
Call Trace:
 <TASK>
 context_switch kernel/sched/core.c:5400 [inline]
 __schedule+0x17d1/0x49f0 kernel/sched/core.c:6727
 __schedule_loop kernel/sched/core.c:6802 [inline]
 schedule+0x149/0x260 kernel/sched/core.c:6817
 schedule_preempt_disabled+0x13/0x30 kernel/sched/core.c:6874
 __mutex_lock_common kernel/locking/mutex.c:684 [inline]
 __mutex_lock+0x6a3/0xd70 kernel/locking/mutex.c:752
 register_nexthop_notifier+0x84/0x290 net/ipv4/nexthop.c:3863
 nsim_fib_create+0x8a6/0xa70 drivers/net/netdevsim/fib.c:1587
 nsim_drv_probe+0x747/0xb80 drivers/net/netdevsim/dev.c:1582
 really_probe+0x29e/0xc50 drivers/base/dd.c:658
 __driver_probe_device+0x1a2/0x3e0 drivers/base/dd.c:800
 driver_probe_device+0x50/0x430 drivers/base/dd.c:830
 __device_attach_driver+0x2d6/0x530 drivers/base/dd.c:958
 bus_for_each_drv+0x24e/0x2e0 drivers/base/bus.c:457
 __device_attach+0x333/0x520 drivers/base/dd.c:1030
 bus_probe_device+0x189/0x260 drivers/base/bus.c:532
 device_add+0x8ff/0xca0 drivers/base/core.c:3639
 nsim_bus_dev_new drivers/net/netdevsim/bus.c:442 [inline]
 new_device_store+0x3f2/0x890 drivers/net/netdevsim/bus.c:173
 kernfs_fop_write_iter+0x3a4/0x500 fs/kernfs/file.c:334
 call_write_iter include/linux/fs.h:2087 [inline]
 new_sync_write fs/read_write.c:497 [inline]
 vfs_write+0xa81/0xcb0 fs/read_write.c:590
 ksys_write+0x1a0/0x2c0 fs/read_write.c:643
 do_syscall_64+0xf9/0x240
 entry_SYSCALL_64_after_hwframe+0x6f/0x77
RIP: 0033:0x7f12ecc7caef
RSP: 002b:00007ffd6d32df00 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 00007f12ecc7caef
RDX: 0000000000000003 RSI: 00007ffd6d32df50 RDI: 0000000000000005
RBP: 00007f12eccca045 R08: 0000000000000000 R09: 00007ffd6d32dd57
R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000003
R13: 00007ffd6d32df50 R14: 00007f12ed8d4620 R15: 0000000000000003
 </TASK>
INFO: task syz-executor.2:7486 blocked for more than 143 seconds.
      Not tainted 6.8.0-rc7-syzkaller-02500-g76839e2f1fde #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:syz-executor.2  state:D stack:26768 pid:7486  tgid:7483  ppid:5094   flags:0x00004006
Call Trace:
 <TASK>
 context_switch kernel/sched/core.c:5400 [inline]
 __schedule+0x17d1/0x49f0 kernel/sched/core.c:6727
 __schedule_loop kernel/sched/core.c:6802 [inline]
 schedule+0x149/0x260 kernel/sched/core.c:6817
 schedule_preempt_disabled+0x13/0x30 kernel/sched/core.c:6874
 __mutex_lock_common kernel/locking/mutex.c:684 [inline]
 __mutex_lock+0x6a3/0xd70 kernel/locking/mutex.c:752
 rtnl_lock net/core/rtnetlink.c:79 [inline]
 rtnetlink_rcv_msg+0x842/0x10d0 net/core/rtnetlink.c:6592
 netlink_rcv_skb+0x1e3/0x430 net/netlink/af_netlink.c:2559
 netlink_unicast_kernel net/netlink/af_netlink.c:1335 [inline]
 netlink_unicast+0x7ea/0x980 net/netlink/af_netlink.c:1361
 netlink_sendmsg+0x8e0/0xcb0 net/netlink/af_netlink.c:1905
 sock_sendmsg_nosec net/socket.c:730 [inline]
 __sock_sendmsg+0x221/0x270 net/socket.c:745
 ____sys_sendmsg+0x525/0x7d0 net/socket.c:2584
 ___sys_sendmsg net/socket.c:2638 [inline]
 __sys_sendmsg+0x2b0/0x3a0 net/socket.c:2667
 do_syscall_64+0xf9/0x240
 entry_SYSCALL_64_after_hwframe+0x6f/0x77
RIP: 0033:0x7f406d87dda9
RSP: 002b:00007f406e5dd0c8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f406d9abf80 RCX: 00007f406d87dda9
RDX: 0000000000000000 RSI: 0000000020000140 RDI: 0000000000000003
RBP: 00007f406d8ca47a R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 000000000000000b R14: 00007f406d9abf80 R15: 00007ffe1ff0e4a8
 </TASK>
INFO: task syz-executor.0:7485 blocked for more than 144 seconds.
      Not tainted 6.8.0-rc7-syzkaller-02500-g76839e2f1fde #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:syz-executor.0  state:D stack:26768 pid:7485  tgid:7484  ppid:5093   flags:0x00004006
Call Trace:
 <TASK>
 context_switch kernel/sched/core.c:5400 [inline]
 __schedule+0x17d1/0x49f0 kernel/sched/core.c:6727
 __schedule_loop kernel/sched/core.c:6802 [inline]
 schedule+0x149/0x260 kernel/sched/core.c:6817
 schedule_preempt_disabled+0x13/0x30 kernel/sched/core.c:6874
 __mutex_lock_common kernel/locking/mutex.c:684 [inline]
 __mutex_lock+0x6a3/0xd70 kernel/locking/mutex.c:752
 rtnl_lock net/core/rtnetlink.c:79 [inline]
 rtnetlink_rcv_msg+0x842/0x10d0 net/core/rtnetlink.c:6592
 netlink_rcv_skb+0x1e3/0x430 net/netlink/af_netlink.c:2559
 netlink_unicast_kernel net/netlink/af_netlink.c:1335 [inline]
 netlink_unicast+0x7ea/0x980 net/netlink/af_netlink.c:1361
 netlink_sendmsg+0x8e0/0xcb0 net/netlink/af_netlink.c:1905
 sock_sendmsg_nosec net/socket.c:730 [inline]
 __sock_sendmsg+0x221/0x270 net/socket.c:745
 ____sys_sendmsg+0x525/0x7d0 net/socket.c:2584
 ___sys_sendmsg net/socket.c:2638 [inline]
 __sys_sendmsg+0x2b0/0x3a0 net/socket.c:2667
 do_syscall_64+0xf9/0x240
 entry_SYSCALL_64_after_hwframe+0x6f/0x77
RIP: 0033:0x7f7143c7dda9
RSP: 002b:00007f714491f0c8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f7143dabf80 RCX: 00007f7143c7dda9
RDX: 0000000000000000 RSI: 0000000020000140 RDI: 0000000000000003
RBP: 00007f7143cca47a R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 000000000000000b R14: 00007f7143dabf80 R15: 00007ffe6918ff78
 </TASK>
INFO: task syz-executor.4:7489 blocked for more than 144 seconds.
      Not tainted 6.8.0-rc7-syzkaller-02500-g76839e2f1fde #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:syz-executor.4  state:D stack:25520 pid:7489  tgid:7488  ppid:5107   flags:0x00004006
Call Trace:
 <TASK>
 context_switch kernel/sched/core.c:5400 [inline]
 __schedule+0x17d1/0x49f0 kernel/sched/core.c:6727
 __schedule_loop kernel/sched/core.c:6802 [inline]
 schedule+0x149/0x260 kernel/sched/core.c:6817
 schedule_preempt_disabled+0x13/0x30 kernel/sched/core.c:6874
 __mutex_lock_common kernel/locking/mutex.c:684 [inline]
 __mutex_lock+0x6a3/0xd70 kernel/locking/mutex.c:752
 ieee802154_nl_fill_phy+0xf2/0x790 net/ieee802154/nl-phy.c:45
 ieee802154_dump_phy_iter+0x110/0x200 net/ieee802154/nl-phy.c:129
 class_for_each_device+0x262/0x380 drivers/base/class.c:404
 wpan_phy_for_each+0xbf/0x100 net/ieee802154/core.c:62
 ieee802154_dump_phy+0x160/0x290 net/ieee802154/nl-phy.c:154
 genl_dumpit+0x107/0x1a0 net/netlink/genetlink.c:1025
 netlink_dump+0x6ff/0xe50 net/netlink/af_netlink.c:2269
 __netlink_dump_start+0x59d/0x780 net/netlink/af_netlink.c:2386
 genl_family_rcv_msg_dumpit net/netlink/genetlink.c:1074 [inline]
 genl_family_rcv_msg net/netlink/genetlink.c:1190 [inline]
 genl_rcv_msg+0x88c/0xec0 net/netlink/genetlink.c:1208
 netlink_rcv_skb+0x1e3/0x430 net/netlink/af_netlink.c:2559
 genl_rcv+0x28/0x40 net/netlink/genetlink.c:1217
 netlink_unicast_kernel net/netlink/af_netlink.c:1335 [inline]
 netlink_unicast+0x7ea/0x980 net/netlink/af_netlink.c:1361
 netlink_sendmsg+0x8e0/0xcb0 net/netlink/af_netlink.c:1905
 sock_sendmsg_nosec net/socket.c:730 [inline]
 __sock_sendmsg+0x221/0x270 net/socket.c:745
 ____sys_sendmsg+0x525/0x7d0 net/socket.c:2584
 ___sys_sendmsg net/socket.c:2638 [inline]
 __sys_sendmsg+0x2b0/0x3a0 net/socket.c:2667
 do_syscall_64+0xf9/0x240
 entry_SYSCALL_64_after_hwframe+0x6f/0x77
RIP: 0033:0x7fdbb3e7dda9
RSP: 002b:00007fdbb4b300c8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007fdbb3fabf80 RCX: 00007fdbb3e7dda9
RDX: 0000000000000000 RSI: 00000000200001c0 RDI: 0000000000000004
RBP: 00007fdbb3eca47a R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 000000000000000b R14: 00007fdbb3fabf80 R15: 00007ffc0809fb68
 </TASK>
INFO: task syz-executor.1:7491 blocked for more than 145 seconds.
      Not tainted 6.8.0-rc7-syzkaller-02500-g76839e2f1fde #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:syz-executor.1  state:D stack:27536 pid:7491  tgid:7490  ppid:6204   flags:0x00004006
Call Trace:
 <TASK>
 context_switch kernel/sched/core.c:5400 [inline]
 __schedule+0x17d1/0x49f0 kernel/sched/core.c:6727
 __schedule_loop kernel/sched/core.c:6802 [inline]
 schedule+0x149/0x260 kernel/sched/core.c:6817
 schedule_preempt_disabled+0x13/0x30 kernel/sched/core.c:6874
 __mutex_lock_common kernel/locking/mutex.c:684 [inline]
 __mutex_lock+0x6a3/0xd70 kernel/locking/mutex.c:752
 genl_lock net/netlink/genetlink.c:33 [inline]
 genl_op_lock net/netlink/genetlink.c:58 [inline]
 genl_rcv_msg+0x121/0xec0 net/netlink/genetlink.c:1207
 netlink_rcv_skb+0x1e3/0x430 net/netlink/af_netlink.c:2559
 genl_rcv+0x28/0x40 net/netlink/genetlink.c:1217
 netlink_unicast_kernel net/netlink/af_netlink.c:1335 [inline]
 netlink_unicast+0x7ea/0x980 net/netlink/af_netlink.c:1361
 netlink_sendmsg+0x8e0/0xcb0 net/netlink/af_netlink.c:1905
 sock_sendmsg_nosec net/socket.c:730 [inline]
 __sock_sendmsg+0x221/0x270 net/socket.c:745
 __sys_sendto+0x3a4/0x4f0 net/socket.c:2191
 __do_sys_sendto net/socket.c:2203 [inline]
 __se_sys_sendto net/socket.c:2199 [inline]
 __x64_sys_sendto+0xde/0x100 net/socket.c:2199
 do_syscall_64+0xf9/0x240
 entry_SYSCALL_64_after_hwframe+0x6f/0x77
RIP: 0033:0x7f13f947fa9c
RSP: 002b:00007f13fa1eef60 EFLAGS: 00000293 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00007f13fa1ef050 RCX: 00007f13f947fa9c
RDX: 0000000000000020 RSI: 00007f13fa1ef0a0 RDI: 0000000000000004
RBP: 0000000000000000 R08: 00007f13fa1eefb4 R09: 000000000000000c
R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000004
R13: 00007f13fa1ef008 R14: 00007f13fa1ef0a0 R15: 0000000000000000
 </TASK>

Showing all locks held in the system:
2 locks held by kworker/u4:1/12:
1 lock held by khungtaskd/29:
 #0: ffffffff8e130be0 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire include/linux/rcupdate.h:298 [inline]
 #0: ffffffff8e130be0 (rcu_read_lock){....}-{1:2}, at: rcu_read_lock include/linux/rcupdate.h:750 [inline]
 #0: ffffffff8e130be0 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x55/0x2a0 kernel/locking/lockdep.c:6614
2 locks held by getty/4820:
 #0: ffff888029f0d0a0 (&tty->ldisc_sem){++++}-{0:0}, at: tty_ldisc_ref_wait+0x25/0x70 drivers/tty/tty_ldisc.c:243
 #1: ffffc90002f062f0 (&ldata->atomic_read_lock){+.+.}-{3:3}, at: n_tty_read+0x6b4/0x1e10 drivers/tty/n_tty.c:2201
5 locks held by kworker/u4:7/23559:
 #0: ffff888015ea4938 ((wq_completion)netns){+.+.}-{0:0}, at: process_one_work kernel/workqueue.c:2608 [inline]
 #0: ffff888015ea4938 ((wq_completion)netns){+.+.}-{0:0}, at: process_scheduled_works+0x825/0x1420 kernel/workqueue.c:2706
 #1: ffffc90012b8fd20 (net_cleanup_work){+.+.}-{0:0}, at: process_one_work kernel/workqueue.c:2608 [inline]
 #1: ffffc90012b8fd20 (net_cleanup_work){+.+.}-{0:0}, at: process_scheduled_works+0x825/0x1420 kernel/workqueue.c:2706
 #2: ffffffff8f36d250 (pernet_ops_rwsem){++++}-{3:3}, at: cleanup_net+0x16a/0xcc0 net/core/net_namespace.c:591
 #3: ffffffff8f3798c8 (rtnl_mutex){+.+.}-{3:3}, at: cleanup_net+0x6af/0xcc0 net/core/net_namespace.c:627
 #4: ffffffff8e136440 (rcu_state.barrier_mutex){+.+.}-{3:3}, at: rcu_barrier+0x4c/0x550 kernel/rcu/tree.c:4064
1 lock held by syz-executor.3/895:
 #0: ffffffff8f3798c8 (rtnl_mutex){+.+.}-{3:3}, at: tun_detach drivers/net/tun.c:698 [inline]
 #0: ffffffff8f3798c8 (rtnl_mutex){+.+.}-{3:3}, at: tun_chr_close+0x3e/0x1b0 drivers/net/tun.c:3498
1 lock held by syz-executor.3/2766:
 #0: ffffffff8f3798c8 (rtnl_mutex){+.+.}-{3:3}, at: tun_detach drivers/net/tun.c:698 [inline]
 #0: ffffffff8f3798c8 (rtnl_mutex){+.+.}-{3:3}, at: tun_chr_close+0x3e/0x1b0 drivers/net/tun.c:3498
1 lock held by syz-executor.3/4747:
 #0: ffffffff8f3798c8 (rtnl_mutex){+.+.}-{3:3}, at: tun_detach drivers/net/tun.c:698 [inline]
 #0: ffffffff8f3798c8 (rtnl_mutex){+.+.}-{3:3}, at: tun_chr_close+0x3e/0x1b0 drivers/net/tun.c:3498
7 locks held by syz-executor.3/6975:
 #0: ffff888029d6a420 (sb_writers#8){.+.+}-{0:0}, at: file_start_write include/linux/fs.h:2794 [inline]
 #0: ffff888029d6a420 (sb_writers#8){.+.+}-{0:0}, at: vfs_write+0x233/0xcb0 fs/read_write.c:586
 #1: ffff88809c7ab888 (&of->mutex){+.+.}-{3:3}, at: kernfs_fop_write_iter+0x1eb/0x500 fs/kernfs/file.c:325
 #2: ffff888021caba00 (kn->active#51){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x20f/0x500 fs/kernfs/file.c:326
 #3: ffffffff8ecb8508 (nsim_bus_dev_list_lock){+.+.}-{3:3}, at: new_device_store+0x1b4/0x890 drivers/net/netdevsim/bus.c:166
 #4: ffff88809acf50e8 (&dev->mutex){....}-{3:3}, at: device_lock include/linux/device.h:990 [inline]
 #4: ffff88809acf50e8 (&dev->mutex){....}-{3:3}, at: __device_attach+0x8e/0x520 drivers/base/dd.c:1005
 #5: ffff88809acf0250 (&devlink->lock_key#7){+.+.}-{3:3}, at: nsim_drv_probe+0xcb/0xb80 drivers/net/netdevsim/dev.c:1534
 #6: ffffffff8f3798c8 (rtnl_mutex){+.+.}-{3:3}, at: register_nexthop_notifier+0x84/0x290 net/ipv4/nexthop.c:3863
1 lock held by syz-executor.2/7486:
 #0: ffffffff8f3798c8 (rtnl_mutex){+.+.}-{3:3}, at: rtnl_lock net/core/rtnetlink.c:79 [inline]
 #0: ffffffff8f3798c8 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x842/0x10d0 net/core/rtnetlink.c:6592
1 lock held by syz-executor.0/7485:
 #0: ffffffff8f3798c8 (rtnl_mutex){+.+.}-{3:3}, at: rtnl_lock net/core/rtnetlink.c:79 [inline]
 #0: ffffffff8f3798c8 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x842/0x10d0 net/core/rtnetlink.c:6592
4 locks held by syz-executor.4/7489:
 #0: ffffffff8f3de3b0 (cb_lock){++++}-{3:3}, at: genl_rcv+0x19/0x40 net/netlink/genetlink.c:1216
 #1: ffff88808f3dd678 (nlk_cb_mutex-GENERIC){+.+.}-{3:3}, at: __netlink_dump_start+0x119/0x780 net/netlink/af_netlink.c:2350
 #2: ffffffff8f3de268 (genl_mutex){+.+.}-{3:3}, at: genl_lock net/netlink/genetlink.c:33 [inline]
 #2: ffffffff8f3de268 (genl_mutex){+.+.}-{3:3}, at: genl_op_lock net/netlink/genetlink.c:58 [inline]
 #2: ffffffff8f3de268 (genl_mutex){+.+.}-{3:3}, at: genl_dumpit+0xd6/0x1a0 net/netlink/genetlink.c:1024
 #3: ffffffff8f3798c8 (rtnl_mutex){+.+.}-{3:3}, at: ieee802154_nl_fill_phy+0xf2/0x790 net/ieee802154/nl-phy.c:45
2 locks held by syz-executor.1/7491:
 #0: ffffffff8f3de3b0 (cb_lock){++++}-{3:3}, at: genl_rcv+0x19/0x40 net/netlink/genetlink.c:1216
 #1: ffffffff8f3de268 (genl_mutex){+.+.}-{3:3}, at: genl_lock net/netlink/genetlink.c:33 [inline]
 #1: ffffffff8f3de268 (genl_mutex){+.+.}-{3:3}, at: genl_op_lock net/netlink/genetlink.c:58 [inline]
 #1: ffffffff8f3de268 (genl_mutex){+.+.}-{3:3}, at: genl_rcv_msg+0x121/0xec0 net/netlink/genetlink.c:1207
3 locks held by kworker/u4:0/7496:
 #0: ffff88802998bd38 ((wq_completion)ipv6_addrconf){+.+.}-{0:0}, at: process_one_work kernel/workqueue.c:2608 [inline]
 #0: ffff88802998bd38 ((wq_completion)ipv6_addrconf){+.+.}-{0:0}, at: process_scheduled_works+0x825/0x1420 kernel/workqueue.c:2706
 #1: ffffc9000314fd20 ((work_completion)(&(&net->ipv6.addr_chk_work)->work)){+.+.}-{0:0}, at: process_one_work kernel/workqueue.c:2608 [inline]
 #1: ffffc9000314fd20 ((work_completion)(&(&net->ipv6.addr_chk_work)->work)){+.+.}-{0:0}, at: process_scheduled_works+0x825/0x1420 kernel/workqueue.c:2706
 #2: ffffffff8f3798c8 (rtnl_mutex){+.+.}-{3:3}, at: addrconf_verify_work+0x19/0x30 net/ipv6/addrconf.c:4731
1 lock held by syz-executor.0/7498:
 #0: ffffffff8f3798c8 (rtnl_mutex){+.+.}-{3:3}, at: rtnl_lock net/core/rtnetlink.c:79 [inline]
 #0: ffffffff8f3798c8 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x842/0x10d0 net/core/rtnetlink.c:6592
1 lock held by syz-executor.4/7505:
 #0: ffffffff8f3798c8 (rtnl_mutex){+.+.}-{3:3}, at: rtnl_lock net/core/rtnetlink.c:79 [inline]
 #0: ffffffff8f3798c8 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x842/0x10d0 net/core/rtnetlink.c:6592
1 lock held by syz-executor.1/7506:
 #0: ffffffff8f3798c8 (rtnl_mutex){+.+.}-{3:3}, at: rtnl_lock net/core/rtnetlink.c:79 [inline]
 #0: ffffffff8f3798c8 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x842/0x10d0 net/core/rtnetlink.c:6592
1 lock held by syz-executor.2/7509:
 #0: ffffffff8f3798c8 (rtnl_mutex){+.+.}-{3:3}, at: rtnl_lock net/core/rtnetlink.c:79 [inline]
 #0: ffffffff8f3798c8 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x842/0x10d0 net/core/rtnetlink.c:6592
1 lock held by syz-executor.3/7516:
 #0: ffffffff8f3798c8 (rtnl_mutex){+.+.}-{3:3}, at: rtnl_lock net/core/rtnetlink.c:79 [inline]
 #0: ffffffff8f3798c8 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x842/0x10d0 net/core/rtnetlink.c:6592
1 lock held by syz-executor.0/7528:
 #0: ffffffff8f3798c8 (rtnl_mutex){+.+.}-{3:3}, at: rtnl_lock net/core/rtnetlink.c:79 [inline]
 #0: ffffffff8f3798c8 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x842/0x10d0 net/core/rtnetlink.c:6592
1 lock held by syz-executor.4/7533:
 #0: ffffffff8f3798c8 (rtnl_mutex){+.+.}-{3:3}, at: rtnl_lock net/core/rtnetlink.c:79 [inline]
 #0: ffffffff8f3798c8 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x842/0x10d0 net/core/rtnetlink.c:6592
1 lock held by syz-executor.1/7537:
 #0: ffffffff8f3798c8 (rtnl_mutex){+.+.}-{3:3}, at: rtnl_lock net/core/rtnetlink.c:79 [inline]
 #0: ffffffff8f3798c8 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x842/0x10d0 net/core/rtnetlink.c:6592
1 lock held by syz-executor.2/7539:
 #0: ffffffff8f3798c8 (rtnl_mutex){+.+.}-{3:3}, at: rtnl_lock net/core/rtnetlink.c:79 [inline]
 #0: ffffffff8f3798c8 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x842/0x10d0 net/core/rtnetlink.c:6592
1 lock held by syz-executor.3/7543:
 #0: ffffffff8f3798c8 (rtnl_mutex){+.+.}-{3:3}, at: rtnl_lock net/core/rtnetlink.c:79 [inline]
 #0: ffffffff8f3798c8 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x842/0x10d0 net/core/rtnetlink.c:6592

=============================================

NMI backtrace for cpu 0
CPU: 0 PID: 29 Comm: khungtaskd Not tainted 6.8.0-rc7-syzkaller-02500-g76839e2f1fde #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 02/29/2024
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:88 [inline]
 dump_stack_lvl+0x1e7/0x2e0 lib/dump_stack.c:106
 nmi_cpu_backtrace+0x49c/0x4d0 lib/nmi_backtrace.c:113
 nmi_trigger_cpumask_backtrace+0x198/0x320 lib/nmi_backtrace.c:62
 trigger_all_cpu_backtrace include/linux/nmi.h:160 [inline]
 check_hung_uninterruptible_tasks kernel/hung_task.c:222 [inline]
 watchdog+0xfaf/0xff0 kernel/hung_task.c:379
 kthread+0x2ef/0x390 kernel/kthread.c:388
 ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
 ret_from_fork_asm+0x1b/0x30 arch/x86/entry/entry_64.S:243
 </TASK>
Sending NMI from CPU 0 to CPUs 1:
NMI backtrace for cpu 1
CPU: 1 PID: 12 Comm: kworker/u4:1 Not tainted 6.8.0-rc7-syzkaller-02500-g76839e2f1fde #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 02/29/2024
Workqueue: bat_events batadv_nc_worker
RIP: 0010:bytes_is_nonzero mm/kasan/generic.c:89 [inline]
RIP: 0010:memory_is_nonzero mm/kasan/generic.c:104 [inline]
RIP: 0010:memory_is_poisoned_n mm/kasan/generic.c:129 [inline]
RIP: 0010:memory_is_poisoned mm/kasan/generic.c:161 [inline]
RIP: 0010:check_region_inline mm/kasan/generic.c:180 [inline]
RIP: 0010:kasan_check_range+0x8c/0x290 mm/kasan/generic.c:189
Code: 3c 31 4c 89 fd 4c 29 dd 48 83 fd 10 7f 29 48 85 ed 0f 84 3e 01 00 00 4c 89 cd 48 f7 d5 48 01 dd 41 80 3b 00 0f 85 c9 01 00 00 <49> ff c3 48 ff c5 75 ee e9 1e 01 00 00 45 89 dc 41 83 e4 07 0f 84
RSP: 0018:ffffc90000117860 EFLAGS: 00000046
RAX: 0000000000000001 RBX: 1ffffffff259469f RCX: ffffffff81713084
RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffffffff92ca34f8
RBP: ffffffffffffffff R08: ffffffff92ca34ff R09: 1ffffffff259469f
R10: dffffc0000000000 R11: fffffbfff259469f R12: ffff888016ead940
R13: ffff888016ead940 R14: dffffc0000000001 R15: fffffbfff25946a0
FS:  0000000000000000(0000) GS:ffff8880b9500000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000c00b564de0 CR3: 000000000df32000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <NMI>
 </NMI>
 <TASK>
 instrument_atomic_read include/linux/instrumented.h:68 [inline]
 _test_bit include/asm-generic/bitops/instrumented-non-atomic.h:141 [inline]
 hlock_class kernel/locking/lockdep.c:228 [inline]
 __lock_acquire+0x1264/0x1fd0 kernel/locking/lockdep.c:5134
 lock_acquire+0x1e3/0x530 kernel/locking/lockdep.c:5754
 __raw_spin_lock_bh include/linux/spinlock_api_smp.h:126 [inline]
 _raw_spin_lock_bh+0x35/0x50 kernel/locking/spinlock.c:178
 spin_lock_bh include/linux/spinlock.h:356 [inline]
 batadv_nc_purge_paths+0xe8/0x3b0 net/batman-adv/network-coding.c:442
 batadv_nc_worker+0x365/0x610 net/batman-adv/network-coding.c:722
 process_one_work kernel/workqueue.c:2633 [inline]
 process_scheduled_works+0x913/0x1420 kernel/workqueue.c:2706
 worker_thread+0xa5f/0x1000 kernel/workqueue.c:2787
 kthread+0x2ef/0x390 kernel/kthread.c:388
 ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
 ret_from_fork_asm+0x1b/0x30 arch/x86/entry/entry_64.S:243
 </TASK>


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

If the report is already addressed, let syzbot know by replying with:
#syz fix: exact-commit-title

If you want to overwrite report's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)

If the report is a duplicate of another one, reply with:
#syz dup: exact-subject-of-another-report

If you want to undo deduplication, reply with:
#syz undup

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [syzbot] [net?] INFO: task hung in register_nexthop_notifier (3)
  2024-03-18 11:26 [syzbot] [net?] INFO: task hung in register_nexthop_notifier (3) syzbot
@ 2024-03-18 14:46 ` Eric Dumazet
  2024-03-21  9:22   ` Antoine Tenart
  2024-06-20 13:39 ` Tetsuo Handa
  1 sibling, 1 reply; 5+ messages in thread
From: Eric Dumazet @ 2024-03-18 14:46 UTC (permalink / raw
  To: syzbot; +Cc: davem, dsahern, kuba, linux-kernel, netdev, pabeni,
	syzkaller-bugs

On Mon, Mar 18, 2024 at 12:26 PM syzbot
<syzbot+99b8125966713aa4b0c3@syzkaller.appspotmail.com> wrote:
>
> Hello,
>
> syzbot found the following issue on:
>
> HEAD commit:    76839e2f1fde net/packet: Add getsockopt support for PACKET..
> git tree:       net-next
> console output: https://syzkaller.appspot.com/x/log.txt?x=10c38e3e180000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=63afafeedf00ef8f
> dashboard link: https://syzkaller.appspot.com/bug?extid=99b8125966713aa4b0c3
> compiler:       Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
>
> Unfortunately, I don't have any reproducer for this issue yet.
>
> Downloadable assets:
> disk image: https://storage.googleapis.com/syzbot-assets/1fdcc43b8d91/disk-76839e2f.raw.xz
> vmlinux: https://storage.googleapis.com/syzbot-assets/7710da23cc3a/vmlinux-76839e2f.xz
> kernel image: https://storage.googleapis.com/syzbot-assets/2b8715f33de4/bzImage-76839e2f.xz
>
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+99b8125966713aa4b0c3@syzkaller.appspotmail.com
>
> INFO: task syz-executor.3:6975 blocked for more than 143 seconds.
>       Not tainted 6.8.0-rc7-syzkaller-02500-g76839e2f1fde #0
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> task:syz-executor.3  state:D stack:20920 pid:6975  tgid:6975  ppid:1      flags:0x00004006
> Call Trace:
>  <TASK>
>  context_switch kernel/sched/core.c:5400 [inline]
>  __schedule+0x17d1/0x49f0 kernel/sched/core.c:6727
>  __schedule_loop kernel/sched/core.c:6802 [inline]
>  schedule+0x149/0x260 kernel/sched/core.c:6817
>  schedule_preempt_disabled+0x13/0x30 kernel/sched/core.c:6874
>  __mutex_lock_common kernel/locking/mutex.c:684 [inline]
>  __mutex_lock+0x6a3/0xd70 kernel/locking/mutex.c:752
>  register_nexthop_notifier+0x84/0x290 net/ipv4/nexthop.c:3863
>  nsim_fib_create+0x8a6/0xa70 drivers/net/netdevsim/fib.c:1587
>  nsim_drv_probe+0x747/0xb80 drivers/net/netdevsim/dev.c:1582
>  really_probe+0x29e/0xc50 drivers/base/dd.c:658
>  __driver_probe_device+0x1a2/0x3e0 drivers/base/dd.c:800
>  driver_probe_device+0x50/0x430 drivers/base/dd.c:830
>  __device_attach_driver+0x2d6/0x530 drivers/base/dd.c:958
>  bus_for_each_drv+0x24e/0x2e0 drivers/base/bus.c:457
>  __device_attach+0x333/0x520 drivers/base/dd.c:1030
>  bus_probe_device+0x189/0x260 drivers/base/bus.c:532
>  device_add+0x8ff/0xca0 drivers/base/core.c:3639
>  nsim_bus_dev_new drivers/net/netdevsim/bus.c:442 [inline]
>  new_device_store+0x3f2/0x890 drivers/net/netdevsim/bus.c:173
>  kernfs_fop_write_iter+0x3a4/0x500 fs/kernfs/file.c:334

So we have a sysfs handler ultimately calling register_nexthop_notifier() or any
other network control path requiring RTNL.

Note that we have rtnl_trylock() for a reason...

Or maybe the reason is wrong, if we could change kernfs_fop_write_iter()
to no longer hold a mutex...

>  call_write_iter include/linux/fs.h:2087 [inline]
>  new_sync_write fs/read_write.c:497 [inline]
>  vfs_write+0xa81/0xcb0 fs/read_write.c:590
>  ksys_write+0x1a0/0x2c0 fs/read_write.c:643
>  do_syscall_64+0xf9/0x240
>  entry_SYSCALL_64_after_hwframe+0x6f/0x77
> RIP: 0033:0x7f12ecc7caef
> RSP: 002b:00007ffd6d32df00 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
> RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 00007f12ecc7caef
> RDX: 0000000000000003 RSI: 00007ffd6d32df50 RDI: 0000000000000005
> RBP: 00007f12eccca045 R08: 0000000000000000 R09: 00007ffd6d32dd57
> R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000003
> R13: 00007ffd6d32df50 R14: 00007f12ed8d4620 R15: 0000000000000003
>  </TASK>
> INFO: task syz-executor.2:7486 blocked for more than 143 seconds.
>       Not tainted 6.8.0-rc7-syzkaller-02500-g76839e2f1fde #0
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> task:syz-executor.2  state:D stack:26768 pid:7486  tgid:7483  ppid:5094   flags:0x00004006
> Call Trace:
>  <TASK>
>  context_switch kernel/sched/core.c:5400 [inline]
>  __schedule+0x17d1/0x49f0 kernel/sched/core.c:6727
>  __schedule_loop kernel/sched/core.c:6802 [inline]
>  schedule+0x149/0x260 kernel/sched/core.c:6817
>  schedule_preempt_disabled+0x13/0x30 kernel/sched/core.c:6874
>  __mutex_lock_common kernel/locking/mutex.c:684 [inline]
>  __mutex_lock+0x6a3/0xd70 kernel/locking/mutex.c:752
>  rtnl_lock net/core/rtnetlink.c:79 [inline]
>  rtnetlink_rcv_msg+0x842/0x10d0 net/core/rtnetlink.c:6592
>  netlink_rcv_skb+0x1e3/0x430 net/netlink/af_netlink.c:2559
>  netlink_unicast_kernel net/netlink/af_netlink.c:1335 [inline]
>  netlink_unicast+0x7ea/0x980 net/netlink/af_netlink.c:1361
>  netlink_sendmsg+0x8e0/0xcb0 net/netlink/af_netlink.c:1905
>  sock_sendmsg_nosec net/socket.c:730 [inline]
>  __sock_sendmsg+0x221/0x270 net/socket.c:745
>  ____sys_sendmsg+0x525/0x7d0 net/socket.c:2584
>  ___sys_sendmsg net/socket.c:2638 [inline]
>  __sys_sendmsg+0x2b0/0x3a0 net/socket.c:2667
>  do_syscall_64+0xf9/0x240
>  entry_SYSCALL_64_after_hwframe+0x6f/0x77
> RIP: 0033:0x7f406d87dda9
> RSP: 002b:00007f406e5dd0c8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
> RAX: ffffffffffffffda RBX: 00007f406d9abf80 RCX: 00007f406d87dda9
> RDX: 0000000000000000 RSI: 0000000020000140 RDI: 0000000000000003
> RBP: 00007f406d8ca47a R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
> R13: 000000000000000b R14: 00007f406d9abf80 R15: 00007ffe1ff0e4a8
>  </TASK>
> INFO: task syz-executor.0:7485 blocked for more than 144 seconds.
>       Not tainted 6.8.0-rc7-syzkaller-02500-g76839e2f1fde #0
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> task:syz-executor.0  state:D stack:26768 pid:7485  tgid:7484  ppid:5093   flags:0x00004006
> Call Trace:
>  <TASK>
>  context_switch kernel/sched/core.c:5400 [inline]
>  __schedule+0x17d1/0x49f0 kernel/sched/core.c:6727
>  __schedule_loop kernel/sched/core.c:6802 [inline]
>  schedule+0x149/0x260 kernel/sched/core.c:6817
>  schedule_preempt_disabled+0x13/0x30 kernel/sched/core.c:6874
>  __mutex_lock_common kernel/locking/mutex.c:684 [inline]
>  __mutex_lock+0x6a3/0xd70 kernel/locking/mutex.c:752
>  rtnl_lock net/core/rtnetlink.c:79 [inline]
>  rtnetlink_rcv_msg+0x842/0x10d0 net/core/rtnetlink.c:6592
>  netlink_rcv_skb+0x1e3/0x430 net/netlink/af_netlink.c:2559
>  netlink_unicast_kernel net/netlink/af_netlink.c:1335 [inline]
>  netlink_unicast+0x7ea/0x980 net/netlink/af_netlink.c:1361
>  netlink_sendmsg+0x8e0/0xcb0 net/netlink/af_netlink.c:1905
>  sock_sendmsg_nosec net/socket.c:730 [inline]
>  __sock_sendmsg+0x221/0x270 net/socket.c:745
>  ____sys_sendmsg+0x525/0x7d0 net/socket.c:2584
>  ___sys_sendmsg net/socket.c:2638 [inline]
>  __sys_sendmsg+0x2b0/0x3a0 net/socket.c:2667
>  do_syscall_64+0xf9/0x240
>  entry_SYSCALL_64_after_hwframe+0x6f/0x77
> RIP: 0033:0x7f7143c7dda9
> RSP: 002b:00007f714491f0c8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
> RAX: ffffffffffffffda RBX: 00007f7143dabf80 RCX: 00007f7143c7dda9
> RDX: 0000000000000000 RSI: 0000000020000140 RDI: 0000000000000003
> RBP: 00007f7143cca47a R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
> R13: 000000000000000b R14: 00007f7143dabf80 R15: 00007ffe6918ff78
>  </TASK>
> INFO: task syz-executor.4:7489 blocked for more than 144 seconds.
>       Not tainted 6.8.0-rc7-syzkaller-02500-g76839e2f1fde #0
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> task:syz-executor.4  state:D stack:25520 pid:7489  tgid:7488  ppid:5107   flags:0x00004006
> Call Trace:
>  <TASK>
>  context_switch kernel/sched/core.c:5400 [inline]
>  __schedule+0x17d1/0x49f0 kernel/sched/core.c:6727
>  __schedule_loop kernel/sched/core.c:6802 [inline]
>  schedule+0x149/0x260 kernel/sched/core.c:6817
>  schedule_preempt_disabled+0x13/0x30 kernel/sched/core.c:6874
>  __mutex_lock_common kernel/locking/mutex.c:684 [inline]
>  __mutex_lock+0x6a3/0xd70 kernel/locking/mutex.c:752
>  ieee802154_nl_fill_phy+0xf2/0x790 net/ieee802154/nl-phy.c:45
>  ieee802154_dump_phy_iter+0x110/0x200 net/ieee802154/nl-phy.c:129
>  class_for_each_device+0x262/0x380 drivers/base/class.c:404
>  wpan_phy_for_each+0xbf/0x100 net/ieee802154/core.c:62
>  ieee802154_dump_phy+0x160/0x290 net/ieee802154/nl-phy.c:154
>  genl_dumpit+0x107/0x1a0 net/netlink/genetlink.c:1025
>  netlink_dump+0x6ff/0xe50 net/netlink/af_netlink.c:2269
>  __netlink_dump_start+0x59d/0x780 net/netlink/af_netlink.c:2386
>  genl_family_rcv_msg_dumpit net/netlink/genetlink.c:1074 [inline]
>  genl_family_rcv_msg net/netlink/genetlink.c:1190 [inline]
>  genl_rcv_msg+0x88c/0xec0 net/netlink/genetlink.c:1208
>  netlink_rcv_skb+0x1e3/0x430 net/netlink/af_netlink.c:2559
>  genl_rcv+0x28/0x40 net/netlink/genetlink.c:1217
>  netlink_unicast_kernel net/netlink/af_netlink.c:1335 [inline]
>  netlink_unicast+0x7ea/0x980 net/netlink/af_netlink.c:1361
>  netlink_sendmsg+0x8e0/0xcb0 net/netlink/af_netlink.c:1905
>  sock_sendmsg_nosec net/socket.c:730 [inline]
>  __sock_sendmsg+0x221/0x270 net/socket.c:745
>  ____sys_sendmsg+0x525/0x7d0 net/socket.c:2584
>  ___sys_sendmsg net/socket.c:2638 [inline]
>  __sys_sendmsg+0x2b0/0x3a0 net/socket.c:2667
>  do_syscall_64+0xf9/0x240
>  entry_SYSCALL_64_after_hwframe+0x6f/0x77
> RIP: 0033:0x7fdbb3e7dda9
> RSP: 002b:00007fdbb4b300c8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
> RAX: ffffffffffffffda RBX: 00007fdbb3fabf80 RCX: 00007fdbb3e7dda9
> RDX: 0000000000000000 RSI: 00000000200001c0 RDI: 0000000000000004
> RBP: 00007fdbb3eca47a R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
> R13: 000000000000000b R14: 00007fdbb3fabf80 R15: 00007ffc0809fb68
>  </TASK>
> INFO: task syz-executor.1:7491 blocked for more than 145 seconds.
>       Not tainted 6.8.0-rc7-syzkaller-02500-g76839e2f1fde #0
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> task:syz-executor.1  state:D stack:27536 pid:7491  tgid:7490  ppid:6204   flags:0x00004006
> Call Trace:
>  <TASK>
>  context_switch kernel/sched/core.c:5400 [inline]
>  __schedule+0x17d1/0x49f0 kernel/sched/core.c:6727
>  __schedule_loop kernel/sched/core.c:6802 [inline]
>  schedule+0x149/0x260 kernel/sched/core.c:6817
>  schedule_preempt_disabled+0x13/0x30 kernel/sched/core.c:6874
>  __mutex_lock_common kernel/locking/mutex.c:684 [inline]
>  __mutex_lock+0x6a3/0xd70 kernel/locking/mutex.c:752
>  genl_lock net/netlink/genetlink.c:33 [inline]
>  genl_op_lock net/netlink/genetlink.c:58 [inline]
>  genl_rcv_msg+0x121/0xec0 net/netlink/genetlink.c:1207
>  netlink_rcv_skb+0x1e3/0x430 net/netlink/af_netlink.c:2559
>  genl_rcv+0x28/0x40 net/netlink/genetlink.c:1217
>  netlink_unicast_kernel net/netlink/af_netlink.c:1335 [inline]
>  netlink_unicast+0x7ea/0x980 net/netlink/af_netlink.c:1361
>  netlink_sendmsg+0x8e0/0xcb0 net/netlink/af_netlink.c:1905
>  sock_sendmsg_nosec net/socket.c:730 [inline]
>  __sock_sendmsg+0x221/0x270 net/socket.c:745
>  __sys_sendto+0x3a4/0x4f0 net/socket.c:2191
>  __do_sys_sendto net/socket.c:2203 [inline]
>  __se_sys_sendto net/socket.c:2199 [inline]
>  __x64_sys_sendto+0xde/0x100 net/socket.c:2199
>  do_syscall_64+0xf9/0x240
>  entry_SYSCALL_64_after_hwframe+0x6f/0x77
> RIP: 0033:0x7f13f947fa9c
> RSP: 002b:00007f13fa1eef60 EFLAGS: 00000293 ORIG_RAX: 000000000000002c
> RAX: ffffffffffffffda RBX: 00007f13fa1ef050 RCX: 00007f13f947fa9c
> RDX: 0000000000000020 RSI: 00007f13fa1ef0a0 RDI: 0000000000000004
> RBP: 0000000000000000 R08: 00007f13fa1eefb4 R09: 000000000000000c
> R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000004
> R13: 00007f13fa1ef008 R14: 00007f13fa1ef0a0 R15: 0000000000000000
>  </TASK>
>
> Showing all locks held in the system:
> 2 locks held by kworker/u4:1/12:
> 1 lock held by khungtaskd/29:
>  #0: ffffffff8e130be0 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire include/linux/rcupdate.h:298 [inline]
>  #0: ffffffff8e130be0 (rcu_read_lock){....}-{1:2}, at: rcu_read_lock include/linux/rcupdate.h:750 [inline]
>  #0: ffffffff8e130be0 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x55/0x2a0 kernel/locking/lockdep.c:6614
> 2 locks held by getty/4820:
>  #0: ffff888029f0d0a0 (&tty->ldisc_sem){++++}-{0:0}, at: tty_ldisc_ref_wait+0x25/0x70 drivers/tty/tty_ldisc.c:243
>  #1: ffffc90002f062f0 (&ldata->atomic_read_lock){+.+.}-{3:3}, at: n_tty_read+0x6b4/0x1e10 drivers/tty/n_tty.c:2201
> 5 locks held by kworker/u4:7/23559:
>  #0: ffff888015ea4938 ((wq_completion)netns){+.+.}-{0:0}, at: process_one_work kernel/workqueue.c:2608 [inline]
>  #0: ffff888015ea4938 ((wq_completion)netns){+.+.}-{0:0}, at: process_scheduled_works+0x825/0x1420 kernel/workqueue.c:2706
>  #1: ffffc90012b8fd20 (net_cleanup_work){+.+.}-{0:0}, at: process_one_work kernel/workqueue.c:2608 [inline]
>  #1: ffffc90012b8fd20 (net_cleanup_work){+.+.}-{0:0}, at: process_scheduled_works+0x825/0x1420 kernel/workqueue.c:2706
>  #2: ffffffff8f36d250 (pernet_ops_rwsem){++++}-{3:3}, at: cleanup_net+0x16a/0xcc0 net/core/net_namespace.c:591
>  #3: ffffffff8f3798c8 (rtnl_mutex){+.+.}-{3:3}, at: cleanup_net+0x6af/0xcc0 net/core/net_namespace.c:627
>  #4: ffffffff8e136440 (rcu_state.barrier_mutex){+.+.}-{3:3}, at: rcu_barrier+0x4c/0x550 kernel/rcu/tree.c:4064
> 1 lock held by syz-executor.3/895:
>  #0: ffffffff8f3798c8 (rtnl_mutex){+.+.}-{3:3}, at: tun_detach drivers/net/tun.c:698 [inline]
>  #0: ffffffff8f3798c8 (rtnl_mutex){+.+.}-{3:3}, at: tun_chr_close+0x3e/0x1b0 drivers/net/tun.c:3498
> 1 lock held by syz-executor.3/2766:
>  #0: ffffffff8f3798c8 (rtnl_mutex){+.+.}-{3:3}, at: tun_detach drivers/net/tun.c:698 [inline]
>  #0: ffffffff8f3798c8 (rtnl_mutex){+.+.}-{3:3}, at: tun_chr_close+0x3e/0x1b0 drivers/net/tun.c:3498
> 1 lock held by syz-executor.3/4747:
>  #0: ffffffff8f3798c8 (rtnl_mutex){+.+.}-{3:3}, at: tun_detach drivers/net/tun.c:698 [inline]
>  #0: ffffffff8f3798c8 (rtnl_mutex){+.+.}-{3:3}, at: tun_chr_close+0x3e/0x1b0 drivers/net/tun.c:3498
> 7 locks held by syz-executor.3/6975:
>  #0: ffff888029d6a420 (sb_writers#8){.+.+}-{0:0}, at: file_start_write include/linux/fs.h:2794 [inline]
>  #0: ffff888029d6a420 (sb_writers#8){.+.+}-{0:0}, at: vfs_write+0x233/0xcb0 fs/read_write.c:586
>  #1: ffff88809c7ab888 (&of->mutex){+.+.}-{3:3}, at: kernfs_fop_write_iter+0x1eb/0x500 fs/kernfs/file.c:325
>  #2: ffff888021caba00 (kn->active#51){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x20f/0x500 fs/kernfs/file.c:326
>  #3: ffffffff8ecb8508 (nsim_bus_dev_list_lock){+.+.}-{3:3}, at: new_device_store+0x1b4/0x890 drivers/net/netdevsim/bus.c:166
>  #4: ffff88809acf50e8 (&dev->mutex){....}-{3:3}, at: device_lock include/linux/device.h:990 [inline]
>  #4: ffff88809acf50e8 (&dev->mutex){....}-{3:3}, at: __device_attach+0x8e/0x520 drivers/base/dd.c:1005
>  #5: ffff88809acf0250 (&devlink->lock_key#7){+.+.}-{3:3}, at: nsim_drv_probe+0xcb/0xb80 drivers/net/netdevsim/dev.c:1534
>  #6: ffffffff8f3798c8 (rtnl_mutex){+.+.}-{3:3}, at: register_nexthop_notifier+0x84/0x290 net/ipv4/nexthop.c:3863
> 1 lock held by syz-executor.2/7486:
>  #0: ffffffff8f3798c8 (rtnl_mutex){+.+.}-{3:3}, at: rtnl_lock net/core/rtnetlink.c:79 [inline]
>  #0: ffffffff8f3798c8 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x842/0x10d0 net/core/rtnetlink.c:6592
> 1 lock held by syz-executor.0/7485:
>  #0: ffffffff8f3798c8 (rtnl_mutex){+.+.}-{3:3}, at: rtnl_lock net/core/rtnetlink.c:79 [inline]
>  #0: ffffffff8f3798c8 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x842/0x10d0 net/core/rtnetlink.c:6592
> 4 locks held by syz-executor.4/7489:
>  #0: ffffffff8f3de3b0 (cb_lock){++++}-{3:3}, at: genl_rcv+0x19/0x40 net/netlink/genetlink.c:1216
>  #1: ffff88808f3dd678 (nlk_cb_mutex-GENERIC){+.+.}-{3:3}, at: __netlink_dump_start+0x119/0x780 net/netlink/af_netlink.c:2350
>  #2: ffffffff8f3de268 (genl_mutex){+.+.}-{3:3}, at: genl_lock net/netlink/genetlink.c:33 [inline]
>  #2: ffffffff8f3de268 (genl_mutex){+.+.}-{3:3}, at: genl_op_lock net/netlink/genetlink.c:58 [inline]
>  #2: ffffffff8f3de268 (genl_mutex){+.+.}-{3:3}, at: genl_dumpit+0xd6/0x1a0 net/netlink/genetlink.c:1024
>  #3: ffffffff8f3798c8 (rtnl_mutex){+.+.}-{3:3}, at: ieee802154_nl_fill_phy+0xf2/0x790 net/ieee802154/nl-phy.c:45
> 2 locks held by syz-executor.1/7491:
>  #0: ffffffff8f3de3b0 (cb_lock){++++}-{3:3}, at: genl_rcv+0x19/0x40 net/netlink/genetlink.c:1216
>  #1: ffffffff8f3de268 (genl_mutex){+.+.}-{3:3}, at: genl_lock net/netlink/genetlink.c:33 [inline]
>  #1: ffffffff8f3de268 (genl_mutex){+.+.}-{3:3}, at: genl_op_lock net/netlink/genetlink.c:58 [inline]
>  #1: ffffffff8f3de268 (genl_mutex){+.+.}-{3:3}, at: genl_rcv_msg+0x121/0xec0 net/netlink/genetlink.c:1207
> 3 locks held by kworker/u4:0/7496:
>  #0: ffff88802998bd38 ((wq_completion)ipv6_addrconf){+.+.}-{0:0}, at: process_one_work kernel/workqueue.c:2608 [inline]
>  #0: ffff88802998bd38 ((wq_completion)ipv6_addrconf){+.+.}-{0:0}, at: process_scheduled_works+0x825/0x1420 kernel/workqueue.c:2706
>  #1: ffffc9000314fd20 ((work_completion)(&(&net->ipv6.addr_chk_work)->work)){+.+.}-{0:0}, at: process_one_work kernel/workqueue.c:2608 [inline]
>  #1: ffffc9000314fd20 ((work_completion)(&(&net->ipv6.addr_chk_work)->work)){+.+.}-{0:0}, at: process_scheduled_works+0x825/0x1420 kernel/workqueue.c:2706
>  #2: ffffffff8f3798c8 (rtnl_mutex){+.+.}-{3:3}, at: addrconf_verify_work+0x19/0x30 net/ipv6/addrconf.c:4731
> 1 lock held by syz-executor.0/7498:
>  #0: ffffffff8f3798c8 (rtnl_mutex){+.+.}-{3:3}, at: rtnl_lock net/core/rtnetlink.c:79 [inline]
>  #0: ffffffff8f3798c8 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x842/0x10d0 net/core/rtnetlink.c:6592
> 1 lock held by syz-executor.4/7505:
>  #0: ffffffff8f3798c8 (rtnl_mutex){+.+.}-{3:3}, at: rtnl_lock net/core/rtnetlink.c:79 [inline]
>  #0: ffffffff8f3798c8 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x842/0x10d0 net/core/rtnetlink.c:6592
> 1 lock held by syz-executor.1/7506:
>  #0: ffffffff8f3798c8 (rtnl_mutex){+.+.}-{3:3}, at: rtnl_lock net/core/rtnetlink.c:79 [inline]
>  #0: ffffffff8f3798c8 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x842/0x10d0 net/core/rtnetlink.c:6592
> 1 lock held by syz-executor.2/7509:
>  #0: ffffffff8f3798c8 (rtnl_mutex){+.+.}-{3:3}, at: rtnl_lock net/core/rtnetlink.c:79 [inline]
>  #0: ffffffff8f3798c8 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x842/0x10d0 net/core/rtnetlink.c:6592
> 1 lock held by syz-executor.3/7516:
>  #0: ffffffff8f3798c8 (rtnl_mutex){+.+.}-{3:3}, at: rtnl_lock net/core/rtnetlink.c:79 [inline]
>  #0: ffffffff8f3798c8 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x842/0x10d0 net/core/rtnetlink.c:6592
> 1 lock held by syz-executor.0/7528:
>  #0: ffffffff8f3798c8 (rtnl_mutex){+.+.}-{3:3}, at: rtnl_lock net/core/rtnetlink.c:79 [inline]
>  #0: ffffffff8f3798c8 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x842/0x10d0 net/core/rtnetlink.c:6592
> 1 lock held by syz-executor.4/7533:
>  #0: ffffffff8f3798c8 (rtnl_mutex){+.+.}-{3:3}, at: rtnl_lock net/core/rtnetlink.c:79 [inline]
>  #0: ffffffff8f3798c8 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x842/0x10d0 net/core/rtnetlink.c:6592
> 1 lock held by syz-executor.1/7537:
>  #0: ffffffff8f3798c8 (rtnl_mutex){+.+.}-{3:3}, at: rtnl_lock net/core/rtnetlink.c:79 [inline]
>  #0: ffffffff8f3798c8 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x842/0x10d0 net/core/rtnetlink.c:6592
> 1 lock held by syz-executor.2/7539:
>  #0: ffffffff8f3798c8 (rtnl_mutex){+.+.}-{3:3}, at: rtnl_lock net/core/rtnetlink.c:79 [inline]
>  #0: ffffffff8f3798c8 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x842/0x10d0 net/core/rtnetlink.c:6592
> 1 lock held by syz-executor.3/7543:
>  #0: ffffffff8f3798c8 (rtnl_mutex){+.+.}-{3:3}, at: rtnl_lock net/core/rtnetlink.c:79 [inline]
>  #0: ffffffff8f3798c8 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x842/0x10d0 net/core/rtnetlink.c:6592
>
> =============================================
>
> NMI backtrace for cpu 0
> CPU: 0 PID: 29 Comm: khungtaskd Not tainted 6.8.0-rc7-syzkaller-02500-g76839e2f1fde #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 02/29/2024
> Call Trace:
>  <TASK>
>  __dump_stack lib/dump_stack.c:88 [inline]
>  dump_stack_lvl+0x1e7/0x2e0 lib/dump_stack.c:106
>  nmi_cpu_backtrace+0x49c/0x4d0 lib/nmi_backtrace.c:113
>  nmi_trigger_cpumask_backtrace+0x198/0x320 lib/nmi_backtrace.c:62
>  trigger_all_cpu_backtrace include/linux/nmi.h:160 [inline]
>  check_hung_uninterruptible_tasks kernel/hung_task.c:222 [inline]
>  watchdog+0xfaf/0xff0 kernel/hung_task.c:379
>  kthread+0x2ef/0x390 kernel/kthread.c:388
>  ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
>  ret_from_fork_asm+0x1b/0x30 arch/x86/entry/entry_64.S:243
>  </TASK>
> Sending NMI from CPU 0 to CPUs 1:
> NMI backtrace for cpu 1
> CPU: 1 PID: 12 Comm: kworker/u4:1 Not tainted 6.8.0-rc7-syzkaller-02500-g76839e2f1fde #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 02/29/2024
> Workqueue: bat_events batadv_nc_worker
> RIP: 0010:bytes_is_nonzero mm/kasan/generic.c:89 [inline]
> RIP: 0010:memory_is_nonzero mm/kasan/generic.c:104 [inline]
> RIP: 0010:memory_is_poisoned_n mm/kasan/generic.c:129 [inline]
> RIP: 0010:memory_is_poisoned mm/kasan/generic.c:161 [inline]
> RIP: 0010:check_region_inline mm/kasan/generic.c:180 [inline]
> RIP: 0010:kasan_check_range+0x8c/0x290 mm/kasan/generic.c:189
> Code: 3c 31 4c 89 fd 4c 29 dd 48 83 fd 10 7f 29 48 85 ed 0f 84 3e 01 00 00 4c 89 cd 48 f7 d5 48 01 dd 41 80 3b 00 0f 85 c9 01 00 00 <49> ff c3 48 ff c5 75 ee e9 1e 01 00 00 45 89 dc 41 83 e4 07 0f 84
> RSP: 0018:ffffc90000117860 EFLAGS: 00000046
> RAX: 0000000000000001 RBX: 1ffffffff259469f RCX: ffffffff81713084
> RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffffffff92ca34f8
> RBP: ffffffffffffffff R08: ffffffff92ca34ff R09: 1ffffffff259469f
> R10: dffffc0000000000 R11: fffffbfff259469f R12: ffff888016ead940
> R13: ffff888016ead940 R14: dffffc0000000001 R15: fffffbfff25946a0
> FS:  0000000000000000(0000) GS:ffff8880b9500000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 000000c00b564de0 CR3: 000000000df32000 CR4: 00000000003506f0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
>  <NMI>
>  </NMI>
>  <TASK>
>  instrument_atomic_read include/linux/instrumented.h:68 [inline]
>  _test_bit include/asm-generic/bitops/instrumented-non-atomic.h:141 [inline]
>  hlock_class kernel/locking/lockdep.c:228 [inline]
>  __lock_acquire+0x1264/0x1fd0 kernel/locking/lockdep.c:5134
>  lock_acquire+0x1e3/0x530 kernel/locking/lockdep.c:5754
>  __raw_spin_lock_bh include/linux/spinlock_api_smp.h:126 [inline]
>  _raw_spin_lock_bh+0x35/0x50 kernel/locking/spinlock.c:178
>  spin_lock_bh include/linux/spinlock.h:356 [inline]
>  batadv_nc_purge_paths+0xe8/0x3b0 net/batman-adv/network-coding.c:442
>  batadv_nc_worker+0x365/0x610 net/batman-adv/network-coding.c:722
>  process_one_work kernel/workqueue.c:2633 [inline]
>  process_scheduled_works+0x913/0x1420 kernel/workqueue.c:2706
>  worker_thread+0xa5f/0x1000 kernel/workqueue.c:2787
>  kthread+0x2ef/0x390 kernel/kthread.c:388
>  ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
>  ret_from_fork_asm+0x1b/0x30 arch/x86/entry/entry_64.S:243
>  </TASK>
>
>
> ---
> This report is generated by a bot. It may contain errors.
> See https://goo.gl/tpsmEJ for more information about syzbot.
> syzbot engineers can be reached at syzkaller@googlegroups.com.
>
> syzbot will keep track of this issue. See:
> https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
>
> If the report is already addressed, let syzbot know by replying with:
> #syz fix: exact-commit-title
>
> If you want to overwrite report's subsystems, reply with:
> #syz set subsystems: new-subsystem
> (See the list of subsystem names on the web dashboard)
>
> If the report is a duplicate of another one, reply with:
> #syz dup: exact-subject-of-another-report
>
> If you want to undo deduplication, reply with:
> #syz undup

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [syzbot] [net?] INFO: task hung in register_nexthop_notifier (3)
  2024-03-18 14:46 ` Eric Dumazet
@ 2024-03-21  9:22   ` Antoine Tenart
  2024-03-21 22:20     ` Hillf Danton
  0 siblings, 1 reply; 5+ messages in thread
From: Antoine Tenart @ 2024-03-21  9:22 UTC (permalink / raw
  To: Eric Dumazet, syzbot
  Cc: davem, dsahern, kuba, linux-kernel, netdev, pabeni,
	syzkaller-bugs

Quoting Eric Dumazet (2024-03-18 15:46:37)
> On Mon, Mar 18, 2024 at 12:26 PM syzbot
> <syzbot+99b8125966713aa4b0c3@syzkaller.appspotmail.com> wrote:
> >
> > INFO: task syz-executor.3:6975 blocked for more than 143 seconds.
> >       Not tainted 6.8.0-rc7-syzkaller-02500-g76839e2f1fde #0
> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > task:syz-executor.3  state:D stack:20920 pid:6975  tgid:6975  ppid:1      flags:0x00004006
> > Call Trace:
> >  <TASK>
> >  context_switch kernel/sched/core.c:5400 [inline]
> >  __schedule+0x17d1/0x49f0 kernel/sched/core.c:6727
> >  __schedule_loop kernel/sched/core.c:6802 [inline]
> >  schedule+0x149/0x260 kernel/sched/core.c:6817
> >  schedule_preempt_disabled+0x13/0x30 kernel/sched/core.c:6874
> >  __mutex_lock_common kernel/locking/mutex.c:684 [inline]
> >  __mutex_lock+0x6a3/0xd70 kernel/locking/mutex.c:752
> >  register_nexthop_notifier+0x84/0x290 net/ipv4/nexthop.c:3863
> >  nsim_fib_create+0x8a6/0xa70 drivers/net/netdevsim/fib.c:1587
> >  nsim_drv_probe+0x747/0xb80 drivers/net/netdevsim/dev.c:1582
> >  really_probe+0x29e/0xc50 drivers/base/dd.c:658
> >  __driver_probe_device+0x1a2/0x3e0 drivers/base/dd.c:800
> >  driver_probe_device+0x50/0x430 drivers/base/dd.c:830
> >  __device_attach_driver+0x2d6/0x530 drivers/base/dd.c:958
> >  bus_for_each_drv+0x24e/0x2e0 drivers/base/bus.c:457
> >  __device_attach+0x333/0x520 drivers/base/dd.c:1030
> >  bus_probe_device+0x189/0x260 drivers/base/bus.c:532
> >  device_add+0x8ff/0xca0 drivers/base/core.c:3639
> >  nsim_bus_dev_new drivers/net/netdevsim/bus.c:442 [inline]
> >  new_device_store+0x3f2/0x890 drivers/net/netdevsim/bus.c:173
> >  kernfs_fop_write_iter+0x3a4/0x500 fs/kernfs/file.c:334
> 
> So we have a sysfs handler ultimately calling register_nexthop_notifier() or any
> other network control path requiring RTNL.
> 
> Note that we have rtnl_trylock() for a reason...

Mentioning the below in case that gives some ideas; feel free to
disregard.

When I looked at similar issues a while ago the rtnl deadlock actually
happened with the kernfs_node refcount; haven't looked at this one in
details though. The mutex in there was just preventing concurrent
writers.

> Or maybe the reason is wrong, if we could change kernfs_fop_write_iter()
> to no longer hold a mutex...

At the time I found a way to safely drop the refcount of those
kernfs_node which then allowed to call rtnl_lock from sysfs handlers,
https://lore.kernel.org/all/20231018154804.420823-1-atenart@kernel.org/T/

Note that this relied on how net device are unregistered (calling
device_del under rtnl and later waiting for refs on the netdev to drop
outside of the lock; and a few other things), so extra modifications
would be needed to generalize the approach. Also it's a tradeoff between
fixing those deadlocks without rtnl_trylock and maintaining a quite
complex logic...

Antoine

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [syzbot] [net?] INFO: task hung in register_nexthop_notifier (3)
  2024-03-21  9:22   ` Antoine Tenart
@ 2024-03-21 22:20     ` Hillf Danton
  0 siblings, 0 replies; 5+ messages in thread
From: Hillf Danton @ 2024-03-21 22:20 UTC (permalink / raw
  To: Antoine Tenart
  Cc: linux-kernel, netdev, pabeni, syzkaller-bugs, Eric Dumazet,
	syzbot

On Thu, 21 Mar 2024 10:22:25 +0100 Antoine Tenart <atenart@kernel.org>
> Quoting Eric Dumazet (2024-03-18 15:46:37)
> > On Mon, Mar 18, 2024 at 12:26=E2=80=AFPM syzbot
> > <syzbot+99b8125966713aa4b0c3@syzkaller.appspotmail.com> wrote:
> > >
> > > INFO: task syz-executor.3:6975 blocked for more than 143 seconds.
> > >       Not tainted 6.8.0-rc7-syzkaller-02500-g76839e2f1fde #0
> > > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this messag=
> e.
> > > task:syz-executor.3  state:D stack:20920 pid:6975  tgid:6975  ppid:1   =
>    flags:0x00004006
> > > Call Trace:
> > >  <TASK>
> > >  context_switch kernel/sched/core.c:5400 [inline]
> > >  __schedule+0x17d1/0x49f0 kernel/sched/core.c:6727
> > >  __schedule_loop kernel/sched/core.c:6802 [inline]
> > >  schedule+0x149/0x260 kernel/sched/core.c:6817
> > >  schedule_preempt_disabled+0x13/0x30 kernel/sched/core.c:6874
> > >  __mutex_lock_common kernel/locking/mutex.c:684 [inline]
> > >  __mutex_lock+0x6a3/0xd70 kernel/locking/mutex.c:752
> > >  register_nexthop_notifier+0x84/0x290 net/ipv4/nexthop.c:3863
> > >  nsim_fib_create+0x8a6/0xa70 drivers/net/netdevsim/fib.c:1587
> > >  nsim_drv_probe+0x747/0xb80 drivers/net/netdevsim/dev.c:1582
> > >  really_probe+0x29e/0xc50 drivers/base/dd.c:658
> > >  __driver_probe_device+0x1a2/0x3e0 drivers/base/dd.c:800
> > >  driver_probe_device+0x50/0x430 drivers/base/dd.c:830
> > >  __device_attach_driver+0x2d6/0x530 drivers/base/dd.c:958
> > >  bus_for_each_drv+0x24e/0x2e0 drivers/base/bus.c:457
> > >  __device_attach+0x333/0x520 drivers/base/dd.c:1030
> > >  bus_probe_device+0x189/0x260 drivers/base/bus.c:532
> > >  device_add+0x8ff/0xca0 drivers/base/core.c:3639
> > >  nsim_bus_dev_new drivers/net/netdevsim/bus.c:442 [inline]
> > >  new_device_store+0x3f2/0x890 drivers/net/netdevsim/bus.c:173
> > >  kernfs_fop_write_iter+0x3a4/0x500 fs/kernfs/file.c:334
> >=20
> > So we have a sysfs handler ultimately calling register_nexthop_notifier()=
>  or any
> > other network control path requiring RTNL.
> >=20
> > Note that we have rtnl_trylock() for a reason...
> 
> Mentioning the below in case that gives some ideas; feel free to
> disregard.
> 
> When I looked at similar issues a while ago the rtnl deadlock actually
> happened with the kernfs_node refcount; haven't looked at this one in
> details though. The mutex in there was just preventing concurrent
> writers.
> 
> > Or maybe the reason is wrong, if we could change kernfs_fop_write_iter()
> > to no longer hold a mutex...

Better after working out why RCU stalled [1]

5 locks held by kworker/u4:7/23559:
 #0: ffff888015ea4938 ((wq_completion)netns){+.+.}-{0:0}, at: process_one_work kernel/workqueue.c:2608 [inline]
 #0: ffff888015ea4938 ((wq_completion)netns){+.+.}-{0:0}, at: process_scheduled_works+0x825/0x1420 kernel/workqueue.c:2706
 #1: ffffc90012b8fd20 (net_cleanup_work){+.+.}-{0:0}, at: process_one_work kernel/workqueue.c:2608 [inline]
 #1: ffffc90012b8fd20 (net_cleanup_work){+.+.}-{0:0}, at: process_scheduled_works+0x825/0x1420 kernel/workqueue.c:2706
 #2: ffffffff8f36d250 (pernet_ops_rwsem){++++}-{3:3}, at: cleanup_net+0x16a/0xcc0 net/core/net_namespace.c:591
 #3: ffffffff8f3798c8 (rtnl_mutex){+.+.}-{3:3}, at: cleanup_net+0x6af/0xcc0 net/core/net_namespace.c:627
 #4: ffffffff8e136440 (rcu_state.barrier_mutex){+.+.}-{3:3}, at: rcu_barrier+0x4c/0x550 kernel/rcu/tree.c:4064

[1] https://lore.kernel.org/lkml/0000000000009485160613eda067@google.com/

> 
> At the time I found a way to safely drop the refcount of those
> kernfs_node which then allowed to call rtnl_lock from sysfs handlers,
> https://lore.kernel.org/all/20231018154804.420823-1-atenart@kernel.org/T/
> 
> Note that this relied on how net device are unregistered (calling
> device_del under rtnl and later waiting for refs on the netdev to drop
> outside of the lock; and a few other things), so extra modifications
> would be needed to generalize the approach. Also it's a tradeoff between
> fixing those deadlocks without rtnl_trylock and maintaining a quite
> complex logic...
> 
> Antoine
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [syzbot] [net?] INFO: task hung in register_nexthop_notifier (3)
  2024-03-18 11:26 [syzbot] [net?] INFO: task hung in register_nexthop_notifier (3) syzbot
  2024-03-18 14:46 ` Eric Dumazet
@ 2024-06-20 13:39 ` Tetsuo Handa
  1 sibling, 0 replies; 5+ messages in thread
From: Tetsuo Handa @ 2024-06-20 13:39 UTC (permalink / raw
  To: syzbot, linux-kernel, syzkaller-bugs

#syz fix: net/sched: act_api: fix possible infinite loop in tcf_idr_check_alloc()


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2024-06-20 13:39 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-03-18 11:26 [syzbot] [net?] INFO: task hung in register_nexthop_notifier (3) syzbot
2024-03-18 14:46 ` Eric Dumazet
2024-03-21  9:22   ` Antoine Tenart
2024-03-21 22:20     ` Hillf Danton
2024-06-20 13:39 ` Tetsuo Handa

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).