LKML Archive mirror
 help / color / mirror / Atom feed
* [syzbot] [net?] unregister_netdevice: waiting for DEV to become free (8)
@ 2023-06-10  1:34 syzbot
       [not found] ` <20230621070710.380373-1-astrajoan@yahoo.com>
       [not found] ` <20230819081057.330728-1-astrajoan@yahoo.com>
  0 siblings, 2 replies; 4+ messages in thread
From: syzbot @ 2023-06-10  1:34 UTC (permalink / raw
  To: arnd, bridge, davem, edumazet, kuba, linux-kernel, netdev,
	nikolay, pabeni, roopa, syzkaller-bugs

Hello,

syzbot found the following issue on:

HEAD commit:    67faabbde36b selftests/bpf: Add missing prototypes for sev..
git tree:       bpf-next
console+strace: https://syzkaller.appspot.com/x/log.txt?x=1381363b280000
kernel config:  https://syzkaller.appspot.com/x/.config?x=5335204dcdecfda
dashboard link: https://syzkaller.appspot.com/bug?extid=881d65229ca4f9ae8c84
compiler:       gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=132faf93280000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=10532add280000

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/751a0490d875/disk-67faabbd.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/2c5106cd9f1f/vmlinux-67faabbd.xz
kernel image: https://storage.googleapis.com/syzbot-assets/62c1154294e4/bzImage-67faabbd.xz

The issue was bisected to:

commit ad2f99aedf8fa77f3ae647153284fa63c43d3055
Author: Arnd Bergmann <arnd@arndb.de>
Date:   Tue Jul 27 13:45:16 2021 +0000

    net: bridge: move bridge ioctls out of .ndo_do_ioctl

bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=146de6f1280000
final oops:     https://syzkaller.appspot.com/x/report.txt?x=166de6f1280000
console output: https://syzkaller.appspot.com/x/log.txt?x=126de6f1280000

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+881d65229ca4f9ae8c84@syzkaller.appspotmail.com
Fixes: ad2f99aedf8f ("net: bridge: move bridge ioctls out of .ndo_do_ioctl")

unregister_netdevice: waiting for bridge0 to become free. Usage count = 2
leaked reference.
 __netdev_tracker_alloc include/linux/netdevice.h:4070 [inline]
 netdev_hold include/linux/netdevice.h:4099 [inline]
 dev_ifsioc+0xbc0/0xeb0 net/core/dev_ioctl.c:408
 dev_ioctl+0x250/0x1090 net/core/dev_ioctl.c:605
 sock_do_ioctl+0x15a/0x230 net/socket.c:1215
 sock_ioctl+0x1f8/0x680 net/socket.c:1318
 vfs_ioctl fs/ioctl.c:51 [inline]
 __do_sys_ioctl fs/ioctl.c:870 [inline]
 __se_sys_ioctl fs/ioctl.c:856 [inline]
 __x64_sys_ioctl+0x197/0x210 fs/ioctl.c:856
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x63/0xcd


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
For information about bisection process see: https://goo.gl/tpsmEJ#bisection

If the bug is already fixed, let syzbot know by replying with:
#syz fix: exact-commit-title

If you want syzbot to run the reproducer, reply with:
#syz test: git://repo/address.git branch-or-commit-hash
If you attach or paste a git patch, syzbot will apply it before testing.

If you want to change bug's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)

If the bug is a duplicate of another bug, reply with:
#syz dup: exact-subject-of-another-report

If you want to undo deduplication, reply with:
#syz undup

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [syzbot] [net?] unregister_netdevice: waiting for DEV to become free (8)
       [not found] ` <20230621070710.380373-1-astrajoan@yahoo.com>
@ 2023-06-21  8:46   ` Dongliang Mu
  0 siblings, 0 replies; 4+ messages in thread
From: Dongliang Mu @ 2023-06-21  8:46 UTC (permalink / raw
  To: Ziqi Zhao
  Cc: syzbot+881d65229ca4f9ae8c84, arnd, bridge, davem, edumazet, kuba,
	linux-kernel, netdev, nikolay, pabeni, roopa, syzkaller-bugs,
	skhan, ivan.orlov0322

On Wed, Jun 21, 2023 at 3:38 PM 'Ziqi Zhao' via syzkaller-bugs
<syzkaller-bugs@googlegroups.com> wrote:
>
> Hi all,
>
> I'm taking a look at this bug as part of the exercice for the Linux
> Kernel Bug Fixing Summer 2023 program. Thanks to the help from my

This is an interesting program. There are many kernel crashes on the
syzbot dashboard, which needs help.

> mentor, Ivan Orlov and Shuah Khan, I've already obtained a reproduction
> of the issue using the provided C reproducer, and I should be able to
> submit a patch by the end of this week to fix the highlighted error. If
> you have any information or suggestions, please feel free to reply to
> this thread. Any help would be greatly appreciated!

Please carefully read the guidance of submitting patches to linux
kernel [1]. Be careful about your coding style before sending.

Note that, Syzbot has the feature: patch testing. You can upload and
test your own patch to confirm that your patch is working properly.

[1] https://docs.kernel.org/process/submitting-patches.html
>
> Best regards,
> Ziqi
>
> --
> You received this message because you are subscribed to the Google Groups "syzkaller-bugs" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to syzkaller-bugs+unsubscribe@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller-bugs/20230621070710.380373-1-astrajoan%40yahoo.com.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] net: bridge: Fix refcnt issues in dev_ioctl
       [not found] ` <20230819081057.330728-1-astrajoan@yahoo.com>
@ 2023-08-19  9:25   ` Nikolay Aleksandrov
       [not found]     ` <20230819225048.dxxzv47fo64g24qx@Astras-Ubuntu>
  0 siblings, 1 reply; 4+ messages in thread
From: Nikolay Aleksandrov @ 2023-08-19  9:25 UTC (permalink / raw
  To: Ziqi Zhao, arnd, bridge, davem, edumazet, f.fainelli,
	ivan.orlov0322, keescook, kuba, hkallweit1, mudongliangabcd,
	nikolay, pabeni, roopa, skhan, syzbot+881d65229ca4f9ae8c84,
	vladimir.oltean
  Cc: linux-kernel, netdev, syzkaller-bugs

Hi Ziqi,
On 8/19/23 11:10, Ziqi Zhao wrote:
> In the bug reported by Syzbot, certain bridge devices would have a
> leaked reference created by race conditions in dev_ioctl, specifically,
> under SIOCBRADDIF or SIOCBRDELIF operations. The reference leak would

How would it leak a reference, could you elaborate?
The reference is always taken and always released after the call.

> be shown in the periodic unregister_netdevice call, which throws a
> warning and cause Syzbot to report a crash. Upon inspection of the

If you reproduced it, is the device later removed or is it really stuck?

> logic in dev_ioctl, it seems the reference was introduced to ensure
> proper access to the bridge device after rtnl_unlock. and the latter
> function is necessary to maintain the following lock order in any
> bridge related ioctl calls:
> 
> 1) br_ioctl_mutex => 2) rtnl_lock
> 
> Conceptually, though, br_ioctl_mutex could be considered more specific
> than rtnl_lock given their usages, hence swapping their order would be
> a reasonable proposal. This patch changes all related call sites to
> maintain the reversed order of the two locks:
> 
> 1) rtnl_lock => 2) br_ioctl_mutex
> 
> By doing so, the extra reference introduced in dev_ioctl is no longer
> needed, and hence the reference leak bug is now resolved.

IIRC there was no bug, it was a false-positive. The reference is held a 
bit longer but then released, so the device is deleted later.
I might be remembering wrong, but I think I briefly looked into this 
when it was reported. If that's not the case I'd be interested to see
a new report/trace because the bug might be somewhere else.

> 
> Reported-by: syzbot+881d65229ca4f9ae8c84@syzkaller.appspotmail.com
> Fixes: ad2f99aedf8f ("net: bridge: move bridge ioctls out of .ndo_do_ioctl")
> Signed-off-by: Ziqi Zhao <astrajoan@yahoo.com>
> ---
>   net/bridge/br_ioctl.c | 4 ----
>   net/core/dev_ioctl.c  | 8 +-------
>   net/socket.c          | 2 ++
>   3 files changed, 3 insertions(+), 11 deletions(-)
> 

Thanks,
  Nik



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] net: bridge: Fix refcnt issues in dev_ioctl
       [not found]     ` <20230819225048.dxxzv47fo64g24qx@Astras-Ubuntu>
@ 2023-08-22 10:40       ` Nikolay Aleksandrov
  0 siblings, 0 replies; 4+ messages in thread
From: Nikolay Aleksandrov @ 2023-08-22 10:40 UTC (permalink / raw
  To: Ziqi Zhao
  Cc: arnd, bridge, davem, edumazet, f.fainelli, ivan.orlov0322,
	keescook, kuba, hkallweit1, mudongliangabcd, nikolay, pabeni,
	roopa, skhan, syzbot+881d65229ca4f9ae8c84, vladimir.oltean,
	linux-kernel, netdev, syzkaller-bugs

On 8/20/23 01:50, Ziqi Zhao wrote:
> On Sat, Aug 19, 2023 at 12:25:15PM +0300, Nikolay Aleksandrov wrote:
> Hi Nik,
> 
> Thank you so much for reviewing the patch and getting back to me!
> 
>> IIRC there was no bug, it was a false-positive. The reference is held a bit
>> longer but then released, so the device is deleted later.
> 
>> If you reproduced it, is the device later removed or is it really stuck?
> 
> I ran the reproducer again without the patch and it seems you are
> correct. It was trying to create a very short-lived bridge, then delete
> it immediately in the next call. The device in question "wpan4" never
> showed up when I polled with `ip link` in the VM, so I'd say it did not
> get stuck.
> 
>> How would it leak a reference, could you elaborate?
>> The reference is always taken and always released after the call.
> 
> This was where I got a bit confused too. The system had a timeout of
> 140 seconds for the unregister_netdevice check. If the bridge in
> question was created and deleted repeatedly, the warning would indeed
> not be an actual reference leak. But how could its reference show up
> after 140 seconds if the bridge's creation and deletion were all within
> a couple of milliseconds?
> 
> So I let the system run for a bit longer with the reproducer, and after
> ~200 seconds, the kernel crashed and complained that some tasks had
> been waiting for too long (more than 143 seconds) trying to get hold of
> the br_ioctl_mutex. This was also quite strange to me, since on the
> surface it definitely looked like a deadlock, but the strict locking
> order as I described previously should prevent any deadlocks from
> happening.
> 
> Anyways, I decided to test switching up the lock order, since both the
> refcnt warning and the task stall seemed closely related to the above
> mentioned locks. When I ran the reproducer again after the patch, both
> the warning and the stall issue went away. So I guess the patch is
> still relevant in preventing bugs in some extreme cases -- although the
> scenario created by the reproducer would probably never happen in real
> usages?
> 

Thank you for testing, but we really need to understand what is going on 
and why the device isn't getting deleted for so long. Currently I don't 
have the time to debug it properly (I'll be able to next week at the 
earliest). We can't apply the patch based only on tests without 
understanding the underlying issue. I'd look into what
the reproducer is doing exactly and also check the system state while 
the deadlock has happened. Also you can list the currently held locks 
(if CONFIG_LOCKDEP is enabled) via magic sysrq + d for example. See 
which process is holding them, what are their priorities and so on.
Try to build some theory of how a deadlock might happen and then go
about proving it. Does the 8021q module have the same problem? It uses
similar code to set its hook.

> Please let me know whether you have any thoughts on how the above
> issues were triggered, and what other information I could gather to
> further demystify this bug. Thank you again for your help!
> 
> Best regards,
> Ziqi


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2023-08-22 10:40 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-06-10  1:34 [syzbot] [net?] unregister_netdevice: waiting for DEV to become free (8) syzbot
     [not found] ` <20230621070710.380373-1-astrajoan@yahoo.com>
2023-06-21  8:46   ` Dongliang Mu
     [not found] ` <20230819081057.330728-1-astrajoan@yahoo.com>
2023-08-19  9:25   ` [PATCH] net: bridge: Fix refcnt issues in dev_ioctl Nikolay Aleksandrov
     [not found]     ` <20230819225048.dxxzv47fo64g24qx@Astras-Ubuntu>
2023-08-22 10:40       ` Nikolay Aleksandrov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).