Linux-Raid Archives mirror
 help / color / mirror / Atom feed
From: Nigel Croxon <ncroxon@redhat.com>
To: linux-raid@vger.kernel.org, Heinz Mauelshagen <heinzm@redhat.com>,
	Xiao Ni <xni@redhat.com>,
	song@kernel.org
Subject: regression: CPU soft lockup with raid10: check slab-out-of-bounds in md_bitmap_get_counter
Date: Fri, 19 Apr 2024 15:49:26 -0400	[thread overview]
Message-ID: <71ba5272-ab07-43ba-8232-d2da642acb4e@redhat.com> (raw)

There is a problem with this commit, it causes a CPU#x soft lockup

commit 301867b1c16805aebbc306aafa6ecdc68b73c7e5
Author: Li Nan <linan122@huawei.com>
Date:   Mon May 15 21:48:05 2023 +0800
md/raid10: check slab-out-of-bounds in md_bitmap_get_counter

Message from syslogd@rhel9 at Apr 19 14:14:55 ...
  kernel:watchdog: BUG: soft lockup - CPU#3 stuck for 26s! [mdX_resync:6976]

dmesg:

[  104.245585] CPU: 7 PID: 3588 Comm: mdX_resync Kdump: loaded Not 
tainted 6.9.0-rc4-next-20240419 #1
[  104.245588] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 
1.16.2-1.fc38 04/01/2014
[  104.245590] RIP: 0010:_raw_spin_unlock_irq+0x13/0x30
[  104.245598] Code: 00 00 00 00 00 66 90 90 90 90 90 90 90 90 90 90 90 
90 90 90 90 90 90 0f 1f 44 00 00 c6 07 00 90 90 90 fb 65 ff 0d 95 9f 75 
76 <74> 05 c3 cc cc cc cc 0f 1f 44 00 00 c3 cc cc cc cc cc cc cc cc cc
[  104.245601] RSP: 0018:ffffb2d74a81bbf8 EFLAGS: 00000246
[  104.245603] RAX: 0000000000000000 RBX: 0000000001000000 RCX: 
000000000000000c
[  104.245604] RDX: 0000000000000000 RSI: 0000000001000000 RDI: 
ffff926160ccd200
[  104.245606] RBP: ffffb2d74a81bcd0 R08: 0000000000000013 R09: 
0000000000000000
[  104.245607] R10: 0000000000000000 R11: ffffb2d74a81bad8 R12: 
0000000000000000
[  104.245608] R13: 0000000000000000 R14: ffff926160ccd200 R15: 
ffff926151019000
[  104.245611] FS:  0000000000000000(0000) GS:ffff9273f9580000(0000) 
knlGS:0000000000000000
[  104.245613] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  104.245614] CR2: 00007f23774d2584 CR3: 0000000104098003 CR4: 
0000000000370ef0
[  104.245616] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 
0000000000000000
[  104.245617] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 
0000000000000400
[  104.245618] Call Trace:
[  104.245620]  <IRQ>
[  104.245623]  ? watchdog_timer_fn+0x1e3/0x260
[  104.245630]  ? __pfx_watchdog_timer_fn+0x10/0x10
[  104.245634]  ? __hrtimer_run_queues+0x112/0x2a0
[  104.245638]  ? hrtimer_interrupt+0xff/0x240
[  104.245640]  ? sched_clock+0xc/0x30
[  104.245644]  ? __sysvec_apic_timer_interrupt+0x54/0x140
[  104.245649]  ? sysvec_apic_timer_interrupt+0x6c/0x90
[  104.245652]  </IRQ>
[  104.245653]  <TASK>
[  104.245654]  ? asm_sysvec_apic_timer_interrupt+0x16/0x20
[  104.245659]  ? _raw_spin_unlock_irq+0x13/0x30
[  104.245661]  md_bitmap_start_sync+0x6b/0xf0
[  104.245668]  raid10_sync_request+0x25c/0x1b40 [raid10]
[  104.245676]  ? is_mddev_idle+0x132/0x150
[  104.245680]  md_do_sync+0x64b/0x1020
[  104.245683]  ? __pfx_autoremove_wake_function+0x10/0x10
[  104.245690]  md_thread+0xa7/0x170
[  104.245693]  ? __pfx_md_thread+0x10/0x10
[  104.245696]  kthread+0xcf/0x100
[  104.245700]  ? __pfx_kthread+0x10/0x10
[  104.245704]  ret_from_fork+0x30/0x50
[  104.245707]  ? __pfx_kthread+0x10/0x10
[  104.245710]  ret_from_fork_asm+0x1a/0x30
[  104.245714]  </TASK>

When you run the reproducer script below...

#!/bin/sh
vg=t
lv=t
devs="/dev/sd[c-j]"
sz=3G
isz=2G
path=/dev/$vg/$lv
mnt=/mnt/$lv

vgcreate -y $vg $devs
lvcreate --yes --nosync --type raid10 -i 2 -n $lv -L $sz $vg

mkfs.xfs $path
mkdir -p $mnt
mount $path $mnt
df -h

for i in {1..10}
do
     lvextend -y -L +$isz -r $path
     lvs
done

lvs -a -o +devices
lvchange --syncaction check $path
#lvs -ovgname,lvname,copypercent t/t         <-- this cmd to watch


             reply	other threads:[~2024-04-19 19:49 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-19 19:49 Nigel Croxon [this message]
2024-04-20  6:09 ` regression: CPU soft lockup with raid10: check slab-out-of-bounds in md_bitmap_get_counter Yu Kuai
2024-04-21 12:30   ` Nigel Croxon
2024-04-22  1:10     ` Yu Kuai
2024-04-24  6:57     ` Yu Kuai
2024-04-25 12:10       ` Nigel Croxon
2024-04-25 16:52         ` Song Liu
2024-04-30 11:07           ` Nigel Croxon
2024-05-06  6:19             ` Yu Kuai
     [not found]               ` <CAM23Vxr6unOdZoBQZBML_uxDGc_pquktfY=R6n=kDPcWG0Jsvg@mail.gmail.com>
2024-05-06 13:52                 ` Yu Kuai

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=71ba5272-ab07-43ba-8232-d2da642acb4e@redhat.com \
    --to=ncroxon@redhat.com \
    --cc=heinzm@redhat.com \
    --cc=linux-raid@vger.kernel.org \
    --cc=song@kernel.org \
    --cc=xni@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).