All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
From: Johannes Thumshirn <Johannes.Thumshirn@wdc.com>
To: Yi Zhang <yi.zhang@redhat.com>,
	linux-block <linux-block@vger.kernel.org>,
	"open list:NVM EXPRESS DRIVER" <linux-nvme@lists.infradead.org>
Subject: Re: [bug report] RIP: 0010:blk_flush_complete_seq+0x450/0x1060 observed during blktests nvme/tcp nvme/012
Date: Mon, 29 Apr 2024 14:35:37 +0000	[thread overview]
Message-ID: <dcc6150c-d632-4b14-9b0d-1b3b45fb5c24@wdc.com> (raw)
In-Reply-To: <CAHj4cs8X31NnOWGVLniT5OWBRtEphxw-AcYrPaygG_uYaoeENQ@mail.gmail.com>

On 23.04.24 15:18, Yi Zhang wrote:
> Hi
> I found this issue on the latest linux-block/for-next by blktests
> nvme/tcp nvme/012, please help check it and let me know if you need
> any info/testing for it, thanks.
> 
> [ 1873.394323] run blktests nvme/012 at 2024-04-23 04:13:47
> [ 1873.761900] loop0: detected capacity change from 0 to 2097152
> [ 1873.846926] nvmet: adding nsid 1 to subsystem blktests-subsystem-1
> [ 1873.987806] nvmet_tcp: enabling port 0 (127.0.0.1:4420)
> [ 1874.208883] nvmet: creating nvm controller 1 for subsystem
> blktests-subsystem-1 for NQN
> nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349.
> [ 1874.243423] nvme nvme0: creating 48 I/O queues.
> [ 1874.362383] nvme nvme0: mapped 48/0/0 default/read/poll queues.
> [ 1874.517677] nvme nvme0: new ctrl: NQN "blktests-subsystem-1", addr
> 127.0.0.1:4420, hostnqn:
> nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349
> [ 1875.649484] XFS (nvme0n1): Mounting V5 Filesystem
> b5a2998b-e06a-40d0-a291-a46b67aa43db
> [ 1875.770202] XFS (nvme0n1): Ending clean mount
> [ 1936.174561] nvme nvme0: I/O tag 49 (a031) type 4 opcode 0x1 (I/O
> Cmd) QID 2 timeout
> [ 1936.182573] nvme nvme0: starting error recovery
> [ 1936.187622] nvme nvme0: I/O tag 50 (5032) type 4 opcode 0x1 (I/O
> Cmd) QID 2 timeout
> [ 1936.196829] nvme nvme0: I/O tag 51 (b033) type 4 opcode 0x1 (I/O
> Cmd) QID 2 timeout
> [ 1936.205619] block nvme0n1: no usable path - requeuing I/O
> [ 1936.211228] nvme nvme0: I/O tag 52 (0034) type 4 opcode 0x1 (I/O
> Cmd) QID 2 timeout
> [ 1936.218975] block nvme0n1: no usable path - requeuing I/O
> [ 1936.224691] nvme nvme0: I/O tag 53 (3035) type 4 opcode 0x1 (I/O
> Cmd) QID 2 timeout
> [ 1936.232395] block nvme0n1: no usable path - requeuing I/O
> [ 1936.237916] nvme nvme0: I/O tag 54 (0036) type 4 opcode 0x1 (I/O
> Cmd) QID 2 timeout
> [ 1936.245731] block nvme0n1: no usable path - requeuing I/O
> [ 1936.251165] block nvme0n1: no usable path - requeuing I/O
> [ 1936.256967] nvme nvme0: I/O tag 55 (8037) type 4 opcode 0x1 (I/O
> Cmd) QID 2 timeout
> [ 1936.264676] block nvme0n1: no usable path - requeuing I/O
> [ 1936.270159] nvme nvme0: I/O tag 56 (1038) type 4 opcode 0x1 (I/O
> Cmd) QID 2 timeout
> [ 1936.277848] block nvme0n1: no usable path - requeuing I/O
> [ 1936.283294] block nvme0n1: no usable path - requeuing I/O
> [ 1936.288743] nvme nvme0: I/O tag 57 (b039) type 4 opcode 0x1 (I/O
> Cmd) QID 2 timeout
> [ 1936.296450] block nvme0n1: no usable path - requeuing I/O
> [ 1936.301920] nvme nvme0: I/O tag 58 (503a) type 4 opcode 0x1 (I/O
> Cmd) QID 2 timeout
> [ 1936.309621] block nvme0n1: no usable path - requeuing I/O
> [ 1936.315076] nvme nvme0: I/O tag 59 (a03b) type 4 opcode 0x1 (I/O
> Cmd) QID 2 timeout
> [ 1936.323271] nvme nvme0: I/O tag 60 (803c) type 4 opcode 0x1 (I/O
> Cmd) QID 2 timeout
> [ 1936.331018] nvme nvme0: I/O tag 61 (203d) type 4 opcode 0x1 (I/O
> Cmd) QID 2 timeout
> [ 1936.338769] nvme nvme0: I/O tag 62 (503e) type 4 opcode 0x1 (I/O
> Cmd) QID 2 timeout
> [ 1936.346515] nvme nvme0: I/O tag 63 (c03f) type 4 opcode 0x1 (I/O
> Cmd) QID 2 timeout
> [ 1936.354693] nvme nvme0: I/O tag 64 (a040) type 4 opcode 0x1 (I/O
> Cmd) QID 2 timeout
> [ 1936.362479] nvme nvme0: I/O tag 49 (0031) type 4 opcode 0x0 (I/O
> Cmd) QID 8 timeout
> [ 1936.370819] nvme nvme0: I/O tag 50 (0032) type 4 opcode 0x1 (I/O
> Cmd) QID 8 timeout
> [ 1936.378679] nvme nvme0: I/O tag 51 (0033) type 4 opcode 0x1 (I/O
> Cmd) QID 8 timeout
> [ 1936.386464] nvme nvme0: I/O tag 52 (0034) type 4 opcode 0x1 (I/O
> Cmd) QID 8 timeout
> [ 1936.394235] nvme nvme0: I/O tag 53 (0035) type 4 opcode 0x1 (I/O
> Cmd) QID 8 timeout
> [ 1936.402062] nvme nvme0: I/O tag 54 (0036) type 4 opcode 0x1 (I/O
> Cmd) QID 8 timeout
> [ 1936.409847] nvme nvme0: I/O tag 55 (0037) type 4 opcode 0x1 (I/O
> Cmd) QID 8 timeout
> [ 1936.417773] nvme nvme0: I/O tag 56 (0038) type 4 opcode 0x1 (I/O
> Cmd) QID 8 timeout
> [ 1936.449539] general protection fault, probably for non-canonical
> address 0xdffffc0000000004: 0000 [#1] PREEMPT SMP KASAN PTI
> [ 1936.450749] nvme nvme0: Reconnecting in 10 seconds...
> [ 1936.460764] KASAN: null-ptr-deref in range
> [0x0000000000000020-0x0000000000000027]
> [ 1936.460775] CPU: 15 PID: 848 Comm: kworker/15:1H Not tainted 6.9.0-rc4.v1+ #2
> [ 1936.480528] Hardware name: Dell Inc. PowerEdge R640/08HT8T, BIOS
> 2.20.1 09/13/2023
> [ 1936.488094] Workqueue: kblockd blk_mq_run_work_fn
> [ 1936.492806] RIP: 0010:blk_flush_complete_seq+0x450/0x1060
> [ 1936.498216] Code: 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 a8 08 00 00
> 49 8d 7c 24 20 4d 89 65 38 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48
> c1 ea 03 <80> 3c 02 00 0f 85 8e 08 00 00 49 8d 7d 30 4d 8b 64 24 20 48
> b8 00
> [ 1936.516960] RSP: 0018:ffffc9000f86f850 EFLAGS: 00010002
> [ 1936.522185] RAX: dffffc0000000000 RBX: 1ffff92001f0df10 RCX: 1ffff11543588c4a
> [ 1936.529318] RDX: 0000000000000004 RSI: dffffc0000000000 RDI: 0000000000000020
> [ 1936.536449] RBP: ffff88aa1a419da0 R08: ffff88c7aba3f9e0 R09: 1ffffffff38e0bb3
> [ 1936.543582] R10: 1ffff11543588c4a R11: ffffffffffe25e18 R12: 0000000000000000
> [ 1936.550714] R13: ffff88aa1ac46200 R14: ffff88a9c4560d00 R15: 0000000000000000
> [ 1936.557847] FS:  0000000000000000(0000) GS:ffff88c7aba00000(0000)
> knlGS:0000000000000000
> [ 1936.565930] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 1936.571675] CR2: 00007fc90f80b004 CR3: 00000029d18a0004 CR4: 00000000007706f0
> [ 1936.578809] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 1936.585942] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 1936.593072] PKRU: 55555554
> [ 1936.595784] Call Trace:
> [ 1936.598237]  <TASK>
> [ 1936.600344]  ? __die_body.cold+0x19/0x25
> [ 1936.604279]  ? die_addr+0x46/0x70
> [ 1936.607606]  ? exc_general_protection+0x14f/0x250
> [ 1936.612325]  ? asm_exc_general_protection+0x26/0x30
> [ 1936.617212]  ? blk_flush_complete_seq+0x450/0x1060
> [ 1936.622012]  ? __pfx_blk_flush_complete_seq+0x10/0x10
> [ 1936.627072]  mq_flush_data_end_io+0x279/0x500
> [ 1936.631430]  ? __pfx_mq_flush_data_end_io+0x10/0x10
> [ 1936.636308]  blk_mq_end_request+0x1b8/0x690
> [ 1936.640493]  ? _raw_spin_unlock_irqrestore+0x45/0x80
> [ 1936.645460]  nvme_failover_req+0x37d/0x4e0 [nvme_core]
> [ 1936.650634]  nvme_fail_nonready_command+0x130/0x180 [nvme_core]
> [ 1936.656578]  blk_mq_dispatch_rq_list+0x3b3/0x2100
> [ 1936.661292]  ? __pfx_blk_mq_dispatch_rq_list+0x10/0x10
> [ 1936.666436]  __blk_mq_sched_dispatch_requests+0x554/0x1510
> [ 1936.671925]  ? __pfx___lock_acquire+0x10/0x10
> [ 1936.676285]  ? __pfx_lock_acquired+0x10/0x10
> [ 1936.680558]  ? __pfx___blk_mq_sched_dispatch_requests+0x10/0x10
> [ 1936.686477]  ? __pfx_lock_acquire+0x10/0x10
> [ 1936.690668]  blk_mq_sched_dispatch_requests+0xa6/0xf0
> [ 1936.695721]  blk_mq_run_work_fn+0x122/0x2c0
> [ 1936.699908]  ? blk_mq_run_work_fn+0x114/0x2c0
> [ 1936.704266]  process_one_work+0x85d/0x13f0
> [ 1936.708371]  ? worker_thread+0xcc/0x1130
> [ 1936.712301]  ? __pfx_process_one_work+0x10/0x10
> [ 1936.716835]  ? assign_work+0x16c/0x240
> [ 1936.720596]  worker_thread+0x6da/0x1130
> [ 1936.724438]  ? __pfx_worker_thread+0x10/0x10
> [ 1936.728712]  kthread+0x2ed/0x3c0
> [ 1936.731944]  ? _raw_spin_unlock_irq+0x28/0x60
> [ 1936.736303]  ? __pfx_kthread+0x10/0x10
> [ 1936.740058]  ret_from_fork+0x31/0x70
> [ 1936.743636]  ? __pfx_kthread+0x10/0x10
> [ 1936.747387]  ret_from_fork_asm+0x1a/0x30
> [ 1936.751319]  </TASK>
> [ 1936.753516] Modules linked in: nvmet_tcp nvmet nvme_tcp
> nvme_fabrics nvme_keyring nvme_core nvme_auth intel_rapl_msr
> intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common
> isst_if_common skx_edac x86_pkg_temp_thermal intel_powerclamp coretemp
> kvm_intel kvm spi_nor iTCO_wdt intel_pmc_bxt cdc_ether mtd usbnet
> iTCO_vendor_support dell_smbios rapl intel_cstate rfkill sunrpc dcdbas
> dell_wmi_descriptor wmi_bmof intel_uncore pcspkr bnxt_en mii tg3
> i2c_i801 spi_intel_pci i2c_smbus spi_intel mei_me ipmi_ssif mei
> lpc_ich intel_pch_thermal acpi_ipmi ipmi_si ipmi_devintf
> ipmi_msghandler nd_pmem nd_btt dax_pmem acpi_power_meter fuse loop
> nfnetlink zram xfs crct10dif_pclmul crc32_pclmul crc32c_intel
> polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3
> sha256_ssse3 sha1_ssse3 mgag200 megaraid_sas i2c_algo_bit wmi nfit
> libnvdimm [last unloaded: nvmet]
> [ 1936.829942] ---[ end trace 0000000000000000 ]---
> [ 1936.880774] RIP: 0010:blk_flush_complete_seq+0x450/0x1060
> [ 1936.886183] Code: 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 a8 08 00 00
> 49 8d 7c 24 20 4d 89 65 38 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48
> c1 ea 03 <80> 3c 02 00 0f 85 8e 08 00 00 49 8d 7d 30 4d 8b 64 24 20 48
> b8 00
> [ 1936.904925] RSP: 0018:ffffc9000f86f850 EFLAGS: 00010002
> [ 1936.910150] RAX: dffffc0000000000 RBX: 1ffff92001f0df10 RCX: 1ffff11543588c4a
> [ 1936.917285] RDX: 0000000000000004 RSI: dffffc0000000000 RDI: 0000000000000020
> [ 1936.924416] RBP: ffff88aa1a419da0 R08: ffff88c7aba3f9e0 R09: 1ffffffff38e0bb3
> [ 1936.931548] R10: 1ffff11543588c4a R11: ffffffffffe25e18 R12: 0000000000000000
> [ 1936.938680] R13: ffff88aa1ac46200 R14: ffff88a9c4560d00 R15: 0000000000000000
> [ 1936.945811] FS:  0000000000000000(0000) GS:ffff88c7aba00000(0000)
> knlGS:0000000000000000
> [ 1936.953899] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 1936.959642] CR2: 00007fc90f80b004 CR3: 00000029d18a0004 CR4: 00000000007706f0
> [ 1936.966776] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 1936.973906] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 1936.981039] PKRU: 55555554
> [ 1936.983752] note: kworker/15:1H[848] exited with irqs disabled
> [ 1936.989617] note: kworker/15:1H[848] exited with preempt_count 1
> [ 1946.926323] nvmet: creating nvm controller 1 for subsystem
> blktests-subsystem-1 for NQN
> nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349.
> [ 1946.944673] nvme nvme0: creating 48 I/O queues.
> 
> (gdb) l *(blk_flush_complete_seq+0x450)
> 0xffffffff826c5370 is in blk_flush_complete_seq (block/blk-flush.c:133).
> 128 * After flush data completion, @rq->bio is %NULL but we need to
> 129 * complete the bio again.  @rq->biotail is guaranteed to equal the
> 130 * original @rq->bio.  Restore it.
> 131 */
> 132 rq->bio = rq->biotail;
> 133 rq->__sector = rq->bio->bi_iter.bi_sector;
> 134
> 135 /* make @rq a normal request */
> 136 rq->rq_flags &= ~RQF_FLUSH_SEQ;
> 137 rq->end_io = rq->flush.saved_end_io;
> 
> 
> 
> --
> Best Regards,
>    Yi Zhang
> 
> 
> 


I'm sorry, but I fail to recreate the reported bug:

root@virtme-ng:blktests# nvme_trtype=tcp ./check nvme/012 
 
 
 

nvme/012 (run mkfs and data verification fio job on NVMeOF block 
device-backed ns)
     runtime  75.900s  ... 
 

[  326.827260] run blktests nvme/012 at 2024-04-29 16:28:31
[  327.475957] loop0: detected capacity change from 0 to 2097152
[  327.538987] nvmet: adding nsid 1 to subsystem blktests-subsystem-1 

[  327.603405] nvmet_tcp: enabling port 0 (127.0.0.1:4420) 
 

[  327.872343] nvmet: creating nvm controller 1 for subsystem 
blktests-subsystem-1 for NQN 
nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349. 

[  327.877120] nvme nvme0: Please enable CONFIG_NVME_MULTIPATH for full 
support of multi-port devices. 

[  327.878095] nvme nvme0: creating 4 I/O queues. 
 

[  327.882741] nvme nvme0: mapped 4/0/0 default/read/poll queues.
[  327.886786] nvme nvme0: new ctrl: NQN "blktests-subsystem-1", addr 
127.0.0.1:4420, hostnqn: 
nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349 

[  328.465762] XFS (nvme0n1): Mounting V5 Filesystem 
1637ca0c-2e42-4816-b46b-e1348e583389 

[  328.510061] XFS (nvme0n1): Ending clean mount 

[  328.517226] xfs filesystem being mounted at /mnt/blktests supports 
timestamps until 2038-01-19 (0x7fffffff) 

[  386.394225] perf: interrupt took too long (2502 > 2500), lowering 
kernel.perf_event_max_sample_rate to 79750 

[  472.615859] perf: interrupt took too long (3154 > 3127), lowering 
kernel.perf_event_max_sample_rate to 63250 

nvme/012 (run mkfs and data verification fio job on NVMeOF block 
device-backed ns) [passed]
     runtime  75.900s  ...  203.043s ctrl: NQN "blktests-subsystem-1" 

root@virtme-ng:blktests#

Any pointers what I could be doing wrong?

My tested tree HEAD is:
commit 518f377c90e510032b4b0f0ebf709dfd9057043d (HEAD -> test, 
block-for-next)
Merge: 9d60dddf961a a4d416dc6098
Author: Jens Axboe <axboe@kernel.dk>
Date:   Fri Apr 26 07:40:36 2024 -0600

     Merge branch 'for-6.10/io_uring' into for-next

     * for-6.10/io_uring:
       io_uring/msg_ring: reuse ctx->submitter_task read using READ_ONCE 
instead of re-reading it



  parent reply	other threads:[~2024-04-29 14:35 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-23 13:17 [bug report] RIP: 0010:blk_flush_complete_seq+0x450/0x1060 observed during blktests nvme/tcp nvme/012 Yi Zhang
2024-04-26  8:30 ` [bug report][bisected] " Yi Zhang
2024-04-29 14:35 ` Johannes Thumshirn [this message]
2024-04-29 22:18   ` [bug report] " Chaitanya Kulkarni
2024-04-30  6:16     ` Johannes Thumshirn
2024-04-30 14:17       ` Yi Zhang
2024-05-03  7:59         ` Sagi Grimberg
2024-05-03 10:32           ` Johannes Thumshirn
2024-05-03 11:01             ` Sagi Grimberg
2024-05-03 21:14               ` Chaitanya Kulkarni
2024-05-09  6:15                 ` Yi Zhang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=dcc6150c-d632-4b14-9b0d-1b3b45fb5c24@wdc.com \
    --to=johannes.thumshirn@wdc.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=yi.zhang@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.