* nvme_tcp BUG: unable to handle kernel NULL pointer dereference at 0000000000000230 @ 2021-06-01 17:51 Engel, Amit 2021-06-02 12:28 ` Engel, Amit 0 siblings, 1 reply; 11+ messages in thread From: Engel, Amit @ 2021-06-01 17:51 UTC (permalink / raw) To: linux-nvme@lists.infradead.org, sagi@grimberg.me; +Cc: Engel, Amit Hello, We hit the below kernel panic “BUG: unable to handle kernel NULL pointer dereference at 0000000000000230” when running with RHEL8.3 host. This happens after we reboot the target side application (when multiple ctrls/connections exist) Based on vmcore analysis it seems that when nvme_tcp_restore_sock_calls is called (from __nvme_tcp_stop_queue) queue->sock is NULL Are you familiar with such an issue ? crash> bt PID: 193053 TASK: ffff9491bdad17c0 CPU: 7 COMMAND: "kworker/u193:9" #0 [ffffb2e9cfdbbb70] machine_kexec at ffffffffb245bf3e #1 [ffffb2e9cfdbbbc8] __crash_kexec at ffffffffb256072d #2 [ffffb2e9cfdbbc90] crash_kexec at ffffffffb256160d #3 [ffffb2e9cfdbbca8] oops_end at ffffffffb2422d4d #4 [ffffb2e9cfdbbcc8] no_context at ffffffffb246ba9e #5 [ffffb2e9cfdbbd20] do_page_fault at ffffffffb246c5c2 #6 [ffffb2e9cfdbbd50] page_fault at ffffffffb2e0122e [exception RIP: _raw_write_lock_bh+23] RIP: ffffffffb2cd6cc7 RSP: ffffb2e9cfdbbe00 RFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff94b2aefb4000 RCX: 0000000000000003 RDX: 00000000000000ff RSI: 00000000fffffe01 RDI: 0000000000000230 RBP: ffff94923f793f40 R8: ffff9492ff1ea7f8 R9: 0000000000000000 R10: 0000000000000000 R11: ffff9492ff1e8c64 R12: ffff94b2b7210338 R13: 0000000000000000 R14: ffff94b27f7a4100 R15: ffff94b2b72110a0 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #7 [ffffb2e9cfdbbe00] __nvme_tcp_stop_queue at ffffffffc02dc0aa [nvme_tcp] #8 [ffffb2e9cfdbbe18] nvme_tcp_start_queue at ffffffffc02dcd18 [nvme_tcp] #9 [ffffb2e9cfdbbe38] nvme_tcp_setup_ctrl at ffffffffc02df258 [nvme_tcp] #10 [ffffb2e9cfdbbe80] nvme_tcp_reconnect_ctrl_work at ffffffffc02df4bf [nvme_tcp] #11 [ffffb2e9cfdbbe98] process_one_work at ffffffffb24d3477 #12 [ffffb2e9cfdbbed8] worker_thread at ffffffffb24d3b40 #13 [ffffb2e9cfdbbf10] kthread at ffffffffb24d9502 #14 [ffffb2e9cfdbbf50] ret_from_fork at ffffffffb2e00255 crash> bt -l PID: 193053 TASK: ffff9491bdad17c0 CPU: 7 COMMAND: "kworker/u193:9" #0 [ffffb2e9cfdbbb70] machine_kexec at ffffffffb245bf3e /usr/src/debug/kernel-4.18.0-240.el8/linux-4.18.0-240.el8.x86_64/arch/x86/kernel/machine_kexec_64.c: 389 #1 [ffffb2e9cfdbbbc8] __crash_kexec at ffffffffb256072d /usr/src/debug/kernel-4.18.0-240.el8/linux-4.18.0-240.el8.x86_64/kernel/kexec_core.c: 956 #2 [ffffb2e9cfdbbc90] crash_kexec at ffffffffb256160d /usr/src/debug/kernel-4.18.0-240.el8/linux-4.18.0-240.el8.x86_64/./include/linux/compiler.h: 219 #3 [ffffb2e9cfdbbca8] oops_end at ffffffffb2422d4d /usr/src/debug/kernel-4.18.0-240.el8/linux-4.18.0-240.el8.x86_64/arch/x86/kernel/dumpstack.c: 334 #4 [ffffb2e9cfdbbcc8] no_context at ffffffffb246ba9e /usr/src/debug/kernel-4.18.0-240.el8/linux-4.18.0-240.el8.x86_64/arch/x86/mm/fault.c: 773 #5 [ffffb2e9cfdbbd20] do_page_fault at ffffffffb246c5c2 /usr/src/debug/kernel-4.18.0-240.el8/linux-4.18.0-240.el8.x86_64/./arch/x86/include/asm/jump_label.h: 38 #6 [ffffb2e9cfdbbd50] page_fault at ffffffffb2e0122e /usr/src/debug/kernel-4.18.0-240.el8/linux-4.18.0-240.el8.x86_64/arch/x86/entry/entry_64.S: 1183 [exception RIP: _raw_write_lock_bh+23] RIP: ffffffffb2cd6cc7 RSP: ffffb2e9cfdbbe00 RFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff94b2aefb4000 RCX: 0000000000000003 RDX: 00000000000000ff RSI: 00000000fffffe01 RDI: 0000000000000230 RBP: ffff94923f793f40 R8: ffff9492ff1ea7f8 R9: 0000000000000000 R10: 0000000000000000 R11: ffff9492ff1e8c64 R12: ffff94b2b7210338 R13: 0000000000000000 R14: ffff94b27f7a4100 R15: ffff94b2b72110a0 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 /usr/src/debug/kernel-4.18.0-240.el8/linux-4.18.0-240.el8.x86_64/./arch/x86/include/asm/atomic.h: 194 #7 [ffffb2e9cfdbbe00] __nvme_tcp_stop_queue at ffffffffc02dc0aa [nvme_tcp] /usr/src/debug/kernel-4.18.0-240.el8/linux-4.18.0-240.el8.x86_64/drivers/nvme/host/tcp.c: 1486 #8 [ffffb2e9cfdbbe18] nvme_tcp_start_queue at ffffffffc02dcd18 [nvme_tcp] /usr/src/debug/kernel-4.18.0-240.el8/linux-4.18.0-240.el8.x86_64/drivers/nvme/host/tcp.c: 1525 #9 [ffffb2e9cfdbbe38] nvme_tcp_setup_ctrl at ffffffffc02df258 [nvme_tcp] /usr/src/debug/kernel-4.18.0-240.el8/linux-4.18.0-240.el8.x86_64/drivers/nvme/host/tcp.c: 1814 #10 [ffffb2e9cfdbbe80] nvme_tcp_reconnect_ctrl_work at ffffffffc02df4bf [nvme_tcp] /usr/src/debug/kernel-4.18.0-240.el8/linux-4.18.0-240.el8.x86_64/drivers/nvme/host/tcp.c: 1962 #11 [ffffb2e9cfdbbe98] process_one_work at ffffffffb24d3477 /usr/src/debug/kernel-4.18.0-240.el8/linux-4.18.0-240.el8.x86_64/./arch/x86/include/asm/jump_label.h: 38 #12 [ffffb2e9cfdbbed8] worker_thread at ffffffffb24d3b40 /usr/src/debug/kernel-4.18.0-240.el8/linux-4.18.0-240.el8.x86_64/./include/linux/compiler.h: 193 #13 [ffffb2e9cfdbbf10] kthread at ffffffffb24d9502 /usr/src/debug/kernel-4.18.0-240.el8/linux-4.18.0-240.el8.x86_64/kernel/kthread.c: 280 #14 [ffffb2e9cfdbbf50] ret_from_fork at ffffffffb2e00255 /usr/src/debug/kernel-4.18.0-240.el8/linux-4.18.0-240.el8.x86_64/arch/x86/entry/entry_64.S: 360 nvme/host/tcp.c ..snip 1481 static void nvme_tcp_restore_sock_calls(struct nvme_tcp_queue *queue) 1482 { 1483 >-------struct socket *sock = queue->sock; 1484 1485 >-------write_lock_bh(&sock->sk->sk_callback_lock); 1486 >-------sock->sk->sk_user_data = NULL; 1487 >-------sock->sk->sk_data_ready = queue->data_ready; 1488 >-------sock->sk->sk_state_change = queue->state_change; 1489 >-------sock->sk->sk_write_space = queue->write_space; 1490 >-------write_unlock_bh(&sock->sk->sk_callback_lock); 1491 } ..snip NULL pointer dereference at 0x230 → 560 decimal crash> struct sock -o struct sock { [0] struct sock_common __sk_common; … ... … [560] rwlock_t sk_callback_lock; _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 11+ messages in thread
* RE: nvme_tcp BUG: unable to handle kernel NULL pointer dereference at 0000000000000230 2021-06-01 17:51 nvme_tcp BUG: unable to handle kernel NULL pointer dereference at 0000000000000230 Engel, Amit @ 2021-06-02 12:28 ` Engel, Amit 2021-06-08 23:39 ` Sagi Grimberg 0 siblings, 1 reply; 11+ messages in thread From: Engel, Amit @ 2021-06-02 12:28 UTC (permalink / raw) To: linux-nvme@lists.infradead.org, sagi@grimberg.me; +Cc: Anner, Ran, Grupi, Elad Hi Sagi, A correction to the below analysis: It seems like sock->sk is NULL and not queue->sock As part of _nvme_tcp_stop_queue kernel_sock_shutdown and nvme_tcp_restore_sock_calls are being called: kernel_sock_shutdown leads to nvme_tcp_state_change which will trigger err_work (nvme_tcp_error_recovery_work) As part of nvme_tcp_error_recovery_work, nvme_tcp_free_queue is being called which releases the socket (sock_release) In our case, based on the below bt: nvme_tcp_error_recovery_work is being triggered (and so sock_release) before nvme_tcp_restore_sock_calls , which end up with NULL dereference pointer at 'rwlock_t sk_callback_lock' ? Can you please review and provide your inputs for this potential race ? Thanks Amit Engel -----Original Message----- From: Engel, Amit Sent: Tuesday, June 1, 2021 8:52 PM To: linux-nvme@lists.infradead.org; sagi@grimberg.me Cc: Engel, Amit Subject: nvme_tcp BUG: unable to handle kernel NULL pointer dereference at 0000000000000230 Hello, We hit the below kernel panic “BUG: unable to handle kernel NULL pointer dereference at 0000000000000230” when running with RHEL8.3 host. This happens after we reboot the target side application (when multiple ctrls/connections exist) Based on vmcore analysis it seems that when nvme_tcp_restore_sock_calls is called (from __nvme_tcp_stop_queue) queue->sock is NULL Are you familiar with such an issue ? crash> bt PID: 193053 TASK: ffff9491bdad17c0 CPU: 7 COMMAND: "kworker/u193:9" #0 [ffffb2e9cfdbbb70] machine_kexec at ffffffffb245bf3e #1 [ffffb2e9cfdbbbc8] __crash_kexec at ffffffffb256072d #2 [ffffb2e9cfdbbc90] crash_kexec at ffffffffb256160d #3 [ffffb2e9cfdbbca8] oops_end at ffffffffb2422d4d #4 [ffffb2e9cfdbbcc8] no_context at ffffffffb246ba9e #5 [ffffb2e9cfdbbd20] do_page_fault at ffffffffb246c5c2 #6 [ffffb2e9cfdbbd50] page_fault at ffffffffb2e0122e [exception RIP: _raw_write_lock_bh+23] RIP: ffffffffb2cd6cc7 RSP: ffffb2e9cfdbbe00 RFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff94b2aefb4000 RCX: 0000000000000003 RDX: 00000000000000ff RSI: 00000000fffffe01 RDI: 0000000000000230 RBP: ffff94923f793f40 R8: ffff9492ff1ea7f8 R9: 0000000000000000 R10: 0000000000000000 R11: ffff9492ff1e8c64 R12: ffff94b2b7210338 R13: 0000000000000000 R14: ffff94b27f7a4100 R15: ffff94b2b72110a0 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #7 [ffffb2e9cfdbbe00] __nvme_tcp_stop_queue at ffffffffc02dc0aa [nvme_tcp] #8 [ffffb2e9cfdbbe18] nvme_tcp_start_queue at ffffffffc02dcd18 [nvme_tcp] #9 [ffffb2e9cfdbbe38] nvme_tcp_setup_ctrl at ffffffffc02df258 [nvme_tcp] #10 [ffffb2e9cfdbbe80] nvme_tcp_reconnect_ctrl_work at ffffffffc02df4bf [nvme_tcp] #11 [ffffb2e9cfdbbe98] process_one_work at ffffffffb24d3477 #12 [ffffb2e9cfdbbed8] worker_thread at ffffffffb24d3b40 #13 [ffffb2e9cfdbbf10] kthread at ffffffffb24d9502 #14 [ffffb2e9cfdbbf50] ret_from_fork at ffffffffb2e00255 crash> bt -l PID: 193053 TASK: ffff9491bdad17c0 CPU: 7 COMMAND: "kworker/u193:9" #0 [ffffb2e9cfdbbb70] machine_kexec at ffffffffb245bf3e /usr/src/debug/kernel-4.18.0-240.el8/linux-4.18.0-240.el8.x86_64/arch/x86/kernel/machine_kexec_64.c: 389 #1 [ffffb2e9cfdbbbc8] __crash_kexec at ffffffffb256072d /usr/src/debug/kernel-4.18.0-240.el8/linux-4.18.0-240.el8.x86_64/kernel/kexec_core.c: 956 #2 [ffffb2e9cfdbbc90] crash_kexec at ffffffffb256160d /usr/src/debug/kernel-4.18.0-240.el8/linux-4.18.0-240.el8.x86_64/./include/linux/compiler.h: 219 #3 [ffffb2e9cfdbbca8] oops_end at ffffffffb2422d4d /usr/src/debug/kernel-4.18.0-240.el8/linux-4.18.0-240.el8.x86_64/arch/x86/kernel/dumpstack.c: 334 #4 [ffffb2e9cfdbbcc8] no_context at ffffffffb246ba9e /usr/src/debug/kernel-4.18.0-240.el8/linux-4.18.0-240.el8.x86_64/arch/x86/mm/fault.c: 773 #5 [ffffb2e9cfdbbd20] do_page_fault at ffffffffb246c5c2 /usr/src/debug/kernel-4.18.0-240.el8/linux-4.18.0-240.el8.x86_64/./arch/x86/include/asm/jump_label.h: 38 #6 [ffffb2e9cfdbbd50] page_fault at ffffffffb2e0122e /usr/src/debug/kernel-4.18.0-240.el8/linux-4.18.0-240.el8.x86_64/arch/x86/entry/entry_64.S: 1183 [exception RIP: _raw_write_lock_bh+23] RIP: ffffffffb2cd6cc7 RSP: ffffb2e9cfdbbe00 RFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff94b2aefb4000 RCX: 0000000000000003 RDX: 00000000000000ff RSI: 00000000fffffe01 RDI: 0000000000000230 RBP: ffff94923f793f40 R8: ffff9492ff1ea7f8 R9: 0000000000000000 R10: 0000000000000000 R11: ffff9492ff1e8c64 R12: ffff94b2b7210338 R13: 0000000000000000 R14: ffff94b27f7a4100 R15: ffff94b2b72110a0 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 /usr/src/debug/kernel-4.18.0-240.el8/linux-4.18.0-240.el8.x86_64/./arch/x86/include/asm/atomic.h: 194 #7 [ffffb2e9cfdbbe00] __nvme_tcp_stop_queue at ffffffffc02dc0aa [nvme_tcp] /usr/src/debug/kernel-4.18.0-240.el8/linux-4.18.0-240.el8.x86_64/drivers/nvme/host/tcp.c: 1486 #8 [ffffb2e9cfdbbe18] nvme_tcp_start_queue at ffffffffc02dcd18 [nvme_tcp] /usr/src/debug/kernel-4.18.0-240.el8/linux-4.18.0-240.el8.x86_64/drivers/nvme/host/tcp.c: 1525 #9 [ffffb2e9cfdbbe38] nvme_tcp_setup_ctrl at ffffffffc02df258 [nvme_tcp] /usr/src/debug/kernel-4.18.0-240.el8/linux-4.18.0-240.el8.x86_64/drivers/nvme/host/tcp.c: 1814 #10 [ffffb2e9cfdbbe80] nvme_tcp_reconnect_ctrl_work at ffffffffc02df4bf [nvme_tcp] /usr/src/debug/kernel-4.18.0-240.el8/linux-4.18.0-240.el8.x86_64/drivers/nvme/host/tcp.c: 1962 #11 [ffffb2e9cfdbbe98] process_one_work at ffffffffb24d3477 /usr/src/debug/kernel-4.18.0-240.el8/linux-4.18.0-240.el8.x86_64/./arch/x86/include/asm/jump_label.h: 38 #12 [ffffb2e9cfdbbed8] worker_thread at ffffffffb24d3b40 /usr/src/debug/kernel-4.18.0-240.el8/linux-4.18.0-240.el8.x86_64/./include/linux/compiler.h: 193 #13 [ffffb2e9cfdbbf10] kthread at ffffffffb24d9502 /usr/src/debug/kernel-4.18.0-240.el8/linux-4.18.0-240.el8.x86_64/kernel/kthread.c: 280 #14 [ffffb2e9cfdbbf50] ret_from_fork at ffffffffb2e00255 /usr/src/debug/kernel-4.18.0-240.el8/linux-4.18.0-240.el8.x86_64/arch/x86/entry/entry_64.S: 360 nvme/host/tcp.c ..snip 1481 static void nvme_tcp_restore_sock_calls(struct nvme_tcp_queue *queue) 1482 { 1483 >-------struct socket *sock = queue->sock; 1484 1485 >-------write_lock_bh(&sock->sk->sk_callback_lock); 1486 >-------sock->sk->sk_user_data = NULL; 1487 >-------sock->sk->sk_data_ready = queue->data_ready; 1488 >-------sock->sk->sk_state_change = queue->state_change; 1489 >-------sock->sk->sk_write_space = queue->write_space; 1490 >-------write_unlock_bh(&sock->sk->sk_callback_lock); 1491 } ..snip NULL pointer dereference at 0x230 → 560 decimal crash> struct sock -o struct sock { [0] struct sock_common __sk_common; … ... … [560] rwlock_t sk_callback_lock; _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: nvme_tcp BUG: unable to handle kernel NULL pointer dereference at 0000000000000230 2021-06-02 12:28 ` Engel, Amit @ 2021-06-08 23:39 ` Sagi Grimberg 2021-06-09 7:48 ` Engel, Amit 0 siblings, 1 reply; 11+ messages in thread From: Sagi Grimberg @ 2021-06-08 23:39 UTC (permalink / raw) To: Engel, Amit, linux-nvme@lists.infradead.org; +Cc: Anner, Ran, Grupi, Elad > Hi Sagi, > > A correction to the below analysis: > It seems like sock->sk is NULL and not queue->sock > > As part of _nvme_tcp_stop_queue > kernel_sock_shutdown and nvme_tcp_restore_sock_calls are being called: > kernel_sock_shutdown leads to nvme_tcp_state_change which will trigger err_work (nvme_tcp_error_recovery_work) > > As part of nvme_tcp_error_recovery_work, nvme_tcp_free_queue is being called which releases the socket (sock_release) > > In our case, based on the below bt: > nvme_tcp_error_recovery_work is being triggered (and so sock_release) before nvme_tcp_restore_sock_calls , which end up with NULL dereference pointer at 'rwlock_t sk_callback_lock' ? > > Can you please review and provide your inputs for this potential race ? Seems that RH8.3 is missing the mutex protection on nvme_tcp_stop_queue. I'm assuming it doesn't happen upstream? _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 11+ messages in thread
* RE: nvme_tcp BUG: unable to handle kernel NULL pointer dereference at 0000000000000230 2021-06-08 23:39 ` Sagi Grimberg @ 2021-06-09 7:48 ` Engel, Amit 2021-06-09 8:04 ` Sagi Grimberg 0 siblings, 1 reply; 11+ messages in thread From: Engel, Amit @ 2021-06-09 7:48 UTC (permalink / raw) To: Sagi Grimberg, linux-nvme@lists.infradead.org; +Cc: Anner, Ran, Grupi, Elad Hi Sagi, Indeed RHEL8.3 does not have the mutex protection on nvme_tcp_stop_queue However, in our case, based on the below back trace We don't get to __nvme_tcp_stop_queue from nvme_tcp_stop_queue We get to it from: nvme_tcp_reconnect_ctrl_work --> nvme_tcp_setup_ctrl --> nvme_tcp_start_queue --> __nvme_tcp_stop_queue so I'm not sure how this mutex protection will help in this case crash> bt -l PID: 193053 TASK: ffff9491bdad17c0 CPU: 7 COMMAND: "kworker/u193:9" #0 [ffffb2e9cfdbbb70] machine_kexec at ffffffffb245bf3e /usr/src/debug/kernel-4.18.0-240.el8/linux-4.18.0-240.el8.x86_64/arch/x86/kernel/machine_kexec_64.c: 389 #1 [ffffb2e9cfdbbbc8] __crash_kexec at ffffffffb256072d /usr/src/debug/kernel-4.18.0-240.el8/linux-4.18.0-240.el8.x86_64/kernel/kexec_core.c: 956 #2 [ffffb2e9cfdbbc90] crash_kexec at ffffffffb256160d /usr/src/debug/kernel-4.18.0-240.el8/linux-4.18.0-240.el8.x86_64/./include/linux/compiler.h: 219 #3 [ffffb2e9cfdbbca8] oops_end at ffffffffb2422d4d /usr/src/debug/kernel-4.18.0-240.el8/linux-4.18.0-240.el8.x86_64/arch/x86/kernel/dumpstack.c: 334 #4 [ffffb2e9cfdbbcc8] no_context at ffffffffb246ba9e /usr/src/debug/kernel-4.18.0-240.el8/linux-4.18.0-240.el8.x86_64/arch/x86/mm/fault.c: 773 #5 [ffffb2e9cfdbbd20] do_page_fault at ffffffffb246c5c2 /usr/src/debug/kernel-4.18.0-240.el8/linux-4.18.0-240.el8.x86_64/./arch/x86/include/asm/jump_label.h: 38 #6 [ffffb2e9cfdbbd50] page_fault at ffffffffb2e0122e /usr/src/debug/kernel-4.18.0-240.el8/linux-4.18.0-240.el8.x86_64/arch/x86/entry/entry_64.S: 1183 [exception RIP: _raw_write_lock_bh+23] RIP: ffffffffb2cd6cc7 RSP: ffffb2e9cfdbbe00 RFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff94b2aefb4000 RCX: 0000000000000003 RDX: 00000000000000ff RSI: 00000000fffffe01 RDI: 0000000000000230 RBP: ffff94923f793f40 R8: ffff9492ff1ea7f8 R9: 0000000000000000 R10: 0000000000000000 R11: ffff9492ff1e8c64 R12: ffff94b2b7210338 R13: 0000000000000000 R14: ffff94b27f7a4100 R15: ffff94b2b72110a0 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 /usr/src/debug/kernel-4.18.0-240.el8/linux-4.18.0-240.el8.x86_64/./arch/x86/include/asm/atomic.h: 194 #7 [ffffb2e9cfdbbe00] __nvme_tcp_stop_queue at ffffffffc02dc0aa [nvme_tcp] /usr/src/debug/kernel-4.18.0-240.el8/linux-4.18.0-240.el8.x86_64/drivers/nvme/host/tcp.c: 1486 #8 [ffffb2e9cfdbbe18] nvme_tcp_start_queue at ffffffffc02dcd18 [nvme_tcp] /usr/src/debug/kernel-4.18.0-240.el8/linux-4.18.0-240.el8.x86_64/drivers/nvme/host/tcp.c: 1525 #9 [ffffb2e9cfdbbe38] nvme_tcp_setup_ctrl at ffffffffc02df258 [nvme_tcp] /usr/src/debug/kernel-4.18.0-240.el8/linux-4.18.0-240.el8.x86_64/drivers/nvme/host/tcp.c: 1814 #10 [ffffb2e9cfdbbe80] nvme_tcp_reconnect_ctrl_work at ffffffffc02df4bf [nvme_tcp] /usr/src/debug/kernel-4.18.0-240.el8/linux-4.18.0-240.el8.x86_64/drivers/nvme/host/tcp.c: 1962 #11 [ffffb2e9cfdbbe98] process_one_work at ffffffffb24d3477 /usr/src/debug/kernel-4.18.0-240.el8/linux-4.18.0-240.el8.x86_64/./arch/x86/include/asm/jump_label.h: 38 #12 [ffffb2e9cfdbbed8] worker_thread at ffffffffb24d3b40 /usr/src/debug/kernel-4.18.0-240.el8/linux-4.18.0-240.el8.x86_64/./include/linux/compiler.h: 193 #13 [ffffb2e9cfdbbf10] kthread at ffffffffb24d9502 /usr/src/debug/kernel-4.18.0-240.el8/linux-4.18.0-240.el8.x86_64/kernel/kthread.c: 280 #14 [ffffb2e9cfdbbf50] ret_from_fork at ffffffffb2e00255 /usr/src/debug/kernel-4.18.0-240.el8/linux-4.18.0-240.el8.x86_64/arch/x86/entry/entry_64.S: 360 -----Original Message----- From: Sagi Grimberg <sagi@grimberg.me> Sent: Wednesday, June 9, 2021 2:39 AM To: Engel, Amit; linux-nvme@lists.infradead.org Cc: Anner, Ran; Grupi, Elad Subject: Re: nvme_tcp BUG: unable to handle kernel NULL pointer dereference at 0000000000000230 [EXTERNAL EMAIL] > Hi Sagi, > > A correction to the below analysis: > It seems like sock->sk is NULL and not queue->sock > > As part of _nvme_tcp_stop_queue > kernel_sock_shutdown and nvme_tcp_restore_sock_calls are being called: > kernel_sock_shutdown leads to nvme_tcp_state_change which will trigger err_work (nvme_tcp_error_recovery_work) > > As part of nvme_tcp_error_recovery_work, nvme_tcp_free_queue is being called which releases the socket (sock_release) > > In our case, based on the below bt: > nvme_tcp_error_recovery_work is being triggered (and so sock_release) before nvme_tcp_restore_sock_calls , which end up with NULL dereference pointer at 'rwlock_t sk_callback_lock' ? > > Can you please review and provide your inputs for this potential race ? Seems that RH8.3 is missing the mutex protection on nvme_tcp_stop_queue. I'm assuming it doesn't happen upstream? _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: nvme_tcp BUG: unable to handle kernel NULL pointer dereference at 0000000000000230 2021-06-09 7:48 ` Engel, Amit @ 2021-06-09 8:04 ` Sagi Grimberg 2021-06-09 8:39 ` Engel, Amit 0 siblings, 1 reply; 11+ messages in thread From: Sagi Grimberg @ 2021-06-09 8:04 UTC (permalink / raw) To: Engel, Amit, linux-nvme@lists.infradead.org; +Cc: Anner, Ran, Grupi, Elad > Hi Sagi, > > Indeed RHEL8.3 does not have the mutex protection on nvme_tcp_stop_queue > However, in our case, based on the below back trace > We don't get to __nvme_tcp_stop_queue from nvme_tcp_stop_queue > We get to it from: > nvme_tcp_reconnect_ctrl_work --> nvme_tcp_setup_ctrl --> nvme_tcp_start_queue --> __nvme_tcp_stop_queue > > so I'm not sure how this mutex protection will help in this case Oh, well iirc we probably need the same mutex protection in start failure case then? -- diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c index 216d21a6a165..00dff3654e6f 100644 --- a/drivers/nvme/host/tcp.c +++ b/drivers/nvme/host/tcp.c @@ -1548,6 +1548,7 @@ static void nvme_tcp_stop_queue(struct nvme_ctrl *nctrl, int qid) static int nvme_tcp_start_queue(struct nvme_ctrl *nctrl, int idx) { struct nvme_tcp_ctrl *ctrl = to_tcp_ctrl(nctrl); + struct nvme_tcp_queue *queue = &ctrl->queues[idx]; int ret; if (idx) @@ -1556,10 +1557,12 @@ static int nvme_tcp_start_queue(struct nvme_ctrl *nctrl, int idx) ret = nvmf_connect_admin_queue(nctrl); if (!ret) { - set_bit(NVME_TCP_Q_LIVE, &ctrl->queues[idx].flags); + set_bit(NVME_TCP_Q_LIVE, &queue->flags); } else { - if (test_bit(NVME_TCP_Q_ALLOCATED, &ctrl->queues[idx].flags)) - __nvme_tcp_stop_queue(&ctrl->queues[idx]); + mutex_lock(&queue->queue_lock); + if (test_bit(NVME_TCP_Q_ALLOCATED, &queue->flags)) + __nvme_tcp_stop_queue(queue); + mutex_unlock(&queue->queue_lock); dev_err(nctrl->device, "failed to connect queue: %d ret=%d\n", idx, ret); } -- _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply related [flat|nested] 11+ messages in thread
* RE: nvme_tcp BUG: unable to handle kernel NULL pointer dereference at 0000000000000230 2021-06-09 8:04 ` Sagi Grimberg @ 2021-06-09 8:39 ` Engel, Amit 2021-06-09 9:11 ` Sagi Grimberg 0 siblings, 1 reply; 11+ messages in thread From: Engel, Amit @ 2021-06-09 8:39 UTC (permalink / raw) To: Sagi Grimberg, linux-nvme@lists.infradead.org; +Cc: Anner, Ran, Grupi, Elad Im not sure that using the queue_lock mutex ill help The race in this case is between sock_release and nvme_tcp_restore_sock_calls sock_release is being called as part of nvme_tcp_free_queue which is destroying the mutex -----Original Message----- From: Sagi Grimberg <sagi@grimberg.me> Sent: Wednesday, June 9, 2021 11:05 AM To: Engel, Amit; linux-nvme@lists.infradead.org Cc: Anner, Ran; Grupi, Elad Subject: Re: nvme_tcp BUG: unable to handle kernel NULL pointer dereference at 0000000000000230 [EXTERNAL EMAIL] > Hi Sagi, > > Indeed RHEL8.3 does not have the mutex protection on > nvme_tcp_stop_queue However, in our case, based on the below back > trace We don't get to __nvme_tcp_stop_queue from nvme_tcp_stop_queue > We get to it from: > nvme_tcp_reconnect_ctrl_work --> nvme_tcp_setup_ctrl --> > nvme_tcp_start_queue --> __nvme_tcp_stop_queue > > so I'm not sure how this mutex protection will help in this case Oh, well iirc we probably need the same mutex protection in start failure case then? -- diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c index 216d21a6a165..00dff3654e6f 100644 --- a/drivers/nvme/host/tcp.c +++ b/drivers/nvme/host/tcp.c @@ -1548,6 +1548,7 @@ static void nvme_tcp_stop_queue(struct nvme_ctrl *nctrl, int qid) static int nvme_tcp_start_queue(struct nvme_ctrl *nctrl, int idx) { struct nvme_tcp_ctrl *ctrl = to_tcp_ctrl(nctrl); + struct nvme_tcp_queue *queue = &ctrl->queues[idx]; int ret; if (idx) @@ -1556,10 +1557,12 @@ static int nvme_tcp_start_queue(struct nvme_ctrl *nctrl, int idx) ret = nvmf_connect_admin_queue(nctrl); if (!ret) { - set_bit(NVME_TCP_Q_LIVE, &ctrl->queues[idx].flags); + set_bit(NVME_TCP_Q_LIVE, &queue->flags); } else { - if (test_bit(NVME_TCP_Q_ALLOCATED, &ctrl->queues[idx].flags)) - __nvme_tcp_stop_queue(&ctrl->queues[idx]); + mutex_lock(&queue->queue_lock); + if (test_bit(NVME_TCP_Q_ALLOCATED, &queue->flags)) + __nvme_tcp_stop_queue(queue); + mutex_unlock(&queue->queue_lock); dev_err(nctrl->device, "failed to connect queue: %d ret=%d\n", idx, ret); } -- _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: nvme_tcp BUG: unable to handle kernel NULL pointer dereference at 0000000000000230 2021-06-09 8:39 ` Engel, Amit @ 2021-06-09 9:11 ` Sagi Grimberg 2021-06-09 11:14 ` Engel, Amit 0 siblings, 1 reply; 11+ messages in thread From: Sagi Grimberg @ 2021-06-09 9:11 UTC (permalink / raw) To: Engel, Amit, linux-nvme@lists.infradead.org; +Cc: Anner, Ran, Grupi, Elad > Im not sure that using the queue_lock mutex ill help > The race in this case is between sock_release and nvme_tcp_restore_sock_calls > sock_release is being called as part of nvme_tcp_free_queue which is destroying the mutex Maybe I'm not understanding the issue here. What is the scenario again? stop_queue is called (ctx1), that triggers error_recovery (ctx2) which then calls free_queue before ctx1 gets to restore sock callbacks? err_work will first stop the queues before freeing them, so it will serialize behind ctx1. What am I missing? _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 11+ messages in thread
* RE: nvme_tcp BUG: unable to handle kernel NULL pointer dereference at 0000000000000230 2021-06-09 9:11 ` Sagi Grimberg @ 2021-06-09 11:14 ` Engel, Amit 2021-06-10 8:44 ` Engel, Amit 2021-06-10 20:03 ` Sagi Grimberg 0 siblings, 2 replies; 11+ messages in thread From: Engel, Amit @ 2021-06-09 11:14 UTC (permalink / raw) To: Sagi Grimberg, linux-nvme@lists.infradead.org; +Cc: Anner, Ran, Grupi, Elad Correct, free_queue is being called (sock->sk becomes NULL) before restore_sock_calls When restore_sock_calls is called, we fail on 'write_lock_bh(&sock->sk->sk_callback_lock)' NULL pointer dereference at 0x230 → 560 decimal crash> struct sock -o struct sock { [0] struct sock_common __sk_common; … ... … [560] rwlock_t sk_callback_lock; stop queue in ctx2 does not really do anything since 'NVME_TCP_Q_LIVE' bit is already cleared (by ctx1). can you please explain how stop the queue before free helps to serialize ctx1 ? The race we are describing is based on the panic bt that I shared. maybe our analysis is not accurate? Thanks, Amit -----Original Message----- From: Sagi Grimberg <sagi@grimberg.me> Sent: Wednesday, June 9, 2021 12:11 PM To: Engel, Amit; linux-nvme@lists.infradead.org Cc: Anner, Ran; Grupi, Elad Subject: Re: nvme_tcp BUG: unable to handle kernel NULL pointer dereference at 0000000000000230 [EXTERNAL EMAIL] > Im not sure that using the queue_lock mutex ill help The race in this > case is between sock_release and nvme_tcp_restore_sock_calls > sock_release is being called as part of nvme_tcp_free_queue which is > destroying the mutex Maybe I'm not understanding the issue here. What is the scenario again? stop_queue is called (ctx1), that triggers error_recovery (ctx2) which then calls free_queue before ctx1 gets to restore sock callbacks? err_work will first stop the queues before freeing them, so it will serialize behind ctx1. What am I missing? _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 11+ messages in thread
* RE: nvme_tcp BUG: unable to handle kernel NULL pointer dereference at 0000000000000230 2021-06-09 11:14 ` Engel, Amit @ 2021-06-10 8:44 ` Engel, Amit 2021-06-10 20:03 ` Sagi Grimberg 1 sibling, 0 replies; 11+ messages in thread From: Engel, Amit @ 2021-06-10 8:44 UTC (permalink / raw) To: Sagi Grimberg, linux-nvme@lists.infradead.org We are trying to reproduce with the upstream code now, will update In addition, I just posted a patch to add mutex lock also to start queue failure case, as suggested by Sagi. Thanks Amit -----Original Message----- From: Engel, Amit Sent: Wednesday, June 9, 2021 2:14 PM To: 'Sagi Grimberg'; linux-nvme@lists.infradead.org Cc: Anner, Ran; Grupi, Elad Subject: RE: nvme_tcp BUG: unable to handle kernel NULL pointer dereference at 0000000000000230 Correct, free_queue is being called (sock->sk becomes NULL) before restore_sock_calls When restore_sock_calls is called, we fail on 'write_lock_bh(&sock->sk->sk_callback_lock)' NULL pointer dereference at 0x230 → 560 decimal crash> struct sock -o struct sock { [0] struct sock_common __sk_common; … ... … [560] rwlock_t sk_callback_lock; stop queue in ctx2 does not really do anything since 'NVME_TCP_Q_LIVE' bit is already cleared (by ctx1). can you please explain how stop the queue before free helps to serialize ctx1 ? The race we are describing is based on the panic bt that I shared. maybe our analysis is not accurate? Thanks, Amit -----Original Message----- From: Sagi Grimberg <sagi@grimberg.me> Sent: Wednesday, June 9, 2021 12:11 PM To: Engel, Amit; linux-nvme@lists.infradead.org Cc: Anner, Ran; Grupi, Elad Subject: Re: nvme_tcp BUG: unable to handle kernel NULL pointer dereference at 0000000000000230 [EXTERNAL EMAIL] > Im not sure that using the queue_lock mutex ill help The race in this > case is between sock_release and nvme_tcp_restore_sock_calls > sock_release is being called as part of nvme_tcp_free_queue which is > destroying the mutex Maybe I'm not understanding the issue here. What is the scenario again? stop_queue is called (ctx1), that triggers error_recovery (ctx2) which then calls free_queue before ctx1 gets to restore sock callbacks? err_work will first stop the queues before freeing them, so it will serialize behind ctx1. What am I missing? _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: nvme_tcp BUG: unable to handle kernel NULL pointer dereference at 0000000000000230 2021-06-09 11:14 ` Engel, Amit 2021-06-10 8:44 ` Engel, Amit @ 2021-06-10 20:03 ` Sagi Grimberg 2021-06-13 8:35 ` Engel, Amit 1 sibling, 1 reply; 11+ messages in thread From: Sagi Grimberg @ 2021-06-10 20:03 UTC (permalink / raw) To: Engel, Amit, linux-nvme@lists.infradead.org; +Cc: Anner, Ran, Grupi, Elad > Correct, free_queue is being called (sock->sk becomes NULL) before restore_sock_calls > > When restore_sock_calls is called, we fail on 'write_lock_bh(&sock->sk->sk_callback_lock)' > > NULL pointer dereference at 0x230 → 560 decimal > crash> struct sock -o > struct sock { > [0] struct sock_common __sk_common; > … > ... > … > [560] rwlock_t sk_callback_lock; > > stop queue in ctx2 does not really do anything since 'NVME_TCP_Q_LIVE' bit is already cleared (by ctx1). > can you please explain how stop the queue before free helps to serialize ctx1 ? What I understood from your description is: 1. ctx1 calls stop_queue - calls kernel_sock_shutdown 2. ctx1 gets to restore_sock_calls (just before) 3. ctx2 is triggered from state_change - scheduling err_work 4. ctx2 does stop_queues 5. ctx2 calls destroy_queues -> there does sock_release 6. ctx1 does frwd progress and access an already freed sk Hence with the mutex protection, ctx2 will be serialized on step (4) until ctx2 releases the mutex and hence cannot get to step (5) but only after ctx1 releases the mutex, in step (6). But maybe I'm not interpreting this correctly? _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 11+ messages in thread
* RE: nvme_tcp BUG: unable to handle kernel NULL pointer dereference at 0000000000000230 2021-06-10 20:03 ` Sagi Grimberg @ 2021-06-13 8:35 ` Engel, Amit 0 siblings, 0 replies; 11+ messages in thread From: Engel, Amit @ 2021-06-13 8:35 UTC (permalink / raw) To: Sagi Grimberg, linux-nvme@lists.infradead.org; +Cc: Anner, Ran, Grupi, Elad Hi Sagi, after revisit the upstream code, I agree with your analysis. We are trying now to run rhel8.3 nvme host with the missing mutex_lock commit. Anyway, I applied a patch that uses the same mutex_lock also for start_queue failure case. In this case, nvme_tcp_start_queue calls __nvme_tcp_stop_queue that should be protected by the same mutex_lock. Thank you for your help Amit -----Original Message----- From: Sagi Grimberg <sagi@grimberg.me> Sent: Thursday, June 10, 2021 11:03 PM To: Engel, Amit; linux-nvme@lists.infradead.org Cc: Anner, Ran; Grupi, Elad Subject: Re: nvme_tcp BUG: unable to handle kernel NULL pointer dereference at 0000000000000230 [EXTERNAL EMAIL] > Correct, free_queue is being called (sock->sk becomes NULL) before > restore_sock_calls > > When restore_sock_calls is called, we fail on 'write_lock_bh(&sock->sk->sk_callback_lock)' > > NULL pointer dereference at 0x230 → 560 decimal > crash> struct sock -o > struct sock { > [0] struct sock_common __sk_common; > … > ... > … > [560] rwlock_t sk_callback_lock; > > stop queue in ctx2 does not really do anything since 'NVME_TCP_Q_LIVE' bit is already cleared (by ctx1). > can you please explain how stop the queue before free helps to serialize ctx1 ? What I understood from your description is: 1. ctx1 calls stop_queue - calls kernel_sock_shutdown 2. ctx1 gets to restore_sock_calls (just before) 3. ctx2 is triggered from state_change - scheduling err_work 4. ctx2 does stop_queues 5. ctx2 calls destroy_queues -> there does sock_release 6. ctx1 does frwd progress and access an already freed sk Hence with the mutex protection, ctx2 will be serialized on step (4) until ctx2 releases the mutex and hence cannot get to step (5) but only after ctx1 releases the mutex, in step (6). But maybe I'm not interpreting this correctly? _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2021-06-13 8:36 UTC | newest] Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-06-01 17:51 nvme_tcp BUG: unable to handle kernel NULL pointer dereference at 0000000000000230 Engel, Amit 2021-06-02 12:28 ` Engel, Amit 2021-06-08 23:39 ` Sagi Grimberg 2021-06-09 7:48 ` Engel, Amit 2021-06-09 8:04 ` Sagi Grimberg 2021-06-09 8:39 ` Engel, Amit 2021-06-09 9:11 ` Sagi Grimberg 2021-06-09 11:14 ` Engel, Amit 2021-06-10 8:44 ` Engel, Amit 2021-06-10 20:03 ` Sagi Grimberg 2021-06-13 8:35 ` Engel, Amit
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.