All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
From: "Engel, Amit" <Amit.Engel@Dell.com>
To: Sagi Grimberg <sagi@grimberg.me>,
	"linux-nvme@lists.infradead.org" <linux-nvme@lists.infradead.org>
Subject: RE: nvme_tcp BUG: unable to handle kernel NULL pointer dereference at 0000000000000230
Date: Thu, 10 Jun 2021 08:44:12 +0000	[thread overview]
Message-ID: <CO1PR19MB48851BDA450C1CAF23965958EE359@CO1PR19MB4885.namprd19.prod.outlook.com> (raw)
In-Reply-To: <CO1PR19MB48859E1CCD596402D4007458EE369@CO1PR19MB4885.namprd19.prod.outlook.com>

We are trying to reproduce with the upstream code now, will update

In addition, I just posted a patch to add mutex lock also to start queue failure case, as suggested by Sagi.

Thanks
Amit 

-----Original Message-----
From: Engel, Amit 
Sent: Wednesday, June 9, 2021 2:14 PM
To: 'Sagi Grimberg'; linux-nvme@lists.infradead.org
Cc: Anner, Ran; Grupi, Elad
Subject: RE: nvme_tcp BUG: unable to handle kernel NULL pointer dereference at 0000000000000230

Correct, free_queue is being called (sock->sk becomes NULL) before restore_sock_calls

When restore_sock_calls is called, we fail on 'write_lock_bh(&sock->sk->sk_callback_lock)' 

NULL pointer dereference at 0x230 → 560 decimal
crash> struct sock -o
struct sock {
   [0] struct sock_common __sk_common;
   …
   ...
   …
   [560] rwlock_t sk_callback_lock;

stop queue in ctx2 does not really do anything since 'NVME_TCP_Q_LIVE' bit is already cleared (by ctx1).
can you please explain how stop the queue before free helps to serialize ctx1 ?

The race we are describing is based on the panic bt that I shared.

maybe our analysis is not accurate? 

Thanks,
Amit

-----Original Message-----
From: Sagi Grimberg <sagi@grimberg.me> 
Sent: Wednesday, June 9, 2021 12:11 PM
To: Engel, Amit; linux-nvme@lists.infradead.org
Cc: Anner, Ran; Grupi, Elad
Subject: Re: nvme_tcp BUG: unable to handle kernel NULL pointer dereference at 0000000000000230


[EXTERNAL EMAIL] 


> Im not sure that using the queue_lock mutex ill help The race in this 
> case is between sock_release and nvme_tcp_restore_sock_calls 
> sock_release is being called as part of nvme_tcp_free_queue which is 
> destroying the mutex

Maybe I'm not understanding the issue here. What is the scenario again?
stop_queue is called (ctx1), that triggers error_recovery (ctx2) which then calls free_queue before ctx1 gets to restore sock callbacks?

err_work will first stop the queues before freeing them, so it will serialize behind ctx1. What am I missing?
_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

  reply	other threads:[~2021-06-10  9:12 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-01 17:51 nvme_tcp BUG: unable to handle kernel NULL pointer dereference at 0000000000000230 Engel, Amit
2021-06-02 12:28 ` Engel, Amit
2021-06-08 23:39   ` Sagi Grimberg
2021-06-09  7:48     ` Engel, Amit
2021-06-09  8:04       ` Sagi Grimberg
2021-06-09  8:39         ` Engel, Amit
2021-06-09  9:11           ` Sagi Grimberg
2021-06-09 11:14             ` Engel, Amit
2021-06-10  8:44               ` Engel, Amit [this message]
2021-06-10 20:03               ` Sagi Grimberg
2021-06-13  8:35                 ` Engel, Amit

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CO1PR19MB48851BDA450C1CAF23965958EE359@CO1PR19MB4885.namprd19.prod.outlook.com \
    --to=amit.engel@dell.com \
    --cc=linux-nvme@lists.infradead.org \
    --cc=sagi@grimberg.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.