All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
From: Keith Busch <kbusch@kernel.org>
To: Nilay Shroff <nilay@linux.ibm.com>
Cc: Sagi Grimberg <sagi@grimberg.me>,
	"linux-nvme@lists.infradead.org" <linux-nvme@lists.infradead.org>,
	Christoph Hellwig <hch@lst.de>, "axboe@fb.com" <axboe@fb.com>,
	Gregory Joyce <gjoyce@ibm.com>,
	Srimannarayana Murthy Maram <msmurthy@imap.linux.ibm.com>
Subject: Re: [Bug Report] PCIe errinject and hot-unplug causes nvme driver hang
Date: Wed, 24 Apr 2024 11:36:02 -0600	[thread overview]
Message-ID: <ZilDAv9y3dSzTPKb@kbusch-mbp.dhcp.thefacebook.com> (raw)
In-Reply-To: <2e10ca7f-0da8-47e4-9bfb-4a6cbf4abaec@linux.ibm.com>

On Tue, Apr 23, 2024 at 03:22:46PM +0530, Nilay Shroff wrote:
> > 
> I tested the above patch, however, it doesn't help to solve the issue.
> I tested it for two cases listed below:
> 
> 1. Platform which doesn't support pci-error-recovery:
> -----------------------------------------------------
> On this platform when nvme_timeout() is invoked, it falls through 
> nvme_shoud_reset()
>   -> nvme_warn_reset() 
>     -> goto disable
> 
> When nvme_timeout() jumps to the label disable, it tries setting the
> controller state to RESETTING but that couldn't succeed because the 
> (logical) hot-unplug/nvme_remove() of the disk is started on another 
> thread and hence controller state has already changed to 
> DELETING/DELETING_NOIO. As nvme_timeout() couldn't set the controller 
> state to RESETTING, nvme_timeout() returns BLK_EH_DONE. In summary, 
> as nvme_timeout() couldn't cancel pending IO, the hot-unplug/nvme_remove() 
> couldn't forward progress and it keeps waiting for request queue to be freezed. 
> 
> 2. Platform supporting pci-error-recovery:
> ------------------------------------------
> Similarly, on this platform as explained for the above case, when 
> nvme_timeout() is invoked, it falls through nvme_shoud_reset()
> -> nvme_warn_reset() -> goto disable. In this case as well, 
> nvme_timeout() returns BLK_EH_DONE. Please note that though this 
> platform supports pci-error-recovery, we couldn't get through 
> nvme_error_detected() because the pci-error-recovery thread is pending 
> on acquiring mutex "pci_lock_rescan_remove". This mutex is acquired by 
> hot-unplug thread before it invokes nvme_remove() and nvme_remove() 
> is currently waiting for request queue to be freezed. For reference,
> I have already captured the task hang traces in previous email of this 
> thread where we could observe these hangs (for both pci-error-recovery
> thread as well as hot-unplig/nvme_remove()).
> 
> I understand that we don't want to cancel pending IO from the nvme_remove()
> unconditionally as if the disk is not physically hot-unplug then we still 
> want to  wait for the in-flight IO to be finished. Also looking through 
> the above cases, I think that the nvme_timeout() might be the code path 
> from where we want to cancel in-flight/pending IO if controller is 
> in the terminal state (i.e. DELETING or DELETING_NOIO). Keeping this idea in
> mind, I have worked out the below patch:
> 
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index 8e0bb9692685..e45a54d84649 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -1286,6 +1286,9 @@ static enum blk_eh_timer_return nvme_timeout(struct request *req)
>         u32 csts = readl(dev->bar + NVME_REG_CSTS);
>         u8 opcode;
>  
> +       if (nvme_state_terminal(&dev->ctrl))
> +               goto disable;
> +
>         /* If PCI error recovery process is happening, we cannot reset or
>          * the recovery mechanism will surely fail.
>          */
> @@ -1390,8 +1393,13 @@ static enum blk_eh_timer_return nvme_timeout(struct request *req)
>         return BLK_EH_RESET_TIMER;
>  
>  disable:
> -       if (!nvme_change_ctrl_state(&dev->ctrl, NVME_CTRL_RESETTING))
> +       if (!nvme_change_ctrl_state(&dev->ctrl, NVME_CTRL_RESETTING)) {
> +               if (nvme_state_terminal(&dev->ctrl)) {
> +                       nvme_dev_disable(dev, false);
> +                       nvme_sync_queues(&dev->ctrl);
> +               }
>                 return BLK_EH_DONE;
> +       }
>  
>         nvme_dev_disable(dev, false);
>         if (nvme_try_sched_reset(&dev->ctrl))
> 
> I have tested the above patch against all possible cases. Please let me know
> if this looks good or if there are any further comments.

This looks okay to me. Just a couple things:

Set nvme_dev_disable's "shutdown" parameter to "true" since we're
restarting the queues again from this state.

Remove "nvme_sync_queues()". I think that would deadlock: sync_queues
waits for the timeout work to complete, but your calling it within the
timeout work, so this would have it wait for itself.


  reply	other threads:[~2024-04-24 17:36 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-18 12:52 [Bug Report] PCIe errinject and hot-unplug causes nvme driver hang Nilay Shroff
2024-04-21 10:28 ` Sagi Grimberg
2024-04-21 16:53   ` Nilay Shroff
2024-04-21 16:56   ` Nilay Shroff
2024-04-22 13:00     ` Sagi Grimberg
2024-04-22 13:52       ` Keith Busch
2024-04-22 14:35         ` Keith Busch
2024-04-23  9:52           ` Nilay Shroff
2024-04-24 17:36             ` Keith Busch [this message]
2024-04-25 13:49               ` Nilay Shroff

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZilDAv9y3dSzTPKb@kbusch-mbp.dhcp.thefacebook.com \
    --to=kbusch@kernel.org \
    --cc=axboe@fb.com \
    --cc=gjoyce@ibm.com \
    --cc=hch@lst.de \
    --cc=linux-nvme@lists.infradead.org \
    --cc=msmurthy@imap.linux.ibm.com \
    --cc=nilay@linux.ibm.com \
    --cc=sagi@grimberg.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.