All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
From: Keith Busch <kbusch@kernel.org>
To: Sagi Grimberg <sagi@grimberg.me>
Cc: Nilay Shroff <nilay@linux.ibm.com>,
	"linux-nvme@lists.infradead.org" <linux-nvme@lists.infradead.org>,
	Christoph Hellwig <hch@lst.de>, "axboe@fb.com" <axboe@fb.com>,
	Gregory Joyce <gjoyce@ibm.com>,
	Srimannarayana Murthy Maram <msmurthy@imap.linux.ibm.com>
Subject: Re: [Bug Report] PCIe errinject and hot-unplug causes nvme driver hang
Date: Mon, 22 Apr 2024 08:35:04 -0600	[thread overview]
Message-ID: <ZiZ1mB0pE6lBrJkN@kbusch-mbp.dhcp.thefacebook.com> (raw)
In-Reply-To: <ZiZrmSW6s7lY7j98@kbusch-mbp.dhcp.thefacebook.com>

On Mon, Apr 22, 2024 at 07:52:25AM -0600, Keith Busch wrote:
> On Mon, Apr 22, 2024 at 04:00:54PM +0300, Sagi Grimberg wrote:
> > > pci_rescan_remove_lock then it shall be able to recover the pci error and hence
> > > pending IOs could be finished. Later when hot-unplug task starts, it could
> > > forward progress and cleanup all resources used by the nvme disk.
> > > 
> > > So does it make sense if we unconditionally cancel the pending IOs from
> > > nvme_remove() before it forward progress to remove namespaces?
> > 
> > The driver attempts to allow inflights I/O to complete successfully, if the
> > device
> > is still present in the remove stage. I am not sure we want to
> > unconditionally fail these
> > I/Os.    Keith?
> 
> We have a timeout handler to clean this up, but I think it was another
> PPC specific patch that has the timeout handler do nothing if pcie error
> recovery is in progress. Which seems questionable, we should be able to
> concurrently run error handling and timeouts, but I think the error
> handling just needs to syncronize the request_queue's in the
> "error_detected" path.

This:

---
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 8e0bb9692685d..38d0215fe53fc 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -1286,13 +1286,6 @@ static enum blk_eh_timer_return nvme_timeout(struct request *req)
 	u32 csts = readl(dev->bar + NVME_REG_CSTS);
 	u8 opcode;
 
-	/* If PCI error recovery process is happening, we cannot reset or
-	 * the recovery mechanism will surely fail.
-	 */
-	mb();
-	if (pci_channel_offline(to_pci_dev(dev->dev)))
-		return BLK_EH_RESET_TIMER;
-
 	/*
 	 * Reset immediately if the controller is failed
 	 */
@@ -3300,6 +3293,7 @@ static pci_ers_result_t nvme_error_detected(struct pci_dev *pdev,
 			return PCI_ERS_RESULT_DISCONNECT;
 		}
 		nvme_dev_disable(dev, false);
+		nvme_sync_queues(&dev->ctrl);
 		return PCI_ERS_RESULT_NEED_RESET;
 	case pci_channel_io_perm_failure:
 		dev_warn(dev->ctrl.device,
--


  reply	other threads:[~2024-04-22 14:35 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-18 12:52 [Bug Report] PCIe errinject and hot-unplug causes nvme driver hang Nilay Shroff
2024-04-21 10:28 ` Sagi Grimberg
2024-04-21 16:53   ` Nilay Shroff
2024-04-21 16:56   ` Nilay Shroff
2024-04-22 13:00     ` Sagi Grimberg
2024-04-22 13:52       ` Keith Busch
2024-04-22 14:35         ` Keith Busch [this message]
2024-04-23  9:52           ` Nilay Shroff
2024-04-24 17:36             ` Keith Busch
2024-04-25 13:49               ` Nilay Shroff

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZiZ1mB0pE6lBrJkN@kbusch-mbp.dhcp.thefacebook.com \
    --to=kbusch@kernel.org \
    --cc=axboe@fb.com \
    --cc=gjoyce@ibm.com \
    --cc=hch@lst.de \
    --cc=linux-nvme@lists.infradead.org \
    --cc=msmurthy@imap.linux.ibm.com \
    --cc=nilay@linux.ibm.com \
    --cc=sagi@grimberg.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.