Linux-PCI Archive mirror
 help / color / mirror / Atom feed
From: Bjorn Helgaas <helgaas@kernel.org>
To: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>
Cc: linux-pci@vger.kernel.org, Nirmal Patel <nirmal.patel@linux.intel.com>
Subject: Re: [PATCH v3] PCI: vmd: always enable host bridge hotplug support flags
Date: Thu, 2 May 2024 17:56:08 -0500	[thread overview]
Message-ID: <20240502225608.GA1553882@bhelgaas> (raw)
In-Reply-To: <bafed3eb-f698-41fa-867c-bfec87e429a4@intel.com>

On Thu, May 02, 2024 at 03:38:00PM -0700, Paul M Stillwell Jr wrote:
> On 5/2/2024 3:08 PM, Bjorn Helgaas wrote:
> > On Mon, Apr 08, 2024 at 11:39:27AM -0700, Paul M Stillwell Jr wrote:
> > > Commit 04b12ef163d1 ("PCI: vmd: Honor ACPI _OSC on PCIe features") added
> > > code to copy the _OSC flags from the root bridge to the host bridge for each
> > > vmd device because the AER bits were not being set correctly which was
> > > causing an AER interrupt storm for certain NVMe devices.
> > > 
> > > This works fine in bare metal environments, but causes problems when the
> > > vmd driver is run in a hypervisor environment. In a hypervisor all the
> > > _OSC bits are 0 despite what the underlying hardware indicates. This is
> > > a problem for vmd users because if vmd is enabled the user *always*
> > > wants hotplug support enabled. To solve this issue the vmd driver always
> > > enables the hotplug bits in the host bridge structure for each vmd.
> > > 
> > > Fixes: 04b12ef163d1 ("PCI: vmd: Honor ACPI _OSC on PCIe features")
> > > Signed-off-by: Nirmal Patel <nirmal.patel@linux.intel.com>
> > > Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>
> > > ---
> > >   drivers/pci/controller/vmd.c | 10 ++++++++--
> > >   1 file changed, 8 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c
> > > index 87b7856f375a..583b10bd5eb7 100644
> > > --- a/drivers/pci/controller/vmd.c
> > > +++ b/drivers/pci/controller/vmd.c
> > > @@ -730,8 +730,14 @@ static int vmd_alloc_irqs(struct vmd_dev *vmd)
> > >   static void vmd_copy_host_bridge_flags(struct pci_host_bridge *root_bridge,
> > >   				       struct pci_host_bridge *vmd_bridge)
> > >   {
> > > -	vmd_bridge->native_pcie_hotplug = root_bridge->native_pcie_hotplug;
> > > -	vmd_bridge->native_shpc_hotplug = root_bridge->native_shpc_hotplug;
> > > +	/*
> > > +	 * there is an issue when the vmd driver is running within a hypervisor
> > > +	 * because all of the _OSC bits are 0 in that case. this disables
> > > +	 * hotplug support, but users who enable VMD in their BIOS always want
> > > +	 * hotplug suuport so always enable it.
> > > +	 */
> > > +	vmd_bridge->native_pcie_hotplug = 1;
> > > +	vmd_bridge->native_shpc_hotplug = 1;
> > 
> > Deferred for now because I think we need to figure out how to set all
> > these bits the same, or at least with a better algorithm than "here's
> > what we want in this environment."
> > 
> > Extended discussion about this at
> > https://lore.kernel.org/r/20240417201542.102-1-paul.m.stillwell.jr@intel.com
> 
> That's ok by me. I thought where we left it was that if we could find a
> solution to the Correctable Errors from the original issue that maybe we
> could revert 04b12ef163d1.
> 
> I'm not sure I would know if a patch that fixes the Correctable Errors comes
> in... We have a test case we would like to test against that was pre
> 04b12ef163d1 (BIOS has AER disabled and we hotplug a disk which results in
> AER interrupts) so we would be curious if the issues we saw before goes away
> with a new patch for Correctable Errors.

My current theory is that there's some issue with that particular
Samsung NVMe device that causes the Correctable Error flood.  Kai-Heng
says they happen even with VMD disabled.

And there are other reports that don't seem to involve VMD but do
involve this NVMe device ([144d:a80a]):

  https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1852420
  https://forums.unraid.net/topic/113521-constant-errors-on-logs-after-nvme-upgrade/
  https://forums.unraid.net/topic/118286-nvme-drives-throwing-errors-filling-logs-instantly-how-to-resolve/
  https://forum.proxmox.com/threads/pve-kernel-panics-on-reboots.144481/
  https://www.eevblog.com/forum/general-computing/linux-mint-21-02-clone-replace-1tb-nvme-with-a-2tb-nvme/

NVMe has weird power management stuff, so it's always possible we're
doing something wrong in a driver.

But I think we really need to handle Correctable Errors better by:

  - Possibly having drivers mask errors if they know about defects
  - Making the log messages less alarming, e.g.,  a single line report
  - Rate-limiting them so they're never overwhelming
  - Maybe automatically masking them in the PCI core to avoid interrupts

> > >   	vmd_bridge->native_aer = root_bridge->native_aer;
> > >   	vmd_bridge->native_pme = root_bridge->native_pme;
> > >   	vmd_bridge->native_ltr = root_bridge->native_ltr;
> > > -- 
> > > 2.39.1
> > > 
> > 
> 

      reply	other threads:[~2024-05-02 22:56 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-08 18:39 [PATCH v3] PCI: vmd: always enable host bridge hotplug support flags Paul M Stillwell Jr
2024-05-02 22:08 ` Bjorn Helgaas
2024-05-02 22:38   ` Paul M Stillwell Jr
2024-05-02 22:56     ` Bjorn Helgaas [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240502225608.GA1553882@bhelgaas \
    --to=helgaas@kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=nirmal.patel@linux.intel.com \
    --cc=paul.m.stillwell.jr@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).