From: Bjorn Helgaas <helgaas@kernel.org>
To: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>
Cc: linux-pci@vger.kernel.org, Nirmal Patel <nirmal.patel@linux.intel.com>
Subject: Re: [PATCH v3] PCI: vmd: always enable host bridge hotplug support flags
Date: Thu, 2 May 2024 17:56:08 -0500 [thread overview]
Message-ID: <20240502225608.GA1553882@bhelgaas> (raw)
In-Reply-To: <bafed3eb-f698-41fa-867c-bfec87e429a4@intel.com>
On Thu, May 02, 2024 at 03:38:00PM -0700, Paul M Stillwell Jr wrote:
> On 5/2/2024 3:08 PM, Bjorn Helgaas wrote:
> > On Mon, Apr 08, 2024 at 11:39:27AM -0700, Paul M Stillwell Jr wrote:
> > > Commit 04b12ef163d1 ("PCI: vmd: Honor ACPI _OSC on PCIe features") added
> > > code to copy the _OSC flags from the root bridge to the host bridge for each
> > > vmd device because the AER bits were not being set correctly which was
> > > causing an AER interrupt storm for certain NVMe devices.
> > >
> > > This works fine in bare metal environments, but causes problems when the
> > > vmd driver is run in a hypervisor environment. In a hypervisor all the
> > > _OSC bits are 0 despite what the underlying hardware indicates. This is
> > > a problem for vmd users because if vmd is enabled the user *always*
> > > wants hotplug support enabled. To solve this issue the vmd driver always
> > > enables the hotplug bits in the host bridge structure for each vmd.
> > >
> > > Fixes: 04b12ef163d1 ("PCI: vmd: Honor ACPI _OSC on PCIe features")
> > > Signed-off-by: Nirmal Patel <nirmal.patel@linux.intel.com>
> > > Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>
> > > ---
> > > drivers/pci/controller/vmd.c | 10 ++++++++--
> > > 1 file changed, 8 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c
> > > index 87b7856f375a..583b10bd5eb7 100644
> > > --- a/drivers/pci/controller/vmd.c
> > > +++ b/drivers/pci/controller/vmd.c
> > > @@ -730,8 +730,14 @@ static int vmd_alloc_irqs(struct vmd_dev *vmd)
> > > static void vmd_copy_host_bridge_flags(struct pci_host_bridge *root_bridge,
> > > struct pci_host_bridge *vmd_bridge)
> > > {
> > > - vmd_bridge->native_pcie_hotplug = root_bridge->native_pcie_hotplug;
> > > - vmd_bridge->native_shpc_hotplug = root_bridge->native_shpc_hotplug;
> > > + /*
> > > + * there is an issue when the vmd driver is running within a hypervisor
> > > + * because all of the _OSC bits are 0 in that case. this disables
> > > + * hotplug support, but users who enable VMD in their BIOS always want
> > > + * hotplug suuport so always enable it.
> > > + */
> > > + vmd_bridge->native_pcie_hotplug = 1;
> > > + vmd_bridge->native_shpc_hotplug = 1;
> >
> > Deferred for now because I think we need to figure out how to set all
> > these bits the same, or at least with a better algorithm than "here's
> > what we want in this environment."
> >
> > Extended discussion about this at
> > https://lore.kernel.org/r/20240417201542.102-1-paul.m.stillwell.jr@intel.com
>
> That's ok by me. I thought where we left it was that if we could find a
> solution to the Correctable Errors from the original issue that maybe we
> could revert 04b12ef163d1.
>
> I'm not sure I would know if a patch that fixes the Correctable Errors comes
> in... We have a test case we would like to test against that was pre
> 04b12ef163d1 (BIOS has AER disabled and we hotplug a disk which results in
> AER interrupts) so we would be curious if the issues we saw before goes away
> with a new patch for Correctable Errors.
My current theory is that there's some issue with that particular
Samsung NVMe device that causes the Correctable Error flood. Kai-Heng
says they happen even with VMD disabled.
And there are other reports that don't seem to involve VMD but do
involve this NVMe device ([144d:a80a]):
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1852420
https://forums.unraid.net/topic/113521-constant-errors-on-logs-after-nvme-upgrade/
https://forums.unraid.net/topic/118286-nvme-drives-throwing-errors-filling-logs-instantly-how-to-resolve/
https://forum.proxmox.com/threads/pve-kernel-panics-on-reboots.144481/
https://www.eevblog.com/forum/general-computing/linux-mint-21-02-clone-replace-1tb-nvme-with-a-2tb-nvme/
NVMe has weird power management stuff, so it's always possible we're
doing something wrong in a driver.
But I think we really need to handle Correctable Errors better by:
- Possibly having drivers mask errors if they know about defects
- Making the log messages less alarming, e.g., a single line report
- Rate-limiting them so they're never overwhelming
- Maybe automatically masking them in the PCI core to avoid interrupts
> > > vmd_bridge->native_aer = root_bridge->native_aer;
> > > vmd_bridge->native_pme = root_bridge->native_pme;
> > > vmd_bridge->native_ltr = root_bridge->native_ltr;
> > > --
> > > 2.39.1
> > >
> >
>
prev parent reply other threads:[~2024-05-02 22:56 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-04-08 18:39 [PATCH v3] PCI: vmd: always enable host bridge hotplug support flags Paul M Stillwell Jr
2024-05-02 22:08 ` Bjorn Helgaas
2024-05-02 22:38 ` Paul M Stillwell Jr
2024-05-02 22:56 ` Bjorn Helgaas [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240502225608.GA1553882@bhelgaas \
--to=helgaas@kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=nirmal.patel@linux.intel.com \
--cc=paul.m.stillwell.jr@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).