All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
From: Javier Martinez Canillas <javierm@redhat.com>
To: Bjorn Helgaas <helgaas@kernel.org>
Cc: linux-kernel@vger.kernel.org,
	Peter Robinson <pbrobinson@gmail.com>,
	Shawn Lin <shawn.lin@rock-chips.com>,
	Bjorn Helgaas <bhelgaas@google.com>,
	Heiko Stuebner <heiko@sntech.de>,
	Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>,
	Rob Herring <robh@kernel.org>,
	linux-arm-kernel@lists.infradead.org, linux-pci@vger.kernel.org,
	linux-rockchip@lists.infradead.org,
	Michal Simek <michal.simek@xilinx.com>,
	Ley Foon Tan <ley.foon.tan@intel.com>,
	rfi@lists.rocketboards.org, Jingoo Han <jingoohan1@gmail.com>,
	Thierry Reding <thierry.reding@gmail.com>,
	Jonathan Hunter <jonathanh@nvidia.com>,
	linux-tegra@vger.kernel.org
Subject: Re: [PATCH v2] PCI: rockchip: Avoid accessing PCIe registers with clocks gated
Date: Fri, 25 Jun 2021 09:09:36 +0200	[thread overview]
Message-ID: <5bee3702-595b-f57b-f962-28644b7e646f@redhat.com> (raw)
In-Reply-To: <20210624224040.GA3567297@bjorn-Precision-5520>

Hello Bjorn,

On 6/25/21 12:40 AM, Bjorn Helgaas wrote:
> [+cc Michal, Ley Foon, Jingoo, Thierry, Jonathan]
> 
> On Tue, Jun 08, 2021 at 10:04:09AM +0200, Javier Martinez Canillas wrote:
>> IRQ handlers that are registered for shared interrupts can be called at
>> any time after have been registered using the request_irq() function.
>>
>> It's up to drivers to ensure that's always safe for these to be called.
>>
>> Both the "pcie-sys" and "pcie-client" interrupts are shared, but since
>> their handlers are registered very early in the probe function, an error
>> later can lead to these handlers being executed before all the required
>> resources have been properly setup.
>>
>> For example, the rockchip_pcie_read() function used by these IRQ handlers
>> expects that some PCIe clocks will already be enabled, otherwise trying
>> to access the PCIe registers causes the read to hang and never return.
>>
>> The CONFIG_DEBUG_SHIRQ option tests if drivers are able to cope with their
>> shared interrupt handlers being called, by generating a spurious interrupt
>> just before a shared interrupt handler is unregistered.
>>
>> But this means that if the option is enabled, any error in the probe path
>> of this driver could lead to one of the IRQ handlers to be executed.
> 
> I'm not an IRQ expert, but I think this is an issue regardless of
> CONFIG_DEBUG_SHIRQ, isn't it?  Anything used by an IRQ handler should
> be initialized before the handler is registered.  CONFIG_DEBUG_SHIRQ
> is just a way to help find latent problems.
>

Yes, it's an issue regardless. It's just that this debug option tests if the
drivers aren't making the wrong assumption, exactly to find issues like this.

>> In a rockpro64 board, the following sequence of events happens:
>>
>>   1) "pcie-sys" IRQ is requested and its handler registered.
>>   2) "pcie-client" IRQ is requested and its handler registered.
>>   3) probe later fails due readl_poll_timeout() returning a timeout.
>>   4) the "pcie-sys" IRQ is unregistered.
>>   5) CONFIG_DEBUG_SHIRQ triggers a spurious interrupt.
>>   6) "pcie-client" IRQ handler is called for this spurious interrupt.
>>   7) IRQ handler tries to read PCIE_CLIENT_INT_STATUS with clocks gated.
>>   8) the machine hangs because rockchip_pcie_read() call never returns.
>>
>> To avoid cases like this, the handlers don't have to be registered until
>> very late in the probe function, once all the resources have been setup.
>>
>> So let's just move all the IRQ init before the pci_host_probe() call, that
>> will prevent issues like this and seems to be the correct thing to do too.
> 
> Previously we registered rockchip_pcie_subsys_irq_handler() and
> rockchip_pcie_client_irq_handler() before the PCIe clocks were
> enabled.  That's a problem because they depend on those clocks being
> enabled, and your patch fixes that.
>
> rockchip_pcie_legacy_int_handler() depends on rockchip->irq_domain,
> which isn't initialized until rockchip_pcie_init_irq_domain().
> Previously we registered rockchip_pcie_legacy_int_handler() as the
> handler for the "legacy" IRQ before rockchip_pcie_init_irq_domain().
> 
> I think you patch *also* fixes that problem, right?
>

Correct, that's why I moved the initialization and IRQ enable after that.
 
> I think this is also an issue with the following other drivers.  They all
> set the handler to something that uses an IRQ domain before they
> actually initialize the domain:

Yes, I agreed with your assessment and also noticed that others drivers have
similar issues. I just don't have any of those platforms to try to reproduce
the bugs and test a fix.

Best regards,
-- 
Javier Martinez Canillas
Software Engineer
New Platform Technologies Enablement team
RHEL Engineering


WARNING: multiple messages have this Message-ID (diff)
From: Javier Martinez Canillas <javierm@redhat.com>
To: Bjorn Helgaas <helgaas@kernel.org>
Cc: linux-kernel@vger.kernel.org,
	Peter Robinson <pbrobinson@gmail.com>,
	Shawn Lin <shawn.lin@rock-chips.com>,
	Bjorn Helgaas <bhelgaas@google.com>,
	Heiko Stuebner <heiko@sntech.de>,
	Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>,
	Rob Herring <robh@kernel.org>,
	linux-arm-kernel@lists.infradead.org, linux-pci@vger.kernel.org,
	linux-rockchip@lists.infradead.org,
	Michal Simek <michal.simek@xilinx.com>,
	Ley Foon Tan <ley.foon.tan@intel.com>,
	rfi@lists.rocketboards.org, Jingoo Han <jingoohan1@gmail.com>,
	Thierry Reding <thierry.reding@gmail.com>,
	Jonathan Hunter <jonathanh@nvidia.com>,
	linux-tegra@vger.kernel.org
Subject: Re: [PATCH v2] PCI: rockchip: Avoid accessing PCIe registers with clocks gated
Date: Fri, 25 Jun 2021 09:09:36 +0200	[thread overview]
Message-ID: <5bee3702-595b-f57b-f962-28644b7e646f@redhat.com> (raw)
In-Reply-To: <20210624224040.GA3567297@bjorn-Precision-5520>

Hello Bjorn,

On 6/25/21 12:40 AM, Bjorn Helgaas wrote:
> [+cc Michal, Ley Foon, Jingoo, Thierry, Jonathan]
> 
> On Tue, Jun 08, 2021 at 10:04:09AM +0200, Javier Martinez Canillas wrote:
>> IRQ handlers that are registered for shared interrupts can be called at
>> any time after have been registered using the request_irq() function.
>>
>> It's up to drivers to ensure that's always safe for these to be called.
>>
>> Both the "pcie-sys" and "pcie-client" interrupts are shared, but since
>> their handlers are registered very early in the probe function, an error
>> later can lead to these handlers being executed before all the required
>> resources have been properly setup.
>>
>> For example, the rockchip_pcie_read() function used by these IRQ handlers
>> expects that some PCIe clocks will already be enabled, otherwise trying
>> to access the PCIe registers causes the read to hang and never return.
>>
>> The CONFIG_DEBUG_SHIRQ option tests if drivers are able to cope with their
>> shared interrupt handlers being called, by generating a spurious interrupt
>> just before a shared interrupt handler is unregistered.
>>
>> But this means that if the option is enabled, any error in the probe path
>> of this driver could lead to one of the IRQ handlers to be executed.
> 
> I'm not an IRQ expert, but I think this is an issue regardless of
> CONFIG_DEBUG_SHIRQ, isn't it?  Anything used by an IRQ handler should
> be initialized before the handler is registered.  CONFIG_DEBUG_SHIRQ
> is just a way to help find latent problems.
>

Yes, it's an issue regardless. It's just that this debug option tests if the
drivers aren't making the wrong assumption, exactly to find issues like this.

>> In a rockpro64 board, the following sequence of events happens:
>>
>>   1) "pcie-sys" IRQ is requested and its handler registered.
>>   2) "pcie-client" IRQ is requested and its handler registered.
>>   3) probe later fails due readl_poll_timeout() returning a timeout.
>>   4) the "pcie-sys" IRQ is unregistered.
>>   5) CONFIG_DEBUG_SHIRQ triggers a spurious interrupt.
>>   6) "pcie-client" IRQ handler is called for this spurious interrupt.
>>   7) IRQ handler tries to read PCIE_CLIENT_INT_STATUS with clocks gated.
>>   8) the machine hangs because rockchip_pcie_read() call never returns.
>>
>> To avoid cases like this, the handlers don't have to be registered until
>> very late in the probe function, once all the resources have been setup.
>>
>> So let's just move all the IRQ init before the pci_host_probe() call, that
>> will prevent issues like this and seems to be the correct thing to do too.
> 
> Previously we registered rockchip_pcie_subsys_irq_handler() and
> rockchip_pcie_client_irq_handler() before the PCIe clocks were
> enabled.  That's a problem because they depend on those clocks being
> enabled, and your patch fixes that.
>
> rockchip_pcie_legacy_int_handler() depends on rockchip->irq_domain,
> which isn't initialized until rockchip_pcie_init_irq_domain().
> Previously we registered rockchip_pcie_legacy_int_handler() as the
> handler for the "legacy" IRQ before rockchip_pcie_init_irq_domain().
> 
> I think you patch *also* fixes that problem, right?
>

Correct, that's why I moved the initialization and IRQ enable after that.
 
> I think this is also an issue with the following other drivers.  They all
> set the handler to something that uses an IRQ domain before they
> actually initialize the domain:

Yes, I agreed with your assessment and also noticed that others drivers have
similar issues. I just don't have any of those platforms to try to reproduce
the bugs and test a fix.

Best regards,
-- 
Javier Martinez Canillas
Software Engineer
New Platform Technologies Enablement team
RHEL Engineering


_______________________________________________
Linux-rockchip mailing list
Linux-rockchip@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-rockchip

WARNING: multiple messages have this Message-ID (diff)
From: Javier Martinez Canillas <javierm@redhat.com>
To: Bjorn Helgaas <helgaas@kernel.org>
Cc: linux-kernel@vger.kernel.org,
	Peter Robinson <pbrobinson@gmail.com>,
	Shawn Lin <shawn.lin@rock-chips.com>,
	Bjorn Helgaas <bhelgaas@google.com>,
	Heiko Stuebner <heiko@sntech.de>,
	Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>,
	Rob Herring <robh@kernel.org>,
	linux-arm-kernel@lists.infradead.org, linux-pci@vger.kernel.org,
	linux-rockchip@lists.infradead.org,
	Michal Simek <michal.simek@xilinx.com>,
	Ley Foon Tan <ley.foon.tan@intel.com>,
	rfi@lists.rocketboards.org, Jingoo Han <jingoohan1@gmail.com>,
	Thierry Reding <thierry.reding@gmail.com>,
	Jonathan Hunter <jonathanh@nvidia.com>,
	linux-tegra@vger.kernel.org
Subject: Re: [PATCH v2] PCI: rockchip: Avoid accessing PCIe registers with clocks gated
Date: Fri, 25 Jun 2021 09:09:36 +0200	[thread overview]
Message-ID: <5bee3702-595b-f57b-f962-28644b7e646f@redhat.com> (raw)
In-Reply-To: <20210624224040.GA3567297@bjorn-Precision-5520>

Hello Bjorn,

On 6/25/21 12:40 AM, Bjorn Helgaas wrote:
> [+cc Michal, Ley Foon, Jingoo, Thierry, Jonathan]
> 
> On Tue, Jun 08, 2021 at 10:04:09AM +0200, Javier Martinez Canillas wrote:
>> IRQ handlers that are registered for shared interrupts can be called at
>> any time after have been registered using the request_irq() function.
>>
>> It's up to drivers to ensure that's always safe for these to be called.
>>
>> Both the "pcie-sys" and "pcie-client" interrupts are shared, but since
>> their handlers are registered very early in the probe function, an error
>> later can lead to these handlers being executed before all the required
>> resources have been properly setup.
>>
>> For example, the rockchip_pcie_read() function used by these IRQ handlers
>> expects that some PCIe clocks will already be enabled, otherwise trying
>> to access the PCIe registers causes the read to hang and never return.
>>
>> The CONFIG_DEBUG_SHIRQ option tests if drivers are able to cope with their
>> shared interrupt handlers being called, by generating a spurious interrupt
>> just before a shared interrupt handler is unregistered.
>>
>> But this means that if the option is enabled, any error in the probe path
>> of this driver could lead to one of the IRQ handlers to be executed.
> 
> I'm not an IRQ expert, but I think this is an issue regardless of
> CONFIG_DEBUG_SHIRQ, isn't it?  Anything used by an IRQ handler should
> be initialized before the handler is registered.  CONFIG_DEBUG_SHIRQ
> is just a way to help find latent problems.
>

Yes, it's an issue regardless. It's just that this debug option tests if the
drivers aren't making the wrong assumption, exactly to find issues like this.

>> In a rockpro64 board, the following sequence of events happens:
>>
>>   1) "pcie-sys" IRQ is requested and its handler registered.
>>   2) "pcie-client" IRQ is requested and its handler registered.
>>   3) probe later fails due readl_poll_timeout() returning a timeout.
>>   4) the "pcie-sys" IRQ is unregistered.
>>   5) CONFIG_DEBUG_SHIRQ triggers a spurious interrupt.
>>   6) "pcie-client" IRQ handler is called for this spurious interrupt.
>>   7) IRQ handler tries to read PCIE_CLIENT_INT_STATUS with clocks gated.
>>   8) the machine hangs because rockchip_pcie_read() call never returns.
>>
>> To avoid cases like this, the handlers don't have to be registered until
>> very late in the probe function, once all the resources have been setup.
>>
>> So let's just move all the IRQ init before the pci_host_probe() call, that
>> will prevent issues like this and seems to be the correct thing to do too.
> 
> Previously we registered rockchip_pcie_subsys_irq_handler() and
> rockchip_pcie_client_irq_handler() before the PCIe clocks were
> enabled.  That's a problem because they depend on those clocks being
> enabled, and your patch fixes that.
>
> rockchip_pcie_legacy_int_handler() depends on rockchip->irq_domain,
> which isn't initialized until rockchip_pcie_init_irq_domain().
> Previously we registered rockchip_pcie_legacy_int_handler() as the
> handler for the "legacy" IRQ before rockchip_pcie_init_irq_domain().
> 
> I think you patch *also* fixes that problem, right?
>

Correct, that's why I moved the initialization and IRQ enable after that.
 
> I think this is also an issue with the following other drivers.  They all
> set the handler to something that uses an IRQ domain before they
> actually initialize the domain:

Yes, I agreed with your assessment and also noticed that others drivers have
similar issues. I just don't have any of those platforms to try to reproduce
the bugs and test a fix.

Best regards,
-- 
Javier Martinez Canillas
Software Engineer
New Platform Technologies Enablement team
RHEL Engineering


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2021-06-25  7:09 UTC|newest]

Thread overview: 72+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-08  8:04 [PATCH v2] PCI: rockchip: Avoid accessing PCIe registers with clocks gated Javier Martinez Canillas
2021-06-08  8:04 ` Javier Martinez Canillas
2021-06-08  8:04 ` Javier Martinez Canillas
2021-06-12 22:02 ` Peter Robinson
2021-06-12 22:02   ` Peter Robinson
2021-06-12 22:02   ` Peter Robinson
2021-06-22 10:31 ` Lorenzo Pieralisi
2021-06-22 10:31   ` Lorenzo Pieralisi
2021-06-22 10:31   ` Lorenzo Pieralisi
2021-06-24 21:57 ` Bjorn Helgaas
2021-06-24 21:57   ` Bjorn Helgaas
2021-06-24 21:57   ` Bjorn Helgaas
2021-06-24 23:18   ` Robin Murphy
2021-06-24 23:18     ` Robin Murphy
2021-06-24 23:18     ` Robin Murphy
2021-06-24 23:28     ` Bjorn Helgaas
2021-06-24 23:28       ` Bjorn Helgaas
2021-06-24 23:28       ` Bjorn Helgaas
2021-06-24 23:51       ` Robin Murphy
2021-06-24 23:51         ` Robin Murphy
2021-06-24 23:51         ` Robin Murphy
2021-06-24 22:40 ` Bjorn Helgaas
2021-06-24 22:40   ` Bjorn Helgaas
2021-06-24 22:40   ` Bjorn Helgaas
2021-06-25  7:09   ` Javier Martinez Canillas [this message]
2021-06-25  7:09     ` Javier Martinez Canillas
2021-06-25  7:09     ` Javier Martinez Canillas
2021-06-25 14:32     ` Bjorn Helgaas
2021-06-25 14:32       ` Bjorn Helgaas
2021-06-25 14:32       ` Bjorn Helgaas
2021-06-25 18:34       ` Javier Martinez Canillas
2021-06-25 18:34         ` Javier Martinez Canillas
2021-06-25 18:34         ` Javier Martinez Canillas
2021-06-29  0:38   ` Bjorn Helgaas
2021-06-29  0:38     ` Bjorn Helgaas
2021-06-29  0:38     ` Bjorn Helgaas
2021-06-29  6:17     ` Javier Martinez Canillas
2021-06-29  6:17       ` Javier Martinez Canillas
2021-06-29  6:17       ` Javier Martinez Canillas
2021-06-29 10:52       ` Robin Murphy
2021-06-29 10:52         ` Robin Murphy
2021-06-29 10:52         ` Robin Murphy
2021-06-29 23:14         ` Bjorn Helgaas
2021-06-29 23:14           ` Bjorn Helgaas
2021-06-29 23:14           ` Bjorn Helgaas
2021-06-30  9:44           ` Robin Murphy
2021-06-30  9:44             ` Robin Murphy
2021-06-30  9:44             ` Robin Murphy
2021-06-30 18:49         ` Bjorn Helgaas
2021-06-30 18:49           ` Bjorn Helgaas
2021-06-30 18:49           ` Bjorn Helgaas
2021-06-30 18:59 ` Bjorn Helgaas
2021-06-30 18:59   ` Bjorn Helgaas
2021-06-30 18:59   ` Bjorn Helgaas
2021-06-30 19:59   ` Javier Martinez Canillas
2021-06-30 19:59     ` Javier Martinez Canillas
2021-06-30 19:59     ` Javier Martinez Canillas
2021-06-30 20:30     ` Bjorn Helgaas
2021-06-30 20:30       ` Bjorn Helgaas
2021-06-30 20:30       ` Bjorn Helgaas
2021-06-30 20:46       ` Peter Robinson
2021-06-30 20:46         ` Peter Robinson
2021-06-30 20:46         ` Peter Robinson
2021-06-30 22:09       ` Javier Martinez Canillas
2021-06-30 22:09         ` Javier Martinez Canillas
2021-06-30 22:09         ` Javier Martinez Canillas
2021-07-01 13:59         ` Bjorn Helgaas
2021-07-01 13:59           ` Bjorn Helgaas
2021-07-01 13:59           ` Bjorn Helgaas
2021-07-01 14:59           ` Javier Martinez Canillas
2021-07-01 14:59             ` Javier Martinez Canillas
2021-07-01 14:59             ` Javier Martinez Canillas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5bee3702-595b-f57b-f962-28644b7e646f@redhat.com \
    --to=javierm@redhat.com \
    --cc=bhelgaas@google.com \
    --cc=heiko@sntech.de \
    --cc=helgaas@kernel.org \
    --cc=jingoohan1@gmail.com \
    --cc=jonathanh@nvidia.com \
    --cc=ley.foon.tan@intel.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=linux-rockchip@lists.infradead.org \
    --cc=linux-tegra@vger.kernel.org \
    --cc=lorenzo.pieralisi@arm.com \
    --cc=michal.simek@xilinx.com \
    --cc=pbrobinson@gmail.com \
    --cc=rfi@lists.rocketboards.org \
    --cc=robh@kernel.org \
    --cc=shawn.lin@rock-chips.com \
    --cc=thierry.reding@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.