* [PATCH] Documentation: PCI: pci-error-recovery: rearrange the general sequence
@ 2021-06-18 6:04 Wesley Sheng
2021-06-18 7:21 ` Oliver O'Halloran
2021-07-01 22:22 ` Bjorn Helgaas
0 siblings, 2 replies; 5+ messages in thread
From: Wesley Sheng @ 2021-06-18 6:04 UTC (permalink / raw
To: linasvepstas, ruscur, oohall, bhelgaas, corbet, linux-pci,
linuxppc-dev, linux-doc, linux-kernel
Cc: wesleyshenggit, Wesley Sheng
Reset_link() callback function was called before mmio_enabled() in
pcie_do_recovery() function actually, so rearrange the general
sequence betwen step 2 and step 3 accordingly.
Signed-off-by: Wesley Sheng <wesley.sheng@amd.com>
---
Documentation/PCI/pci-error-recovery.rst | 23 ++++++++++++-----------
1 file changed, 12 insertions(+), 11 deletions(-)
diff --git a/Documentation/PCI/pci-error-recovery.rst b/Documentation/PCI/pci-error-recovery.rst
index 187f43a03200..ac6a8729ef28 100644
--- a/Documentation/PCI/pci-error-recovery.rst
+++ b/Documentation/PCI/pci-error-recovery.rst
@@ -184,7 +184,14 @@ is STEP 6 (Permanent Failure).
and prints an error to syslog. A reboot is then required to
get the device working again.
-STEP 2: MMIO Enabled
+STEP 2: Link Reset
+------------------
+The platform resets the link. This is a PCI-Express specific step
+and is done whenever a fatal error has been detected that can be
+"solved" by resetting the link.
+
+
+STEP 3: MMIO Enabled
--------------------
The platform re-enables MMIO to the device (but typically not the
DMA), and then calls the mmio_enabled() callback on all affected
@@ -197,8 +204,8 @@ information, if any, and eventually do things like trigger a device local
reset or some such, but not restart operations. This callback is made if
all drivers on a segment agree that they can try to recover and if no automatic
link reset was performed by the HW. If the platform can't just re-enable IOs
-without a slot reset or a link reset, it will not call this callback, and
-instead will have gone directly to STEP 3 (Link Reset) or STEP 4 (Slot Reset)
+without a slot reset, it will not call this callback, and
+instead will have gone directly or STEP 4 (Slot Reset)
.. note::
@@ -210,7 +217,7 @@ instead will have gone directly to STEP 3 (Link Reset) or STEP 4 (Slot Reset)
such an error might cause IOs to be re-blocked for the whole
segment, and thus invalidate the recovery that other devices
on the same segment might have done, forcing the whole segment
- into one of the next states, that is, link reset or slot reset.
+ into next states, that is, slot reset.
The driver should return one of the following result codes:
- PCI_ERS_RESULT_RECOVERED
@@ -233,17 +240,11 @@ The driver should return one of the following result codes:
The next step taken depends on the results returned by the drivers.
If all drivers returned PCI_ERS_RESULT_RECOVERED, then the platform
-proceeds to either STEP3 (Link Reset) or to STEP 5 (Resume Operations).
+proceeds to STEP 5 (Resume Operations).
If any driver returned PCI_ERS_RESULT_NEED_RESET, then the platform
proceeds to STEP 4 (Slot Reset)
-STEP 3: Link Reset
-------------------
-The platform resets the link. This is a PCI-Express specific step
-and is done whenever a fatal error has been detected that can be
-"solved" by resetting the link.
-
STEP 4: Slot Reset
------------------
--
2.25.1
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH] Documentation: PCI: pci-error-recovery: rearrange the general sequence
2021-06-18 6:04 [PATCH] Documentation: PCI: pci-error-recovery: rearrange the general sequence Wesley Sheng
@ 2021-06-18 7:21 ` Oliver O'Halloran
2021-06-29 3:34 ` Wesley Sheng
2021-07-01 22:22 ` Bjorn Helgaas
1 sibling, 1 reply; 5+ messages in thread
From: Oliver O'Halloran @ 2021-06-18 7:21 UTC (permalink / raw
To: Wesley Sheng
Cc: linasvepstas, Russell Currey, Bjorn Helgaas, Jonathan Corbet,
linux-pci, linuxppc-dev, linux-doc, Linux Kernel Mailing List,
wesleyshenggit
On Fri, Jun 18, 2021 at 4:05 PM Wesley Sheng <wesley.sheng@amd.com> wrote:
>
> Reset_link() callback function was called before mmio_enabled() in
> pcie_do_recovery() function actually, so rearrange the general
> sequence betwen step 2 and step 3 accordingly.
I don't think this is true in all cases. If pcie_do_recovery() is
called with state==pci_channel_io_normal (i.e. non-fatal AER) the link
won't be reset. EEH (ppc PCI error recovery thing) also uses
.mmio_enabled() as described.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] Documentation: PCI: pci-error-recovery: rearrange the general sequence
2021-06-18 7:21 ` Oliver O'Halloran
@ 2021-06-29 3:34 ` Wesley Sheng
0 siblings, 0 replies; 5+ messages in thread
From: Wesley Sheng @ 2021-06-29 3:34 UTC (permalink / raw
To: Oliver O'Halloran, linasvepstas, ruscur, bhelgaas, corbet,
linux-pci, linuxppc-dev, linux-doc, linux-kernel
Cc: wesleyshenggit, wesley.sheng
On Fri, Jun 18, 2021 at 05:21:32PM +1000, Oliver O'Halloran wrote:
> On Fri, Jun 18, 2021 at 4:05 PM Wesley Sheng <wesley.sheng@amd.com> wrote:
> >
> > Reset_link() callback function was called before mmio_enabled() in
> > pcie_do_recovery() function actually, so rearrange the general
> > sequence betwen step 2 and step 3 accordingly.
>
> I don't think this is true in all cases. If pcie_do_recovery() is
> called with state==pci_channel_io_normal (i.e. non-fatal AER) the link
> won't be reset. EEH (ppc PCI error recovery thing) also uses
> .mmio_enabled() as described.
Yes, in case of non-fatal AER, reset_link() callback (aer_root_reset() for
AER and dpc_reset_link() for DPC) will not be invoked. And if
.error_detected() return PCI_ERS_RESULT_CAN_RECOVER, .mmio_enabled() be
called followed.
But if pcie_do_recovery() is called with state == pci_channel_io_frozen,
reset_link() callback is called after .error_detected() but before
.mmio_enabled(). So I thought Step 2: MMIO Enabled and Step 3: Link Reset
should rearrange their sequence.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] Documentation: PCI: pci-error-recovery: rearrange the general sequence
2021-06-18 6:04 [PATCH] Documentation: PCI: pci-error-recovery: rearrange the general sequence Wesley Sheng
2021-06-18 7:21 ` Oliver O'Halloran
@ 2021-07-01 22:22 ` Bjorn Helgaas
2021-07-02 2:41 ` Wesley Sheng
1 sibling, 1 reply; 5+ messages in thread
From: Bjorn Helgaas @ 2021-07-01 22:22 UTC (permalink / raw
To: Wesley Sheng
Cc: linasvepstas, ruscur, oohall, bhelgaas, corbet, linux-pci,
linuxppc-dev, linux-doc, linux-kernel, wesleyshenggit
Please make the subject a little more specific. "rearrange the
general sequence" doesn't say anything about what was affected.
On Fri, Jun 18, 2021 at 02:04:46PM +0800, Wesley Sheng wrote:
> Reset_link() callback function was called before mmio_enabled() in
> pcie_do_recovery() function actually, so rearrange the general
> sequence betwen step 2 and step 3 accordingly.
s/betwen/between/
Not sure "general" adds anything in this sentence. "Step 2 and step
3" are not meaningful here in the commit log. It needs to spell out
what those steps are so the log makes sense by itself.
"reset_link" does not appear in pcie_do_recovery(). I'm guessing
you're referring to the "reset_subordinates" function pointer?
> Signed-off-by: Wesley Sheng <wesley.sheng@amd.com>
I didn't quite understand your response to Oliver, so I'll wait for
your corrections and his ack before proceeding.
> ---
> Documentation/PCI/pci-error-recovery.rst | 23 ++++++++++++-----------
> 1 file changed, 12 insertions(+), 11 deletions(-)
>
> diff --git a/Documentation/PCI/pci-error-recovery.rst b/Documentation/PCI/pci-error-recovery.rst
> index 187f43a03200..ac6a8729ef28 100644
> --- a/Documentation/PCI/pci-error-recovery.rst
> +++ b/Documentation/PCI/pci-error-recovery.rst
> @@ -184,7 +184,14 @@ is STEP 6 (Permanent Failure).
> and prints an error to syslog. A reboot is then required to
> get the device working again.
>
> -STEP 2: MMIO Enabled
> +STEP 2: Link Reset
> +------------------
> +The platform resets the link. This is a PCI-Express specific step
> +and is done whenever a fatal error has been detected that can be
> +"solved" by resetting the link.
> +
> +
> +STEP 3: MMIO Enabled
> --------------------
> The platform re-enables MMIO to the device (but typically not the
> DMA), and then calls the mmio_enabled() callback on all affected
> @@ -197,8 +204,8 @@ information, if any, and eventually do things like trigger a device local
> reset or some such, but not restart operations. This callback is made if
> all drivers on a segment agree that they can try to recover and if no automatic
> link reset was performed by the HW. If the platform can't just re-enable IOs
> -without a slot reset or a link reset, it will not call this callback, and
> -instead will have gone directly to STEP 3 (Link Reset) or STEP 4 (Slot Reset)
> +without a slot reset, it will not call this callback, and
> +instead will have gone directly or STEP 4 (Slot Reset)
s/or/to/ ?
> .. note::
>
> @@ -210,7 +217,7 @@ instead will have gone directly to STEP 3 (Link Reset) or STEP 4 (Slot Reset)
> such an error might cause IOs to be re-blocked for the whole
> segment, and thus invalidate the recovery that other devices
> on the same segment might have done, forcing the whole segment
> - into one of the next states, that is, link reset or slot reset.
> + into next states, that is, slot reset.
s/into next states/into the next state/ ?
> The driver should return one of the following result codes:
> - PCI_ERS_RESULT_RECOVERED
> @@ -233,17 +240,11 @@ The driver should return one of the following result codes:
>
> The next step taken depends on the results returned by the drivers.
> If all drivers returned PCI_ERS_RESULT_RECOVERED, then the platform
> -proceeds to either STEP3 (Link Reset) or to STEP 5 (Resume Operations).
> +proceeds to STEP 5 (Resume Operations).
>
> If any driver returned PCI_ERS_RESULT_NEED_RESET, then the platform
> proceeds to STEP 4 (Slot Reset)
>
> -STEP 3: Link Reset
> -------------------
> -The platform resets the link. This is a PCI-Express specific step
> -and is done whenever a fatal error has been detected that can be
> -"solved" by resetting the link.
> -
> STEP 4: Slot Reset
> ------------------
>
> --
> 2.25.1
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] Documentation: PCI: pci-error-recovery: rearrange the general sequence
2021-07-01 22:22 ` Bjorn Helgaas
@ 2021-07-02 2:41 ` Wesley Sheng
0 siblings, 0 replies; 5+ messages in thread
From: Wesley Sheng @ 2021-07-02 2:41 UTC (permalink / raw
To: linasvepstas, ruscur, oohall, bhelgaas, corbet, linux-pci,
linuxppc-dev, linux-doc, linux-kernel
Cc: wesleyshenggit
On Thu, Jul 01, 2021 at 05:22:31PM -0500, Bjorn Helgaas wrote:
> Please make the subject a little more specific. "rearrange the
> general sequence" doesn't say anything about what was affected.
>
> On Fri, Jun 18, 2021 at 02:04:46PM +0800, Wesley Sheng wrote:
> > Reset_link() callback function was called before mmio_enabled() in
> > pcie_do_recovery() function actually, so rearrange the general
> > sequence betwen step 2 and step 3 accordingly.
>
> s/betwen/between/
>
> Not sure "general" adds anything in this sentence. "Step 2 and step
> 3" are not meaningful here in the commit log. It needs to spell out
> what those steps are so the log makes sense by itself.
>
> "reset_link" does not appear in pcie_do_recovery(). I'm guessing
> you're referring to the "reset_subordinates" function pointer?
>
Yes, you are right.
pcieaer-howto.rst has a section named with "Provide callbacks",
the callback supplied to pcie_do_recovery() was referred to
reset_link.
> > Signed-off-by: Wesley Sheng <wesley.sheng@amd.com>
>
> I didn't quite understand your response to Oliver, so I'll wait for
> your corrections and his ack before proceeding.
>
OK.
I thought step 2 MMIO Enabled and step 3 link reset should swap sequence.
> > ---
> > Documentation/PCI/pci-error-recovery.rst | 23 ++++++++++++-----------
> > 1 file changed, 12 insertions(+), 11 deletions(-)
> >
> > diff --git a/Documentation/PCI/pci-error-recovery.rst b/Documentation/PCI/pci-error-recovery.rst
> > index 187f43a03200..ac6a8729ef28 100644
> > --- a/Documentation/PCI/pci-error-recovery.rst
> > +++ b/Documentation/PCI/pci-error-recovery.rst
> > @@ -184,7 +184,14 @@ is STEP 6 (Permanent Failure).
> > and prints an error to syslog. A reboot is then required to
> > get the device working again.
> >
> > -STEP 2: MMIO Enabled
> > +STEP 2: Link Reset
> > +------------------
> > +The platform resets the link. This is a PCI-Express specific step
> > +and is done whenever a fatal error has been detected that can be
> > +"solved" by resetting the link.
> > +
> > +
> > +STEP 3: MMIO Enabled
> > --------------------
> > The platform re-enables MMIO to the device (but typically not the
> > DMA), and then calls the mmio_enabled() callback on all affected
> > @@ -197,8 +204,8 @@ information, if any, and eventually do things like trigger a device local
> > reset or some such, but not restart operations. This callback is made if
> > all drivers on a segment agree that they can try to recover and if no automatic
> > link reset was performed by the HW. If the platform can't just re-enable IOs
> > -without a slot reset or a link reset, it will not call this callback, and
> > -instead will have gone directly to STEP 3 (Link Reset) or STEP 4 (Slot Reset)
> > +without a slot reset, it will not call this callback, and
> > +instead will have gone directly or STEP 4 (Slot Reset)
>
> s/or/to/ ?
>
> > .. note::
> >
> > @@ -210,7 +217,7 @@ instead will have gone directly to STEP 3 (Link Reset) or STEP 4 (Slot Reset)
> > such an error might cause IOs to be re-blocked for the whole
> > segment, and thus invalidate the recovery that other devices
> > on the same segment might have done, forcing the whole segment
> > - into one of the next states, that is, link reset or slot reset.
> > + into next states, that is, slot reset.
>
> s/into next states/into the next state/ ?
>
> > The driver should return one of the following result codes:
> > - PCI_ERS_RESULT_RECOVERED
> > @@ -233,17 +240,11 @@ The driver should return one of the following result codes:
> >
> > The next step taken depends on the results returned by the drivers.
> > If all drivers returned PCI_ERS_RESULT_RECOVERED, then the platform
> > -proceeds to either STEP3 (Link Reset) or to STEP 5 (Resume Operations).
> > +proceeds to STEP 5 (Resume Operations).
> >
> > If any driver returned PCI_ERS_RESULT_NEED_RESET, then the platform
> > proceeds to STEP 4 (Slot Reset)
> >
> > -STEP 3: Link Reset
> > -------------------
> > -The platform resets the link. This is a PCI-Express specific step
> > -and is done whenever a fatal error has been detected that can be
> > -"solved" by resetting the link.
> > -
> > STEP 4: Slot Reset
> > ------------------
> >
> > --
> > 2.25.1
> >
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2021-07-02 2:42 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-06-18 6:04 [PATCH] Documentation: PCI: pci-error-recovery: rearrange the general sequence Wesley Sheng
2021-06-18 7:21 ` Oliver O'Halloran
2021-06-29 3:34 ` Wesley Sheng
2021-07-01 22:22 ` Bjorn Helgaas
2021-07-02 2:41 ` Wesley Sheng
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).