LKML Archive mirror
 help / color / mirror / Atom feed
* [PATCH] x86_64: restore mask_bits in msi shutdown
@ 2008-04-11 23:26 Yinghai Lu
  2008-04-15  8:12 ` Andrew Morton
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Yinghai Lu @ 2008-04-11 23:26 UTC (permalink / raw
  To: Thomas Gleixner, Andrew Morton, Ingo Molnar, Eric W. Biederman
  Cc: Jeff Garzik, Ayaz Abdulla, kernel list

I can not kexec RHEL 5.1 from 2.6.25-rc3 later

caused by:
commit 89d694b9dbe769ca1004e01db0ca43964806a611
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Mon Feb 18 18:25:17 2008 +0100

    genirq: do not leave interupts enabled on free_irq

    The default_disable() function was changed in commit:

     76d2160147f43f982dfe881404cfde9fd0a9da21
     genirq: do not mask interrupts by default

    It removed the mask function in favour of the default delayed
    interrupt disabling. Unfortunately this also broke the shutdown in
    free_irq() when the last handler is removed from the interrupt for
    those architectures which rely on the default implementations. Now we
    can end up with a enabled interrupt line after the last handler was
    removed, which can result in spurious interrupts.

    Fix this by adding a default_shutdown function, which is only
    installed, when the irqchip implementation does provide neither a
    shutdown nor a disable function.

    [@stable: affected versions: .21 - .24 ]



for MSI, default_shutdown will call mask_bit for msi device. so all mask bits will
left disabled after free_irq.
then if kexec next kernel that only can use msi_enable bit.
all device's MSI can not be used.

So try to restore MSI mask bits that is saved before using msi in first kernel.

Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>

Index: linux-2.6/arch/x86/kernel/io_apic_64.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/io_apic_64.c
+++ linux-2.6/arch/x86/kernel/io_apic_64.c
@@ -2003,6 +2003,14 @@ static void set_msi_irq_affinity(unsigne
 }
 #endif /* CONFIG_SMP */
 
+static void msi_shutdown(unsigned int irq)
+{
+	struct irq_desc *desc = irq_desc + irq;
+
+	msi_restore_mask_bits(irq);
+	desc->status |= IRQ_MASKED;
+}
+
 /*
  * IRQ Chip for MSI PCI/PCI-X/PCI-Express Devices,
  * which implement the MSI or MSI-X Capability Structure.
@@ -2012,6 +2020,7 @@ static struct irq_chip msi_chip = {
 	.unmask		= unmask_msi_irq,
 	.mask		= mask_msi_irq,
 	.ack		= ack_apic_edge,
+	.shutdown	= msi_shutdown,
 #ifdef CONFIG_SMP
 	.set_affinity	= set_msi_irq_affinity,
 #endif
Index: linux-2.6/drivers/pci/msi.c
===================================================================
--- linux-2.6.orig/drivers/pci/msi.c
+++ linux-2.6/drivers/pci/msi.c
@@ -123,6 +123,31 @@ static void msix_flush_writes(unsigned i
 	}
 }
 
+void msi_restore_mask_bits(unsigned int irq)
+{
+	struct msi_desc *entry;
+
+	entry = get_irq_msi(irq);
+	BUG_ON(!entry || !entry->dev);
+	switch (entry->msi_attrib.type) {
+	case PCI_CAP_ID_MSI:
+		if (entry->msi_attrib.maskbit) {
+			int pos;
+			u32 mask_bits;
+
+			pos = (long)entry->mask_base;
+			mask_bits = entry->orig_mask_bits;
+			pci_write_config_dword(entry->dev, pos, mask_bits);
+		}
+		break;
+	case PCI_CAP_ID_MSIX:
+		break;
+	default:
+		BUG();
+		break;
+	}
+}
+
 static void msi_set_mask_bit(unsigned int irq, int flag)
 {
 	struct msi_desc *entry;
@@ -376,6 +401,7 @@ static int msi_capability_init(struct pc
 		pci_read_config_dword(dev,
 			msi_mask_bits_reg(pos, is_64bit_address(control)),
 			&maskbits);
+		entry->orig_mask_bits = maskbits;
 		temp = (1 << multi_msi_capable(control));
 		temp = ((temp - 1) & ~temp);
 		maskbits |= temp;
Index: linux-2.6/include/linux/msi.h
===================================================================
--- linux-2.6.orig/include/linux/msi.h
+++ linux-2.6/include/linux/msi.h
@@ -14,6 +14,7 @@ extern void mask_msi_irq(unsigned int ir
 extern void unmask_msi_irq(unsigned int irq);
 extern void read_msi_msg(unsigned int irq, struct msi_msg *msg);
 extern void write_msi_msg(unsigned int irq, struct msi_msg *msg);
+extern void msi_restore_mask_bits(unsigned int irq);
 
 struct msi_desc {
 	struct {
@@ -30,6 +31,7 @@ struct msi_desc {
 	struct list_head list;
 
 	void __iomem *mask_base;
+	u32 orig_mask_bits;  /* restoring it for freeing */
 	struct pci_dev *dev;
 
 	/* Last set MSI message */

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] x86_64: restore mask_bits in msi shutdown
  2008-04-11 23:26 [PATCH] x86_64: restore mask_bits in msi shutdown Yinghai Lu
@ 2008-04-15  8:12 ` Andrew Morton
  2008-04-15  8:17   ` Yinghai Lu
  2008-04-17  9:19 ` Eric W. Biederman
  2008-04-17 10:15 ` Eric W. Biederman
  2 siblings, 1 reply; 8+ messages in thread
From: Andrew Morton @ 2008-04-15  8:12 UTC (permalink / raw
  To: yhlu.kernel
  Cc: Yinghai Lu, Thomas Gleixner, Ingo Molnar, Eric W. Biederman,
	Jeff Garzik, Ayaz Abdulla, kernel list

On Fri, 11 Apr 2008 16:26:11 -0700 Yinghai Lu <yhlu.kernel.send@gmail.com> wrote:

> I can not kexec RHEL 5.1 from 2.6.25-rc3 later
> 
> caused by:
> commit 89d694b9dbe769ca1004e01db0ca43964806a611
> Author: Thomas Gleixner <tglx@linutronix.de>
> Date:   Mon Feb 18 18:25:17 2008 +0100
> 
>     genirq: do not leave interupts enabled on free_irq
> 
>     The default_disable() function was changed in commit:
> 
>      76d2160147f43f982dfe881404cfde9fd0a9da21
>      genirq: do not mask interrupts by default
> 
>     It removed the mask function in favour of the default delayed
>     interrupt disabling. Unfortunately this also broke the shutdown in
>     free_irq() when the last handler is removed from the interrupt for
>     those architectures which rely on the default implementations. Now we
>     can end up with a enabled interrupt line after the last handler was
>     removed, which can result in spurious interrupts.
> 
>     Fix this by adding a default_shutdown function, which is only
>     installed, when the irqchip implementation does provide neither a
>     shutdown nor a disable function.
> 
>     [@stable: affected versions: .21 - .24 ]
> 
> 
> 
> for MSI, default_shutdown will call mask_bit for msi device. so all mask bits will
> left disabled after free_irq.
> then if kexec next kernel that only can use msi_enable bit.
> all device's MSI can not be used.
> 
> So try to restore MSI mask bits that is saved before using msi in first kernel.
> 
> Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
> 
> Index: linux-2.6/arch/x86/kernel/io_apic_64.c
> ===================================================================
> --- linux-2.6.orig/arch/x86/kernel/io_apic_64.c
> +++ linux-2.6/arch/x86/kernel/io_apic_64.c
> @@ -2003,6 +2003,14 @@ static void set_msi_irq_affinity(unsigne
>  }
>  #endif /* CONFIG_SMP */
>  
> +static void msi_shutdown(unsigned int irq)
> +{
> +	struct irq_desc *desc = irq_desc + irq;
> +
> +	msi_restore_mask_bits(irq);
> +	desc->status |= IRQ_MASKED;
> +}

I wonder if only x86_64 needs this treatment.

> Index: linux-2.6/drivers/pci/msi.c
> ===================================================================
> --- linux-2.6.orig/drivers/pci/msi.c
> +++ linux-2.6/drivers/pci/msi.c
> @@ -123,6 +123,31 @@ static void msix_flush_writes(unsigned i
>  	}
>  }
>  
> +void msi_restore_mask_bits(unsigned int irq)
> +{
> +	struct msi_desc *entry;
> +
> +	entry = get_irq_msi(irq);
> +	BUG_ON(!entry || !entry->dev);
> +	switch (entry->msi_attrib.type) {
> +	case PCI_CAP_ID_MSI:
> +		if (entry->msi_attrib.maskbit) {
> +			int pos;
> +			u32 mask_bits;
> +
> +			pos = (long)entry->mask_base;
> +			mask_bits = entry->orig_mask_bits;
> +			pci_write_config_dword(entry->dev, pos, mask_bits);
> +		}
> +		break;
> +	case PCI_CAP_ID_MSIX:
> +		break;
> +	default:
> +		BUG();
> +		break;
> +	}
> +}

If no other architectures will ever need this, perhaps it is in the wrong
file.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] x86_64: restore mask_bits in msi shutdown
  2008-04-15  8:12 ` Andrew Morton
@ 2008-04-15  8:17   ` Yinghai Lu
  0 siblings, 0 replies; 8+ messages in thread
From: Yinghai Lu @ 2008-04-15  8:17 UTC (permalink / raw
  To: Andrew Morton
  Cc: Yinghai Lu, Thomas Gleixner, Ingo Molnar, Eric W. Biederman,
	Jeff Garzik, Ayaz Abdulla, kernel list

On Tue, Apr 15, 2008 at 1:12 AM, Andrew Morton
<akpm@linux-foundation.org> wrote:
>
> On Fri, 11 Apr 2008 16:26:11 -0700 Yinghai Lu <yhlu.kernel.send@gmail.com> wrote:
>
>  > I can not kexec RHEL 5.1 from 2.6.25-rc3 later
>  >
>  > caused by:
>  > commit 89d694b9dbe769ca1004e01db0ca43964806a611
>  > Author: Thomas Gleixner <tglx@linutronix.de>
>  > Date:   Mon Feb 18 18:25:17 2008 +0100
>  >
>  >     genirq: do not leave interupts enabled on free_irq
>  >
>  >     The default_disable() function was changed in commit:
>  >
>  >      76d2160147f43f982dfe881404cfde9fd0a9da21
>  >      genirq: do not mask interrupts by default
>  >
>  >     It removed the mask function in favour of the default delayed
>  >     interrupt disabling. Unfortunately this also broke the shutdown in
>  >     free_irq() when the last handler is removed from the interrupt for
>  >     those architectures which rely on the default implementations. Now we
>  >     can end up with a enabled interrupt line after the last handler was
>  >     removed, which can result in spurious interrupts.
>  >
>  >     Fix this by adding a default_shutdown function, which is only
>  >     installed, when the irqchip implementation does provide neither a
>  >     shutdown nor a disable function.
>  >
>  >     [@stable: affected versions: .21 - .24 ]
>  >
>  >
>  >
>  > for MSI, default_shutdown will call mask_bit for msi device. so all mask bits will
>  > left disabled after free_irq.
>  > then if kexec next kernel that only can use msi_enable bit.
>  > all device's MSI can not be used.
>  >
>  > So try to restore MSI mask bits that is saved before using msi in first kernel.
>  >
>  > Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
>  >
>  > Index: linux-2.6/arch/x86/kernel/io_apic_64.c
>  > ===================================================================
>  > --- linux-2.6.orig/arch/x86/kernel/io_apic_64.c
>  > +++ linux-2.6/arch/x86/kernel/io_apic_64.c
>  > @@ -2003,6 +2003,14 @@ static void set_msi_irq_affinity(unsigne
>  >  }
>  >  #endif /* CONFIG_SMP */
>  >
>  > +static void msi_shutdown(unsigned int irq)
>  > +{
>  > +     struct irq_desc *desc = irq_desc + irq;
>  > +
>  > +     msi_restore_mask_bits(irq);
>  > +     desc->status |= IRQ_MASKED;
>  > +}
>
>  I wonder if only x86_64 needs this treatment.
>
>
>  > Index: linux-2.6/drivers/pci/msi.c
>  > ===================================================================
>  > --- linux-2.6.orig/drivers/pci/msi.c
>  > +++ linux-2.6/drivers/pci/msi.c
>  > @@ -123,6 +123,31 @@ static void msix_flush_writes(unsigned i
>  >       }
>  >  }
>  >
>  > +void msi_restore_mask_bits(unsigned int irq)
>  > +{
>  > +     struct msi_desc *entry;
>  > +
>  > +     entry = get_irq_msi(irq);
>  > +     BUG_ON(!entry || !entry->dev);
>  > +     switch (entry->msi_attrib.type) {
>  > +     case PCI_CAP_ID_MSI:
>  > +             if (entry->msi_attrib.maskbit) {
>  > +                     int pos;
>  > +                     u32 mask_bits;
>  > +
>  > +                     pos = (long)entry->mask_base;
>  > +                     mask_bits = entry->orig_mask_bits;
>  > +                     pci_write_config_dword(entry->dev, pos, mask_bits);
>  > +             }
>  > +             break;
>  > +     case PCI_CAP_ID_MSIX:
>  > +             break;
>  > +     default:
>  > +             BUG();
>  > +             break;
>  > +     }
>  > +}
>
>  If no other architectures will ever need this, perhaps it is in the wrong
>  file.

need someone has powerpc (64?) etc platform with RHEL 5.1 with PCIe
NIC that is using MSI
to test if they have the same problem with kexec.

YH

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] x86_64: restore mask_bits in msi shutdown
  2008-04-11 23:26 [PATCH] x86_64: restore mask_bits in msi shutdown Yinghai Lu
  2008-04-15  8:12 ` Andrew Morton
@ 2008-04-17  9:19 ` Eric W. Biederman
  2008-04-17  9:44   ` Yinghai Lu
  2008-04-17  9:51   ` Yinghai Lu
  2008-04-17 10:15 ` Eric W. Biederman
  2 siblings, 2 replies; 8+ messages in thread
From: Eric W. Biederman @ 2008-04-17  9:19 UTC (permalink / raw
  To: hlu.kernel
  Cc: Thomas Gleixner, Andrew Morton, Ingo Molnar, Jeff Garzik,
	Ayaz Abdulla, kernel list

Yinghai Lu <yhlu.kernel.send@gmail.com> writes:

> I can not kexec RHEL 5.1 from 2.6.25-rc3 later
>
> caused by:
> commit 89d694b9dbe769ca1004e01db0ca43964806a611
> Author: Thomas Gleixner <tglx@linutronix.de>
> Date:   Mon Feb 18 18:25:17 2008 +0100
>
>     genirq: do not leave interupts enabled on free_irq
>
>     The default_disable() function was changed in commit:
>
>      76d2160147f43f982dfe881404cfde9fd0a9da21
>      genirq: do not mask interrupts by default
>
>     It removed the mask function in favour of the default delayed
>     interrupt disabling. Unfortunately this also broke the shutdown in
>     free_irq() when the last handler is removed from the interrupt for
>     those architectures which rely on the default implementations. Now we
>     can end up with a enabled interrupt line after the last handler was
>     removed, which can result in spurious interrupts.
>
>     Fix this by adding a default_shutdown function, which is only
>     installed, when the irqchip implementation does provide neither a
>     shutdown nor a disable function.
>
>     [@stable: affected versions: .21 - .24 ]
>
>
>
> for MSI, default_shutdown will call mask_bit for msi device. so all mask bits
> will
> left disabled after free_irq.
> then if kexec next kernel that only can use msi_enable bit.
> all device's MSI can not be used.
>
> So try to restore MSI mask bits that is saved before using msi in first kernel.
>
> Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>

Ouch!  In the case of MSI-X this is horrible.  Reenabling an interrupt
line when we are not using it.  That is likely to cause even stranger
things than kexec to fail.

What happens when someone next comes to use that msi interrupt is
a reasonable question.

The PCI standard describes the state the bits in the msi capability
are supposed to be in after reset, and if a driver is going to
assume some state that is the only reasonable state for a driver to
expect the hardware to be in.  So we don't need to perform a
save/restore cycle.

Could you look at having pci_disable_msi reset the mask bit
to it's default state after we have called msi_set_enable(dev, 0)?
Once the msi capability is disabled the mask bit has no affect.

You should be able to ignoring pci_disable_msix for now.

Thanks, 
Eric


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] x86_64: restore mask_bits in msi shutdown
  2008-04-17  9:19 ` Eric W. Biederman
@ 2008-04-17  9:44   ` Yinghai Lu
  2008-04-17  9:51   ` Yinghai Lu
  1 sibling, 0 replies; 8+ messages in thread
From: Yinghai Lu @ 2008-04-17  9:44 UTC (permalink / raw
  To: Eric W. Biederman
  Cc: hlu.kernel, Thomas Gleixner, Andrew Morton, Ingo Molnar,
	Jeff Garzik, Ayaz Abdulla, kernel list

On Thu, Apr 17, 2008 at 2:19 AM, Eric W. Biederman
<ebiederm@xmission.com> wrote:
>
> Yinghai Lu <yhlu.kernel.send@gmail.com> writes:
>
>  > I can not kexec RHEL 5.1 from 2.6.25-rc3 later
>  >
>  > caused by:
>  > commit 89d694b9dbe769ca1004e01db0ca43964806a611
>  > Author: Thomas Gleixner <tglx@linutronix.de>
>  > Date:   Mon Feb 18 18:25:17 2008 +0100
>  >
>  >     genirq: do not leave interupts enabled on free_irq
>  >
>  >     The default_disable() function was changed in commit:
>  >
>  >      76d2160147f43f982dfe881404cfde9fd0a9da21
>  >      genirq: do not mask interrupts by default
>  >
>  >     It removed the mask function in favour of the default delayed
>  >     interrupt disabling. Unfortunately this also broke the shutdown in
>  >     free_irq() when the last handler is removed from the interrupt for
>  >     those architectures which rely on the default implementations. Now we
>  >     can end up with a enabled interrupt line after the last handler was
>  >     removed, which can result in spurious interrupts.
>  >
>  >     Fix this by adding a default_shutdown function, which is only
>  >     installed, when the irqchip implementation does provide neither a
>  >     shutdown nor a disable function.
>  >
>  >     [@stable: affected versions: .21 - .24 ]
>  >
>  >
>  >
>  > for MSI, default_shutdown will call mask_bit for msi device. so all mask bits
>  > will
>  > left disabled after free_irq.
>  > then if kexec next kernel that only can use msi_enable bit.
>  > all device's MSI can not be used.
>  >
>  > So try to restore MSI mask bits that is saved before using msi in first kernel.
>  >
>  > Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
>
>  Ouch!  In the case of MSI-X this is horrible.  Reenabling an interrupt
>  line when we are not using it.  That is likely to cause even stranger
>  things than kexec to fail.

only does save and restore with msi and doesn't do anything with MSIX

+void msi_restore_mask_bits(unsigned int irq)
+{
+       struct msi_desc *entry;
+
+       entry = get_irq_msi(irq);
+       BUG_ON(!entry || !entry->dev);
+       switch (entry->msi_attrib.type) {
+       case PCI_CAP_ID_MSI:
+               if (entry->msi_attrib.maskbit) {
+                       int pos;
+                       u32 mask_bits;
+
+                       pos = (long)entry->mask_base;
+                       mask_bits = entry->orig_mask_bits;
+                       pci_write_config_dword(entry->dev, pos, mask_bits);
+               }
+               break;
+       case PCI_CAP_ID_MSIX:
+               break; ======================> does not restore that for MSIX
+       default:
+               BUG();
+               break;
+       }
+}

YH

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] x86_64: restore mask_bits in msi shutdown
  2008-04-17  9:19 ` Eric W. Biederman
  2008-04-17  9:44   ` Yinghai Lu
@ 2008-04-17  9:51   ` Yinghai Lu
  1 sibling, 0 replies; 8+ messages in thread
From: Yinghai Lu @ 2008-04-17  9:51 UTC (permalink / raw
  To: Eric W. Biederman
  Cc: hlu.kernel, Thomas Gleixner, Andrew Morton, Ingo Molnar,
	Jeff Garzik, Ayaz Abdulla, kernel list

On Thu, Apr 17, 2008 at 2:19 AM, Eric W. Biederman
<ebiederm@xmission.com> wrote:
>
> Yinghai Lu <yhlu.kernel.send@gmail.com> writes:
>
>  > I can not kexec RHEL 5.1 from 2.6.25-rc3 later
>  >
>  > caused by:
>  > commit 89d694b9dbe769ca1004e01db0ca43964806a611
>  > Author: Thomas Gleixner <tglx@linutronix.de>
>  > Date:   Mon Feb 18 18:25:17 2008 +0100
>  >
>  >     genirq: do not leave interupts enabled on free_irq
>  >
>  >     The default_disable() function was changed in commit:
>  >
>  >      76d2160147f43f982dfe881404cfde9fd0a9da21
>  >      genirq: do not mask interrupts by default
>  >
>  >     It removed the mask function in favour of the default delayed
>  >     interrupt disabling. Unfortunately this also broke the shutdown in
>  >     free_irq() when the last handler is removed from the interrupt for
>  >     those architectures which rely on the default implementations. Now we
>  >     can end up with a enabled interrupt line after the last handler was
>  >     removed, which can result in spurious interrupts.
>  >
>  >     Fix this by adding a default_shutdown function, which is only
>  >     installed, when the irqchip implementation does provide neither a
>  >     shutdown nor a disable function.
>  >
>  >     [@stable: affected versions: .21 - .24 ]
>  >
>  >
>  >
>  > for MSI, default_shutdown will call mask_bit for msi device. so all mask bits
>  > will
>  > left disabled after free_irq.
>  > then if kexec next kernel that only can use msi_enable bit.
>  > all device's MSI can not be used.
>  >
>  > So try to restore MSI mask bits that is saved before using msi in first kernel.
>  >
>  > Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
>
>  Ouch!  In the case of MSI-X this is horrible.  Reenabling an interrupt
>  line when we are not using it.  That is likely to cause even stranger
>  things than kexec to fail.
>
>  What happens when someone next comes to use that msi interrupt is
>  a reasonable question.
>
>  The PCI standard describes the state the bits in the msi capability
>  are supposed to be in after reset, and if a driver is going to
>  assume some state that is the only reasonable state for a driver to
>  expect the hardware to be in.  So we don't need to perform a
>  save/restore cycle.
>
>  Could you look at having pci_disable_msi reset the mask bit
>  to it's default state after we have called msi_set_enable(dev, 0)?
>  Once the msi capability is disabled the mask bit has no affect.

but the next kernel (RHEL 5.1) only can enable msi, and it doesn't
touch mask_bits (that is leaved as 0xff by first 2.6.25-rc2 later)
so device (nvidia mcp55 nic) doesn't work. --- if i manually used
setpci to set 0x60 to 0xfe or 0x00. it will work. also if i booted
kernel (RHEL 5.1), that 0x60
is always 0x00, even nic works with MSI.

YH

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] x86_64: restore mask_bits in msi shutdown
  2008-04-11 23:26 [PATCH] x86_64: restore mask_bits in msi shutdown Yinghai Lu
  2008-04-15  8:12 ` Andrew Morton
  2008-04-17  9:19 ` Eric W. Biederman
@ 2008-04-17 10:15 ` Eric W. Biederman
  2008-04-17 19:38   ` Yinghai Lu
  2 siblings, 1 reply; 8+ messages in thread
From: Eric W. Biederman @ 2008-04-17 10:15 UTC (permalink / raw
  To: yhlu.kernel
  Cc: Thomas Gleixner, Andrew Morton, Ingo Molnar, Jeff Garzik,
	Ayaz Abdulla, kernel list

Hmm.  Trying this again with what should be yh's proper address.


Yinghai Lu <yhlu.kernel.send@gmail.com> writes:

> I can not kexec RHEL 5.1 from 2.6.25-rc3 later
>
> caused by:
> commit 89d694b9dbe769ca1004e01db0ca43964806a611
> Author: Thomas Gleixner <tglx@linutronix.de>
> Date:   Mon Feb 18 18:25:17 2008 +0100
>
>     genirq: do not leave interupts enabled on free_irq
>
>     The default_disable() function was changed in commit:
>
>      76d2160147f43f982dfe881404cfde9fd0a9da21
>      genirq: do not mask interrupts by default
>
>     It removed the mask function in favour of the default delayed
>     interrupt disabling. Unfortunately this also broke the shutdown in
>     free_irq() when the last handler is removed from the interrupt for
>     those architectures which rely on the default implementations. Now we
>     can end up with a enabled interrupt line after the last handler was
>     removed, which can result in spurious interrupts.
>
>     Fix this by adding a default_shutdown function, which is only
>     installed, when the irqchip implementation does provide neither a
>     shutdown nor a disable function.
>
>     [@stable: affected versions: .21 - .24 ]
>
>
>
> for MSI, default_shutdown will call mask_bit for msi device. so all mask bits
> will
> left disabled after free_irq.
> then if kexec next kernel that only can use msi_enable bit.
> all device's MSI can not be used.
>
> So try to restore MSI mask bits that is saved before using msi in first kernel.
>
> Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>

Ouch!  In the case of MSI-X this is horrible.  Reenabling an interrupt
line when we are not using it.  That is likely to cause even stranger
things than kexec to fail.

What happens when someone next comes to use that msi interrupt is
a reasonable question.

The PCI standard describes the state the bits in the msi capability
are supposed to be in after reset, and if a driver is going to
assume some state that is the only reasonable state for a driver to
expect the hardware to be in.  So we don't need to perform a
save/restore cycle.

Could you look at having pci_disable_msi reset the mask bit
to it's default state after we have called msi_set_enable(dev, 0)?
Once the msi capability is disabled the mask bit has no affect.

You should be able to ignoring pci_disable_msix for now.

Thanks, 
Eric


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] x86_64: restore mask_bits in msi shutdown
  2008-04-17 10:15 ` Eric W. Biederman
@ 2008-04-17 19:38   ` Yinghai Lu
  0 siblings, 0 replies; 8+ messages in thread
From: Yinghai Lu @ 2008-04-17 19:38 UTC (permalink / raw
  To: Eric W. Biederman
  Cc: Thomas Gleixner, Andrew Morton, Ingo Molnar, Jeff Garzik,
	Ayaz Abdulla, kernel list

On Thu, Apr 17, 2008 at 3:15 AM, Eric W. Biederman
<ebiederm@xmission.com> wrote:
> Hmm.  Trying this again with what should be yh's proper address.
>
>
>
>  Yinghai Lu <yhlu.kernel.send@gmail.com> writes:
>
>
>
> > I can not kexec RHEL 5.1 from 2.6.25-rc3 later
>  >
>  > caused by:
>  > commit 89d694b9dbe769ca1004e01db0ca43964806a611
>  > Author: Thomas Gleixner <tglx@linutronix.de>
>  > Date:   Mon Feb 18 18:25:17 2008 +0100
>  >
>  >     genirq: do not leave interupts enabled on free_irq
>  >
>  >     The default_disable() function was changed in commit:
>  >
>  >      76d2160147f43f982dfe881404cfde9fd0a9da21
>  >      genirq: do not mask interrupts by default
>  >
>  >     It removed the mask function in favour of the default delayed
>  >     interrupt disabling. Unfortunately this also broke the shutdown in
>  >     free_irq() when the last handler is removed from the interrupt for
>  >     those architectures which rely on the default implementations. Now we
>  >     can end up with a enabled interrupt line after the last handler was
>  >     removed, which can result in spurious interrupts.
>  >
>  >     Fix this by adding a default_shutdown function, which is only
>  >     installed, when the irqchip implementation does provide neither a
>  >     shutdown nor a disable function.
>  >
>  >     [@stable: affected versions: .21 - .24 ]
>  >
>  >
>  >
>  > for MSI, default_shutdown will call mask_bit for msi device. so all mask bits
>  > will
>  > left disabled after free_irq.
>  > then if kexec next kernel that only can use msi_enable bit.
>  > all device's MSI can not be used.
>  >
>  > So try to restore MSI mask bits that is saved before using msi in first kernel.
>  >
>  > Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
>
>
> Ouch!  In the case of MSI-X this is horrible.  Reenabling an interrupt
>  line when we are not using it.  That is likely to cause even stranger
>  things than kexec to fail.

yes It doesn't restore that for MSI-X

>
>  What happens when someone next comes to use that msi interrupt is
>  a reasonable question.

it uses MSI, but doesn't touch mask_bits

>
>  The PCI standard describes the state the bits in the msi capability
>  are supposed to be in after reset, and if a driver is going to
>  assume some state that is the only reasonable state for a driver to
>  expect the hardware to be in.  So we don't need to perform a
>  save/restore cycle.
>
>  Could you look at having pci_disable_msi reset the mask bit
>  to it's default state after we have called msi_set_enable(dev, 0)?
>  Once the msi capability is disabled the mask bit has no affect.

but the next kernel (RHEL 5.1) only can enable msi, and it doesn't
touch mask_bits (that is leaved as 0xff by first 2.6.25-rc2 later)
so device (nvidia mcp55 nic) doesn't work. --- if i manually used
setpci to set 0x60 to 0xfe or 0x00. it will work. also if i booted
kernel (RHEL 5.1), that 0x60
is always 0x00, even nic works with MSI.


YH

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2008-04-17 19:38 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-04-11 23:26 [PATCH] x86_64: restore mask_bits in msi shutdown Yinghai Lu
2008-04-15  8:12 ` Andrew Morton
2008-04-15  8:17   ` Yinghai Lu
2008-04-17  9:19 ` Eric W. Biederman
2008-04-17  9:44   ` Yinghai Lu
2008-04-17  9:51   ` Yinghai Lu
2008-04-17 10:15 ` Eric W. Biederman
2008-04-17 19:38   ` Yinghai Lu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).