Re: IBM HS20 Xen 4.1 and 4.2 Critical Interrupt

All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed

* Re: IBM HS20 Xen 4.1 and 4.2 Critical Interrupt - Front panel NMI crash
@ 2013-09-08 14:35 Trenta sis
  2013-09-08 14:41 ` Trenta sis
  0 siblings, 1 reply; 21+ messages in thread
From: Trenta sis @ 2013-09-08 14:35 UTC (permalink / raw
  To: xen-devel; +Cc: arrfab, JBeulich, agya.naila

[-- Attachment #1.1: Type: text/plain, Size: 1514 bytes --]

Hello,

I have the same error, server is auto rebooted during every boot with
kernel XEN, HS20 with Debian Wheezy and XEN hang on and AMM managment show
same errors described in previous mails. With Debian wheezy wit non-xen
kernel boots correcte, it seems that problems is with xen kernel
Same Server HS20 with Debian Lenny+ XEN 3.2 or Debian Squeeze+XEN
4.0 working perfect

Upgraded to Debian testing and unstable with same results XEN 4.1 and 4.2.

If you need more information, you can ask.
How can be solved this bug?

Thanks

On Fri, Feb 08, 2013 at 03:08:08PM +0100, agya naila wrote:

Hello all, Today Xen finally running on IBM blade server machine, try to
add nmi=dom0 and find the Base Board Management Controller on bios
configuration and disabled the 'reboot system on nmi' attribute. This step
won't eliminate the nmi problem since I still found NMI error interrupt on
my blade server log but xen would ignored and keep running. If any other
found better solution would be great.

Thanks for the 'workaround' info.

We still should find out what exactly generates/causes that NMI with Xen..

-- Pasi

Agya

On Thu, Feb 7, 2013 at 9:51 PM, Fabian Arrotin <[1]arr...@centos.org>
wrote:

On 02/06/2013 02:39 PM, agya naila wrote: > I configure it by added
nmi=ignore to my /boot/grub/grub.cfg >

Just to add that I also tried the nmi=ignore parameter for Xen, and it
stills hard reboot/resets automatically during the kernel dom0 boot Fabian

References

Visible links 1. mailto:arr...@centos.org

[-- Attachment #1.2: Type: text/html, Size: 2033 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* IBM HS20 Xen 4.1 and 4.2 Critical Interrupt - Front panel NMI crash
  2013-09-08 14:35 IBM HS20 Xen 4.1 and 4.2 Critical Interrupt - Front panel NMI crash Trenta sis
@ 2013-09-08 14:41 ` Trenta sis
  2013-09-09 19:15   ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 21+ messages in thread
From: Trenta sis @ 2013-09-08 14:41 UTC (permalink / raw
  To: xen-devel; +Cc: arrfab, JBeulich, agya.naila


[-- Attachment #1.1: Type: text/plain, Size: 1571 bytes --]

 Hello,

I have the same error, server is auto rebooted during every boot with
kernel XEN, HS20 with Debian Wheezy and XEN hang on and AMM managment show
same errors described in previous mails. With Debian wheezy wit non-xen
kernel boots correcte, it seems that problems is with xen kernel
Same Server HS20 with Debian Lenny+ XEN 3.2 or Debian Squeeze+XEN
4.0 working perfect

Upgraded to Debian testing and unstable with same results XEN 4.1 and 4.2.

If you need more information, you can ask.
How can be solved this bug?

Thanks




>
>
>
>
> On Fri, Feb 08, 2013 at 03:08:08PM +0100, agya naila wrote:
>
> Hello all, Today Xen finally running on IBM blade server machine, try to
> add nmi=dom0 and find the Base Board Management Controller on bios
> configuration and disabled the 'reboot system on nmi' attribute. This step
> won't eliminate the nmi problem since I still found NMI error interrupt on
> my blade server log but xen would ignored and keep running. If any other
> found better solution would be great.
>
> Thanks for the 'workaround' info.
>
> We still should find out what exactly generates/causes that NMI with Xen..
>
> -- Pasi
>
> Agya
>
> On Thu, Feb 7, 2013 at 9:51 PM, Fabian Arrotin <[1]arr...@centos.org>
> wrote:
>
> On 02/06/2013 02:39 PM, agya naila wrote: > I configure it by added
> nmi=ignore to my /boot/grub/grub.cfg >
>
> Just to add that I also tried the nmi=ignore parameter for Xen, and it
> stills hard reboot/resets automatically during the kernel dom0 boot Fabian
>
> References
>
> Visible links 1. mailto:arr...@centos.org
>

[-- Attachment #1.2: Type: text/html, Size: 2221 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: IBM HS20 Xen 4.1 and 4.2 Critical Interrupt - Front panel NMI crash
  2013-09-08 14:41 ` Trenta sis
@ 2013-09-09 19:15   ` Konrad Rzeszutek Wilk
  2013-09-12 12:47     ` Trenta sis
  0 siblings, 1 reply; 21+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-09-09 19:15 UTC (permalink / raw
  To: Trenta sis; +Cc: arrfab, agya.naila, JBeulich, xen-devel

On Sun, Sep 08, 2013 at 04:41:02PM +0200, Trenta sis wrote:
>  Hello,
> 
> I have the same error, server is auto rebooted during every boot with
> kernel XEN, HS20 with Debian Wheezy and XEN hang on and AMM managment show
> same errors described in previous mails. With Debian wheezy wit non-xen
> kernel boots correcte, it seems that problems is with xen kernel
> Same Server HS20 with Debian Lenny+ XEN 3.2 or Debian Squeeze+XEN
> 4.0 working perfect
> 
> Upgraded to Debian testing and unstable with same results XEN 4.1 and 4.2.
> 
> If you need more information, you can ask.
> How can be solved this bug?

Did you the workaround help?

And in regards to finding out exactly what causes it - well there are
logs in the BMC that can point to it the PCI device? Did you check those?
Do they save if there is any device that has PCI SERR on them?

Thanks.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: IBM HS20 Xen 4.1 and 4.2 Critical Interrupt - Front panel NMI crash
  2013-09-09 19:15   ` Konrad Rzeszutek Wilk
@ 2013-09-12 12:47     ` Trenta sis
  2013-09-23 14:02       ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 21+ messages in thread
From: Trenta sis @ 2013-09-12 12:47 UTC (permalink / raw
  To: Konrad Rzeszutek Wilk; +Cc: arrfab, agya.naila, JBeulich, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 1226 bytes --]

Hello,

We need this server and we have made a downgrade to Debian Squeeze.
I hope in a few day to have another HS20 to make some additional test, I'll
try to get all information that you asked and send
Sorry, one question what is  PCI SERR ? Where?

Thanks for all

2013/9/9 Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

> On Sun, Sep 08, 2013 at 04:41:02PM +0200, Trenta sis wrote:
> >  Hello,
> >
> > I have the same error, server is auto rebooted during every boot with
> > kernel XEN, HS20 with Debian Wheezy and XEN hang on and AMM managment
> show
> > same errors described in previous mails. With Debian wheezy wit non-xen
> > kernel boots correcte, it seems that problems is with xen kernel
> > Same Server HS20 with Debian Lenny+ XEN 3.2 or Debian Squeeze+XEN
> > 4.0 working perfect
> >
> > Upgraded to Debian testing and unstable with same results XEN 4.1 and
> 4.2.
> >
> > If you need more information, you can ask.
> > How can be solved this bug?
>
> Did you the workaround help?
>
> And in regards to finding out exactly what causes it - well there are
> logs in the BMC that can point to it the PCI device? Did you check those?
> Do they save if there is any device that has PCI SERR on them?
>
> Thanks.
>

[-- Attachment #1.2: Type: text/html, Size: 1665 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: IBM HS20 Xen 4.1 and 4.2 Critical Interrupt - Front panel NMI crash
  2013-09-12 12:47     ` Trenta sis
@ 2013-09-23 14:02       ` Konrad Rzeszutek Wilk
  2013-09-29 10:47         ` Trenta sis
  0 siblings, 1 reply; 21+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-09-23 14:02 UTC (permalink / raw
  To: Trenta sis; +Cc: arrfab, agya.naila, JBeulich, xen-devel

On Thu, Sep 12, 2013 at 02:47:39PM +0200, Trenta sis wrote:
> Hello,
> 
> We need this server and we have made a downgrade to Debian Squeeze.
> I hope in a few day to have another HS20 to make some additional test, I'll
> try to get all information that you asked and send
> Sorry, one question what is  PCI SERR ? Where?

If you log in the BladeCenter webfrontend you should see logs of
each blade. Some of them are 'User XYZ logged in'. But in some cases
the are more serious ones - such an NMI or PCI SERR. If you could copy-n-paste
them it could help in figuring which PCI device is responsible for causing
the NMI.

> 
> Thanks for all
> 
> 2013/9/9 Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> 
> > On Sun, Sep 08, 2013 at 04:41:02PM +0200, Trenta sis wrote:
> > >  Hello,
> > >
> > > I have the same error, server is auto rebooted during every boot with
> > > kernel XEN, HS20 with Debian Wheezy and XEN hang on and AMM managment
> > show
> > > same errors described in previous mails. With Debian wheezy wit non-xen
> > > kernel boots correcte, it seems that problems is with xen kernel
> > > Same Server HS20 with Debian Lenny+ XEN 3.2 or Debian Squeeze+XEN
> > > 4.0 working perfect
> > >
> > > Upgraded to Debian testing and unstable with same results XEN 4.1 and
> > 4.2.
> > >
> > > If you need more information, you can ask.
> > > How can be solved this bug?
> >
> > Did you the workaround help?
> >
> > And in regards to finding out exactly what causes it - well there are
> > logs in the BMC that can point to it the PCI device? Did you check those?
> > Do they save if there is any device that has PCI SERR on them?
> >
> > Thanks.
> >

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: IBM HS20 Xen 4.1 and 4.2 Critical Interrupt - Front panel NMI crash
  2013-09-23 14:02       ` Konrad Rzeszutek Wilk
@ 2013-09-29 10:47         ` Trenta sis
  2013-09-30 14:13           ` Is: 0xCF8 on extended config space instead of MCONF? Was:Re: " Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 21+ messages in thread
From: Trenta sis @ 2013-09-29 10:47 UTC (permalink / raw
  To: Konrad Rzeszutek Wilk; +Cc: arrfab, agya.naila, JBeulich, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 9026 bytes --]

Hello,

In Bladecenter webfrontend appears:

  27 I Blade_09 09/08/13 13:25:17 0x806f0013 <javascript:;> Chassis, (NMI
State) diagnostic interrupt 28 E Blade_09 09/08/13 13:25:12
0x10000002<javascript:;> SMI
Hdlr: 00151743 HI Fatal Error, HI_FERR/NERR Value= 0020 29 I Blade_09 09/08/13
13:09:14 0x806f0013 <javascript:;> Recovery Chassis, (NMI State) diagnostic
interrupt 30 I Blade_09 09/08/13 13:09:03 0x806f0013 <javascript:;> Chassis,
(NMI State) diagnostic interrupt 31 E Blade_09 09/08/13 13:08:58
0x10000002<javascript:;> SMI
Hdlr: 00151743 HI Fatal Error, HI_FERR/NERR Value= 0020 32 I Blade_09 09/08/13
12:46:26 0x806f0013 <javascript:;> Recovery Chassis, (NMI State) diagnostic
interrupt 33 I Blade_09 09/08/13 12:46:15 0x806f0013 <javascript:;> Chassis,
(NMI State) diagnostic interrupt 34 E Blade_09 09/08/13 12:46:11
0x10000002<javascript:;> SMI
Hdlr: 00151743 HI Fatal Error, HI_FERR/NERR Value= 0020 35 I Blade_09 09/08/13
12:34:13 0x806f0013 <javascript:;> Recovery Chassis, (NMI State) diagnostic
interrupt 36 I Blade_09 09/08/13 12:34:03 0x806f0013 <javascript:;> Chassis,
(NMI State) diagnostic interrupt 37 E Blade_09 09/08/13 12:33:58
0x10000002<javascript:;> SMI
Hdlr: 00151743 HI Fatal Error, HI_FERR/NERR Value= 0020 38 I Blade_09 09/08/13
12:27:25 0x806f0013 <javascript:;> Recovery Chassis, (NMI State) diagnostic
interrupt 39 I Blade_09 09/08/13 12:27:14 0x806f0013 <javascript:;> Chassis,
(NMI State) diagnostic interrupt 40 E Blade_09 09/08/13 12:27:10
0x10000002<javascript:;> SMI
Hdlr: 00151743 HI Fatal Error, HI_FERR/NERR Value= 0020 41 I Blade_09 09/08/13
12:20:45 0x806f0013 <javascript:;> Recovery Chassis, (NMI State) diagnostic
interrupt 42 I Blade_09 09/08/13 12:20:34 0x806f0013 <javascript:;> Chassis,
(NMI State) diagnostic interrupt 43 E Blade_09 09/08/13 12:20:30
0x10000002<javascript:;> SMI
Hdlr: 00151743 HI Fatal Error, HI_FERR/NERR Value= 0020 44 I Blade_09 09/08/13
12:18:20 0x806f0013 <javascript:;> Recovery Chassis, (NMI State) diagnostic
interrupt 45 I Blade_09 09/08/13 12:18:10 0x806f0013 <javascript:;> Chassis,
(NMI State) diagnostic interrupt 46 E Blade_09 09/08/13 12:18:05
0x10000002<javascript:;> SMI
Hdlr: 00151743 HI Fatal Error, HI_FERR/NERR Value= 0020 47 I Blade_09 09/08/13
12:15:47 0x806f0013 <javascript:;> Recovery Chassis, (NMI State) diagnostic
interrupt 48 I Blade_09 09/08/13 12:15:37 0x806f0013 <javascript:;> Chassis,
(NMI State) diagnostic interrupt 49 E Blade_09 09/08/13 12:15:32
0x10000002<javascript:;> SMI
Hdlr: 00151743 HI Fatal Error, HI_FERR/NERR Value= 0020
27 I Blade_09 09/08/13 13:25:17 0x806f0013 Chassis, (NMI State) diagnostic
interrupt
28 E Blade_09 09/08/13 13:25:12 0x10000002 SMI Hdlr: 00151743 HI Fatal
Error, HI_FERR/NERR Value= 0020
29 I Blade_09 09/08/13 13:09:14 0x806f0013 Recovery Chassis, (NMI State)
diagnostic interrupt
30 I Blade_09 09/08/13 13:09:03 0x806f0013 Chassis, (NMI State) diagnostic
interrupt
31 E Blade_09 09/08/13 13:08:58 0x10000002 SMI Hdlr: 00151743 HI Fatal
Error, HI_FERR/NERR Value= 0020
32 I Blade_09 09/08/13 12:46:26 0x806f0013 Recovery Chassis, (NMI State)
diagnostic interrupt
33 I Blade_09 09/08/13 12:46:15 0x806f0013 Chassis, (NMI State) diagnostic
interrupt
34 E Blade_09 09/08/13 12:46:11 0x10000002 SMI Hdlr: 00151743 HI Fatal
Error, HI_FERR/NERR Value= 0020
35 I Blade_09 09/08/13 12:34:13 0x806f0013 Recovery Chassis, (NMI State)
diagnostic interrupt
36 I Blade_09 09/08/13 12:34:03 0x806f0013 Chassis, (NMI State) diagnostic
interrupt
37 E Blade_09 09/08/13 12:33:58 0x10000002 SMI Hdlr: 00151743 HI Fatal
Error, HI_FERR/NERR Value= 0020
38 I Blade_09 09/08/13 12:27:25 0x806f0013 Recovery Chassis, (NMI State)
diagnostic interrupt
39 I Blade_09 09/08/13 12:27:14 0x806f0013 Chassis, (NMI State) diagnostic
interrupt
40 E Blade_09 09/08/13 12:27:10 0x10000002 SMI Hdlr: 00151743 HI Fatal
Error, HI_FERR/NERR Value= 0020
41 I Blade_09 09/08/13 12:20:45 0x806f0013 Recovery Chassis, (NMI State)
diagnostic interrupt
42 I Blade_09 09/08/13 12:20:34 0x806f0013 Chassis, (NMI State) diagnostic
interrupt
43 E Blade_09 09/08/13 12:20:30 0x10000002 SMI Hdlr: 00151743 HI Fatal
Error, HI_FERR/NERR Value= 0020
44 I Blade_09 09/08/13 12:18:20 0x806f0013 Recovery Chassis, (NMI State)
diagnostic interrupt
45 I Blade_09 09/08/13 12:18:10 0x806f0013 Chassis, (NMI State) diagnostic
interrupt
46 E Blade_09 09/08/13 12:18:05 0x10000002 SMI Hdlr: 00151743 HI Fatal
Error, HI_FERR/NERR Value= 0020
47 I Blade_09 09/08/13 12:15:47 0x806f0013 Recovery Chassis, (NMI State)
diagnostic interrupt
48 I Blade_09 09/08/13 12:15:37 0x806f0013 Chassis, (NMI State) diagnostic
interrupt
49 E Blade_09 09/08/13 12:15:32 0x10000002 SMI Hdlr: 00151743 HI Fatal
Error, HI_FERR/NERR Value= 0020
Thanks


  27 I Blade_09 09/08/13 13:25:17 0x806f0013 <javascript:;> Chassis, (NMI
State) diagnostic interrupt 28 E Blade_09 09/08/13 13:25:12
0x10000002<javascript:;> SMI
Hdlr: 00151743 HI Fatal Error, HI_FERR/NERR Value= 0020 29 I Blade_09 09/08/13
13:09:14 0x806f0013 <javascript:;> Recovery Chassis, (NMI State) diagnostic
interrupt 30 I Blade_09 09/08/13 13:09:03 0x806f0013 <javascript:;> Chassis,
(NMI State) diagnostic interrupt 31 E Blade_09 09/08/13 13:08:58
0x10000002<javascript:;> SMI
Hdlr: 00151743 HI Fatal Error, HI_FERR/NERR Value= 0020 32 I Blade_09 09/08/13
12:46:26 0x806f0013 <javascript:;> Recovery Chassis, (NMI State) diagnostic
interrupt 33 I Blade_09 09/08/13 12:46:15 0x806f0013 <javascript:;> Chassis,
(NMI State) diagnostic interrupt 34 E Blade_09 09/08/13 12:46:11
0x10000002<javascript:;> SMI
Hdlr: 00151743 HI Fatal Error, HI_FERR/NERR Value= 0020 35 I Blade_09 09/08/13
12:34:13 0x806f0013 <javascript:;> Recovery Chassis, (NMI State) diagnostic
interrupt 36 I Blade_09 09/08/13 12:34:03 0x806f0013 <javascript:;> Chassis,
(NMI State) diagnostic interrupt 37 E Blade_09 09/08/13 12:33:58
0x10000002<javascript:;> SMI
Hdlr: 00151743 HI Fatal Error, HI_FERR/NERR Value= 0020 38 I Blade_09 09/08/13
12:27:25 0x806f0013 <javascript:;> Recovery Chassis, (NMI State) diagnostic
interrupt 39 I Blade_09 09/08/13 12:27:14 0x806f0013 <javascript:;> Chassis,
(NMI State) diagnostic interrupt 40 E Blade_09 09/08/13 12:27:10
0x10000002<javascript:;> SMI
Hdlr: 00151743 HI Fatal Error, HI_FERR/NERR Value= 0020 41 I Blade_09 09/08/13
12:20:45 0x806f0013 <javascript:;> Recovery Chassis, (NMI State) diagnostic
interrupt 42 I Blade_09 09/08/13 12:20:34 0x806f0013 <javascript:;> Chassis,
(NMI State) diagnostic interrupt 43 E Blade_09 09/08/13 12:20:30
0x10000002<javascript:;> SMI
Hdlr: 00151743 HI Fatal Error, HI_FERR/NERR Value= 0020 44 I Blade_09 09/08/13
12:18:20 0x806f0013 <javascript:;> Recovery Chassis, (NMI State) diagnostic
interrupt 45 I Blade_09 09/08/13 12:18:10 0x806f0013 <javascript:;> Chassis,
(NMI State) diagnostic interrupt 46 E Blade_09 09/08/13 12:18:05
0x10000002<javascript:;> SMI
Hdlr: 00151743 HI Fatal Error, HI_FERR/NERR Value= 0020 47 I Blade_09 09/08/13
12:15:47 0x806f0013 <javascript:;> Recovery Chassis, (NMI State) diagnostic
interrupt 48 I Blade_09 09/08/13 12:15:37 0x806f0013 <javascript:;> Chassis,
(NMI State) diagnostic interrupt 49 E Blade_09 09/08/13 12:15:32
0x10000002<javascript:;> SMI
Hdlr: 00151743 HI Fatal Error, HI_FERR/NERR Value= 0020

2013/9/23 Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

> On Thu, Sep 12, 2013 at 02:47:39PM +0200, Trenta sis wrote:
> > Hello,
> >
> > We need this server and we have made a downgrade to Debian Squeeze.
> > I hope in a few day to have another HS20 to make some additional test,
> I'll
> > try to get all information that you asked and send
> > Sorry, one question what is  PCI SERR ? Where?
>
> If you log in the BladeCenter webfrontend you should see logs of
> each blade. Some of them are 'User XYZ logged in'. But in some cases
> the are more serious ones - such an NMI or PCI SERR. If you could
> copy-n-paste
> them it could help in figuring which PCI device is responsible for causing
> the NMI.
>
> >
> > Thanks for all
> >
> > 2013/9/9 Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> >
> > > On Sun, Sep 08, 2013 at 04:41:02PM +0200, Trenta sis wrote:
> > > >  Hello,
> > > >
> > > > I have the same error, server is auto rebooted during every boot with
> > > > kernel XEN, HS20 with Debian Wheezy and XEN hang on and AMM managment
> > > show
> > > > same errors described in previous mails. With Debian wheezy wit
> non-xen
> > > > kernel boots correcte, it seems that problems is with xen kernel
> > > > Same Server HS20 with Debian Lenny+ XEN 3.2 or Debian Squeeze+XEN
> > > > 4.0 working perfect
> > > >
> > > > Upgraded to Debian testing and unstable with same results XEN 4.1 and
> > > 4.2.
> > > >
> > > > If you need more information, you can ask.
> > > > How can be solved this bug?
> > >
> > > Did you the workaround help?
> > >
> > > And in regards to finding out exactly what causes it - well there are
> > > logs in the BMC that can point to it the PCI device? Did you check
> those?
> > > Do they save if there is any device that has PCI SERR on them?
> > >
> > > Thanks.
> > >
>

[-- Attachment #1.2: Type: text/html, Size: 16191 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Is: 0xCF8 on extended config space instead of MCONF? Was:Re: IBM HS20 Xen 4.1 and 4.2 Critical Interrupt - Front panel NMI crash
  2013-09-29 10:47         ` Trenta sis
@ 2013-09-30 14:13           ` Konrad Rzeszutek Wilk
  2013-09-30 15:40             ` Jan Beulich
                               ` (2 more replies)
  0 siblings, 3 replies; 21+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-09-30 14:13 UTC (permalink / raw
  To: Trenta sis; +Cc: arrfab, agya.naila, JBeulich, xen-devel

> Hdlr: 00151743 HI Fatal Error, HI_FERR/NERR Value= 0020
> 27 I Blade_09 09/08/13 13:25:17 0x806f0013 Chassis, (NMI State) diagnostic
> interrupt
> 28 E Blade_09 09/08/13 13:25:12 0x10000002 SMI Hdlr: 00151743 HI Fatal
> Error, HI_FERR/NERR Value= 0020

Doing a simple Google search on HI_FERR tells me that it is:

http://www.intel.com/content/dam/doc/datasheet/e7525-memory-controller-hub-datasheet.pdf

and that
3.6.14 HI_FERR – Hub Interface First Error Register (D0:F1)

has something in it. The value is 0020 (is that decimal or hex?). If it is
decimal it is then 10100, which is bit 2 and 4:

bit 2:

HI Internal Parity Error Detected. This bit is sticky through reset. System 
software clears this bit by writing a ‘1’ to the location.
0 = No Internal Parity error detected.
1 = MCH HI bridge has detected an Internal Parity error. Non-fatal.

and bit 4:
HI Data Parity Error Detected. This bit is sticky through reset. System software 
clears this bit by writing a ‘1’ to the location.
0 = No HI data parity error.
1 = MCH has detected a parity error on the data phase of a HI transaction. 

But that is unlikely as these are 'non-fatal'. So if this is hex, then it would
be bit 5, which is:

Enhanced Configuration Access Error. This bit is sticky through reset. System 
software clears this bit by writing a ‘1’ to the location.
0 = No Enhanced Configuration Access error
1 = A PCI Express* Enhanced Configuration access was mistakenly targeting 
the legacy interface. Fatal

That sounds more like it. So we touched a PCIe Enhanced Configuration (MMCONFIG?)
using the legacy interface (cf8?).

Jan, any thoughts? Is there a particular bug-fix we are missing in Xen 4.1 or Xen 4.2
for this?  Xen 4.0 seems to work.

Trenta,

When you used Xen 4.0 did you use the same kernel as with Xen 4.1 or Xen 4.2?

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Is: 0xCF8 on extended config space instead of MCONF? Was:Re: IBM HS20 Xen 4.1 and 4.2 Critical Interrupt - Front panel NMI crash
  2013-09-30 14:13           ` Is: 0xCF8 on extended config space instead of MCONF? Was:Re: " Konrad Rzeszutek Wilk
@ 2013-09-30 15:40             ` Jan Beulich
  2013-10-04 16:31             ` Trenta sis
  2014-09-05 11:58             ` Trenta sis
  2 siblings, 0 replies; 21+ messages in thread
From: Jan Beulich @ 2013-09-30 15:40 UTC (permalink / raw
  To: Trenta sis, Konrad Rzeszutek Wilk; +Cc: arrfab, agya.naila, xen-devel

>>> On 30.09.13 at 16:13, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:
> But that is unlikely as these are 'non-fatal'. So if this is hex, then it 
> would be bit 5, which is:
> 
> Enhanced Configuration Access Error. This bit is sticky through reset. 
> System 
> software clears this bit by writing a ‘1’ to the location.
> 0 = No Enhanced Configuration Access error
> 1 = A PCI Express* Enhanced Configuration access was mistakenly targeting 
> the legacy interface. Fatal
> 
> 
> That sounds more like it. So we touched a PCIe Enhanced Configuration 
> (MMCONFIG?)
> using the legacy interface (cf8?).
> 
> Jan, any thoughts? Is there a particular bug-fix we are missing in Xen 4.1 
> or Xen 4.2
> for this?  Xen 4.0 seems to work.

Possibly MMCONF just didn't get used on 4.0?

And no, I don't think I recall any possibly relevant change. Even more,
the description above sounds more like an error resulting from device
misbehavior than from software incorrectly doing some access.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Is: 0xCF8 on extended config space instead of MCONF? Was:Re: IBM HS20 Xen 4.1 and 4.2 Critical Interrupt - Front panel NMI crash
  2013-09-30 14:13           ` Is: 0xCF8 on extended config space instead of MCONF? Was:Re: " Konrad Rzeszutek Wilk
  2013-09-30 15:40             ` Jan Beulich
@ 2013-10-04 16:31             ` Trenta sis
  2013-10-04 16:55               ` Konrad Rzeszutek Wilk
  2014-09-05 11:58             ` Trenta sis
  2 siblings, 1 reply; 21+ messages in thread
From: Trenta sis @ 2013-10-04 16:31 UTC (permalink / raw
  To: Konrad Rzeszutek Wilk; +Cc: arrfab, agya.naila, JBeulich, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 2122 bytes --]

Hi,

With Xen 4.0 kernel used was 2.6.32, default kernel Debain 6 (Squeeze)
Thanks

2013/9/30 Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

> > Hdlr: 00151743 HI Fatal Error, HI_FERR/NERR Value= 0020
> > 27 I Blade_09 09/08/13 13:25:17 0x806f0013 Chassis, (NMI State)
> diagnostic
> > interrupt
> > 28 E Blade_09 09/08/13 13:25:12 0x10000002 SMI Hdlr: 00151743 HI Fatal
> > Error, HI_FERR/NERR Value= 0020
>
> Doing a simple Google search on HI_FERR tells me that it is:
>
>
> http://www.intel.com/content/dam/doc/datasheet/e7525-memory-controller-hub-datasheet.pdf
>
> and that
> 3.6.14 HI_FERR – Hub Interface First Error Register (D0:F1)
>
> has something in it. The value is 0020 (is that decimal or hex?). If it is
> decimal it is then 10100, which is bit 2 and 4:
>
> bit 2:
>
> HI Internal Parity Error Detected. This bit is sticky through reset. System
> software clears this bit by writing a ‘1’ to the location.
> 0 = No Internal Parity error detected.
> 1 = MCH HI bridge has detected an Internal Parity error. Non-fatal.
>
> and bit 4:
> HI Data Parity Error Detected. This bit is sticky through reset. System
> software
> clears this bit by writing a ‘1’ to the location.
> 0 = No HI data parity error.
> 1 = MCH has detected a parity error on the data phase of a HI transaction.
>
>
>
> But that is unlikely as these are 'non-fatal'. So if this is hex, then it
> would
> be bit 5, which is:
>
> Enhanced Configuration Access Error. This bit is sticky through reset.
> System
> software clears this bit by writing a ‘1’ to the location.
> 0 = No Enhanced Configuration Access error
> 1 = A PCI Express* Enhanced Configuration access was mistakenly targeting
> the legacy interface. Fatal
>
>
> That sounds more like it. So we touched a PCIe Enhanced Configuration
> (MMCONFIG?)
> using the legacy interface (cf8?).
>
> Jan, any thoughts? Is there a particular bug-fix we are missing in Xen 4.1
> or Xen 4.2
> for this?  Xen 4.0 seems to work.
>
> Trenta,
>
> When you used Xen 4.0 did you use the same kernel as with Xen 4.1 or Xen
> 4.2?
>

[-- Attachment #1.2: Type: text/html, Size: 2570 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Is: 0xCF8 on extended config space instead of MCONF? Was:Re: IBM HS20 Xen 4.1 and 4.2 Critical Interrupt - Front panel NMI crash
  2013-10-04 16:31             ` Trenta sis
@ 2013-10-04 16:55               ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 21+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-10-04 16:55 UTC (permalink / raw
  To: Trenta sis; +Cc: arrfab, agya.naila, JBeulich, xen-devel

On Fri, Oct 04, 2013 at 06:31:37PM +0200, Trenta sis wrote:
> Hi,
> 
> With Xen 4.0 kernel used was 2.6.32, default kernel Debain 6 (Squeeze)
> Thanks

So if you swap either kernel or hypervisor do you see this? Meaning
if you run with Xen 4.2 + 2.6.32 or Xen 4.0 + current kernel.

> 
> 2013/9/30 Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> 
> > > Hdlr: 00151743 HI Fatal Error, HI_FERR/NERR Value= 0020
> > > 27 I Blade_09 09/08/13 13:25:17 0x806f0013 Chassis, (NMI State)
> > diagnostic
> > > interrupt
> > > 28 E Blade_09 09/08/13 13:25:12 0x10000002 SMI Hdlr: 00151743 HI Fatal
> > > Error, HI_FERR/NERR Value= 0020
> >
> > Doing a simple Google search on HI_FERR tells me that it is:
> >
> >
> > http://www.intel.com/content/dam/doc/datasheet/e7525-memory-controller-hub-datasheet.pdf
> >
> > and that
> > 3.6.14 HI_FERR – Hub Interface First Error Register (D0:F1)
> >
> > has something in it. The value is 0020 (is that decimal or hex?). If it is
> > decimal it is then 10100, which is bit 2 and 4:
> >
> > bit 2:
> >
> > HI Internal Parity Error Detected. This bit is sticky through reset. System
> > software clears this bit by writing a ‘1’ to the location.
> > 0 = No Internal Parity error detected.
> > 1 = MCH HI bridge has detected an Internal Parity error. Non-fatal.
> >
> > and bit 4:
> > HI Data Parity Error Detected. This bit is sticky through reset. System
> > software
> > clears this bit by writing a ‘1’ to the location.
> > 0 = No HI data parity error.
> > 1 = MCH has detected a parity error on the data phase of a HI transaction.
> >
> >
> >
> > But that is unlikely as these are 'non-fatal'. So if this is hex, then it
> > would
> > be bit 5, which is:
> >
> > Enhanced Configuration Access Error. This bit is sticky through reset.
> > System
> > software clears this bit by writing a ‘1’ to the location.
> > 0 = No Enhanced Configuration Access error
> > 1 = A PCI Express* Enhanced Configuration access was mistakenly targeting
> > the legacy interface. Fatal
> >
> >
> > That sounds more like it. So we touched a PCIe Enhanced Configuration
> > (MMCONFIG?)
> > using the legacy interface (cf8?).
> >
> > Jan, any thoughts? Is there a particular bug-fix we are missing in Xen 4.1
> > or Xen 4.2
> > for this?  Xen 4.0 seems to work.
> >
> > Trenta,
> >
> > When you used Xen 4.0 did you use the same kernel as with Xen 4.1 or Xen
> > 4.2?
> >

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Is: 0xCF8 on extended config space instead of MCONF? Was:Re: IBM HS20 Xen 4.1 and 4.2 Critical Interrupt - Front panel NMI crash
  2013-09-30 14:13           ` Is: 0xCF8 on extended config space instead of MCONF? Was:Re: " Konrad Rzeszutek Wilk
  2013-09-30 15:40             ` Jan Beulich
  2013-10-04 16:31             ` Trenta sis
@ 2014-09-05 11:58             ` Trenta sis
  2014-09-05 14:30               ` Is: 0xCF8 on extended config space instead of MCONF? Jan Beulich
  2 siblings, 1 reply; 21+ messages in thread
From: Trenta sis @ 2014-09-05 11:58 UTC (permalink / raw
  To: Konrad Rzeszutek Wilk; +Cc: arrfab, agya naila, Jan Beulich, xen-devel

Hello,

I have created a bug and added information that you asked, with Debian
7 and xen 4.0 works. It seems that the problema is xen >=4.1is the
problem...
Also tried with citrix xenserver 6.2 same errors in amm, adn we can't
install, server is rebooted automatically.

Bug:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=760563


I hope that this can help

Thanks

2013-09-30 16:13 GMT+02:00 Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>:
>> Hdlr: 00151743 HI Fatal Error, HI_FERR/NERR Value= 0020
>> 27 I Blade_09 09/08/13 13:25:17 0x806f0013 Chassis, (NMI State) diagnostic
>> interrupt
>> 28 E Blade_09 09/08/13 13:25:12 0x10000002 SMI Hdlr: 00151743 HI Fatal
>> Error, HI_FERR/NERR Value= 0020
>
> Doing a simple Google search on HI_FERR tells me that it is:
>
> http://www.intel.com/content/dam/doc/datasheet/e7525-memory-controller-hub-datasheet.pdf
>
> and that
> 3.6.14 HI_FERR – Hub Interface First Error Register (D0:F1)
>
> has something in it. The value is 0020 (is that decimal or hex?). If it is
> decimal it is then 10100, which is bit 2 and 4:
>
> bit 2:
>
> HI Internal Parity Error Detected. This bit is sticky through reset. System
> software clears this bit by writing a ‘1’ to the location.
> 0 = No Internal Parity error detected.
> 1 = MCH HI bridge has detected an Internal Parity error. Non-fatal.
>
> and bit 4:
> HI Data Parity Error Detected. This bit is sticky through reset. System software
> clears this bit by writing a ‘1’ to the location.
> 0 = No HI data parity error.
> 1 = MCH has detected a parity error on the data phase of a HI transaction.
>
>
>
> But that is unlikely as these are 'non-fatal'. So if this is hex, then it would
> be bit 5, which is:
>
> Enhanced Configuration Access Error. This bit is sticky through reset. System
> software clears this bit by writing a ‘1’ to the location.
> 0 = No Enhanced Configuration Access error
> 1 = A PCI Express* Enhanced Configuration access was mistakenly targeting
> the legacy interface. Fatal
>
>
> That sounds more like it. So we touched a PCIe Enhanced Configuration (MMCONFIG?)
> using the legacy interface (cf8?).
>
> Jan, any thoughts? Is there a particular bug-fix we are missing in Xen 4.1 or Xen 4.2
> for this?  Xen 4.0 seems to work.
>
> Trenta,
>
> When you used Xen 4.0 did you use the same kernel as with Xen 4.1 or Xen 4.2?

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Is: 0xCF8 on extended config space instead of MCONF?
  2014-09-05 11:58             ` Trenta sis
@ 2014-09-05 14:30               ` Jan Beulich
  2014-09-08 12:53                 ` Trenta sis
  0 siblings, 1 reply; 21+ messages in thread
From: Jan Beulich @ 2014-09-05 14:30 UTC (permalink / raw
  To: Trenta sis; +Cc: arrfab, agya naila, xen-devel

>>> On 05.09.14 at 13:58, <trenta.sis@gmail.com> wrote:
> Hello,
> 
> I have created a bug and added information that you asked, with Debian
> 7 and xen 4.0 works. It seems that the problema is xen >=4.1is the
> problem...
> Also tried with citrix xenserver 6.2 same errors in amm, adn we can't
> install, server is rebooted automatically.
> 
> Bug:
> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=760563 

So did you try the option Ian suggested there? It is well possible
that newer Xen makes more use of the extended config space.
And if that results in the described error, this _still_ means a
hardware issue, not a software one (for which the use of said
command line option would only be a workaround).

Jan

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Is: 0xCF8 on extended config space instead of MCONF?
  2014-09-05 14:30               ` Is: 0xCF8 on extended config space instead of MCONF? Jan Beulich
@ 2014-09-08 12:53                 ` Trenta sis
  2014-09-08 13:28                   ` Jan Beulich
  0 siblings, 1 reply; 21+ messages in thread
From: Trenta sis @ 2014-09-08 12:53 UTC (permalink / raw
  To: Jan Beulich; +Cc: arrfab, agya naila, xen-devel

Hi,

Thanks for you answer, I have made some test detailed in
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=760563

It seems that is not a hardware issue, it seems a problems with xen
mmcfg with this hardware. Will be solved? Can I use for a production
environment this workaround?

Thanks





2014-09-05 16:30 GMT+02:00 Jan Beulich <JBeulich@suse.com>:
>>>> On 05.09.14 at 13:58, <trenta.sis@gmail.com> wrote:
>> Hello,
>>
>> I have created a bug and added information that you asked, with Debian
>> 7 and xen 4.0 works. It seems that the problema is xen >=4.1is the
>> problem...
>> Also tried with citrix xenserver 6.2 same errors in amm, adn we can't
>> install, server is rebooted automatically.
>>
>> Bug:
>> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=760563
>
> So did you try the option Ian suggested there? It is well possible
> that newer Xen makes more use of the extended config space.
> And if that results in the described error, this _still_ means a
> hardware issue, not a software one (for which the use of said
> command line option would only be a workaround).
>
> Jan
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Is: 0xCF8 on extended config space instead of MCONF?
  2014-09-08 12:53                 ` Trenta sis
@ 2014-09-08 13:28                   ` Jan Beulich
  2014-09-09  7:11                     ` Trenta sis
  0 siblings, 1 reply; 21+ messages in thread
From: Jan Beulich @ 2014-09-08 13:28 UTC (permalink / raw
  To: Trenta sis; +Cc: arrfab, agya naila, xen-devel

>>> On 08.09.14 at 14:53, <trenta.sis@gmail.com> wrote:
> It seems that is not a hardware issue, it seems a problems with xen
> mmcfg with this hardware.

I.e. it _is_ a hardware issue.

> Will be solved?

That's a question to the hardware vendor.

> Can I use for a production
> environment this workaround?

I guess so, if you're not requiring any of the functionality needed
extended config space access, like SR-IOV.

Jan

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Is: 0xCF8 on extended config space instead of MCONF?
  2014-09-08 13:28                   ` Jan Beulich
@ 2014-09-09  7:11                     ` Trenta sis
  2014-09-09  8:53                       ` Ian Campbell
  2014-09-09  9:15                       ` Jan Beulich
  0 siblings, 2 replies; 21+ messages in thread
From: Trenta sis @ 2014-09-09  7:11 UTC (permalink / raw
  To: Jan Beulich; +Cc: arrfab, agya naila, xen-devel

Hi,

With xen 4.0 or citrix xenserver 5.6 works perfect, why you say that
is a hardware issue if we don't use xen >4.0 works perfect? Also I
have tried with two differents server with same result, I'm not sure
that is a hardware issue after to do this test on both servers...
Can you give me details about what is mmcfg, I have searched but I
can't find any detailed information.


Thanks



2014-09-08 15:28 GMT+02:00 Jan Beulich <JBeulich@suse.com>:
>>>> On 08.09.14 at 14:53, <trenta.sis@gmail.com> wrote:
>> It seems that is not a hardware issue, it seems a problems with xen
>> mmcfg with this hardware.
>
> I.e. it _is_ a hardware issue.
>
>> Will be solved?
>
> That's a question to the hardware vendor.
>
>> Can I use for a production
>> environment this workaround?
>
> I guess so, if you're not requiring any of the functionality needed
> extended config space access, like SR-IOV.
>
> Jan
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Is: 0xCF8 on extended config space instead of MCONF?
  2014-09-09  7:11                     ` Trenta sis
@ 2014-09-09  8:53                       ` Ian Campbell
  2014-09-09 14:00                         ` Trenta sis
  2014-09-09  9:15                       ` Jan Beulich
  1 sibling, 1 reply; 21+ messages in thread
From: Ian Campbell @ 2014-09-09  8:53 UTC (permalink / raw
  To: Trenta sis; +Cc: arrfab, agya naila, Jan Beulich, xen-devel

On Tue, 2014-09-09 at 09:11 +0200, Trenta sis wrote:
> Hi,
> 
> With xen 4.0 or citrix xenserver 5.6 works perfect, why you say that
> is a hardware issue if we don't use xen >4.0 works perfect?

I'm afraid this doesn't rove anything, it could easily be that Xen <=
4.0 simply doesn't tickle the bad hardware behaviour.

>  Also I
> have tried with two differents server with same result,

Two different servers or two of the same kind? It's not unexpected that
two of the same type of server would have the same firmware/hardware
issues.

I think step one should be to make sure that you have the very latest
firmware (BIOS etc) for the hardware.

>  I'm not sure
> that is a hardware issue after to do this test on both servers...
> Can you give me details about what is mmcfg, I have searched but I
> can't find any detailed information.

AIUI it's an extended mechanism for accessing PCI configuration space.
Xen prior to 4.0 most likely didn't use this method which is why it
works ok.

Ian.
> 
> 
> Thanks
> 
> 
> 
> 2014-09-08 15:28 GMT+02:00 Jan Beulich <JBeulich@suse.com>:
> >>>> On 08.09.14 at 14:53, <trenta.sis@gmail.com> wrote:
> >> It seems that is not a hardware issue, it seems a problems with xen
> >> mmcfg with this hardware.
> >
> > I.e. it _is_ a hardware issue.
> >
> >> Will be solved?
> >
> > That's a question to the hardware vendor.
> >
> >> Can I use for a production
> >> environment this workaround?
> >
> > I guess so, if you're not requiring any of the functionality needed
> > extended config space access, like SR-IOV.
> >
> > Jan
> >
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Is: 0xCF8 on extended config space instead of MCONF?
  2014-09-09  7:11                     ` Trenta sis
  2014-09-09  8:53                       ` Ian Campbell
@ 2014-09-09  9:15                       ` Jan Beulich
  1 sibling, 0 replies; 21+ messages in thread
From: Jan Beulich @ 2014-09-09  9:15 UTC (permalink / raw
  To: Trenta sis; +Cc: arrfab, agya naila, xen-devel

>>> On 09.09.14 at 09:11, <trenta.sis@gmail.com> wrote:
> With xen 4.0 or citrix xenserver 5.6 works perfect, why you say that
> is a hardware issue if we don't use xen >4.0 works perfect?

This sort of argumentation would preclude newer versions from adding
new functionality.

> Also I
> have tried with two differents server with same result, I'm not sure
> that is a hardware issue after to do this test on both servers...
> Can you give me details about what is mmcfg, I have searched but I
> can't find any detailed information.

This is an alternative access mechanism to PCI config space, required
(except on AMD systems) to access extended config space, and also
(universally) required to access the config space of devices on PCI
segments other than zero.

Jan

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Is: 0xCF8 on extended config space instead of MCONF?
  2014-09-09  8:53                       ` Ian Campbell
@ 2014-09-09 14:00                         ` Trenta sis
  2014-09-09 14:39                           ` Konrad Rzeszutek Wilk
  2014-09-09 14:39                           ` Ian Campbell
  0 siblings, 2 replies; 21+ messages in thread
From: Trenta sis @ 2014-09-09 14:00 UTC (permalink / raw
  To: Ian Campbell; +Cc: arrfab, agya naila, Jan Beulich, xen-devel

Hi,

We have tried with two differents servers but same type (HS20) and
both with latest firmware, bios,...  and with same results, no
working.
as you decribed, if mmcfg is a new option added to xen >4.0 this could
be the correct answer for this issue with this hardware. We were
working with xen 3.2 and 4.0 with this hardware without any error or
problem.
Remember that HS20 are IBM blade server, I don't know if this could be
a problem...

What do you think if we work enable mmcfg=0, this could be a correct
solutions and we can work without any problems o incompatibilities?

Thanks



2014-09-09 10:53 GMT+02:00 Ian Campbell <Ian.Campbell@citrix.com>:
> On Tue, 2014-09-09 at 09:11 +0200, Trenta sis wrote:
>> Hi,
>>
>> With xen 4.0 or citrix xenserver 5.6 works perfect, why you say that
>> is a hardware issue if we don't use xen >4.0 works perfect?
>
> I'm afraid this doesn't rove anything, it could easily be that Xen <=
> 4.0 simply doesn't tickle the bad hardware behaviour.
>
>>  Also I
>> have tried with two differents server with same result,
>
> Two different servers or two of the same kind? It's not unexpected that
> two of the same type of server would have the same firmware/hardware
> issues.
>
> I think step one should be to make sure that you have the very latest
> firmware (BIOS etc) for the hardware.
>
>>  I'm not sure
>> that is a hardware issue after to do this test on both servers...
>> Can you give me details about what is mmcfg, I have searched but I
>> can't find any detailed information.
>
> AIUI it's an extended mechanism for accessing PCI configuration space.
> Xen prior to 4.0 most likely didn't use this method which is why it
> works ok.
>
> Ian.
>>
>>
>> Thanks
>>
>>
>>
>> 2014-09-08 15:28 GMT+02:00 Jan Beulich <JBeulich@suse.com>:
>> >>>> On 08.09.14 at 14:53, <trenta.sis@gmail.com> wrote:
>> >> It seems that is not a hardware issue, it seems a problems with xen
>> >> mmcfg with this hardware.
>> >
>> > I.e. it _is_ a hardware issue.
>> >
>> >> Will be solved?
>> >
>> > That's a question to the hardware vendor.
>> >
>> >> Can I use for a production
>> >> environment this workaround?
>> >
>> > I guess so, if you're not requiring any of the functionality needed
>> > extended config space access, like SR-IOV.
>> >
>> > Jan
>> >
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xen.org
>> http://lists.xen.org/xen-devel
>
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Is: 0xCF8 on extended config space instead of MCONF?
  2014-09-09 14:00                         ` Trenta sis
@ 2014-09-09 14:39                           ` Konrad Rzeszutek Wilk
  2014-09-09 14:39                           ` Ian Campbell
  1 sibling, 0 replies; 21+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-09-09 14:39 UTC (permalink / raw
  To: Trenta sis; +Cc: arrfab, Jan Beulich, Ian Campbell, agya naila, xen-devel

On Tue, Sep 09, 2014 at 04:00:51PM +0200, Trenta sis wrote:
> Hi,
> 
> We have tried with two differents servers but same type (HS20) and
> both with latest firmware, bios,...  and with same results, no
> working.
> as you decribed, if mmcfg is a new option added to xen >4.0 this could
> be the correct answer for this issue with this hardware. We were
> working with xen 3.2 and 4.0 with this hardware without any error or
> problem.
> Remember that HS20 are IBM blade server, I don't know if this could be
> a problem...
> 
> What do you think if we work enable mmcfg=0, this could be a correct
> solutions and we can work without any problems o incompatibilities?

Yes it should work - as you say - it is a blade hardware and it does
not have any SR-IOV functionality, so the MMCONF usage is not needed.

The SR-IOV functionality for blades came much much later.
> 
> Thanks
> 
> 
> 
> 2014-09-09 10:53 GMT+02:00 Ian Campbell <Ian.Campbell@citrix.com>:
> > On Tue, 2014-09-09 at 09:11 +0200, Trenta sis wrote:
> >> Hi,
> >>
> >> With xen 4.0 or citrix xenserver 5.6 works perfect, why you say that
> >> is a hardware issue if we don't use xen >4.0 works perfect?
> >
> > I'm afraid this doesn't rove anything, it could easily be that Xen <=
> > 4.0 simply doesn't tickle the bad hardware behaviour.
> >
> >>  Also I
> >> have tried with two differents server with same result,
> >
> > Two different servers or two of the same kind? It's not unexpected that
> > two of the same type of server would have the same firmware/hardware
> > issues.
> >
> > I think step one should be to make sure that you have the very latest
> > firmware (BIOS etc) for the hardware.
> >
> >>  I'm not sure
> >> that is a hardware issue after to do this test on both servers...
> >> Can you give me details about what is mmcfg, I have searched but I
> >> can't find any detailed information.
> >
> > AIUI it's an extended mechanism for accessing PCI configuration space.
> > Xen prior to 4.0 most likely didn't use this method which is why it
> > works ok.
> >
> > Ian.
> >>
> >>
> >> Thanks
> >>
> >>
> >>
> >> 2014-09-08 15:28 GMT+02:00 Jan Beulich <JBeulich@suse.com>:
> >> >>>> On 08.09.14 at 14:53, <trenta.sis@gmail.com> wrote:
> >> >> It seems that is not a hardware issue, it seems a problems with xen
> >> >> mmcfg with this hardware.
> >> >
> >> > I.e. it _is_ a hardware issue.
> >> >
> >> >> Will be solved?
> >> >
> >> > That's a question to the hardware vendor.
> >> >
> >> >> Can I use for a production
> >> >> environment this workaround?
> >> >
> >> > I guess so, if you're not requiring any of the functionality needed
> >> > extended config space access, like SR-IOV.
> >> >
> >> > Jan
> >> >
> >>
> >> _______________________________________________
> >> Xen-devel mailing list
> >> Xen-devel@lists.xen.org
> >> http://lists.xen.org/xen-devel
> >
> >
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Is: 0xCF8 on extended config space instead of MCONF?
  2014-09-09 14:00                         ` Trenta sis
  2014-09-09 14:39                           ` Konrad Rzeszutek Wilk
@ 2014-09-09 14:39                           ` Ian Campbell
  2014-09-09 14:46                             ` Trenta sis
  1 sibling, 1 reply; 21+ messages in thread
From: Ian Campbell @ 2014-09-09 14:39 UTC (permalink / raw
  To: Trenta sis; +Cc: arrfab, agya naila, Jan Beulich, xen-devel

On Tue, 2014-09-09 at 16:00 +0200, Trenta sis wrote:
> What do you think if we work enable mmcfg=0, this could be a correct
> solutions and we can work without any problems o incompatibilities?

Jan already answered this further up the thread:
        I guess so, if you're not requiring any of the functionality
        needed
        extended config space access, like SR-IOV.

IOW you should test with this option in your usecases. If it works then
go for it.

You should also report it to your hardware vendor.

Ian.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Is: 0xCF8 on extended config space instead of MCONF?
  2014-09-09 14:39                           ` Ian Campbell
@ 2014-09-09 14:46                             ` Trenta sis
  0 siblings, 0 replies; 21+ messages in thread
From: Trenta sis @ 2014-09-09 14:46 UTC (permalink / raw
  To: Ian Campbell; +Cc: arrfab, agya naila, Jan Beulich, xen-devel

OK, Thanks for all
We will use with mmcfg=0

Solved and thanks for all !!!



2014-09-09 16:39 GMT+02:00 Ian Campbell <Ian.Campbell@citrix.com>:
> On Tue, 2014-09-09 at 16:00 +0200, Trenta sis wrote:
>> What do you think if we work enable mmcfg=0, this could be a correct
>> solutions and we can work without any problems o incompatibilities?
>
> Jan already answered this further up the thread:
>         I guess so, if you're not requiring any of the functionality
>         needed
>         extended config space access, like SR-IOV.
>
> IOW you should test with this option in your usecases. If it works then
> go for it.
>
> You should also report it to your hardware vendor.
>
> Ian.
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2014-09-09 14:46 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-09-08 14:35 IBM HS20 Xen 4.1 and 4.2 Critical Interrupt - Front panel NMI crash Trenta sis
2013-09-08 14:41 ` Trenta sis
2013-09-09 19:15   ` Konrad Rzeszutek Wilk
2013-09-12 12:47     ` Trenta sis
2013-09-23 14:02       ` Konrad Rzeszutek Wilk
2013-09-29 10:47         ` Trenta sis
2013-09-30 14:13           ` Is: 0xCF8 on extended config space instead of MCONF? Was:Re: " Konrad Rzeszutek Wilk
2013-09-30 15:40             ` Jan Beulich
2013-10-04 16:31             ` Trenta sis
2013-10-04 16:55               ` Konrad Rzeszutek Wilk
2014-09-05 11:58             ` Trenta sis
2014-09-05 14:30               ` Is: 0xCF8 on extended config space instead of MCONF? Jan Beulich
2014-09-08 12:53                 ` Trenta sis
2014-09-08 13:28                   ` Jan Beulich
2014-09-09  7:11                     ` Trenta sis
2014-09-09  8:53                       ` Ian Campbell
2014-09-09 14:00                         ` Trenta sis
2014-09-09 14:39                           ` Konrad Rzeszutek Wilk
2014-09-09 14:39                           ` Ian Campbell
2014-09-09 14:46                             ` Trenta sis
2014-09-09  9:15                       ` Jan Beulich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.