All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 3/3] drivers/vfio/pci: Fix MSIx message lost
  2014-03-03  3:24 [PATCH 1/3] drivers/vfio: Rework offsetofend() Gavin Shan
@ 2014-03-03  3:24 ` Gavin Shan
  2014-03-03  3:51   ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 16+ messages in thread
From: Gavin Shan @ 2014-03-03  3:24 UTC (permalink / raw
  To: kvm; +Cc: alex.williamson, benh, aik, Gavin Shan

The problem is specific to the case of BIST issued applied to IPR
adapter on the guest side. After BIST reset, we lose everything
in MSIx table and we never have chance update MSIx messages for
those enabled interrupts to MSIx table.

The patch fixes it by writing MSIx message to MSIx table before
reenabling them.

Reported-by: Wen Xiong <wenxiong@us.ibm.com>
Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
---
 drivers/vfio/pci/vfio_pci_intrs.c |   19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
index 2103576..279ebd0 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -17,6 +17,7 @@
 #include <linux/interrupt.h>
 #include <linux/eventfd.h>
 #include <linux/pci.h>
+#include <linux/msi.h>
 #include <linux/file.h>
 #include <linux/poll.h>
 #include <linux/vfio.h>
@@ -517,6 +518,7 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_device *vdev,
 	struct pci_dev *pdev = vdev->pdev;
 	int irq = msix ? vdev->msix[vector].vector : pdev->irq + vector;
 	char *name = msix ? "vfio-msix" : "vfio-msi";
+	struct msi_msg msg;
 	struct eventfd_ctx *trigger;
 	int ret;
 
@@ -544,6 +546,23 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_device *vdev,
 		return PTR_ERR(trigger);
 	}
 
+	/* We possiblly lose the MSI/MSIx message in some cases.
+	 * For example, BIST reset on IPR adapter. The MSIx table
+	 * is cleaned out. However, we never get chance to put
+	 * MSIx messages to MSIx table because all MSIx stuff is
+	 * being cached in QEMU. Here, we had the trick to put the
+	 * MSI/MSIx message back.
+	 *
+	 * Basically, we needn't worry about MSI messages. However,
+	 * it's not harmful and there might be cases of PCI config data
+	 * lost because of cached PCI config data in QEMU again.
+	 *
+	 * Note that we should flash the message prior to enabling
+	 * the corresponding interrupt by request_irq().
+	 */
+	 get_cached_msi_msg(irq, &msg);
+	 write_msi_msg(irq, &msg);
+
 	ret = request_irq(irq, vfio_msihandler, 0,
 			  vdev->ctx[vector].name, trigger);
 	if (ret) {
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/3] drivers/vfio/pci: Fix MSIx message lost
  2014-03-03  3:24 ` [PATCH 3/3] drivers/vfio/pci: Fix MSIx message lost Gavin Shan
@ 2014-03-03  3:51   ` Benjamin Herrenschmidt
  2014-03-03  4:43     ` Alexey Kardashevskiy
  2014-03-03  4:49     ` Alex Williamson
  0 siblings, 2 replies; 16+ messages in thread
From: Benjamin Herrenschmidt @ 2014-03-03  3:51 UTC (permalink / raw
  To: Gavin Shan; +Cc: kvm, alex.williamson, aik

On Mon, 2014-03-03 at 11:24 +0800, Gavin Shan wrote:
> The problem is specific to the case of BIST issued applied to IPR
> adapter on the guest side. After BIST reset, we lose everything
> in MSIx table and we never have chance update MSIx messages for
> those enabled interrupts to MSIx table.
> 
> The patch fixes it by writing MSIx message to MSIx table before
> reenabling them.

That needs a better explanation... the guest does try to restore the
MSI-X (in a way similar to resuming from suspend) by writing the
message value back into the table.

My understanding is that we trap that and turn that into a call to
vfio_msi_set_vector_signal, is that correct ?

In that case, it does indeed make sense to have that function rewrite
the cached message.

(Just trying to understand how this patch fixes the problem we saw)

I suppose other drivers would have similar issues if they have some
kind of internal "reset" path and try to save and restore the MSI-X
table.

Cheers,
Ben.

> Reported-by: Wen Xiong <wenxiong@us.ibm.com>
> Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
> ---
>  drivers/vfio/pci/vfio_pci_intrs.c |   19 +++++++++++++++++++
>  1 file changed, 19 insertions(+)
> 
> diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
> index 2103576..279ebd0 100644
> --- a/drivers/vfio/pci/vfio_pci_intrs.c
> +++ b/drivers/vfio/pci/vfio_pci_intrs.c
> @@ -17,6 +17,7 @@
>  #include <linux/interrupt.h>
>  #include <linux/eventfd.h>
>  #include <linux/pci.h>
> +#include <linux/msi.h>
>  #include <linux/file.h>
>  #include <linux/poll.h>
>  #include <linux/vfio.h>
> @@ -517,6 +518,7 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_device *vdev,
>  	struct pci_dev *pdev = vdev->pdev;
>  	int irq = msix ? vdev->msix[vector].vector : pdev->irq + vector;
>  	char *name = msix ? "vfio-msix" : "vfio-msi";
> +	struct msi_msg msg;
>  	struct eventfd_ctx *trigger;
>  	int ret;
>  
> @@ -544,6 +546,23 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_device *vdev,
>  		return PTR_ERR(trigger);
>  	}
>  
> +	/* We possiblly lose the MSI/MSIx message in some cases.
> +	 * For example, BIST reset on IPR adapter. The MSIx table
> +	 * is cleaned out. However, we never get chance to put
> +	 * MSIx messages to MSIx table because all MSIx stuff is
> +	 * being cached in QEMU. Here, we had the trick to put the
> +	 * MSI/MSIx message back.
> +	 *
> +	 * Basically, we needn't worry about MSI messages. However,
> +	 * it's not harmful and there might be cases of PCI config data
> +	 * lost because of cached PCI config data in QEMU again.
> +	 *
> +	 * Note that we should flash the message prior to enabling
> +	 * the corresponding interrupt by request_irq().
> +	 */
> +	 get_cached_msi_msg(irq, &msg);
> +	 write_msi_msg(irq, &msg);
> +
>  	ret = request_irq(irq, vfio_msihandler, 0,
>  			  vdev->ctx[vector].name, trigger);
>  	if (ret) {



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/3] drivers/vfio/pci: Fix MSIx message lost
  2014-03-03  3:51   ` Benjamin Herrenschmidt
@ 2014-03-03  4:43     ` Alexey Kardashevskiy
  2014-03-03  5:00       ` Benjamin Herrenschmidt
  2014-03-03  4:49     ` Alex Williamson
  1 sibling, 1 reply; 16+ messages in thread
From: Alexey Kardashevskiy @ 2014-03-03  4:43 UTC (permalink / raw
  To: Benjamin Herrenschmidt, Gavin Shan; +Cc: kvm, alex.williamson

On 03/03/2014 02:51 PM, Benjamin Herrenschmidt wrote:
> On Mon, 2014-03-03 at 11:24 +0800, Gavin Shan wrote:
>> The problem is specific to the case of BIST issued applied to IPR
>> adapter on the guest side. After BIST reset, we lose everything
>> in MSIx table and we never have chance update MSIx messages for
>> those enabled interrupts to MSIx table.
>>
>> The patch fixes it by writing MSIx message to MSIx table before
>> reenabling them.
> 
> That needs a better explanation... the guest does try to restore the
> MSI-X (in a way similar to resuming from suspend) by writing the
> message value back into the table.
> 
> My understanding is that we trap that and turn that into a call to
> vfio_msi_set_vector_signal, is that correct ?


Yep.

Gavin sent me a backtrace of what happens when the guest tries writing to a
BAR:

qemu/hw/pci/msix.c::msix_table_mmio_write
	msix_handle_mask_update
	msix_fire_vector_notifier
qemu/hw/misc/vfio.c::vfio_msix_vector_use
	vfio_msix_vector_do_use

IOCTL-to-VFIO: VFIO_DEVICE_SET_IRQS
host/drivers/vfio/pci/vfio_pci.c::vfio_pci_ioctl
host/drivers/vfio/pci/vfio_pci_intrs.c::vfio_pci_set_irqs_ioctl
	vfio_pci_set_msi_trigger
	vfio_msi_set_block
	vfio_msi_set_vector_signal



While it works for our particular problem and seems correct, it has one
flaw - hw/pci/msix.c will not generate this backtrace if masking bit does
not change which can happen in general:
===
static void msix_handle_mask_update(PCIDevice *dev, int vector, bool
was_masked)
{
    bool is_masked = msix_is_masked(dev, vector);

    if (is_masked == was_masked) {
        return;
    }
===

Or if masking bit is the same, nothing bad is expected?...


> In that case, it does indeed make sense to have that function rewrite
> the cached message.
> 
> (Just trying to understand how this patch fixes the problem we saw)
> 
> I suppose other drivers would have similar issues if they have some
> kind of internal "reset" path and try to save and restore the MSI-X
> table.





> Cheers,
> Ben.
> 
>> Reported-by: Wen Xiong <wenxiong@us.ibm.com>
>> Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
>> ---
>>  drivers/vfio/pci/vfio_pci_intrs.c |   19 +++++++++++++++++++
>>  1 file changed, 19 insertions(+)
>>
>> diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
>> index 2103576..279ebd0 100644
>> --- a/drivers/vfio/pci/vfio_pci_intrs.c
>> +++ b/drivers/vfio/pci/vfio_pci_intrs.c
>> @@ -17,6 +17,7 @@
>>  #include <linux/interrupt.h>
>>  #include <linux/eventfd.h>
>>  #include <linux/pci.h>
>> +#include <linux/msi.h>
>>  #include <linux/file.h>
>>  #include <linux/poll.h>
>>  #include <linux/vfio.h>
>> @@ -517,6 +518,7 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_device *vdev,
>>  	struct pci_dev *pdev = vdev->pdev;
>>  	int irq = msix ? vdev->msix[vector].vector : pdev->irq + vector;
>>  	char *name = msix ? "vfio-msix" : "vfio-msi";
>> +	struct msi_msg msg;
>>  	struct eventfd_ctx *trigger;
>>  	int ret;
>>  
>> @@ -544,6 +546,23 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_device *vdev,
>>  		return PTR_ERR(trigger);
>>  	}
>>  
>> +	/* We possiblly lose the MSI/MSIx message in some cases.
>> +	 * For example, BIST reset on IPR adapter. The MSIx table
>> +	 * is cleaned out. However, we never get chance to put
>> +	 * MSIx messages to MSIx table because all MSIx stuff is
>> +	 * being cached in QEMU. Here, we had the trick to put the
>> +	 * MSI/MSIx message back.
>> +	 *
>> +	 * Basically, we needn't worry about MSI messages. However,
>> +	 * it's not harmful and there might be cases of PCI config data
>> +	 * lost because of cached PCI config data in QEMU again.
>> +	 *
>> +	 * Note that we should flash the message prior to enabling
>> +	 * the corresponding interrupt by request_irq().
>> +	 */
>> +	 get_cached_msi_msg(irq, &msg);
>> +	 write_msi_msg(irq, &msg);
>> +
>>  	ret = request_irq(irq, vfio_msihandler, 0,
>>  			  vdev->ctx[vector].name, trigger);
>>  	if (ret) {
> 
> 


-- 
Alexey

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/3] drivers/vfio/pci: Fix MSIx message lost
  2014-03-03  3:51   ` Benjamin Herrenschmidt
  2014-03-03  4:43     ` Alexey Kardashevskiy
@ 2014-03-03  4:49     ` Alex Williamson
  2014-03-03  5:05       ` Benjamin Herrenschmidt
       [not found]       ` <20140303061036.GA4447@shangw.(null)>
  1 sibling, 2 replies; 16+ messages in thread
From: Alex Williamson @ 2014-03-03  4:49 UTC (permalink / raw
  To: Benjamin Herrenschmidt; +Cc: Gavin Shan, kvm, aik

On Mon, 2014-03-03 at 14:51 +1100, Benjamin Herrenschmidt wrote:
> On Mon, 2014-03-03 at 11:24 +0800, Gavin Shan wrote:
> > The problem is specific to the case of BIST issued applied to IPR
> > adapter on the guest side. After BIST reset, we lose everything
> > in MSIx table and we never have chance update MSIx messages for
> > those enabled interrupts to MSIx table.
> > 
> > The patch fixes it by writing MSIx message to MSIx table before
> > reenabling them.
> 
> That needs a better explanation... the guest does try to restore the
> MSI-X (in a way similar to resuming from suspend) by writing the
> message value back into the table.
> 
> My understanding is that we trap that and turn that into a call to
> vfio_msi_set_vector_signal, is that correct ?
> 
> In that case, it does indeed make sense to have that function rewrite
> the cached message.
> 
> (Just trying to understand how this patch fixes the problem we saw)
> 
> I suppose other drivers would have similar issues if they have some
> kind of internal "reset" path and try to save and restore the MSI-X
> table.
> 
> Cheers,
> Ben.
> 
> > Reported-by: Wen Xiong <wenxiong@us.ibm.com>
> > Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
> > ---
> >  drivers/vfio/pci/vfio_pci_intrs.c |   19 +++++++++++++++++++
> >  1 file changed, 19 insertions(+)
> > 
> > diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
> > index 2103576..279ebd0 100644
> > --- a/drivers/vfio/pci/vfio_pci_intrs.c
> > +++ b/drivers/vfio/pci/vfio_pci_intrs.c
> > @@ -17,6 +17,7 @@
> >  #include <linux/interrupt.h>
> >  #include <linux/eventfd.h>
> >  #include <linux/pci.h>
> > +#include <linux/msi.h>
> >  #include <linux/file.h>
> >  #include <linux/poll.h>
> >  #include <linux/vfio.h>
> > @@ -517,6 +518,7 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_device *vdev,
> >  	struct pci_dev *pdev = vdev->pdev;
> >  	int irq = msix ? vdev->msix[vector].vector : pdev->irq + vector;
> >  	char *name = msix ? "vfio-msix" : "vfio-msi";
> > +	struct msi_msg msg;
> >  	struct eventfd_ctx *trigger;
> >  	int ret;
> >  
> > @@ -544,6 +546,23 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_device *vdev,
> >  		return PTR_ERR(trigger);
> >  	}
> >  
> > +	/* We possiblly lose the MSI/MSIx message in some cases.
> > +	 * For example, BIST reset on IPR adapter. The MSIx table
> > +	 * is cleaned out. However, we never get chance to put
> > +	 * MSIx messages to MSIx table because all MSIx stuff is
> > +	 * being cached in QEMU. Here, we had the trick to put the
> > +	 * MSI/MSIx message back.
> > +	 *
> > +	 * Basically, we needn't worry about MSI messages. However,
> > +	 * it's not harmful and there might be cases of PCI config data
> > +	 * lost because of cached PCI config data in QEMU again.
> > +	 *
> > +	 * Note that we should flash the message prior to enabling
> > +	 * the corresponding interrupt by request_irq().
> > +	 */
> > +	 get_cached_msi_msg(irq, &msg);
> > +	 write_msi_msg(irq, &msg);
> > +
> >  	ret = request_irq(irq, vfio_msihandler, 0,
> >  			  vdev->ctx[vector].name, trigger);
> >  	if (ret) {


I understand from talking to Alexey that the BARs are reset by this
BIST, but what about the MSIX capability?  If that gets reset to be
disabled, where does it get re-enabled?  QEMU won't know about the BIST,
so probably assumes nothing changed when the guest writes the enable
bit.  VFIO doesn't allow userspace writes to the MSIX capability.  So it
seems like the above would restore the table entry, but not necessarily
whether MSIX is enabled on the device.

When I talked with Alexey I noted that we already have a BAR restore
function, vfio_bar_restore(), that tries to do some fixup when backdoor
resets are noticed.  If we were to trigger that upon noting the user
writing BAR values to what we think they're already set to, we could
extend it to trigger an interrupt restore that could fixup both the
capability and the table entries.  Thanks,

Alex



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/3] drivers/vfio/pci: Fix MSIx message lost
  2014-03-03  4:43     ` Alexey Kardashevskiy
@ 2014-03-03  5:00       ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 16+ messages in thread
From: Benjamin Herrenschmidt @ 2014-03-03  5:00 UTC (permalink / raw
  To: Alexey Kardashevskiy; +Cc: Gavin Shan, kvm, alex.williamson

On Mon, 2014-03-03 at 15:43 +1100, Alexey Kardashevskiy wrote:
> While it works for our particular problem and seems correct, it has one
> flaw - hw/pci/msix.c will not generate this backtrace if masking bit does
> not change which can happen in general:
> ===
> static void msix_handle_mask_update(PCIDevice *dev, int vector, bool
> was_masked)
> {
>     bool is_masked = msix_is_masked(dev, vector);
> 
>     if (is_masked == was_masked) {
>         return;
>     }
> ===
> 
> Or if masking bit is the same, nothing bad is expected?...

Hrm ok, so it will work in this specific case but might not in the general
case of a driver triggering some kind of local reset on the device requiring
the MSI-X to be restored. The guest will write but qemu will swallow them ...

I think that needs to be fixed but it might be hard without introducing
a new ioctl from what I can see of the way the code is structured... Unless
qemu turns that into a disable/enable pair I suppose.

Cheers,
Ben.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/3] drivers/vfio/pci: Fix MSIx message lost
  2014-03-03  4:49     ` Alex Williamson
@ 2014-03-03  5:05       ` Benjamin Herrenschmidt
       [not found]       ` <20140303061036.GA4447@shangw.(null)>
  1 sibling, 0 replies; 16+ messages in thread
From: Benjamin Herrenschmidt @ 2014-03-03  5:05 UTC (permalink / raw
  To: Alex Williamson; +Cc: Gavin Shan, kvm, aik

On Sun, 2014-03-02 at 21:49 -0700, Alex Williamson wrote:

> I understand from talking to Alexey that the BARs are reset by this
> BIST, but what about the MSIX capability?  If that gets reset to be
> disabled, where does it get re-enabled?

The guest will do pci_save/restore_state iirc which will attempt
to save and restore everything, but qemu might get in the way here...

I'm not sure whether the IPR BIST clears the config space but we
probably should account for the general case of the driver in the guest
doing some kind of reset (BIST or otherwise) and trying to save and
restore state itself.

>   QEMU won't know about the BIST,
> so probably assumes nothing changed when the guest writes the enable
> bit.  VFIO doesn't allow userspace writes to the MSIX capability.  So it
> seems like the above would restore the table entry, but not necessarily
> whether MSIX is enabled on the device.

Maybe qemu shouldn't trust its cache and upon guest writes, do a read
first to check the HW state ? It's not like any of this is performance
sensitive anyway...

> When I talked with Alexey I noted that we already have a BAR restore
> function, vfio_bar_restore(), that tries to do some fixup when backdoor
> resets are noticed.  If we were to trigger that upon noting the user
> writing BAR values to what we think they're already set to, we could
> extend it to trigger an interrupt restore that could fixup both the
> capability and the table entries.  Thanks,

Cheers,
Ben.



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/3] drivers/vfio/pci: Fix MSIx message lost
       [not found]       ` <20140303061036.GA4447@shangw.(null)>
@ 2014-03-03 19:36         ` Alex Williamson
       [not found]           ` <20140304023018.GA21672@shangw.(null)>
  0 siblings, 1 reply; 16+ messages in thread
From: Alex Williamson @ 2014-03-03 19:36 UTC (permalink / raw
  To: Gavin Shan; +Cc: Benjamin Herrenschmidt, kvm, aik

On Mon, 2014-03-03 at 14:10 +0800, Gavin Shan wrote:
> On Sun, Mar 02, 2014 at 09:49:49PM -0700, Alex Williamson wrote:
> >On Mon, 2014-03-03 at 14:51 +1100, Benjamin Herrenschmidt wrote:
> >> On Mon, 2014-03-03 at 11:24 +0800, Gavin Shan wrote:
> 
> .../...
> 
> >> 
> >> > Reported-by: Wen Xiong <wenxiong@us.ibm.com>
> >> > Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
> >> > ---
> >> >  drivers/vfio/pci/vfio_pci_intrs.c |   19 +++++++++++++++++++
> >> >  1 file changed, 19 insertions(+)
> >> > 
> >> > diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
> >> > index 2103576..279ebd0 100644
> >> > --- a/drivers/vfio/pci/vfio_pci_intrs.c
> >> > +++ b/drivers/vfio/pci/vfio_pci_intrs.c
> >> > @@ -17,6 +17,7 @@
> >> >  #include <linux/interrupt.h>
> >> >  #include <linux/eventfd.h>
> >> >  #include <linux/pci.h>
> >> > +#include <linux/msi.h>
> >> >  #include <linux/file.h>
> >> >  #include <linux/poll.h>
> >> >  #include <linux/vfio.h>
> >> > @@ -517,6 +518,7 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_device *vdev,
> >> >  	struct pci_dev *pdev = vdev->pdev;
> >> >  	int irq = msix ? vdev->msix[vector].vector : pdev->irq + vector;
> >> >  	char *name = msix ? "vfio-msix" : "vfio-msi";
> >> > +	struct msi_msg msg;
> >> >  	struct eventfd_ctx *trigger;
> >> >  	int ret;
> >> >  
> >> > @@ -544,6 +546,23 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_device *vdev,
> >> >  		return PTR_ERR(trigger);
> >> >  	}
> >> >  
> >> > +	/* We possiblly lose the MSI/MSIx message in some cases.
> >> > +	 * For example, BIST reset on IPR adapter. The MSIx table
> >> > +	 * is cleaned out. However, we never get chance to put
> >> > +	 * MSIx messages to MSIx table because all MSIx stuff is
> >> > +	 * being cached in QEMU. Here, we had the trick to put the
> >> > +	 * MSI/MSIx message back.
> >> > +	 *
> >> > +	 * Basically, we needn't worry about MSI messages. However,
> >> > +	 * it's not harmful and there might be cases of PCI config data
> >> > +	 * lost because of cached PCI config data in QEMU again.
> >> > +	 *
> >> > +	 * Note that we should flash the message prior to enabling
> >> > +	 * the corresponding interrupt by request_irq().
> >> > +	 */
> >> > +	 get_cached_msi_msg(irq, &msg);
> >> > +	 write_msi_msg(irq, &msg);
> >> > +
> >> >  	ret = request_irq(irq, vfio_msihandler, 0,
> >> >  			  vdev->ctx[vector].name, trigger);
> >> >  	if (ret) {
> >
> >
> >I understand from talking to Alexey that the BARs are reset by this
> >BIST, but what about the MSIX capability?  If that gets reset to be
> >disabled, where does it get re-enabled?  QEMU won't know about the BIST,
> >so probably assumes nothing changed when the guest writes the enable
> >bit.  VFIO doesn't allow userspace writes to the MSIX capability.  So it
> >seems like the above would restore the table entry, but not necessarily
> >whether MSIX is enabled on the device.
> >
> 
> Yes, It's not necessarily, but not harmful to restore the message into
> device even though the PCI device has individual enabled. Prior to
> request_irq(), the interrupt should have been masked on PIC/CPU side or
> race condition there.
> 
> >When I talked with Alexey I noted that we already have a BAR restore
> >function, vfio_bar_restore(), that tries to do some fixup when backdoor
> >resets are noticed.  If we were to trigger that upon noting the user
> >writing BAR values to what we think they're already set to, we could
> >extend it to trigger an interrupt restore that could fixup both the
> >capability and the table entries.  Thanks,
> >
> 
> Various PCI devices could have different ways to do BIST. It's a bit
> hard to capture all of them, I think. Alex, please advise which way
> to fix the issue would be better. Or we can have this patch and figure
> out the way using vfio_bar_restore() later? :-)

This approach arguably has a very niche application.  I'd rather see
this device fall into the vfio_bar_restore() and add a callout to the
vfio irq code to restore the MSIX table and MSI/MSIX capabilities if the
surprise reset occurred with either enabled.  That seems like it would
be generally useful to any device with a backdoor reset.  Besides, if
the BARs on the device are being deprogrammed by BIST, then it must
already be hitting the vfio_bar_restore code or just re-writing the
vector table entry wouldn't be enough to get it running again.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/3] drivers/vfio/pci: Fix MSIx message lost
       [not found]           ` <20140304023018.GA21672@shangw.(null)>
@ 2014-03-04 20:45             ` Alex Williamson
  0 siblings, 0 replies; 16+ messages in thread
From: Alex Williamson @ 2014-03-04 20:45 UTC (permalink / raw
  To: Gavin Shan; +Cc: Benjamin Herrenschmidt, kvm, aik

On Tue, 2014-03-04 at 10:30 +0800, Gavin Shan wrote:
> On Mon, Mar 03, 2014 at 12:36:16PM -0700, Alex Williamson wrote:
> >On Mon, 2014-03-03 at 14:10 +0800, Gavin Shan wrote:
> >> On Sun, Mar 02, 2014 at 09:49:49PM -0700, Alex Williamson wrote:
> >> >On Mon, 2014-03-03 at 14:51 +1100, Benjamin Herrenschmidt wrote:
> >> >> On Mon, 2014-03-03 at 11:24 +0800, Gavin Shan wrote:
> 
> ../...
> 
> >> 
> >> Various PCI devices could have different ways to do BIST. It's a bit
> >> hard to capture all of them, I think. Alex, please advise which way
> >> to fix the issue would be better. Or we can have this patch and figure
> >> out the way using vfio_bar_restore() later? :-)
> >
> >This approach arguably has a very niche application.  I'd rather see
> >this device fall into the vfio_bar_restore() and add a callout to the
> >vfio irq code to restore the MSIX table and MSI/MSIX capabilities if the
> >surprise reset occurred with either enabled.  That seems like it would
> >be generally useful to any device with a backdoor reset.  Besides, if
> >the BARs on the device are being deprogrammed by BIST, then it must
> >already be hitting the vfio_bar_restore code or just re-writing the
> >vector table entry wouldn't be enough to get it running again.  Thanks,
> >
> 
> I had some debugging output in vfio_bar_restore() for my case, which is
> pci_save_state(), BIST and then pci_restore_state() on guest side. I never
> saw those debugging output from vfio_bar_restore(). Host VFIO-PCI has cached
> PCI config space, which was built in advance, and BIST on guest side didn't
> get altered PCI_COMMAND (memory/IO enable). So we didn't call into the function
> where we expect to restore MSIx stuff, Or I missed your point? :-)

Ok, I think I'm slowly wrapping my head around exactly what's going on,
let me summarize to make sure I understand.  The driver for this device
does some degree of configuration, including enabling MSI-X for the
device.  It then initiates a BIST via non-standard memory mapped
registers which resets some parts of the device, including the MMIO
space containing the MSI-X vector table, but not including PCI config
space of the device.  In particular, the COMMAND register and BAR values
in PCI config space are left intact (and apparently the MSI-X capability
as well).

If the driver wanted to be really evil, it could do this with vectors
enabled and rely on the reset value of the MSI-X vectors being masked.
However, we get lucky and the driver has all of the vectors masked prior
to reset so that after reset we see the masked to unmasked transition
and use that to program the eventfd, which is where you'd like to
re-write the vector message.  Is that all correct?

If the rest of config space is unaffected by this reset then my
suggestion to tie it into vfio_bar_restore() probably doesn't make
sense.  In fact, it doesn't seem like there's any config space
interaction we could count on to reliably detect this.

Thinking out loud, I wonder if there's any value to reading the MSI-X
message from the device and comparing to the cached value so that we can
at least detect that we've encountered this situation.  If we did that,
we could restore all of the vectors in the table.  However, it seems
like the only value that would add would be to write a message to syslog
since the guest will need to re-write any vectors it intends to use (and
assuming we can count on vectors being masked after reset as the spec
indicates).  So, I guess I can't think of anything better than what you
proposed (with better comments to detail the situation for later).
Thanks,

Alex


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 0/3] VFIO Bug Fixes
@ 2014-03-10  5:46 Gavin Shan
  2014-03-10  5:46 ` [PATCH 1/3] drivers/vfio: Rework offsetofend() Gavin Shan
                   ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Gavin Shan @ 2014-03-10  5:46 UTC (permalink / raw
  To: alex.williamson; +Cc: kvm, benh, aik, Gavin Shan

v1 -> v2:
	* Don't change the name of variable "flags" in [PATCH 2/3].
	* Comment and commit log cleanup in [PATCH 3/3].

Gavin Shan (3):
  drivers/vfio: Rework offsetofend()
  drivers/vfio/pci: Fix wrong MSI interrupt count
  drivers/vfio/pci: Fix MSIx message lost

---

 drivers/vfio/pci/vfio_pci.c       |    3 +--
 drivers/vfio/pci/vfio_pci_intrs.c |   11 +++++++++++
 include/linux/vfio.h              |    5 ++---
 3 files changed, 14 insertions(+), 5 deletions(-)

Thanks,
Gavin


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/3] drivers/vfio: Rework offsetofend()
  2014-03-10  5:46 [PATCH 0/3] VFIO Bug Fixes Gavin Shan
@ 2014-03-10  5:46 ` Gavin Shan
  2014-03-10  5:46 ` [PATCH 2/3] drivers/vfio/pci: Fix wrong MSI interrupt count Gavin Shan
  2014-03-10  5:46 ` [PATCH 3/3] drivers/vfio/pci: Fix MSIx message lost Gavin Shan
  2 siblings, 0 replies; 16+ messages in thread
From: Gavin Shan @ 2014-03-10  5:46 UTC (permalink / raw
  To: alex.williamson; +Cc: kvm, benh, aik, Gavin Shan

The macro offsetofend() introduces unnecessary temporary variable
"tmp". The patch avoids that and saves a bit memory in stack.

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
---
 include/linux/vfio.h |    5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index 24579a0..43f6bf4 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -86,9 +86,8 @@ extern void vfio_unregister_iommu_driver(
  * from user space.  This allows us to easily determine if the provided
  * structure is sized to include various fields.
  */
-#define offsetofend(TYPE, MEMBER) ({				\
-	TYPE tmp;						\
-	offsetof(TYPE, MEMBER) + sizeof(tmp.MEMBER); })		\
+#define offsetofend(TYPE, MEMBER) \
+	(offsetof(TYPE, MEMBER)	+ sizeof(((TYPE *)0)->MEMBER))
 
 /*
  * External user API
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 2/3] drivers/vfio/pci: Fix wrong MSI interrupt count
  2014-03-10  5:46 [PATCH 0/3] VFIO Bug Fixes Gavin Shan
  2014-03-10  5:46 ` [PATCH 1/3] drivers/vfio: Rework offsetofend() Gavin Shan
@ 2014-03-10  5:46 ` Gavin Shan
  2014-03-10  5:46 ` [PATCH 3/3] drivers/vfio/pci: Fix MSIx message lost Gavin Shan
  2 siblings, 0 replies; 16+ messages in thread
From: Gavin Shan @ 2014-03-10  5:46 UTC (permalink / raw
  To: alex.williamson; +Cc: kvm, benh, aik, Gavin Shan

According PCI local bus specification, the register of Message
Control for MSI (offset: 2, length: 2) has bit#0 to enable or
disable MSI logic and it shouldn't be part contributing to the
calculation of MSI interrupt count. The patch fixes the issue.

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
---
 drivers/vfio/pci/vfio_pci.c |    3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index 7ba0424..6b8cd07 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -196,8 +196,7 @@ static int vfio_pci_get_irq_count(struct vfio_pci_device *vdev, int irq_type)
 		if (pos) {
 			pci_read_config_word(vdev->pdev,
 					     pos + PCI_MSI_FLAGS, &flags);
-
-			return 1 << (flags & PCI_MSI_FLAGS_QMASK);
+			return 1 << ((flags & PCI_MSI_FLAGS_QMASK) >> 1);
 		}
 	} else if (irq_type == VFIO_PCI_MSIX_IRQ_INDEX) {
 		u8 pos;
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 3/3] drivers/vfio/pci: Fix MSIx message lost
  2014-03-10  5:46 [PATCH 0/3] VFIO Bug Fixes Gavin Shan
  2014-03-10  5:46 ` [PATCH 1/3] drivers/vfio: Rework offsetofend() Gavin Shan
  2014-03-10  5:46 ` [PATCH 2/3] drivers/vfio/pci: Fix wrong MSI interrupt count Gavin Shan
@ 2014-03-10  5:46 ` Gavin Shan
  2014-03-17 22:16   ` Alex Williamson
  2014-04-02  1:16   ` Alexey Kardashevskiy
  2 siblings, 2 replies; 16+ messages in thread
From: Gavin Shan @ 2014-03-10  5:46 UTC (permalink / raw
  To: alex.williamson; +Cc: kvm, benh, aik, Gavin Shan

The problem is specific to the case of BIST reset issued to IPR
adapter on the guest side. The IPR driver calls pci_save_state(),
issues BIST reset and then pci_restore_state(). The issued BIST
cleans out MSIx table and pci_restore_state() doesn't restore
the MSIx table as expected. Eventually, MSIx messages with all
zeros are sent and causes EEH error on Power platform.

The patch fixes it by writing MSIx message to MSIx table before
reenabling the individual interrupts in the following path:

qemu/hw/pci/msix.c::msix_table_mmio_write
                    msix_handle_mask_update
                    msix_fire_vector_notifier
qemu/hw/misc/vfio.c::vfio_msix_vector_use
                    vfio_msix_vector_do_use

IOCTL Command VFIO_DEVICE_SET_IRQS to VFIO-PCI
vfio/pci/vfio_pci.c::vfio_pci_ioctl
vfio/pci/vfio_pci_intrs.c::vfio_pci_set_irqs_ioctl
                    vfio_pci_set_msi_trigger
                    vfio_msi_set_block
                    vfio_msi_set_vector_signal

Reported-by: Wen Xiong <wenxiong@us.ibm.com>
Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
---
 drivers/vfio/pci/vfio_pci_intrs.c |   11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
index 2103576..83e0638 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -17,6 +17,7 @@
 #include <linux/interrupt.h>
 #include <linux/eventfd.h>
 #include <linux/pci.h>
+#include <linux/msi.h>
 #include <linux/file.h>
 #include <linux/poll.h>
 #include <linux/vfio.h>
@@ -517,6 +518,7 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_device *vdev,
 	struct pci_dev *pdev = vdev->pdev;
 	int irq = msix ? vdev->msix[vector].vector : pdev->irq + vector;
 	char *name = msix ? "vfio-msix" : "vfio-msi";
+	struct msi_msg msg;
 	struct eventfd_ctx *trigger;
 	int ret;
 
@@ -544,6 +546,15 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_device *vdev,
 		return PTR_ERR(trigger);
 	}
 
+	/* We possiblly lose the MSI/MSIx message in some cases, one
+	 * of which is pci_save_state(), BIST reset and pci_restore_state()
+	 * for IPR adapter. The BIST reset cleans out MSIx table and we
+	 * don't have chance to restore it. Here, we have the trick to
+	 * restore it before enabling individual interrupts. For MSI messages,
+	 * it's harmless to write them back.
+	 */
+	get_cached_msi_msg(irq, &msg);
+	write_msi_msg(irq, &msg);
 	ret = request_irq(irq, vfio_msihandler, 0,
 			  vdev->ctx[vector].name, trigger);
 	if (ret) {
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/3] drivers/vfio/pci: Fix MSIx message lost
  2014-03-10  5:46 ` [PATCH 3/3] drivers/vfio/pci: Fix MSIx message lost Gavin Shan
@ 2014-03-17 22:16   ` Alex Williamson
       [not found]     ` <20140318013238.GA9843@shangw.(null)>
  2014-04-02  1:16   ` Alexey Kardashevskiy
  1 sibling, 1 reply; 16+ messages in thread
From: Alex Williamson @ 2014-03-17 22:16 UTC (permalink / raw
  To: Gavin Shan; +Cc: kvm, benh, aik

On Mon, 2014-03-10 at 13:46 +0800, Gavin Shan wrote:
> The problem is specific to the case of BIST reset issued to IPR
> adapter on the guest side. The IPR driver calls pci_save_state(),
> issues BIST reset and then pci_restore_state(). The issued BIST
> cleans out MSIx table and pci_restore_state() doesn't restore
> the MSIx table as expected. Eventually, MSIx messages with all
> zeros are sent and causes EEH error on Power platform.
> 
> The patch fixes it by writing MSIx message to MSIx table before
> reenabling the individual interrupts in the following path:
> 
> qemu/hw/pci/msix.c::msix_table_mmio_write
>                     msix_handle_mask_update
>                     msix_fire_vector_notifier
> qemu/hw/misc/vfio.c::vfio_msix_vector_use
>                     vfio_msix_vector_do_use
> 
> IOCTL Command VFIO_DEVICE_SET_IRQS to VFIO-PCI
> vfio/pci/vfio_pci.c::vfio_pci_ioctl
> vfio/pci/vfio_pci_intrs.c::vfio_pci_set_irqs_ioctl
>                     vfio_pci_set_msi_trigger
>                     vfio_msi_set_block
>                     vfio_msi_set_vector_signal
> 
> Reported-by: Wen Xiong <wenxiong@us.ibm.com>
> Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
> ---
>  drivers/vfio/pci/vfio_pci_intrs.c |   11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
> index 2103576..83e0638 100644
> --- a/drivers/vfio/pci/vfio_pci_intrs.c
> +++ b/drivers/vfio/pci/vfio_pci_intrs.c
> @@ -17,6 +17,7 @@
>  #include <linux/interrupt.h>
>  #include <linux/eventfd.h>
>  #include <linux/pci.h>
> +#include <linux/msi.h>
>  #include <linux/file.h>
>  #include <linux/poll.h>
>  #include <linux/vfio.h>
> @@ -517,6 +518,7 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_device *vdev,
>  	struct pci_dev *pdev = vdev->pdev;
>  	int irq = msix ? vdev->msix[vector].vector : pdev->irq + vector;
>  	char *name = msix ? "vfio-msix" : "vfio-msi";
> +	struct msi_msg msg;
>  	struct eventfd_ctx *trigger;
>  	int ret;
>  
> @@ -544,6 +546,15 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_device *vdev,
>  		return PTR_ERR(trigger);
>  	}
>  
> +	/* We possiblly lose the MSI/MSIx message in some cases, one
> +	 * of which is pci_save_state(), BIST reset and pci_restore_state()
> +	 * for IPR adapter. The BIST reset cleans out MSIx table and we
> +	 * don't have chance to restore it. Here, we have the trick to
> +	 * restore it before enabling individual interrupts. For MSI messages,
> +	 * it's harmless to write them back.
> +	 */
> +	get_cached_msi_msg(irq, &msg);
> +	write_msi_msg(irq, &msg);
>  	ret = request_irq(irq, vfio_msihandler, 0,
>  			  vdev->ctx[vector].name, trigger);
>  	if (ret) {


Hi Gavin,

The commit log and patch comment are still pretty confusing.  The
problem is not necessarily specific to the IPR card, it's just the only
instance we know of it, otherwise we'd make a quirk that only gets
invoked for this device.  We're also not only making the change for the
call path listed, but for any callers of the function.  I also got a bit
nervous that while it seems like it should be safe to re-write the
cached MSI message before enabling it, our argument for doing so is that
the MSIx vector table lives in device memory.  This is not true for MSI
and modifying MSI modifies all the vectors, so why are we doing it?  The
result is the patch below.  Please let me know if you approve and I'll
apply it after further testing.  Thanks,

Alex

Author: Gavin Shan <shangw@linux.vnet.ibm.com>

    vfio/pci: Restore MSIx message prior to enabling
    
    The MSIx vector table lives in device memory, which may be cleared as
    part of a backdoor device reset.  This is the case on the IBM IPR HBA
    when the BIST is run on the device.  When assigned to a QEMU guest,
    the guest driver does a pci_save_state(), issues a BIST, then does a
    pci_restore_state().  The BIST clears the MSIx vector table, but due
    to the way interrupts are configured the pci_restore_state() does not
    restore the vector table as expected.  Eventually this results in an
    EEH error on Power platforms when the device attempts to signal an
    interrupt with the zero'd table entry.
    
    Fix the problem by restoring the host cached MSI message prior to
    enabling each vector.
    
    Reported-by: Wen Xiong <wenxiong@us.ibm.com>
    Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
    Signed-off-by: Alex Williamson <alex.williamson@redhat.com>

diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
index 2103576..3058b2e 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -17,6 +17,7 @@
 #include <linux/interrupt.h>
 #include <linux/eventfd.h>
 #include <linux/pci.h>
+#include <linux/msi.h>
 #include <linux/file.h>
 #include <linux/poll.h>
 #include <linux/vfio.h>
@@ -544,6 +545,19 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_device *vdev,
 		return PTR_ERR(trigger);
 	}
 
+	/*
+	 * The MSIx vector table resides in device memory which may be cleared
+	 * via backdoor resets.  We don't allow direct access to the vector
+	 * table so even if a userspace driver attempts to save/restore around
+	 * such a reset it would be unsuccessful.  To avoid this, restore the
+	 * cached value of the message prior to enabling.
+	 */
+	if (msix) {
+		struct msi_msg msg;
+		get_cached_msi_msg(irq, &msg);
+		write_msi_msg(irq, &msg);
+	}
+
 	ret = request_irq(irq, vfio_msihandler, 0,
 			  vdev->ctx[vector].name, trigger);
 	if (ret) {



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/3] drivers/vfio/pci: Fix MSIx message lost
       [not found]     ` <20140318013238.GA9843@shangw.(null)>
@ 2014-03-18 22:44       ` Alex Williamson
  0 siblings, 0 replies; 16+ messages in thread
From: Alex Williamson @ 2014-03-18 22:44 UTC (permalink / raw
  To: Gavin Shan; +Cc: kvm, benh, aik

On Tue, 2014-03-18 at 09:32 +0800, Gavin Shan wrote:
> On Mon, Mar 17, 2014 at 04:16:22PM -0600, Alex Williamson wrote:
> >On Mon, 2014-03-10 at 13:46 +0800, Gavin Shan wrote:
> >> The problem is specific to the case of BIST reset issued to IPR
> >> adapter on the guest side. The IPR driver calls pci_save_state(),
> >> issues BIST reset and then pci_restore_state(). The issued BIST
> >> cleans out MSIx table and pci_restore_state() doesn't restore
> >> the MSIx table as expected. Eventually, MSIx messages with all
> >> zeros are sent and causes EEH error on Power platform.
> >> 
> >> The patch fixes it by writing MSIx message to MSIx table before
> >> reenabling the individual interrupts in the following path:
> >> 
> >> qemu/hw/pci/msix.c::msix_table_mmio_write
> >>                     msix_handle_mask_update
> >>                     msix_fire_vector_notifier
> >> qemu/hw/misc/vfio.c::vfio_msix_vector_use
> >>                     vfio_msix_vector_do_use
> >> 
> >> IOCTL Command VFIO_DEVICE_SET_IRQS to VFIO-PCI
> >> vfio/pci/vfio_pci.c::vfio_pci_ioctl
> >> vfio/pci/vfio_pci_intrs.c::vfio_pci_set_irqs_ioctl
> >>                     vfio_pci_set_msi_trigger
> >>                     vfio_msi_set_block
> >>                     vfio_msi_set_vector_signal
> >> 
> >> Reported-by: Wen Xiong <wenxiong@us.ibm.com>
> >> Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
> >> ---
> >>  drivers/vfio/pci/vfio_pci_intrs.c |   11 +++++++++++
> >>  1 file changed, 11 insertions(+)
> >> 
> >> diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
> >> index 2103576..83e0638 100644
> >> --- a/drivers/vfio/pci/vfio_pci_intrs.c
> >> +++ b/drivers/vfio/pci/vfio_pci_intrs.c
> >> @@ -17,6 +17,7 @@
> >>  #include <linux/interrupt.h>
> >>  #include <linux/eventfd.h>
> >>  #include <linux/pci.h>
> >> +#include <linux/msi.h>
> >>  #include <linux/file.h>
> >>  #include <linux/poll.h>
> >>  #include <linux/vfio.h>
> >> @@ -517,6 +518,7 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_device *vdev,
> >>  	struct pci_dev *pdev = vdev->pdev;
> >>  	int irq = msix ? vdev->msix[vector].vector : pdev->irq + vector;
> >>  	char *name = msix ? "vfio-msix" : "vfio-msi";
> >> +	struct msi_msg msg;
> >>  	struct eventfd_ctx *trigger;
> >>  	int ret;
> >>  
> >> @@ -544,6 +546,15 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_device *vdev,
> >>  		return PTR_ERR(trigger);
> >>  	}
> >>  
> >> +	/* We possiblly lose the MSI/MSIx message in some cases, one
> >> +	 * of which is pci_save_state(), BIST reset and pci_restore_state()
> >> +	 * for IPR adapter. The BIST reset cleans out MSIx table and we
> >> +	 * don't have chance to restore it. Here, we have the trick to
> >> +	 * restore it before enabling individual interrupts. For MSI messages,
> >> +	 * it's harmless to write them back.
> >> +	 */
> >> +	get_cached_msi_msg(irq, &msg);
> >> +	write_msi_msg(irq, &msg);
> >>  	ret = request_irq(irq, vfio_msihandler, 0,
> >>  			  vdev->ctx[vector].name, trigger);
> >>  	if (ret) {
> >
> >
> >Hi Gavin,
> >
> >The commit log and patch comment are still pretty confusing.  The
> >problem is not necessarily specific to the IPR card, it's just the only
> >instance we know of it, otherwise we'd make a quirk that only gets
> >invoked for this device.  We're also not only making the change for the
> >call path listed, but for any callers of the function.  I also got a bit
> >nervous that while it seems like it should be safe to re-write the
> >cached MSI message before enabling it, our argument for doing so is that
> >the MSIx vector table lives in device memory.  This is not true for MSI
> >and modifying MSI modifies all the vectors, so why are we doing it?  The
> >result is the patch below.  Please let me know if you approve and I'll
> >apply it after further testing.  Thanks,
> >
> 
> Thanks, Alex. Your changes look good. We needn't touch MSI stuff and
> it's not necessarily specific to IPR adapter :-)

We still have a problem, get_cached_msi_msg() and write_msi_msg() are
not exported for module use, so we get undefined symbols trying to use
them.  Please get them exported first or figure out a different
strategy.  Thanks,

Alex

> >Author: Gavin Shan <shangw@linux.vnet.ibm.com>
> >
> >    vfio/pci: Restore MSIx message prior to enabling
> >    
> >    The MSIx vector table lives in device memory, which may be cleared as
> >    part of a backdoor device reset.  This is the case on the IBM IPR HBA
> >    when the BIST is run on the device.  When assigned to a QEMU guest,
> >    the guest driver does a pci_save_state(), issues a BIST, then does a
> >    pci_restore_state().  The BIST clears the MSIx vector table, but due
> >    to the way interrupts are configured the pci_restore_state() does not
> >    restore the vector table as expected.  Eventually this results in an
> >    EEH error on Power platforms when the device attempts to signal an
> >    interrupt with the zero'd table entry.
> >    
> >    Fix the problem by restoring the host cached MSI message prior to
> >    enabling each vector.
> >    
> >    Reported-by: Wen Xiong <wenxiong@us.ibm.com>
> >    Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
> >    Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> >
> >diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
> >index 2103576..3058b2e 100644
> >--- a/drivers/vfio/pci/vfio_pci_intrs.c
> >+++ b/drivers/vfio/pci/vfio_pci_intrs.c
> >@@ -17,6 +17,7 @@
> > #include <linux/interrupt.h>
> > #include <linux/eventfd.h>
> > #include <linux/pci.h>
> >+#include <linux/msi.h>
> > #include <linux/file.h>
> > #include <linux/poll.h>
> > #include <linux/vfio.h>
> >@@ -544,6 +545,19 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_device *vdev,
> > 		return PTR_ERR(trigger);
> > 	}
> >
> >+	/*
> >+	 * The MSIx vector table resides in device memory which may be cleared
> >+	 * via backdoor resets.  We don't allow direct access to the vector
> >+	 * table so even if a userspace driver attempts to save/restore around
> >+	 * such a reset it would be unsuccessful.  To avoid this, restore the
> >+	 * cached value of the message prior to enabling.
> >+	 */
> >+	if (msix) {
> >+		struct msi_msg msg;
> >+		get_cached_msi_msg(irq, &msg);
> >+		write_msi_msg(irq, &msg);
> >+	}
> >+
> > 	ret = request_irq(irq, vfio_msihandler, 0,
> > 			  vdev->ctx[vector].name, trigger);
> > 	if (ret) {
> >
> >
> 




^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/3] drivers/vfio/pci: Fix MSIx message lost
  2014-03-10  5:46 ` [PATCH 3/3] drivers/vfio/pci: Fix MSIx message lost Gavin Shan
  2014-03-17 22:16   ` Alex Williamson
@ 2014-04-02  1:16   ` Alexey Kardashevskiy
  2014-04-02  1:36     ` Alex Williamson
  1 sibling, 1 reply; 16+ messages in thread
From: Alexey Kardashevskiy @ 2014-04-02  1:16 UTC (permalink / raw
  To: Gavin Shan, alex.williamson; +Cc: kvm, benh

On 03/10/2014 04:46 PM, Gavin Shan wrote:
> The problem is specific to the case of BIST reset issued to IPR
> adapter on the guest side. The IPR driver calls pci_save_state(),
> issues BIST reset and then pci_restore_state(). The issued BIST
> cleans out MSIx table and pci_restore_state() doesn't restore
> the MSIx table as expected. Eventually, MSIx messages with all
> zeros are sent and causes EEH error on Power platform.
> 
> The patch fixes it by writing MSIx message to MSIx table before
> reenabling the individual interrupts in the following path:
> 
> qemu/hw/pci/msix.c::msix_table_mmio_write
>                     msix_handle_mask_update
>                     msix_fire_vector_notifier
> qemu/hw/misc/vfio.c::vfio_msix_vector_use
>                     vfio_msix_vector_do_use
> 
> IOCTL Command VFIO_DEVICE_SET_IRQS to VFIO-PCI
> vfio/pci/vfio_pci.c::vfio_pci_ioctl
> vfio/pci/vfio_pci_intrs.c::vfio_pci_set_irqs_ioctl
>                     vfio_pci_set_msi_trigger
>                     vfio_msi_set_block
>                     vfio_msi_set_vector_signal
> 
> Reported-by: Wen Xiong <wenxiong@us.ibm.com>
> Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
> ---
>  drivers/vfio/pci/vfio_pci_intrs.c |   11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
> index 2103576..83e0638 100644
> --- a/drivers/vfio/pci/vfio_pci_intrs.c
> +++ b/drivers/vfio/pci/vfio_pci_intrs.c
> @@ -17,6 +17,7 @@
>  #include <linux/interrupt.h>
>  #include <linux/eventfd.h>
>  #include <linux/pci.h>
> +#include <linux/msi.h>
>  #include <linux/file.h>
>  #include <linux/poll.h>
>  #include <linux/vfio.h>
> @@ -517,6 +518,7 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_device *vdev,
>  	struct pci_dev *pdev = vdev->pdev;
>  	int irq = msix ? vdev->msix[vector].vector : pdev->irq + vector;
>  	char *name = msix ? "vfio-msix" : "vfio-msi";
> +	struct msi_msg msg;
>  	struct eventfd_ctx *trigger;
>  	int ret;
>  
> @@ -544,6 +546,15 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_device *vdev,
>  		return PTR_ERR(trigger);
>  	}
>  
> +	/* We possiblly lose the MSI/MSIx message in some cases, one
> +	 * of which is pci_save_state(), BIST reset and pci_restore_state()
> +	 * for IPR adapter. The BIST reset cleans out MSIx table and we
> +	 * don't have chance to restore it. Here, we have the trick to
> +	 * restore it before enabling individual interrupts. For MSI messages,
> +	 * it's harmless to write them back.
> +	 */
> +	get_cached_msi_msg(irq, &msg);
> +	write_msi_msg(irq, &msg);
>  	ret = request_irq(irq, vfio_msihandler, 0,
>  			  vdev->ctx[vector].name, trigger);
>  	if (ret) {
> 


This is missing to successfully compile VFIO as modules:


diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
index 2c10752..80ba2e9 100644
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -283,6 +283,7 @@ void get_cached_msi_msg(unsigned int irq, struct
msi_msg *msg)

        __get_cached_msi_msg(entry, msg);
 }
+EXPORT_SYMBOL_GPL(get_cached_msi_msg);

 void __write_msi_msg(struct msi_desc *entry, struct msi_msg *msg)
 {
@@ -327,6 +328,7 @@ void write_msi_msg(unsigned int irq, struct msi_msg *msg)

        __write_msi_msg(entry, msg);
 }
+EXPORT_SYMBOL_GPL(write_msi_msg);

 static void free_msi_irqs(struct pci_dev *dev)
 {




-- 
Alexey

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/3] drivers/vfio/pci: Fix MSIx message lost
  2014-04-02  1:16   ` Alexey Kardashevskiy
@ 2014-04-02  1:36     ` Alex Williamson
  0 siblings, 0 replies; 16+ messages in thread
From: Alex Williamson @ 2014-04-02  1:36 UTC (permalink / raw
  To: Alexey Kardashevskiy; +Cc: Gavin Shan, kvm, benh

On Wed, 2014-04-02 at 12:16 +1100, Alexey Kardashevskiy wrote:
> On 03/10/2014 04:46 PM, Gavin Shan wrote:
> > The problem is specific to the case of BIST reset issued to IPR
> > adapter on the guest side. The IPR driver calls pci_save_state(),
> > issues BIST reset and then pci_restore_state(). The issued BIST
> > cleans out MSIx table and pci_restore_state() doesn't restore
> > the MSIx table as expected. Eventually, MSIx messages with all
> > zeros are sent and causes EEH error on Power platform.
> > 
> > The patch fixes it by writing MSIx message to MSIx table before
> > reenabling the individual interrupts in the following path:
> > 
> > qemu/hw/pci/msix.c::msix_table_mmio_write
> >                     msix_handle_mask_update
> >                     msix_fire_vector_notifier
> > qemu/hw/misc/vfio.c::vfio_msix_vector_use
> >                     vfio_msix_vector_do_use
> > 
> > IOCTL Command VFIO_DEVICE_SET_IRQS to VFIO-PCI
> > vfio/pci/vfio_pci.c::vfio_pci_ioctl
> > vfio/pci/vfio_pci_intrs.c::vfio_pci_set_irqs_ioctl
> >                     vfio_pci_set_msi_trigger
> >                     vfio_msi_set_block
> >                     vfio_msi_set_vector_signal
> > 
> > Reported-by: Wen Xiong <wenxiong@us.ibm.com>
> > Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
> > ---
> >  drivers/vfio/pci/vfio_pci_intrs.c |   11 +++++++++++
> >  1 file changed, 11 insertions(+)
> > 
> > diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
> > index 2103576..83e0638 100644
> > --- a/drivers/vfio/pci/vfio_pci_intrs.c
> > +++ b/drivers/vfio/pci/vfio_pci_intrs.c
> > @@ -17,6 +17,7 @@
> >  #include <linux/interrupt.h>
> >  #include <linux/eventfd.h>
> >  #include <linux/pci.h>
> > +#include <linux/msi.h>
> >  #include <linux/file.h>
> >  #include <linux/poll.h>
> >  #include <linux/vfio.h>
> > @@ -517,6 +518,7 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_device *vdev,
> >  	struct pci_dev *pdev = vdev->pdev;
> >  	int irq = msix ? vdev->msix[vector].vector : pdev->irq + vector;
> >  	char *name = msix ? "vfio-msix" : "vfio-msi";
> > +	struct msi_msg msg;
> >  	struct eventfd_ctx *trigger;
> >  	int ret;
> >  
> > @@ -544,6 +546,15 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_device *vdev,
> >  		return PTR_ERR(trigger);
> >  	}
> >  
> > +	/* We possiblly lose the MSI/MSIx message in some cases, one
> > +	 * of which is pci_save_state(), BIST reset and pci_restore_state()
> > +	 * for IPR adapter. The BIST reset cleans out MSIx table and we
> > +	 * don't have chance to restore it. Here, we have the trick to
> > +	 * restore it before enabling individual interrupts. For MSI messages,
> > +	 * it's harmless to write them back.
> > +	 */
> > +	get_cached_msi_msg(irq, &msg);
> > +	write_msi_msg(irq, &msg);
> >  	ret = request_irq(irq, vfio_msihandler, 0,
> >  			  vdev->ctx[vector].name, trigger);
> >  	if (ret) {
> > 
> 
> 
> This is missing to successfully compile VFIO as modules:

Yep, I noted this in my reply on 3/18.  Thanks,

Alex

> diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
> index 2c10752..80ba2e9 100644
> --- a/drivers/pci/msi.c
> +++ b/drivers/pci/msi.c
> @@ -283,6 +283,7 @@ void get_cached_msi_msg(unsigned int irq, struct
> msi_msg *msg)
> 
>         __get_cached_msi_msg(entry, msg);
>  }
> +EXPORT_SYMBOL_GPL(get_cached_msi_msg);
> 
>  void __write_msi_msg(struct msi_desc *entry, struct msi_msg *msg)
>  {
> @@ -327,6 +328,7 @@ void write_msi_msg(unsigned int irq, struct msi_msg *msg)
> 
>         __write_msi_msg(entry, msg);
>  }
> +EXPORT_SYMBOL_GPL(write_msi_msg);
> 
>  static void free_msi_irqs(struct pci_dev *dev)
>  {
> 
> 
> 
> 




^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2014-04-02  1:36 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-03-10  5:46 [PATCH 0/3] VFIO Bug Fixes Gavin Shan
2014-03-10  5:46 ` [PATCH 1/3] drivers/vfio: Rework offsetofend() Gavin Shan
2014-03-10  5:46 ` [PATCH 2/3] drivers/vfio/pci: Fix wrong MSI interrupt count Gavin Shan
2014-03-10  5:46 ` [PATCH 3/3] drivers/vfio/pci: Fix MSIx message lost Gavin Shan
2014-03-17 22:16   ` Alex Williamson
     [not found]     ` <20140318013238.GA9843@shangw.(null)>
2014-03-18 22:44       ` Alex Williamson
2014-04-02  1:16   ` Alexey Kardashevskiy
2014-04-02  1:36     ` Alex Williamson
  -- strict thread matches above, loose matches on Subject: below --
2014-03-03  3:24 [PATCH 1/3] drivers/vfio: Rework offsetofend() Gavin Shan
2014-03-03  3:24 ` [PATCH 3/3] drivers/vfio/pci: Fix MSIx message lost Gavin Shan
2014-03-03  3:51   ` Benjamin Herrenschmidt
2014-03-03  4:43     ` Alexey Kardashevskiy
2014-03-03  5:00       ` Benjamin Herrenschmidt
2014-03-03  4:49     ` Alex Williamson
2014-03-03  5:05       ` Benjamin Herrenschmidt
     [not found]       ` <20140303061036.GA4447@shangw.(null)>
2014-03-03 19:36         ` Alex Williamson
     [not found]           ` <20140304023018.GA21672@shangw.(null)>
2014-03-04 20:45             ` Alex Williamson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.