All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] VT-d: fix PCI device detach from virtual machine
@ 2009-02-26  9:31 Han, Weidong
  2010-06-14 23:19 ` David Woodhouse
  0 siblings, 1 reply; 7+ messages in thread
From: Han, Weidong @ 2009-02-26  9:31 UTC (permalink / raw
  To: 'dwmw2@infradead.org', 'Avi Kivity'
  Cc: 'iommu@lists.linux-foundation.org', 'kvm'

[-- Attachment #1: Type: text/plain, Size: 2461 bytes --]

When assign a device behind conventional PCI bridge or PCIe to
PCI/PCI-x bridge to a domain, it must assign its bridge and may
also need to assign secondary interface to the same domain. 

Dependent assignment is already there, but dependent
deassignment is missed when detach device from virtual machine.
This results in conventional PCI device assignment failure after
it has been assigned once. This patch addes dependent
deassignment, and fixes the issue.


Signed-off-by: Weidong Han <weidong.han@intel.com>
---
 drivers/pci/intel-iommu.c |   29 +++++++++++++++++++++++++++++
 1 files changed, 29 insertions(+), 0 deletions(-)

diff --git a/drivers/pci/intel-iommu.c b/drivers/pci/intel-iommu.c
index f3f6865..d21a1a5 100644
--- a/drivers/pci/intel-iommu.c
+++ b/drivers/pci/intel-iommu.c
@@ -2772,6 +2772,33 @@ static int vm_domain_add_dev_info(struct dmar_domain *domain,
 	return 0;
 }
 
+static void iommu_detach_dependent_devices(struct intel_iommu *iommu,
+					   struct pci_dev *pdev)
+{
+	struct pci_dev *tmp, *parent;
+
+	if (!iommu || !pdev)
+		return;
+
+	/* dependent device detach */
+	tmp = pci_find_upstream_pcie_bridge(pdev);
+	/* Secondary interface's bus number and devfn 0 */
+	if (tmp) {
+		parent = pdev->bus->self;
+		while (parent != tmp) {
+			iommu_detach_dev(iommu, parent->bus->number,
+				parent->devfn);
+			parent = parent->bus->self;
+		}
+		if (tmp->is_pcie) /* this is a PCIE-to-PCI bridge */
+			iommu_detach_dev(iommu,
+				tmp->subordinate->number, 0);
+		else /* this is a legacy PCI bridge */
+			iommu_detach_dev(iommu,
+				tmp->bus->number, tmp->devfn);
+	}
+}
+
 static void vm_domain_remove_one_dev_info(struct dmar_domain *domain,
 					  struct pci_dev *pdev)
 {
@@ -2797,6 +2824,7 @@ static void vm_domain_remove_one_dev_info(struct dmar_domain *domain,
 			spin_unlock_irqrestore(&device_domain_lock, flags);
 
 			iommu_detach_dev(iommu, info->bus, info->devfn);
+			iommu_detach_dependent_devices(iommu, pdev);
 			free_devinfo_mem(info);
 
 			spin_lock_irqsave(&device_domain_lock, flags);
@@ -2846,6 +2874,7 @@ static void vm_domain_remove_all_dev_info(struct dmar_domain *domain)
 
 		iommu = device_to_iommu(info->bus, info->devfn);
 		iommu_detach_dev(iommu, info->bus, info->devfn);
+		iommu_detach_dependent_devices(iommu, info->dev);
 
 		/* clear this iommu in iommu_bmp, update iommu count
 		 * and coherency
-- 
1.6.0.4

[-- Attachment #2: 0001-VT-d-fix-PCI-device-detach-from-virtual-machine.patch --]
[-- Type: application/octet-stream, Size: 2609 bytes --]

From 13c25cbfb13c14a41c228c88c2c5bd45e6ed347b Mon Sep 17 00:00:00 2001
From: Weidong Han <weidong.han@intel.com>
Date: Wed, 25 Feb 2009 16:58:44 +0800
Subject: [PATCH] VT-d: fix PCI device detach from virtual machine 

When assign a device behind conventional PCI bridge or PCIe to
PCI/PCI-x bridge to a domain, it must assign its bridge and may
also need to assign secondary interface to the same domain. 

Dependent assignment is already there, but dependent
deassignment is missed when detach device from virtual machine.
This results in conventional PCI device assignment failure after
it has been assigned once. This patch addes dependent
deassignment, and fixes the issue.


Signed-off-by: Weidong Han <weidong.han@intel.com>
---
 drivers/pci/intel-iommu.c |   29 +++++++++++++++++++++++++++++
 1 files changed, 29 insertions(+), 0 deletions(-)

diff --git a/drivers/pci/intel-iommu.c b/drivers/pci/intel-iommu.c
index f3f6865..d21a1a5 100644
--- a/drivers/pci/intel-iommu.c
+++ b/drivers/pci/intel-iommu.c
@@ -2772,6 +2772,33 @@ static int vm_domain_add_dev_info(struct dmar_domain *domain,
 	return 0;
 }
 
+static void iommu_detach_dependent_devices(struct intel_iommu *iommu,
+					   struct pci_dev *pdev)
+{
+	struct pci_dev *tmp, *parent;
+
+	if (!iommu || !pdev)
+		return;
+
+	/* dependent device detach */
+	tmp = pci_find_upstream_pcie_bridge(pdev);
+	/* Secondary interface's bus number and devfn 0 */
+	if (tmp) {
+		parent = pdev->bus->self;
+		while (parent != tmp) {
+			iommu_detach_dev(iommu, parent->bus->number,
+				parent->devfn);
+			parent = parent->bus->self;
+		}
+		if (tmp->is_pcie) /* this is a PCIE-to-PCI bridge */
+			iommu_detach_dev(iommu,
+				tmp->subordinate->number, 0);
+		else /* this is a legacy PCI bridge */
+			iommu_detach_dev(iommu,
+				tmp->bus->number, tmp->devfn);
+	}
+}
+
 static void vm_domain_remove_one_dev_info(struct dmar_domain *domain,
 					  struct pci_dev *pdev)
 {
@@ -2797,6 +2824,7 @@ static void vm_domain_remove_one_dev_info(struct dmar_domain *domain,
 			spin_unlock_irqrestore(&device_domain_lock, flags);
 
 			iommu_detach_dev(iommu, info->bus, info->devfn);
+			iommu_detach_dependent_devices(iommu, pdev);
 			free_devinfo_mem(info);
 
 			spin_lock_irqsave(&device_domain_lock, flags);
@@ -2846,6 +2874,7 @@ static void vm_domain_remove_all_dev_info(struct dmar_domain *domain)
 
 		iommu = device_to_iommu(info->bus, info->devfn);
 		iommu_detach_dev(iommu, info->bus, info->devfn);
+		iommu_detach_dependent_devices(iommu, info->dev);
 
 		/* clear this iommu in iommu_bmp, update iommu count
 		 * and coherency
-- 
1.6.0.4


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] VT-d: fix PCI device detach from virtual machine
  2009-02-26  9:31 [PATCH] VT-d: fix PCI device detach from virtual machine Han, Weidong
@ 2010-06-14 23:19 ` David Woodhouse
  2010-06-15 14:10   ` Joerg Roedel
  2010-06-17  3:35   ` Weidong Han
  0 siblings, 2 replies; 7+ messages in thread
From: David Woodhouse @ 2010-06-14 23:19 UTC (permalink / raw
  To: Han, Weidong
  Cc: 'Avi Kivity', 'iommu@lists.linux-foundation.org',
	'kvm'

On Thu, 2009-02-26 at 17:31 +0800, Han, Weidong wrote:
> When assign a device behind conventional PCI bridge or PCIe to
> PCI/PCI-x bridge to a domain, it must assign its bridge and may
> also need to assign secondary interface to the same domain. 
> 
> Dependent assignment is already there, but dependent
> deassignment is missed when detach device from virtual machine.
> This results in conventional PCI device assignment failure after
> it has been assigned once. This patch addes dependent
> deassignment, and fixes the issue. 

Um, this code makes my head hurt.

Why are we doing this in the first place? Because the IOMMU works on the
source-id in PCIe transactions, the pci_find_upstream_pcie_bridge()
function effectively tells us which PCI device our own device will be
masquerading as, for the purposes of DMA.

So why do we bother setting up a context in the IOMMU for the device
itself, when no DMA will ever appear to come from this device? And
likewise why do we bother setting up a context for intermediate PCI
bridges?

Why not just jump straight to the 'DMA proxy' device, and use that
_only_?

We'll have to cope with multiple devices behind the same 'proxy', but it
looks like our handling of that is totally screwed already...  what
happens right now if you have two PCI devices behind the same PCIe-PCI
bridge, and try to attach both of them to different domains... or both
to the _same_ domain, in fact, and then detach just one of them. I think
the answer to the latter question is that your newly-added
iommu_detach_dependent_devices() routine will tear down the mapping on
the 'proxy' device and faults will start happening for the device which
is supposed to still be mapped?

Confused... and tempted to rip it all out and start over.

-- 
dwmw2


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] VT-d: fix PCI device detach from virtual machine
  2010-06-14 23:19 ` David Woodhouse
@ 2010-06-15 14:10   ` Joerg Roedel
  2010-06-15 14:52     ` David Woodhouse
  2010-06-17  3:35   ` Weidong Han
  1 sibling, 1 reply; 7+ messages in thread
From: Joerg Roedel @ 2010-06-15 14:10 UTC (permalink / raw
  To: David Woodhouse
  Cc: Han, Weidong, 'iommu@lists.linux-foundation.org',
	'Avi Kivity', 'kvm'

On Mon, Jun 14, 2010 at 07:19:17PM -0400, David Woodhouse wrote:
> Why not just jump straight to the 'DMA proxy' device, and use that
> _only_?

Not sure about Intel chipsets, but on AMD chipset a legacy device can be
seen by the IOMMU with both device-ids, its own and the bridge device.
So the domain needs to be configured for both device-ids in the iommu
driver.

	Joerg



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] VT-d: fix PCI device detach from virtual machine
  2010-06-15 14:10   ` Joerg Roedel
@ 2010-06-15 14:52     ` David Woodhouse
  0 siblings, 0 replies; 7+ messages in thread
From: David Woodhouse @ 2010-06-15 14:52 UTC (permalink / raw
  To: Joerg Roedel
  Cc: Han, Weidong, 'iommu@lists.linux-foundation.org',
	'Avi Kivity', 'kvm'

On Tue, 2010-06-15 at 16:10 +0200, Joerg Roedel wrote:
> On Mon, Jun 14, 2010 at 07:19:17PM -0400, David Woodhouse wrote:
> > Why not just jump straight to the 'DMA proxy' device, and use that
> > _only_?
> 
> Not sure about Intel chipsets, but on AMD chipset a legacy device can be
> seen by the IOMMU with both device-ids, its own and the bridge device.
> So the domain needs to be configured for both device-ids in the iommu
> driver.

'can be seen with both' sounds odd to me... you mean that depending on
the phase of the moon, it may show up with one or the other on the
_same_ hardware? Or is your IOMMU sufficiently different that you
actually mean both ids are present at the same time for mapping
purposes?

Or do you just mean that you can't always tell which it'll be, so if
there's a _possibility_ that an upstream bridge will act as a proxy, you
map both of them so that you don't have to worry about false positives
in your "is there a proxy?" algorithm?

That approach may make a lot of sense -- but still I don't see why we'd
set up the mapping for _every_ PCI bridge from the device up to the
suspected proxy... unless we're _really_ unsure about where the
transactions will appear to come from, perhaps?

Have you encountered these Ricoh multi-function devices and observed
them doing DMA transactions from function zero regardless of which
function is actually responsible for it:
03:00.0 SD Host controller: Ricoh Co Ltd Device e822 (rev 03)
03:00.4 FireWire (IEEE 1394): Ricoh Co Ltd Device e832 (rev 03)

Do we need to share a 'quirks' infrastructure for handling such devices?

-- 
dwmw2


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] VT-d: fix PCI device detach from virtual machine
  2010-06-14 23:19 ` David Woodhouse
  2010-06-15 14:10   ` Joerg Roedel
@ 2010-06-17  3:35   ` Weidong Han
  2010-06-17  8:49     ` David Woodhouse
  1 sibling, 1 reply; 7+ messages in thread
From: Weidong Han @ 2010-06-17  3:35 UTC (permalink / raw
  To: David Woodhouse
  Cc: 'Avi Kivity', 'iommu@lists.linux-foundation.org',
	'kvm', Kay, Allen M

David Woodhouse wrote:
> On Thu, 2009-02-26 at 17:31 +0800, Han, Weidong wrote:
>   
>> When assign a device behind conventional PCI bridge or PCIe to
>> PCI/PCI-x bridge to a domain, it must assign its bridge and may
>> also need to assign secondary interface to the same domain. 
>>
>> Dependent assignment is already there, but dependent
>> deassignment is missed when detach device from virtual machine.
>> This results in conventional PCI device assignment failure after
>> it has been assigned once. This patch addes dependent
>> deassignment, and fixes the issue. 
>>     
>
> Um, this code makes my head hurt.
>
> Why are we doing this in the first place? Because the IOMMU works on the
> source-id in PCIe transactions, the pci_find_upstream_pcie_bridge()
> function effectively tells us which PCI device our own device will be
> masquerading as, for the purposes of DMA.
>
> So why do we bother setting up a context in the IOMMU for the device
> itself, when no DMA will ever appear to come from this device? And
>   
if the device is behind PCI Express-to-PCI/PCI-X bridge, the source-id 
may be the device bdf or the
source-id provided by the bridge. so it needs to map the device itself.
> likewise why do we bother setting up a context for intermediate PCI
> bridges?
>   
I'm not sure if the intermediate PCI bridges are necessary. need to 
check PCI spec.
> Why not just jump straight to the 'DMA proxy' device, and use that
> _only_?
>   
What's the 'DMA proxy' device? is it the upstream pcie-to-pci bridge?
> We'll have to cope with multiple devices behind the same 'proxy', but it
> looks like our handling of that is totally screwed already...  what
> happens right now if you have two PCI devices behind the same PCIe-PCI
> bridge, and try to attach both of them to different domains... or both
> to the _same_ domain, in fact, and then detach just one of them. I think
> the answer to the latter question is that your newly-added
> iommu_detach_dependent_devices() routine will tear down the mapping on
> the 'proxy' device and faults will start happening for the device which
> is supposed to still be mapped?
>   
all the device behind a pcie-to-pci bridge must be co-assigned to a 
single domain. So it also require users to detach them together.

Regards,
Weidong
> Confused... and tempted to rip it all out and start over.
>
>   


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] VT-d: fix PCI device detach from virtual machine
  2010-06-17  3:35   ` Weidong Han
@ 2010-06-17  8:49     ` David Woodhouse
  2010-06-17  9:15       ` Weidong Han
  0 siblings, 1 reply; 7+ messages in thread
From: David Woodhouse @ 2010-06-17  8:49 UTC (permalink / raw
  To: Weidong Han
  Cc: 'Avi Kivity', 'iommu@lists.linux-foundation.org',
	'kvm', Kay, Allen M

On Thu, 2010-06-17 at 11:35 +0800, Weidong Han wrote:
> David Woodhouse wrote:
> > So why do we bother setting up a context in the IOMMU for the device
> > itself, when no DMA will ever appear to come from this device? And
> >   
> if the device is behind PCI Express-to-PCI/PCI-X bridge, the source-id 
> may be the device bdf or the source-id provided by the bridge. so it
> needs to map the device itself.

Ah, that makes some sense, and matches what Jörg said about the AMD
IOMMU spec. Thanks.

> > likewise why do we bother setting up a context for intermediate PCI
> > bridges?
> >   
> I'm not sure if the intermediate PCI bridges are necessary. need to 
> check PCI spec.

FWIW, the AMD IOMMU doesn't do this; it only sets up the mapping for the
original device and for its 'proxy'.

> > Why not just jump straight to the 'DMA proxy' device, and use that
> > _only_?
> >   
> What's the 'DMA proxy' device? is it the upstream pcie-to-pci bridge?

Yes. Or, in the case of a certain buggy Ricoh multi-function device, it
is function zero -- all other functions do their DMA as if it came from
function zero.

This is why I'm looking at whether we need the whole tree-walking thing,
or whether we can just have a single 'proxy' device (or 'alias' as it's
called in the AMD code).

> all the device behind a pcie-to-pci bridge must be co-assigned to a 
> single domain. So it also require users to detach them together.

Do we even _have_ an API for KVM to assign multiple devices at the same
time? Or an API for KVM to _determine_ which devices are behind the same
'proxy'?

We could even have a new device hotplugged after the assignment has been
done.

-- 
David Woodhouse                            Open Source Technology Centre
David.Woodhouse@intel.com                              Intel Corporation


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] VT-d: fix PCI device detach from virtual machine
  2010-06-17  8:49     ` David Woodhouse
@ 2010-06-17  9:15       ` Weidong Han
  0 siblings, 0 replies; 7+ messages in thread
From: Weidong Han @ 2010-06-17  9:15 UTC (permalink / raw
  To: David Woodhouse
  Cc: 'Avi Kivity', 'iommu@lists.linux-foundation.org',
	'kvm', Kay, Allen M

David Woodhouse wrote:
> On Thu, 2010-06-17 at 11:35 +0800, Weidong Han wrote:
>   
>> David Woodhouse wrote:
>>     
>>> So why do we bother setting up a context in the IOMMU for the device
>>> itself, when no DMA will ever appear to come from this device? And
>>>   
>>>       
>> if the device is behind PCI Express-to-PCI/PCI-X bridge, the source-id 
>> may be the device bdf or the source-id provided by the bridge. so it
>> needs to map the device itself.
>>     
>
> Ah, that makes some sense, and matches what Jörg said about the AMD
> IOMMU spec. Thanks.
>
>   
>>> likewise why do we bother setting up a context for intermediate PCI
>>> bridges?
>>>   
>>>       
>> I'm not sure if the intermediate PCI bridges are necessary. need to 
>> check PCI spec.
>>     
>
> FWIW, the AMD IOMMU doesn't do this; it only sets up the mapping for the
> original device and for its 'proxy'.
>
>   
>>> Why not just jump straight to the 'DMA proxy' device, and use that
>>> _only_?
>>>   
>>>       
>> What's the 'DMA proxy' device? is it the upstream pcie-to-pci bridge?
>>     
>
> Yes. Or, in the case of a certain buggy Ricoh multi-function device, it
> is function zero -- all other functions do their DMA as if it came from
> function zero.
>
> This is why I'm looking at whether we need the whole tree-walking thing,
> or whether we can just have a single 'proxy' device (or 'alias' as it's
> called in the AMD code).
>
>   
>> all the device behind a pcie-to-pci bridge must be co-assigned to a 
>> single domain. So it also require users to detach them together.
>>     
>
> Do we even _have_ an API for KVM to assign multiple devices at the same
> time? Or an API for KVM to _determine_ which devices are behind the same
> 'proxy'?
>   
No. I think it's better to add some checks in management tool (e.g. 
libvirt) to determine if the devices can be assigned or not before 
device assignment, such as all devices behind same 'proxy' are 
co-assigned (checking command line), and devices can be reset (FLR, 
secondary bus reset, D0->D3 etc.) I think libvirt already has these 
checks for device assignment.


> We could even have a new device hotplugged after the assignment has been
> done.
>   

yes, but for these legacy devices, they only can be assigned and 
de-assigned at the same time.

Regards,
Weidong


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2010-06-17  9:16 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-02-26  9:31 [PATCH] VT-d: fix PCI device detach from virtual machine Han, Weidong
2010-06-14 23:19 ` David Woodhouse
2010-06-15 14:10   ` Joerg Roedel
2010-06-15 14:52     ` David Woodhouse
2010-06-17  3:35   ` Weidong Han
2010-06-17  8:49     ` David Woodhouse
2010-06-17  9:15       ` Weidong Han

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.