Xen-Devel Archive mirror
 help / color / mirror / Atom feed
* [RFC KERNEL PATCH v7 0/2] Support device passthrough when dom0 is PVH on Xen
@ 2024-05-15  6:50 Jiqian Chen
  2024-05-15  6:50 ` [RFC KERNEL PATCH v7 1/2] xen/pvh: Setup gsi for passthrough device Jiqian Chen
  2024-05-15  6:50 ` [RFC KERNEL PATCH v7 2/2] xen/privcmd: Add new syscall to get gsi from dev Jiqian Chen
  0 siblings, 2 replies; 6+ messages in thread
From: Jiqian Chen @ 2024-05-15  6:50 UTC (permalink / raw
  To: Juergen Gross, Stefano Stabellini, Bjorn Helgaas,
	Rafael J . Wysocki, Roger Pau Monné
  Cc: xen-devel, linux-pci, linux-kernel, linux-acpi, Huang Rui,
	Jiqian Chen

Hi All,
This is v7 series to support passthrough on Xen when dom0 is PVH.
v6->v7 change:
* the first patch of v6 was already merged into branch linux_next.
* patch#1: is the patch#2 of v6. move the implementation of function xen_acpi_get_gsi_info to
           file drivers/xen/acpi.c, that modification is more convenient for the subsequent
           patch to obtain gsi.
* patch#2: is the patch#3 of v6. add a new parameter "gsi" to struct pcistub_device and set
           gsi when pcistub initialize device. Then when userspace wants to get gsi by passing
           sbdf, we can return that gsi.


Best regards,
Jiqian Chen




v5->v6 change:
* patch#3: change to add a new syscall to translate irq to gsi, instead adding a new gsi sysfs.


v4->v5 changes:
* patch#1: Add Reviewed-by Stefano
* patch#2: Add Reviewed-by Stefano
* patch#3: No changes


v3->v4 changes:
* patch#1: change the comment of PHYSDEVOP_pci_device_state_reset; use a new function
           pcistub_reset_device_state to wrap __pci_reset_function_locked and xen_reset_device_state,
           and call pcistub_reset_device_state in pci_stub.c
* patch#2: remove map_pirq from xen_pvh_passthrough_gsi


v2->v3 changes:
* patch#1: add condition to limit do xen_reset_device_state for no-pv domain in pcistub_init_device.
* patch#2: Abandoning previous implementations that call unmask_irq. To setup gsi and map pirq for
           passthrough device in pcistub_init_device.
* patch#3: Abandoning previous implementations that adds new syscall to get gsi from irq. To add a new
           sysfs for gsi, then userspace can get gsi number from sysfs.


Below is the description of v2 cover letter:
This series of patches are the v2 of the implementation of passthrough when dom0 is PVH on Xen.
We sent the v1 to upstream before, but the v1 had so many problems and we got lots of suggestions.
I will introduce all issues that these patches try to fix and the differences between v1 and v2.

Issues we encountered:
1. pci_stub failed to write bar for a passthrough device.
Problem: when we run \u201csudo xl pci-assignable-add <sbdf>\u201d to assign a device, pci_stub will
call \u201cpcistub_init_device() -> pci_restore_state() -> pci_restore_config_space() ->
pci_restore_config_space_range() -> pci_restore_config_dword() -> pci_write_config_dword(), the pci
config write will trigger an io interrupt to bar_write() in the xen, but the bar->enabled was set before,
the write is not allowed now, and then when bar->Qemu config the passthrough device in xen_pt_realize(),
it gets invalid bar values.

Reason: the reason is that we don't tell vPCI that the device has been reset, so the current cached state
in pdev->vpci is all out of date and is different from the real device state.

Solution: to solve this problem, the first patch of kernel(xen/pci: Add xen_reset_device_state
function) and the fist patch of xen(xen/vpci: Clear all vpci status of device) add a new hypercall to
reset the state stored in vPCI when the state of real device has changed.
Thank Roger for the suggestion of this v2, and it is different from v1
(https://lore.kernel.org/xen-devel/20230312075455.450187-3-ray.huang@amd.com/), v1 simply allow domU to
write pci bar, it does not comply with the design principles of vPCI.

2. failed to do PHYSDEVOP_map_pirq when dom0 is PVH
Problem: HVM domU will do PHYSDEVOP_map_pirq for a passthrough device by using gsi. See
xen_pt_realize->xc_physdev_map_pirq and pci_add_dm_done->xc_physdev_map_pirq. Then xc_physdev_map_pirq
will call into Xen, but in hvm_physdev_op(), PHYSDEVOP_map_pirq is not allowed.

Reason: In hvm_physdev_op(), the variable "currd" is PVH dom0 and PVH has no X86_EMU_USE_PIRQ flag, it
will fail at has_pirq check.

Solution: I think we may need to allow PHYSDEVOP_map_pirq when "currd" is dom0 (at present dom0 is PVH).
The second patch of xen(x86/pvh: Open PHYSDEVOP_map_pirq for PVH dom0) allow PVH dom0 do
PHYSDEVOP_map_pirq. This v2 patch is better than v1, v1 simply remove the has_pirq check
(xen https://lore.kernel.org/xen-devel/20230312075455.450187-4-ray.huang@amd.com/).

3. the gsi of a passthrough device doesn't be unmasked
 3.1 failed to check the permission of pirq
 3.2 the gsi of passthrough device was not registered in PVH dom0

Problem:
3.1 callback function pci_add_dm_done() will be called when qemu config a passthrough device for domU.
This function will call xc_domain_irq_permission()-> pirq_access_permitted() to check if the gsi has
corresponding mappings in dom0. But it didn\u2019t, so failed. See
XEN_DOMCTL_irq_permission->pirq_access_permitted, "current" is PVH dom0 and it return irq is 0.
3.2 it's possible for a gsi (iow: vIO-APIC pin) to never get registered on PVH dom0, because the
devices of PVH are using MSI(-X) interrupts. However, the IO-APIC pin must be configured for it to be
able to be mapped into a domU.

Reason: After searching codes, I find "map_pirq" and "register_gsi" will be done in function
vioapic_write_redirent->vioapic_hwdom_map_gsi when the gsi(aka ioapic's pin) is unmasked in PVH dom0.
So the two problems can be concluded to that the gsi of a passthrough device doesn't be unmasked.

Solution: to solve these problems, the second patch of kernel(xen/pvh: Unmask irq for passthrough device
in PVH dom0) call the unmask_irq() when we assign a device to be passthrough. So that passthrough devices
can have the mapping of gsi on PVH dom0 and gsi can be registered. This v2 patch is different from the
v1( kernel https://lore.kernel.org/xen-devel/20230312120157.452859-5-ray.huang@amd.com/,
kernel https://lore.kernel.org/xen-devel/20230312120157.452859-5-ray.huang@amd.com/ and
xen https://lore.kernel.org/xen-devel/20230312075455.450187-5-ray.huang@amd.com/),
v1 performed "map_pirq" and "register_gsi" on all pci devices on PVH dom0, which is unnecessary and may
cause multiple registration.

4. failed to map pirq for gsi
Problem: qemu will call xc_physdev_map_pirq() to map a passthrough device\u2019s gsi to pirq in function
xen_pt_realize(). But failed.

Reason: According to the implement of xc_physdev_map_pirq(), it needs gsi instead of irq, but qemu pass
irq to it and treat irq as gsi, it is got from file /sys/bus/pci/devices/xxxx:xx:xx.x/irq in function
xen_host_pci_device_get(). But actually the gsi number is not equal with irq. On PVH dom0, when it
allocates irq for a gsi in function acpi_register_gsi_ioapic(), allocation is dynamic, and follow the
principle of applying first, distributing first. And if you debug the kernel codes(see
function __irq_alloc_descs), you will find the irq number is allocated from small to large by order, but
the applying gsi number is not, gsi 38 may come before gsi 28, that causes gsi 38 get a smaller irq number
than gsi 28, and then gsi != irq.

Solution: we can record the relation between gsi and irq, then when userspace(qemu) want to use gsi, we
can do a translation. The third patch of kernel(xen/privcmd: Add new syscall to get gsi from irq) records
all the relations in acpi_register_gsi_xen_pvh() when dom0 initialize pci devices, and provide a syscall
for userspace to get the gsi from irq. The third patch of xen(tools: Add new function to get gsi from irq)
add a new function xc_physdev_gsi_from_irq() to call the new syscall added on kernel side.
And then userspace can use that function to get gsi. Then xc_physdev_map_pirq() will success. This v2 patch
is the same as v1( kernel https://lore.kernel.org/xen-devel/20230312120157.452859-6-ray.huang@amd.com/ and
xen https://lore.kernel.org/xen-devel/20230312075455.450187-6-ray.huang@amd.com/)

About the v2 patch of qemu, just change an included head file, other are similar to the v1 (
qemu https://lore.kernel.org/xen-devel/20230312092244.451465-19-ray.huang@amd.com/), just call
xc_physdev_gsi_from_irq() to get gsi from irq.

Jiqian Chen (2):
  xen/pvh: Setup gsi for passthrough device
  xen/privcmd: Add new syscall to get gsi from dev

 arch/x86/xen/enlighten_pvh.c       | 21 +++++++++++
 drivers/acpi/pci_irq.c             |  2 +-
 drivers/xen/acpi.c                 | 50 +++++++++++++++++++++++++
 drivers/xen/privcmd.c              | 28 ++++++++++++++
 drivers/xen/xen-pciback/pci_stub.c | 59 ++++++++++++++++++++++++++++--
 include/linux/acpi.h               |  1 +
 include/uapi/xen/privcmd.h         |  7 ++++
 include/xen/acpi.h                 | 12 ++++++
 8 files changed, 176 insertions(+), 4 deletions(-)

-- 
2.34.1



^ permalink raw reply	[flat|nested] 6+ messages in thread

* [RFC KERNEL PATCH v7 1/2] xen/pvh: Setup gsi for passthrough device
  2024-05-15  6:50 [RFC KERNEL PATCH v7 0/2] Support device passthrough when dom0 is PVH on Xen Jiqian Chen
@ 2024-05-15  6:50 ` Jiqian Chen
  2024-05-15  6:50 ` [RFC KERNEL PATCH v7 2/2] xen/privcmd: Add new syscall to get gsi from dev Jiqian Chen
  1 sibling, 0 replies; 6+ messages in thread
From: Jiqian Chen @ 2024-05-15  6:50 UTC (permalink / raw
  To: Juergen Gross, Stefano Stabellini, Bjorn Helgaas,
	Rafael J . Wysocki, Roger Pau Monné
  Cc: xen-devel, linux-pci, linux-kernel, linux-acpi, Huang Rui,
	Jiqian Chen, Huang Rui

In PVH dom0, the gsis don't get registered, but the gsi of
a passthrough device must be configured for it to be able to be
mapped into a domU.

When assign a device to passthrough, proactively setup the gsi
of the device during that process.

Co-developed-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
---
 arch/x86/xen/enlighten_pvh.c       | 21 +++++++++++++
 drivers/acpi/pci_irq.c             |  2 +-
 drivers/xen/acpi.c                 | 50 ++++++++++++++++++++++++++++++
 drivers/xen/xen-pciback/pci_stub.c | 21 +++++++++++++
 include/linux/acpi.h               |  1 +
 include/xen/acpi.h                 | 10 ++++++
 6 files changed, 104 insertions(+), 1 deletion(-)

diff --git a/arch/x86/xen/enlighten_pvh.c b/arch/x86/xen/enlighten_pvh.c
index 27a2a02ef8fb..711cdcbc6916 100644
--- a/arch/x86/xen/enlighten_pvh.c
+++ b/arch/x86/xen/enlighten_pvh.c
@@ -4,6 +4,7 @@
 #include <linux/mm.h>
 
 #include <xen/hvc-console.h>
+#include <xen/acpi.h>
 
 #include <asm/bootparam.h>
 #include <asm/io_apic.h>
@@ -27,6 +28,26 @@
 bool __ro_after_init xen_pvh;
 EXPORT_SYMBOL_GPL(xen_pvh);
 
+int xen_pvh_setup_gsi(int gsi, int trigger, int polarity)
+{
+	int ret;
+	struct physdev_setup_gsi setup_gsi;
+
+	setup_gsi.gsi = gsi;
+	setup_gsi.triggering = (trigger == ACPI_EDGE_SENSITIVE ? 0 : 1);
+	setup_gsi.polarity = (polarity == ACPI_ACTIVE_HIGH ? 0 : 1);
+
+	ret = HYPERVISOR_physdev_op(PHYSDEVOP_setup_gsi, &setup_gsi);
+	if (ret == -EEXIST) {
+		xen_raw_printk("Already setup the GSI :%d\n", gsi);
+		ret = 0;
+	} else if (ret)
+		xen_raw_printk("Fail to setup GSI (%d)!\n", gsi);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(xen_pvh_setup_gsi);
+
 void __init xen_pvh_init(struct boot_params *boot_params)
 {
 	u32 msr;
diff --git a/drivers/acpi/pci_irq.c b/drivers/acpi/pci_irq.c
index ff30ceca2203..630fe0a34bc6 100644
--- a/drivers/acpi/pci_irq.c
+++ b/drivers/acpi/pci_irq.c
@@ -288,7 +288,7 @@ static int acpi_reroute_boot_interrupt(struct pci_dev *dev,
 }
 #endif /* CONFIG_X86_IO_APIC */
 
-static struct acpi_prt_entry *acpi_pci_irq_lookup(struct pci_dev *dev, int pin)
+struct acpi_prt_entry *acpi_pci_irq_lookup(struct pci_dev *dev, int pin)
 {
 	struct acpi_prt_entry *entry = NULL;
 	struct pci_dev *bridge;
diff --git a/drivers/xen/acpi.c b/drivers/xen/acpi.c
index 6893c79fd2a1..9e2096524fbc 100644
--- a/drivers/xen/acpi.c
+++ b/drivers/xen/acpi.c
@@ -30,6 +30,7 @@
  * IN THE SOFTWARE.
  */
 
+#include <linux/pci.h>
 #include <xen/acpi.h>
 #include <xen/interface/platform.h>
 #include <asm/xen/hypercall.h>
@@ -75,3 +76,52 @@ int xen_acpi_notify_hypervisor_extended_sleep(u8 sleep_state,
 	return xen_acpi_notify_hypervisor_state(sleep_state, val_a,
 						val_b, true);
 }
+
+struct acpi_prt_entry {
+	struct acpi_pci_id      id;
+	u8                      pin;
+	acpi_handle             link;
+	u32                     index;
+};
+
+int xen_acpi_get_gsi_info(struct pci_dev *dev,
+						  int *gsi_out,
+						  int *trigger_out,
+						  int *polarity_out)
+{
+	int gsi;
+	u8 pin;
+	struct acpi_prt_entry *entry;
+	int trigger = ACPI_LEVEL_SENSITIVE;
+	int polarity = acpi_irq_model == ACPI_IRQ_MODEL_GIC ?
+				      ACPI_ACTIVE_HIGH : ACPI_ACTIVE_LOW;
+
+	if (!dev || !gsi_out || !trigger_out || !polarity_out)
+		return -EINVAL;
+
+	pin = dev->pin;
+	if (!pin)
+		return -EINVAL;
+
+	entry = acpi_pci_irq_lookup(dev, pin);
+	if (entry) {
+		if (entry->link)
+			gsi = acpi_pci_link_allocate_irq(entry->link,
+							 entry->index,
+							 &trigger, &polarity,
+							 NULL);
+		else
+			gsi = entry->index;
+	} else
+		gsi = -1;
+
+	if (gsi < 0)
+		return -EINVAL;
+
+	*gsi_out = gsi;
+	*trigger_out = trigger;
+	*polarity_out = polarity;
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(xen_acpi_get_gsi_info);
diff --git a/drivers/xen/xen-pciback/pci_stub.c b/drivers/xen/xen-pciback/pci_stub.c
index 46c40ec8a18e..2b90d832d0a7 100644
--- a/drivers/xen/xen-pciback/pci_stub.c
+++ b/drivers/xen/xen-pciback/pci_stub.c
@@ -21,6 +21,9 @@
 #include <xen/events.h>
 #include <xen/pci.h>
 #include <xen/xen.h>
+#ifdef CONFIG_ACPI
+#include <xen/acpi.h>
+#endif
 #include <asm/xen/hypervisor.h>
 #include <xen/interface/physdev.h>
 #include "pciback.h"
@@ -367,6 +370,9 @@ static int pcistub_match(struct pci_dev *dev)
 static int pcistub_init_device(struct pci_dev *dev)
 {
 	struct xen_pcibk_dev_data *dev_data;
+#ifdef CONFIG_ACPI
+	int gsi, trigger, polarity;
+#endif
 	int err = 0;
 
 	dev_dbg(&dev->dev, "initializing...\n");
@@ -435,6 +441,21 @@ static int pcistub_init_device(struct pci_dev *dev)
 			goto config_release;
 		pci_restore_state(dev);
 	}
+
+#ifdef CONFIG_ACPI
+	err = xen_acpi_get_gsi_info(dev, &gsi, &trigger, &polarity);
+	if (err) {
+		dev_err(&dev->dev, "Fail to get gsi info!\n");
+		goto config_release;
+	}
+
+	if (xen_initial_domain() && xen_pvh_domain()) {
+		err = xen_pvh_setup_gsi(gsi, trigger, polarity);
+		if (err)
+			goto config_release;
+	}
+#endif
+
 	/* Now disable the device (this also ensures some private device
 	 * data is setup before we export)
 	 */
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index 34829f2c517a..f8690b02bba4 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -361,6 +361,7 @@ void acpi_unregister_gsi (u32 gsi);
 
 struct pci_dev;
 
+struct acpi_prt_entry *acpi_pci_irq_lookup(struct pci_dev *dev, int pin);
 int acpi_pci_irq_enable (struct pci_dev *dev);
 void acpi_penalize_isa_irq(int irq, int active);
 bool acpi_isa_irq_available(int irq);
diff --git a/include/xen/acpi.h b/include/xen/acpi.h
index b1e11863144d..9b50027113f3 100644
--- a/include/xen/acpi.h
+++ b/include/xen/acpi.h
@@ -67,10 +67,20 @@ static inline void xen_acpi_sleep_register(void)
 		acpi_suspend_lowlevel = xen_acpi_suspend_lowlevel;
 	}
 }
+int xen_pvh_setup_gsi(int gsi, int trigger, int polarity);
 #else
 static inline void xen_acpi_sleep_register(void)
 {
 }
+
+static inline int xen_pvh_setup_gsi(int gsi, int trigger, int polarity)
+{
+	return -1;
+}
 #endif
 
+int xen_acpi_get_gsi_info(struct pci_dev *dev,
+						  int *gsi_out,
+						  int *trigger_out,
+						  int *polarity_out);
 #endif	/* _XEN_ACPI_H */
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [RFC KERNEL PATCH v7 2/2] xen/privcmd: Add new syscall to get gsi from dev
  2024-05-15  6:50 [RFC KERNEL PATCH v7 0/2] Support device passthrough when dom0 is PVH on Xen Jiqian Chen
  2024-05-15  6:50 ` [RFC KERNEL PATCH v7 1/2] xen/pvh: Setup gsi for passthrough device Jiqian Chen
@ 2024-05-15  6:50 ` Jiqian Chen
  2024-05-15 22:42   ` Stefano Stabellini
  1 sibling, 1 reply; 6+ messages in thread
From: Jiqian Chen @ 2024-05-15  6:50 UTC (permalink / raw
  To: Juergen Gross, Stefano Stabellini, Bjorn Helgaas,
	Rafael J . Wysocki, Roger Pau Monné
  Cc: xen-devel, linux-pci, linux-kernel, linux-acpi, Huang Rui,
	Jiqian Chen, Huang Rui

In PVH dom0, it uses the linux local interrupt mechanism,
when it allocs irq for a gsi, it is dynamic, and follow
the principle of applying first, distributing first. And
the irq number is alloced from small to large, but the
applying gsi number is not, may gsi 38 comes before gsi 28,
it causes the irq number is not equal with the gsi number.
And when passthrough a device, QEMU will use device's gsi
number to do pirq mapping, but the gsi number is got from
file /sys/bus/pci/devices/<sbdf>/irq, irq!= gsi, so it will
fail when mapping.
And in current linux codes, there is no method to get gsi
for userspace.

For above purpose, record gsi of pcistub devices when init
pcistub and add a new syscall into privcmd to let userspace
can get gsi when they have a need.

Co-developed-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
---
 drivers/xen/privcmd.c              | 28 ++++++++++++++++++++++
 drivers/xen/xen-pciback/pci_stub.c | 38 +++++++++++++++++++++++++++---
 include/uapi/xen/privcmd.h         |  7 ++++++
 include/xen/acpi.h                 |  2 ++
 4 files changed, 72 insertions(+), 3 deletions(-)

diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c
index 67dfa4778864..5953a03b5cb0 100644
--- a/drivers/xen/privcmd.c
+++ b/drivers/xen/privcmd.c
@@ -45,6 +45,9 @@
 #include <xen/page.h>
 #include <xen/xen-ops.h>
 #include <xen/balloon.h>
+#ifdef CONFIG_ACPI
+#include <xen/acpi.h>
+#endif
 
 #include "privcmd.h"
 
@@ -842,6 +845,27 @@ static long privcmd_ioctl_mmap_resource(struct file *file,
 	return rc;
 }
 
+static long privcmd_ioctl_gsi_from_dev(struct file *file, void __user *udata)
+{
+	struct privcmd_gsi_from_dev kdata;
+
+	if (copy_from_user(&kdata, udata, sizeof(kdata)))
+		return -EFAULT;
+
+#ifdef CONFIG_ACPI
+	kdata.gsi = pcistub_get_gsi_from_sbdf(kdata.sbdf);
+	if (kdata.gsi == -1)
+		return -EINVAL;
+#else
+	kdata.gsi = -1;
+#endif
+
+	if (copy_to_user(udata, &kdata, sizeof(kdata)))
+		return -EFAULT;
+
+	return 0;
+}
+
 #ifdef CONFIG_XEN_PRIVCMD_EVENTFD
 /* Irqfd support */
 static struct workqueue_struct *irqfd_cleanup_wq;
@@ -1529,6 +1553,10 @@ static long privcmd_ioctl(struct file *file,
 		ret = privcmd_ioctl_ioeventfd(file, udata);
 		break;
 
+	case IOCTL_PRIVCMD_GSI_FROM_DEV:
+		ret = privcmd_ioctl_gsi_from_dev(file, udata);
+		break;
+
 	default:
 		break;
 	}
diff --git a/drivers/xen/xen-pciback/pci_stub.c b/drivers/xen/xen-pciback/pci_stub.c
index 2b90d832d0a7..4b62b4d377a9 100644
--- a/drivers/xen/xen-pciback/pci_stub.c
+++ b/drivers/xen/xen-pciback/pci_stub.c
@@ -56,6 +56,9 @@ struct pcistub_device {
 
 	struct pci_dev *dev;
 	struct xen_pcibk_device *pdev;/* non-NULL if struct pci_dev is in use */
+#ifdef CONFIG_ACPI
+	int gsi;
+#endif
 };
 
 /* Access to pcistub_devices & seized_devices lists and the initialize_devices
@@ -88,6 +91,9 @@ static struct pcistub_device *pcistub_device_alloc(struct pci_dev *dev)
 
 	kref_init(&psdev->kref);
 	spin_lock_init(&psdev->lock);
+#ifdef CONFIG_ACPI
+	psdev->gsi = -1;
+#endif
 
 	return psdev;
 }
@@ -220,6 +226,25 @@ static struct pci_dev *pcistub_device_get_pci_dev(struct xen_pcibk_device *pdev,
 	return pci_dev;
 }
 
+#ifdef CONFIG_ACPI
+int pcistub_get_gsi_from_sbdf(unsigned int sbdf)
+{
+	struct pcistub_device *psdev;
+	int domain = sbdf >> 16;
+	int bus = (sbdf >> 8) & 0xff;
+	int slot = (sbdf >> 3) & 0x1f;
+	int func = sbdf & 0x7;
+
+	psdev = pcistub_device_find(domain, bus, slot, func);
+
+	if (!psdev)
+		return -1;
+
+	return psdev->gsi;
+}
+EXPORT_SYMBOL_GPL(pcistub_get_gsi_from_sbdf);
+#endif
+
 struct pci_dev *pcistub_get_pci_dev_by_slot(struct xen_pcibk_device *pdev,
 					    int domain, int bus,
 					    int slot, int func)
@@ -367,14 +392,20 @@ static int pcistub_match(struct pci_dev *dev)
 	return found;
 }
 
-static int pcistub_init_device(struct pci_dev *dev)
+static int pcistub_init_device(struct pcistub_device *psdev)
 {
 	struct xen_pcibk_dev_data *dev_data;
+	struct pci_dev *dev;
 #ifdef CONFIG_ACPI
 	int gsi, trigger, polarity;
 #endif
 	int err = 0;
 
+	if (!psdev)
+		return -EINVAL;
+
+	dev = psdev->dev;
+
 	dev_dbg(&dev->dev, "initializing...\n");
 
 	/* The PCI backend is not intended to be a module (or to work with
@@ -448,6 +479,7 @@ static int pcistub_init_device(struct pci_dev *dev)
 		dev_err(&dev->dev, "Fail to get gsi info!\n");
 		goto config_release;
 	}
+	psdev->gsi = gsi;
 
 	if (xen_initial_domain() && xen_pvh_domain()) {
 		err = xen_pvh_setup_gsi(gsi, trigger, polarity);
@@ -495,7 +527,7 @@ static int __init pcistub_init_devices_late(void)
 
 		spin_unlock_irqrestore(&pcistub_devices_lock, flags);
 
-		err = pcistub_init_device(psdev->dev);
+		err = pcistub_init_device(psdev);
 		if (err) {
 			dev_err(&psdev->dev->dev,
 				"error %d initializing device\n", err);
@@ -565,7 +597,7 @@ static int pcistub_seize(struct pci_dev *dev,
 		spin_unlock_irqrestore(&pcistub_devices_lock, flags);
 
 		/* don't want irqs disabled when calling pcistub_init_device */
-		err = pcistub_init_device(psdev->dev);
+		err = pcistub_init_device(psdev);
 
 		spin_lock_irqsave(&pcistub_devices_lock, flags);
 
diff --git a/include/uapi/xen/privcmd.h b/include/uapi/xen/privcmd.h
index 8b8c5d1420fe..220e7670a113 100644
--- a/include/uapi/xen/privcmd.h
+++ b/include/uapi/xen/privcmd.h
@@ -126,6 +126,11 @@ struct privcmd_ioeventfd {
 	__u8 pad[2];
 };
 
+struct privcmd_gsi_from_dev {
+	__u32 sbdf;
+	int gsi;
+};
+
 /*
  * @cmd: IOCTL_PRIVCMD_HYPERCALL
  * @arg: &privcmd_hypercall_t
@@ -157,5 +162,7 @@ struct privcmd_ioeventfd {
 	_IOW('P', 8, struct privcmd_irqfd)
 #define IOCTL_PRIVCMD_IOEVENTFD					\
 	_IOW('P', 9, struct privcmd_ioeventfd)
+#define IOCTL_PRIVCMD_GSI_FROM_DEV				\
+	_IOC(_IOC_NONE, 'P', 10, sizeof(struct privcmd_gsi_from_dev))
 
 #endif /* __LINUX_PUBLIC_PRIVCMD_H__ */
diff --git a/include/xen/acpi.h b/include/xen/acpi.h
index 9b50027113f3..0bf5f4884456 100644
--- a/include/xen/acpi.h
+++ b/include/xen/acpi.h
@@ -83,4 +83,6 @@ int xen_acpi_get_gsi_info(struct pci_dev *dev,
 						  int *gsi_out,
 						  int *trigger_out,
 						  int *polarity_out);
+
+int pcistub_get_gsi_from_sbdf(unsigned int sbdf);
 #endif	/* _XEN_ACPI_H */
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [RFC KERNEL PATCH v7 2/2] xen/privcmd: Add new syscall to get gsi from dev
  2024-05-15  6:50 ` [RFC KERNEL PATCH v7 2/2] xen/privcmd: Add new syscall to get gsi from dev Jiqian Chen
@ 2024-05-15 22:42   ` Stefano Stabellini
  2024-05-16  6:54     ` Chen, Jiqian
  0 siblings, 1 reply; 6+ messages in thread
From: Stefano Stabellini @ 2024-05-15 22:42 UTC (permalink / raw
  To: Jiqian Chen
  Cc: Juergen Gross, Stefano Stabellini, Bjorn Helgaas,
	Rafael J . Wysocki, Roger Pau Monné, xen-devel, linux-pci,
	linux-kernel, linux-acpi, Huang Rui, Huang Rui

On Wed, 15 May 2024, Jiqian Chen wrote:
> In PVH dom0, it uses the linux local interrupt mechanism,
> when it allocs irq for a gsi, it is dynamic, and follow
> the principle of applying first, distributing first. And
> the irq number is alloced from small to large, but the
> applying gsi number is not, may gsi 38 comes before gsi 28,
> it causes the irq number is not equal with the gsi number.
> And when passthrough a device, QEMU will use device's gsi
> number to do pirq mapping, but the gsi number is got from
> file /sys/bus/pci/devices/<sbdf>/irq, irq!= gsi, so it will
> fail when mapping.
> And in current linux codes, there is no method to get gsi
> for userspace.
> 
> For above purpose, record gsi of pcistub devices when init
> pcistub and add a new syscall into privcmd to let userspace
> can get gsi when they have a need.
> 
> Co-developed-by: Huang Rui <ray.huang@amd.com>
> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
> ---
>  drivers/xen/privcmd.c              | 28 ++++++++++++++++++++++
>  drivers/xen/xen-pciback/pci_stub.c | 38 +++++++++++++++++++++++++++---
>  include/uapi/xen/privcmd.h         |  7 ++++++
>  include/xen/acpi.h                 |  2 ++
>  4 files changed, 72 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c
> index 67dfa4778864..5953a03b5cb0 100644
> --- a/drivers/xen/privcmd.c
> +++ b/drivers/xen/privcmd.c
> @@ -45,6 +45,9 @@
>  #include <xen/page.h>
>  #include <xen/xen-ops.h>
>  #include <xen/balloon.h>
> +#ifdef CONFIG_ACPI
> +#include <xen/acpi.h>
> +#endif
>  
>  #include "privcmd.h"
>  
> @@ -842,6 +845,27 @@ static long privcmd_ioctl_mmap_resource(struct file *file,
>  	return rc;
>  }
>  
> +static long privcmd_ioctl_gsi_from_dev(struct file *file, void __user *udata)
> +{
> +	struct privcmd_gsi_from_dev kdata;
> +
> +	if (copy_from_user(&kdata, udata, sizeof(kdata)))
> +		return -EFAULT;
> +
> +#ifdef CONFIG_ACPI
> +	kdata.gsi = pcistub_get_gsi_from_sbdf(kdata.sbdf);
> +	if (kdata.gsi == -1)
> +		return -EINVAL;
> +#else
> +	kdata.gsi = -1;

Should we return an error instead, like -EINVAL, to make the behavior
more similar to the CONFIG_ACPI case?


> +#endif
> +
> +	if (copy_to_user(udata, &kdata, sizeof(kdata)))
> +		return -EFAULT;
> +
> +	return 0;
> +}
> +
>  #ifdef CONFIG_XEN_PRIVCMD_EVENTFD
>  /* Irqfd support */
>  static struct workqueue_struct *irqfd_cleanup_wq;
> @@ -1529,6 +1553,10 @@ static long privcmd_ioctl(struct file *file,
>  		ret = privcmd_ioctl_ioeventfd(file, udata);
>  		break;
>  
> +	case IOCTL_PRIVCMD_GSI_FROM_DEV:
> +		ret = privcmd_ioctl_gsi_from_dev(file, udata);
> +		break;
> +
>  	default:
>  		break;
>  	}
> diff --git a/drivers/xen/xen-pciback/pci_stub.c b/drivers/xen/xen-pciback/pci_stub.c
> index 2b90d832d0a7..4b62b4d377a9 100644
> --- a/drivers/xen/xen-pciback/pci_stub.c
> +++ b/drivers/xen/xen-pciback/pci_stub.c
> @@ -56,6 +56,9 @@ struct pcistub_device {
>  
>  	struct pci_dev *dev;
>  	struct xen_pcibk_device *pdev;/* non-NULL if struct pci_dev is in use */
> +#ifdef CONFIG_ACPI
> +	int gsi;
> +#endif
>  };
>  
>  /* Access to pcistub_devices & seized_devices lists and the initialize_devices
> @@ -88,6 +91,9 @@ static struct pcistub_device *pcistub_device_alloc(struct pci_dev *dev)
>  
>  	kref_init(&psdev->kref);
>  	spin_lock_init(&psdev->lock);
> +#ifdef CONFIG_ACPI
> +	psdev->gsi = -1;
> +#endif
>  
>  	return psdev;
>  }
> @@ -220,6 +226,25 @@ static struct pci_dev *pcistub_device_get_pci_dev(struct xen_pcibk_device *pdev,
>  	return pci_dev;
>  }
>  
> +#ifdef CONFIG_ACPI
> +int pcistub_get_gsi_from_sbdf(unsigned int sbdf)
> +{
> +	struct pcistub_device *psdev;
> +	int domain = sbdf >> 16;
> +	int bus = (sbdf >> 8) & 0xff;
> +	int slot = (sbdf >> 3) & 0x1f;
> +	int func = sbdf & 0x7;

you can use PCI_DEVFN PCI_SLOT PCI_FUNC pci_domain_nr instead of open
coding.


> +
> +	psdev = pcistub_device_find(domain, bus, slot, func);
> +
> +	if (!psdev)
> +		return -1;
> +
> +	return psdev->gsi;
> +}
> +EXPORT_SYMBOL_GPL(pcistub_get_gsi_from_sbdf);
> +#endif
> +
>  struct pci_dev *pcistub_get_pci_dev_by_slot(struct xen_pcibk_device *pdev,
>  					    int domain, int bus,
>  					    int slot, int func)
> @@ -367,14 +392,20 @@ static int pcistub_match(struct pci_dev *dev)
>  	return found;
>  }
>  
> -static int pcistub_init_device(struct pci_dev *dev)
> +static int pcistub_init_device(struct pcistub_device *psdev)
>  {
>  	struct xen_pcibk_dev_data *dev_data;
> +	struct pci_dev *dev;
>  #ifdef CONFIG_ACPI
>  	int gsi, trigger, polarity;
>  #endif
>  	int err = 0;
>  
> +	if (!psdev)
> +		return -EINVAL;
> +
> +	dev = psdev->dev;
> +
>  	dev_dbg(&dev->dev, "initializing...\n");
>  
>  	/* The PCI backend is not intended to be a module (or to work with
> @@ -448,6 +479,7 @@ static int pcistub_init_device(struct pci_dev *dev)
>  		dev_err(&dev->dev, "Fail to get gsi info!\n");
>  		goto config_release;
>  	}
> +	psdev->gsi = gsi;
>  
>  	if (xen_initial_domain() && xen_pvh_domain()) {
>  		err = xen_pvh_setup_gsi(gsi, trigger, polarity);
> @@ -495,7 +527,7 @@ static int __init pcistub_init_devices_late(void)
>  
>  		spin_unlock_irqrestore(&pcistub_devices_lock, flags);
>  
> -		err = pcistub_init_device(psdev->dev);
> +		err = pcistub_init_device(psdev);
>  		if (err) {
>  			dev_err(&psdev->dev->dev,
>  				"error %d initializing device\n", err);
> @@ -565,7 +597,7 @@ static int pcistub_seize(struct pci_dev *dev,
>  		spin_unlock_irqrestore(&pcistub_devices_lock, flags);
>  
>  		/* don't want irqs disabled when calling pcistub_init_device */
> -		err = pcistub_init_device(psdev->dev);
> +		err = pcistub_init_device(psdev);
>  
>  		spin_lock_irqsave(&pcistub_devices_lock, flags);
>  
> diff --git a/include/uapi/xen/privcmd.h b/include/uapi/xen/privcmd.h
> index 8b8c5d1420fe..220e7670a113 100644
> --- a/include/uapi/xen/privcmd.h
> +++ b/include/uapi/xen/privcmd.h
> @@ -126,6 +126,11 @@ struct privcmd_ioeventfd {
>  	__u8 pad[2];
>  };
>  
> +struct privcmd_gsi_from_dev {
> +	__u32 sbdf;
> +	int gsi;
> +};
> +
>  /*
>   * @cmd: IOCTL_PRIVCMD_HYPERCALL
>   * @arg: &privcmd_hypercall_t
> @@ -157,5 +162,7 @@ struct privcmd_ioeventfd {
>  	_IOW('P', 8, struct privcmd_irqfd)
>  #define IOCTL_PRIVCMD_IOEVENTFD					\
>  	_IOW('P', 9, struct privcmd_ioeventfd)
> +#define IOCTL_PRIVCMD_GSI_FROM_DEV				\
> +	_IOC(_IOC_NONE, 'P', 10, sizeof(struct privcmd_gsi_from_dev))
>  
>  #endif /* __LINUX_PUBLIC_PRIVCMD_H__ */
> diff --git a/include/xen/acpi.h b/include/xen/acpi.h
> index 9b50027113f3..0bf5f4884456 100644
> --- a/include/xen/acpi.h
> +++ b/include/xen/acpi.h
> @@ -83,4 +83,6 @@ int xen_acpi_get_gsi_info(struct pci_dev *dev,
>  						  int *gsi_out,
>  						  int *trigger_out,
>  						  int *polarity_out);
> +
> +int pcistub_get_gsi_from_sbdf(unsigned int sbdf);
>  #endif	/* _XEN_ACPI_H */
> -- 
> 2.34.1
> 


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC KERNEL PATCH v7 2/2] xen/privcmd: Add new syscall to get gsi from dev
  2024-05-15 22:42   ` Stefano Stabellini
@ 2024-05-16  6:54     ` Chen, Jiqian
  2024-05-16 20:29       ` Stefano Stabellini
  0 siblings, 1 reply; 6+ messages in thread
From: Chen, Jiqian @ 2024-05-16  6:54 UTC (permalink / raw
  To: Stefano Stabellini
  Cc: Juergen Gross, Bjorn Helgaas, Rafael J . Wysocki,
	Roger Pau Monné, xen-devel@lists.xenproject.org,
	linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-acpi@vger.kernel.org, Huang, Ray, Chen, Jiqian

On 2024/5/16 06:42, Stefano Stabellini wrote:
> On Wed, 15 May 2024, Jiqian Chen wrote:
>> In PVH dom0, it uses the linux local interrupt mechanism,
>> when it allocs irq for a gsi, it is dynamic, and follow
>> the principle of applying first, distributing first. And
>> the irq number is alloced from small to large, but the
>> applying gsi number is not, may gsi 38 comes before gsi 28,
>> it causes the irq number is not equal with the gsi number.
>> And when passthrough a device, QEMU will use device's gsi
>> number to do pirq mapping, but the gsi number is got from
>> file /sys/bus/pci/devices/<sbdf>/irq, irq!= gsi, so it will
>> fail when mapping.
>> And in current linux codes, there is no method to get gsi
>> for userspace.
>>
>> For above purpose, record gsi of pcistub devices when init
>> pcistub and add a new syscall into privcmd to let userspace
>> can get gsi when they have a need.
>>
>> Co-developed-by: Huang Rui <ray.huang@amd.com>
>> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
>> ---
>>  drivers/xen/privcmd.c              | 28 ++++++++++++++++++++++
>>  drivers/xen/xen-pciback/pci_stub.c | 38 +++++++++++++++++++++++++++---
>>  include/uapi/xen/privcmd.h         |  7 ++++++
>>  include/xen/acpi.h                 |  2 ++
>>  4 files changed, 72 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c
>> index 67dfa4778864..5953a03b5cb0 100644
>> --- a/drivers/xen/privcmd.c
>> +++ b/drivers/xen/privcmd.c
>> @@ -45,6 +45,9 @@
>>  #include <xen/page.h>
>>  #include <xen/xen-ops.h>
>>  #include <xen/balloon.h>
>> +#ifdef CONFIG_ACPI
>> +#include <xen/acpi.h>
>> +#endif
>>  
>>  #include "privcmd.h"
>>  
>> @@ -842,6 +845,27 @@ static long privcmd_ioctl_mmap_resource(struct file *file,
>>  	return rc;
>>  }
>>  
>> +static long privcmd_ioctl_gsi_from_dev(struct file *file, void __user *udata)
>> +{
>> +	struct privcmd_gsi_from_dev kdata;
>> +
>> +	if (copy_from_user(&kdata, udata, sizeof(kdata)))
>> +		return -EFAULT;
>> +
>> +#ifdef CONFIG_ACPI
>> +	kdata.gsi = pcistub_get_gsi_from_sbdf(kdata.sbdf);
>> +	if (kdata.gsi == -1)
>> +		return -EINVAL;
>> +#else
>> +	kdata.gsi = -1;
> 
> Should we return an error instead, like -EINVAL, to make the behavior
> more similar to the CONFIG_ACPI case?
OK, will return -EINVAL if not config acpi.
Like:
static long privcmd_ioctl_gsi_from_dev(struct file *file, void __user *udata)
{
#ifdef CONFIG_ACPI
	struct privcmd_gsi_from_dev kdata;

	if (copy_from_user(&kdata, udata, sizeof(kdata)))
		return -EFAULT;

	kdata.gsi = pcistub_get_gsi_from_sbdf(kdata.sbdf);
	if (kdata.gsi == -1)
		return -EINVAL;

	if (copy_to_user(udata, &kdata, sizeof(kdata)))
		return -EFAULT;

	return 0;
#else
	return -EINVAL;
#endif
}

> 
> 
>> +#endif
>> +
>> +	if (copy_to_user(udata, &kdata, sizeof(kdata)))
>> +		return -EFAULT;
>> +
>> +	return 0;
>> +}
>> +
>>  #ifdef CONFIG_XEN_PRIVCMD_EVENTFD
>>  /* Irqfd support */
>>  static struct workqueue_struct *irqfd_cleanup_wq;
>> @@ -1529,6 +1553,10 @@ static long privcmd_ioctl(struct file *file,
>>  		ret = privcmd_ioctl_ioeventfd(file, udata);
>>  		break;
>>  
>> +	case IOCTL_PRIVCMD_GSI_FROM_DEV:
>> +		ret = privcmd_ioctl_gsi_from_dev(file, udata);
>> +		break;
>> +
>>  	default:
>>  		break;
>>  	}
>> diff --git a/drivers/xen/xen-pciback/pci_stub.c b/drivers/xen/xen-pciback/pci_stub.c
>> index 2b90d832d0a7..4b62b4d377a9 100644
>> --- a/drivers/xen/xen-pciback/pci_stub.c
>> +++ b/drivers/xen/xen-pciback/pci_stub.c
>> @@ -56,6 +56,9 @@ struct pcistub_device {
>>  
>>  	struct pci_dev *dev;
>>  	struct xen_pcibk_device *pdev;/* non-NULL if struct pci_dev is in use */
>> +#ifdef CONFIG_ACPI
>> +	int gsi;
>> +#endif
>>  };
>>  
>>  /* Access to pcistub_devices & seized_devices lists and the initialize_devices
>> @@ -88,6 +91,9 @@ static struct pcistub_device *pcistub_device_alloc(struct pci_dev *dev)
>>  
>>  	kref_init(&psdev->kref);
>>  	spin_lock_init(&psdev->lock);
>> +#ifdef CONFIG_ACPI
>> +	psdev->gsi = -1;
>> +#endif
>>  
>>  	return psdev;
>>  }
>> @@ -220,6 +226,25 @@ static struct pci_dev *pcistub_device_get_pci_dev(struct xen_pcibk_device *pdev,
>>  	return pci_dev;
>>  }
>>  
>> +#ifdef CONFIG_ACPI
>> +int pcistub_get_gsi_from_sbdf(unsigned int sbdf)
>> +{
>> +	struct pcistub_device *psdev;
>> +	int domain = sbdf >> 16;
>> +	int bus = (sbdf >> 8) & 0xff;
>> +	int slot = (sbdf >> 3) & 0x1f;
>> +	int func = sbdf & 0x7;
> 
> you can use PCI_DEVFN PCI_SLOT PCI_FUNC pci_domain_nr instead of open
> coding.
Thanks, will change to use these in next version.
But pci_domain_nr requires passing in pci_dev.
Will change like:
	int domain = (sbdf >> 16) & 0xffff;
	int bus = PCI_BUS_NUM(sbdf);
	int slot = PCI_SLOT(sbdf);
	int func = PCI_FUNC(sbdf);

> 
> 
>> +
>> +	psdev = pcistub_device_find(domain, bus, slot, func);
>> +
>> +	if (!psdev)
>> +		return -1;
>> +
>> +	return psdev->gsi;
>> +}
>> +EXPORT_SYMBOL_GPL(pcistub_get_gsi_from_sbdf);
>> +#endif
>> +
>>  struct pci_dev *pcistub_get_pci_dev_by_slot(struct xen_pcibk_device *pdev,
>>  					    int domain, int bus,
>>  					    int slot, int func)
>> @@ -367,14 +392,20 @@ static int pcistub_match(struct pci_dev *dev)
>>  	return found;
>>  }
>>  
>> -static int pcistub_init_device(struct pci_dev *dev)
>> +static int pcistub_init_device(struct pcistub_device *psdev)
>>  {
>>  	struct xen_pcibk_dev_data *dev_data;
>> +	struct pci_dev *dev;
>>  #ifdef CONFIG_ACPI
>>  	int gsi, trigger, polarity;
>>  #endif
>>  	int err = 0;
>>  
>> +	if (!psdev)
>> +		return -EINVAL;
>> +
>> +	dev = psdev->dev;
>> +
>>  	dev_dbg(&dev->dev, "initializing...\n");
>>  
>>  	/* The PCI backend is not intended to be a module (or to work with
>> @@ -448,6 +479,7 @@ static int pcistub_init_device(struct pci_dev *dev)
>>  		dev_err(&dev->dev, "Fail to get gsi info!\n");
>>  		goto config_release;
>>  	}
>> +	psdev->gsi = gsi;
>>  
>>  	if (xen_initial_domain() && xen_pvh_domain()) {
>>  		err = xen_pvh_setup_gsi(gsi, trigger, polarity);
>> @@ -495,7 +527,7 @@ static int __init pcistub_init_devices_late(void)
>>  
>>  		spin_unlock_irqrestore(&pcistub_devices_lock, flags);
>>  
>> -		err = pcistub_init_device(psdev->dev);
>> +		err = pcistub_init_device(psdev);
>>  		if (err) {
>>  			dev_err(&psdev->dev->dev,
>>  				"error %d initializing device\n", err);
>> @@ -565,7 +597,7 @@ static int pcistub_seize(struct pci_dev *dev,
>>  		spin_unlock_irqrestore(&pcistub_devices_lock, flags);
>>  
>>  		/* don't want irqs disabled when calling pcistub_init_device */
>> -		err = pcistub_init_device(psdev->dev);
>> +		err = pcistub_init_device(psdev);
>>  
>>  		spin_lock_irqsave(&pcistub_devices_lock, flags);
>>  
>> diff --git a/include/uapi/xen/privcmd.h b/include/uapi/xen/privcmd.h
>> index 8b8c5d1420fe..220e7670a113 100644
>> --- a/include/uapi/xen/privcmd.h
>> +++ b/include/uapi/xen/privcmd.h
>> @@ -126,6 +126,11 @@ struct privcmd_ioeventfd {
>>  	__u8 pad[2];
>>  };
>>  
>> +struct privcmd_gsi_from_dev {
>> +	__u32 sbdf;
>> +	int gsi;
>> +};
>> +
>>  /*
>>   * @cmd: IOCTL_PRIVCMD_HYPERCALL
>>   * @arg: &privcmd_hypercall_t
>> @@ -157,5 +162,7 @@ struct privcmd_ioeventfd {
>>  	_IOW('P', 8, struct privcmd_irqfd)
>>  #define IOCTL_PRIVCMD_IOEVENTFD					\
>>  	_IOW('P', 9, struct privcmd_ioeventfd)
>> +#define IOCTL_PRIVCMD_GSI_FROM_DEV				\
>> +	_IOC(_IOC_NONE, 'P', 10, sizeof(struct privcmd_gsi_from_dev))
>>  
>>  #endif /* __LINUX_PUBLIC_PRIVCMD_H__ */
>> diff --git a/include/xen/acpi.h b/include/xen/acpi.h
>> index 9b50027113f3..0bf5f4884456 100644
>> --- a/include/xen/acpi.h
>> +++ b/include/xen/acpi.h
>> @@ -83,4 +83,6 @@ int xen_acpi_get_gsi_info(struct pci_dev *dev,
>>  						  int *gsi_out,
>>  						  int *trigger_out,
>>  						  int *polarity_out);
>> +
>> +int pcistub_get_gsi_from_sbdf(unsigned int sbdf);
>>  #endif	/* _XEN_ACPI_H */
>> -- 
>> 2.34.1
>>

-- 
Best regards,
Jiqian Chen.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC KERNEL PATCH v7 2/2] xen/privcmd: Add new syscall to get gsi from dev
  2024-05-16  6:54     ` Chen, Jiqian
@ 2024-05-16 20:29       ` Stefano Stabellini
  0 siblings, 0 replies; 6+ messages in thread
From: Stefano Stabellini @ 2024-05-16 20:29 UTC (permalink / raw
  To: Chen, Jiqian
  Cc: Stefano Stabellini, Juergen Gross, Bjorn Helgaas,
	Rafael J . Wysocki, Roger Pau Monné,
	xen-devel@lists.xenproject.org, linux-pci@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-acpi@vger.kernel.org,
	Huang, Ray

On Thu, 16 May 2024, Chen, Jiqian wrote:
> On 2024/5/16 06:42, Stefano Stabellini wrote:
> > On Wed, 15 May 2024, Jiqian Chen wrote:
> >> In PVH dom0, it uses the linux local interrupt mechanism,
> >> when it allocs irq for a gsi, it is dynamic, and follow
> >> the principle of applying first, distributing first. And
> >> the irq number is alloced from small to large, but the
> >> applying gsi number is not, may gsi 38 comes before gsi 28,
> >> it causes the irq number is not equal with the gsi number.
> >> And when passthrough a device, QEMU will use device's gsi
> >> number to do pirq mapping, but the gsi number is got from
> >> file /sys/bus/pci/devices/<sbdf>/irq, irq!= gsi, so it will
> >> fail when mapping.
> >> And in current linux codes, there is no method to get gsi
> >> for userspace.
> >>
> >> For above purpose, record gsi of pcistub devices when init
> >> pcistub and add a new syscall into privcmd to let userspace
> >> can get gsi when they have a need.
> >>
> >> Co-developed-by: Huang Rui <ray.huang@amd.com>
> >> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
> >> ---
> >>  drivers/xen/privcmd.c              | 28 ++++++++++++++++++++++
> >>  drivers/xen/xen-pciback/pci_stub.c | 38 +++++++++++++++++++++++++++---
> >>  include/uapi/xen/privcmd.h         |  7 ++++++
> >>  include/xen/acpi.h                 |  2 ++
> >>  4 files changed, 72 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c
> >> index 67dfa4778864..5953a03b5cb0 100644
> >> --- a/drivers/xen/privcmd.c
> >> +++ b/drivers/xen/privcmd.c
> >> @@ -45,6 +45,9 @@
> >>  #include <xen/page.h>
> >>  #include <xen/xen-ops.h>
> >>  #include <xen/balloon.h>
> >> +#ifdef CONFIG_ACPI
> >> +#include <xen/acpi.h>
> >> +#endif
> >>  
> >>  #include "privcmd.h"
> >>  
> >> @@ -842,6 +845,27 @@ static long privcmd_ioctl_mmap_resource(struct file *file,
> >>  	return rc;
> >>  }
> >>  
> >> +static long privcmd_ioctl_gsi_from_dev(struct file *file, void __user *udata)
> >> +{
> >> +	struct privcmd_gsi_from_dev kdata;
> >> +
> >> +	if (copy_from_user(&kdata, udata, sizeof(kdata)))
> >> +		return -EFAULT;
> >> +
> >> +#ifdef CONFIG_ACPI
> >> +	kdata.gsi = pcistub_get_gsi_from_sbdf(kdata.sbdf);
> >> +	if (kdata.gsi == -1)
> >> +		return -EINVAL;
> >> +#else
> >> +	kdata.gsi = -1;
> > 
> > Should we return an error instead, like -EINVAL, to make the behavior
> > more similar to the CONFIG_ACPI case?
> OK, will return -EINVAL if not config acpi.
> Like:
> static long privcmd_ioctl_gsi_from_dev(struct file *file, void __user *udata)
> {
> #ifdef CONFIG_ACPI
> 	struct privcmd_gsi_from_dev kdata;
> 
> 	if (copy_from_user(&kdata, udata, sizeof(kdata)))
> 		return -EFAULT;
> 
> 	kdata.gsi = pcistub_get_gsi_from_sbdf(kdata.sbdf);
> 	if (kdata.gsi == -1)
> 		return -EINVAL;
> 
> 	if (copy_to_user(udata, &kdata, sizeof(kdata)))
> 		return -EFAULT;
> 
> 	return 0;
> #else
> 	return -EINVAL;
> #endif
> }


Yep that's fine



> >> +#endif
> >> +
> >> +	if (copy_to_user(udata, &kdata, sizeof(kdata)))
> >> +		return -EFAULT;
> >> +
> >> +	return 0;
> >> +}
> >> +
> >>  #ifdef CONFIG_XEN_PRIVCMD_EVENTFD
> >>  /* Irqfd support */
> >>  static struct workqueue_struct *irqfd_cleanup_wq;
> >> @@ -1529,6 +1553,10 @@ static long privcmd_ioctl(struct file *file,
> >>  		ret = privcmd_ioctl_ioeventfd(file, udata);
> >>  		break;
> >>  
> >> +	case IOCTL_PRIVCMD_GSI_FROM_DEV:
> >> +		ret = privcmd_ioctl_gsi_from_dev(file, udata);
> >> +		break;
> >> +
> >>  	default:
> >>  		break;
> >>  	}
> >> diff --git a/drivers/xen/xen-pciback/pci_stub.c b/drivers/xen/xen-pciback/pci_stub.c
> >> index 2b90d832d0a7..4b62b4d377a9 100644
> >> --- a/drivers/xen/xen-pciback/pci_stub.c
> >> +++ b/drivers/xen/xen-pciback/pci_stub.c
> >> @@ -56,6 +56,9 @@ struct pcistub_device {
> >>  
> >>  	struct pci_dev *dev;
> >>  	struct xen_pcibk_device *pdev;/* non-NULL if struct pci_dev is in use */
> >> +#ifdef CONFIG_ACPI
> >> +	int gsi;
> >> +#endif
> >>  };
> >>  
> >>  /* Access to pcistub_devices & seized_devices lists and the initialize_devices
> >> @@ -88,6 +91,9 @@ static struct pcistub_device *pcistub_device_alloc(struct pci_dev *dev)
> >>  
> >>  	kref_init(&psdev->kref);
> >>  	spin_lock_init(&psdev->lock);
> >> +#ifdef CONFIG_ACPI
> >> +	psdev->gsi = -1;
> >> +#endif
> >>  
> >>  	return psdev;
> >>  }
> >> @@ -220,6 +226,25 @@ static struct pci_dev *pcistub_device_get_pci_dev(struct xen_pcibk_device *pdev,
> >>  	return pci_dev;
> >>  }
> >>  
> >> +#ifdef CONFIG_ACPI
> >> +int pcistub_get_gsi_from_sbdf(unsigned int sbdf)
> >> +{
> >> +	struct pcistub_device *psdev;
> >> +	int domain = sbdf >> 16;
> >> +	int bus = (sbdf >> 8) & 0xff;
> >> +	int slot = (sbdf >> 3) & 0x1f;
> >> +	int func = sbdf & 0x7;
> > 
> > you can use PCI_DEVFN PCI_SLOT PCI_FUNC pci_domain_nr instead of open
> > coding.
> Thanks, will change to use these in next version.
> But pci_domain_nr requires passing in pci_dev.
> Will change like:
> 	int domain = (sbdf >> 16) & 0xffff;
> 	int bus = PCI_BUS_NUM(sbdf);
> 	int slot = PCI_SLOT(sbdf);
> 	int func = PCI_FUNC(sbdf);

That's fine


 
> >> +
> >> +	psdev = pcistub_device_find(domain, bus, slot, func);
> >> +
> >> +	if (!psdev)
> >> +		return -1;
> >> +
> >> +	return psdev->gsi;
> >> +}
> >> +EXPORT_SYMBOL_GPL(pcistub_get_gsi_from_sbdf);
> >> +#endif
> >> +
> >>  struct pci_dev *pcistub_get_pci_dev_by_slot(struct xen_pcibk_device *pdev,
> >>  					    int domain, int bus,
> >>  					    int slot, int func)
> >> @@ -367,14 +392,20 @@ static int pcistub_match(struct pci_dev *dev)
> >>  	return found;
> >>  }
> >>  
> >> -static int pcistub_init_device(struct pci_dev *dev)
> >> +static int pcistub_init_device(struct pcistub_device *psdev)
> >>  {
> >>  	struct xen_pcibk_dev_data *dev_data;
> >> +	struct pci_dev *dev;
> >>  #ifdef CONFIG_ACPI
> >>  	int gsi, trigger, polarity;
> >>  #endif
> >>  	int err = 0;
> >>  
> >> +	if (!psdev)
> >> +		return -EINVAL;
> >> +
> >> +	dev = psdev->dev;
> >> +
> >>  	dev_dbg(&dev->dev, "initializing...\n");
> >>  
> >>  	/* The PCI backend is not intended to be a module (or to work with
> >> @@ -448,6 +479,7 @@ static int pcistub_init_device(struct pci_dev *dev)
> >>  		dev_err(&dev->dev, "Fail to get gsi info!\n");
> >>  		goto config_release;
> >>  	}
> >> +	psdev->gsi = gsi;
> >>  
> >>  	if (xen_initial_domain() && xen_pvh_domain()) {
> >>  		err = xen_pvh_setup_gsi(gsi, trigger, polarity);
> >> @@ -495,7 +527,7 @@ static int __init pcistub_init_devices_late(void)
> >>  
> >>  		spin_unlock_irqrestore(&pcistub_devices_lock, flags);
> >>  
> >> -		err = pcistub_init_device(psdev->dev);
> >> +		err = pcistub_init_device(psdev);
> >>  		if (err) {
> >>  			dev_err(&psdev->dev->dev,
> >>  				"error %d initializing device\n", err);
> >> @@ -565,7 +597,7 @@ static int pcistub_seize(struct pci_dev *dev,
> >>  		spin_unlock_irqrestore(&pcistub_devices_lock, flags);
> >>  
> >>  		/* don't want irqs disabled when calling pcistub_init_device */
> >> -		err = pcistub_init_device(psdev->dev);
> >> +		err = pcistub_init_device(psdev);
> >>  
> >>  		spin_lock_irqsave(&pcistub_devices_lock, flags);
> >>  
> >> diff --git a/include/uapi/xen/privcmd.h b/include/uapi/xen/privcmd.h
> >> index 8b8c5d1420fe..220e7670a113 100644
> >> --- a/include/uapi/xen/privcmd.h
> >> +++ b/include/uapi/xen/privcmd.h
> >> @@ -126,6 +126,11 @@ struct privcmd_ioeventfd {
> >>  	__u8 pad[2];
> >>  };
> >>  
> >> +struct privcmd_gsi_from_dev {
> >> +	__u32 sbdf;
> >> +	int gsi;
> >> +};
> >> +
> >>  /*
> >>   * @cmd: IOCTL_PRIVCMD_HYPERCALL
> >>   * @arg: &privcmd_hypercall_t
> >> @@ -157,5 +162,7 @@ struct privcmd_ioeventfd {
> >>  	_IOW('P', 8, struct privcmd_irqfd)
> >>  #define IOCTL_PRIVCMD_IOEVENTFD					\
> >>  	_IOW('P', 9, struct privcmd_ioeventfd)
> >> +#define IOCTL_PRIVCMD_GSI_FROM_DEV				\
> >> +	_IOC(_IOC_NONE, 'P', 10, sizeof(struct privcmd_gsi_from_dev))
> >>  
> >>  #endif /* __LINUX_PUBLIC_PRIVCMD_H__ */
> >> diff --git a/include/xen/acpi.h b/include/xen/acpi.h
> >> index 9b50027113f3..0bf5f4884456 100644
> >> --- a/include/xen/acpi.h
> >> +++ b/include/xen/acpi.h
> >> @@ -83,4 +83,6 @@ int xen_acpi_get_gsi_info(struct pci_dev *dev,
> >>  						  int *gsi_out,
> >>  						  int *trigger_out,
> >>  						  int *polarity_out);
> >> +
> >> +int pcistub_get_gsi_from_sbdf(unsigned int sbdf);
> >>  #endif	/* _XEN_ACPI_H */
> >> -- 
> >> 2.34.1
> >>
> 
> -- 
> Best regards,
> Jiqian Chen.
> 


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2024-05-16 20:30 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-05-15  6:50 [RFC KERNEL PATCH v7 0/2] Support device passthrough when dom0 is PVH on Xen Jiqian Chen
2024-05-15  6:50 ` [RFC KERNEL PATCH v7 1/2] xen/pvh: Setup gsi for passthrough device Jiqian Chen
2024-05-15  6:50 ` [RFC KERNEL PATCH v7 2/2] xen/privcmd: Add new syscall to get gsi from dev Jiqian Chen
2024-05-15 22:42   ` Stefano Stabellini
2024-05-16  6:54     ` Chen, Jiqian
2024-05-16 20:29       ` Stefano Stabellini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).