[XEN PATCH v8 0/5] Support device passthrough when dom0 is PVH on Xen

Xen-Devel Archive mirror
 help / color / mirror / Atom feed

* [XEN PATCH v8 0/5] Support device passthrough when dom0 is PVH on Xen
@ 2024-05-16  9:52 Jiqian Chen
  2024-05-16  9:52 ` [XEN PATCH v8 1/5] xen/vpci: Clear all vpci status of device Jiqian Chen
                   ` (4 more replies)
  0 siblings, 5 replies; 49+ messages in thread
From: Jiqian Chen @ 2024-05-16  9:52 UTC (permalink / raw
  To: xen-devel
  Cc: Jan Beulich, Andrew Cooper, Roger Pau Monné, Wei Liu,
	George Dunlap, Julien Grall, Stefano Stabellini, Anthony PERARD,
	Anthony PERARD, Juergen Gross, Daniel P . Smith,
	Stewart Hildebrand, Huang Rui, Jiqian Chen

Hi All,
This is v8 series to support passthrough when dom0 is PVH
v6->v7 changes:
* patch#2: Add the domid check(domid == DOMID_SELF) to prevent self map when guest doesn't use pirq.
           That check was missed in the previous version.
* patch#4: Due to changes in the implementation of obtaining gsi in the kernel. Change to add a new function
           to get gsi by passing in the sbdf of pci device.
* patch#5: Remove the parameter "is_gsi", when there exist gsi, in pci_add_dm_done use a new function
           pci_device_set_gsi to do map_pirq and grant permission. That gets more intuitive code logic.

Best regards,
Jiqian Chen

v6->v7 changes:
* patch#4: Due to changes in the implementation of obtaining gsi in the kernel. Change to add a new function
           to get gsi from irq, instead of gsi sysfs.
* patch#5: Fix the issue with variable usage, rc->r.

v5->v6 changes:
* patch#1: Add Reviewed-by Stefano and Stewart. Rebase code and change old function vpci_remove_device,
           vpci_add_handlers to vpci_deassign_device, vpci_assign_device
* patch#2: Add Reviewed-by Stefano
* patch#3: Remove unnecessary "ASSERT(!has_pirq(currd));"
* patch#4: Fix some coding style issues below directory tools
* patch#5: Modified some variable names and code logic to make code easier to be understood, which to use
           gsi by default and be compatible with older kernel versions to continue to use irq

v4->v5 changes:
* patch#1: add pci_lock wrap function vpci_reset_device_state
* patch#2: move the check of self map_pirq to physdev.c, and change to check if the caller has PIRQ flag, and
           just break for PHYSDEVOP_(un)map_pirq in hvm_physdev_op
* patch#3: return -EOPNOTSUPP instead, and use ASSERT(!has_pirq(currd));
* patch#4: is the patch#5 in v4 because patch#5 in v5 has some dependency on it. And add the handling of errno
           and add the Reviewed-by Stefano
* patch#5: is the patch#4 in v4. New implementation to add new hypercall XEN_DOMCTL_gsi_permission to grant gsi

v3->v4 changes:
* patch#1: change the comment of PHYSDEVOP_pci_device_state_reset; move printings behind pcidevs_unlock
* patch#2: add check to prevent PVH self map
* patch#3: new patch, The implementation of adding PHYSDEVOP_setup_gsi for PVH is treated as a separate patch
* patch#4: new patch to solve the map_pirq problem of PVH dom0. use gsi to grant irq permission in
           XEN_DOMCTL_irq_permission.
* patch#5: to be compatible with previous kernel versions, when there is no gsi sysfs, still use irq
v4 link:
https://lore.kernel.org/xen-devel/20240105070920.350113-1-Jiqian.Chen@amd.com/T/#t

v2->v3 changes:
* patch#1: move the content out of pci_reset_device_state and delete pci_reset_device_state; add
           xsm_resource_setup_pci check for PHYSDEVOP_pci_device_state_reset; add description for
		   PHYSDEVOP_pci_device_state_reset;
* patch#2: du to changes in the implementation of the second patch on kernel side(that it will do setup_gsi and
           map_pirq when assigning a device to passthrough), add PHYSDEVOP_setup_gsi for PVH dom0, and we need
		   to support self mapping.
* patch#3: du to changes in the implementation of the second patch on kernel side(that adds a new sysfs for gsi
           instead of a new syscall), so read gsi number from the sysfs of gsi.
v3 link:
https://lore.kernel.org/xen-devel/20231210164009.1551147-1-Jiqian.Chen@amd.com/T/#t

v2 link:
https://lore.kernel.org/xen-devel/20231124104136.3263722-1-Jiqian.Chen@amd.com/T/#t
Below is the description of v2 cover letter:
This series of patches are the v2 of the implementation of passthrough when dom0 is PVH on Xen.
We sent the v1 to upstream before, but the v1 had so many problems and we got lots of suggestions.
I will introduce all issues that these patches try to fix and the differences between v1 and v2.

Issues we encountered:
1. pci_stub failed to write bar for a passthrough device.
Problem: when we run \u201csudo xl pci-assignable-add <sbdf>\u201d to assign a device, pci_stub will call
pcistub_init_device() -> pci_restore_state() -> pci_restore_config_space() ->
pci_restore_config_space_range() -> pci_restore_config_dword() -> pci_write_config_dword()\u201d, the pci config
write will trigger an io interrupt to bar_write() in the xen, but the
bar->enabled was set before, the write is not allowed now, and then when 
bar->Qemu config the
passthrough device in xen_pt_realize(), it gets invalid bar values.

Reason: the reason is that we don't tell vPCI that the device has been reset, so the current cached state in
pdev->vpci is all out of date and is different from the real device state.

Solution: to solve this problem, the first patch of kernel(xen/pci: Add xen_reset_device_state
function) and the fist patch of xen(xen/vpci: Clear all vpci status of device) add a new hypercall to reset the
state stored in vPCI when the state of real device has changed.
Thank Roger for the suggestion of this v2, and it is different from
v1 (https://lore.kernel.org/xen-devel/20230312075455.450187-3-ray.huang@amd.com/), v1 simply allow domU to write
pci bar, it does not comply with the design principles of vPCI.

2. failed to do PHYSDEVOP_map_pirq when dom0 is PVH
Problem: HVM domU will do PHYSDEVOP_map_pirq for a passthrough device by using gsi. See
xen_pt_realize->xc_physdev_map_pirq and pci_add_dm_done->xc_physdev_map_pirq. Then xc_physdev_map_pirq will call
into Xen, but in hvm_physdev_op(), PHYSDEVOP_map_pirq is not allowed.

Reason: In hvm_physdev_op(), the variable "currd" is PVH dom0 and PVH has no X86_EMU_USE_PIRQ flag, it will fail
at has_pirq check.

Solution: I think we may need to allow PHYSDEVOP_map_pirq when "currd" is dom0 (at present dom0 is PVH). The
second patch of xen(x86/pvh: Open PHYSDEVOP_map_pirq for PVH dom0) allow PVH dom0 do PHYSDEVOP_map_pirq. This v2
patch is better than v1, v1 simply remove the has_pirq check
(xen https://lore.kernel.org/xen-devel/20230312075455.450187-4-ray.huang@amd.com/).

3. the gsi of a passthrough device doesn't be unmasked
 3.1 failed to check the permission of pirq
 3.2 the gsi of passthrough device was not registered in PVH dom0

Problem:
3.1 callback function pci_add_dm_done() will be called when qemu config a passthrough device for domU.
This function will call xc_domain_irq_permission()-> pirq_access_permitted() to check if the gsi has corresponding
mappings in dom0. But it didn\u2019t, so failed. See XEN_DOMCTL_irq_permission->pirq_access_permitted, "current"
is PVH dom0 and it return irq is 0.
3.2 it's possible for a gsi (iow: vIO-APIC pin) to never get registered on PVH dom0, because the devices of PVH
are using MSI(-X) interrupts. However, the IO-APIC pin must be configured for it to be able to be mapped into a domU.

Reason: After searching codes, I find "map_pirq" and "register_gsi" will be done in function
vioapic_write_redirent->vioapic_hwdom_map_gsi when the gsi(aka ioapic's pin) is unmasked in PVH dom0.
So the two problems can be concluded to that the gsi of a passthrough device doesn't be unmasked.

Solution: to solve these problems, the second patch of kernel(xen/pvh: Unmask irq for passthrough device in PVH dom0)
call the unmask_irq() when we assign a device to be passthrough. So that passthrough devices can have the mapping of
gsi on PVH dom0 and gsi can be registered. This v2 patch is different from the
v1( kernel https://lore.kernel.org/xen-devel/20230312120157.452859-5-ray.huang@amd.com/,
kernel https://lore.kernel.org/xen-devel/20230312120157.452859-5-ray.huang@amd.com/ and
xen https://lore.kernel.org/xen-devel/20230312075455.450187-5-ray.huang@amd.com/),
v1 performed "map_pirq" and "register_gsi" on all pci devices on PVH dom0, which is unnecessary and may cause
multiple registration.

4. failed to map pirq for gsi
Problem: qemu will call xc_physdev_map_pirq() to map a passthrough device\u2019s gsi to pirq in function
xen_pt_realize(). But failed.

Reason: According to the implement of xc_physdev_map_pirq(), it needs gsi instead of irq, but qemu pass irq to it and
treat irq as gsi, it is got from file /sys/bus/pci/devices/xxxx:xx:xx.x/irq in function xen_host_pci_device_get().
But actually the gsi number is not equal with irq. On PVH dom0, when it allocates irq for a gsi in
function acpi_register_gsi_ioapic(), allocation is dynamic, and follow the principle of applying first, distributing
first. And if you debug the kernel codes(see function __irq_alloc_descs), you will find the irq number is allocated
from small to large by order, but the applying gsi number is not, gsi 38 may come before gsi 28, that causes gsi 38
get a smaller irq number than gsi 28, and then gsi != irq.

Solution: we can record the relation between gsi and irq, then when userspace(qemu) want to use gsi, we can do a
translation. The third patch of kernel(xen/privcmd: Add new syscall to get gsi from irq) records all the relations
in acpi_register_gsi_xen_pvh() when dom0 initialize pci devices, and provide a syscall for userspace to get the gsi
from irq. The third patch of xen(tools: Add new function to get gsi from irq) add a new function
xc_physdev_gsi_from_irq() to call the new syscall added on kernel side.
And then userspace can use that function to get gsi. Then xc_physdev_map_pirq() will success. This v2 patch is the
same as v1( kernel https://lore.kernel.org/xen-devel/20230312120157.452859-6-ray.huang@amd.com/ and
xen https://lore.kernel.org/xen-devel/20230312075455.450187-6-ray.huang@amd.com/)

About the v2 patch of qemu, just change an included head file, other are similar to the
v1 ( qemu https://lore.kernel.org/xen-devel/20230312092244.451465-19-ray.huang@amd.com/), just call
xc_physdev_gsi_from_irq() to get gsi from irq.

Jiqian Chen (5):
  xen/vpci: Clear all vpci status of device
  x86/pvh: Allow (un)map_pirq when dom0 is PVH
  x86/pvh: Add PHYSDEVOP_setup_gsi for PVH dom0
  tools: Add new function to get gsi from dev
  domctl: Add XEN_DOMCTL_gsi_permission to grant gsi

 tools/include/xen-sys/Linux/privcmd.h |  7 +++
 tools/include/xencall.h               |  2 +
 tools/include/xenctrl.h               |  7 +++
 tools/libs/call/core.c                |  5 +++
 tools/libs/call/libxencall.map        |  2 +
 tools/libs/call/linux.c               | 15 +++++++
 tools/libs/call/private.h             |  9 ++++
 tools/libs/ctrl/xc_domain.c           | 15 +++++++
 tools/libs/ctrl/xc_physdev.c          |  4 ++
 tools/libs/light/libxl_pci.c          | 63 +++++++++++++++++++++++++++
 xen/arch/x86/domctl.c                 | 31 +++++++++++++
 xen/arch/x86/hvm/hypercall.c          |  8 ++++
 xen/arch/x86/physdev.c                | 24 ++++++++++
 xen/drivers/pci/physdev.c             | 36 +++++++++++++++
 xen/drivers/vpci/vpci.c               | 10 +++++
 xen/include/public/domctl.h           |  9 ++++
 xen/include/public/physdev.h          |  7 +++
 xen/include/xen/vpci.h                |  6 +++
 xen/xsm/flask/hooks.c                 |  1 +
 19 files changed, 261 insertions(+)

-- 
2.34.1

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [XEN PATCH v8 1/5] xen/vpci: Clear all vpci status of device
  2024-05-16  9:52 [XEN PATCH v8 0/5] Support device passthrough when dom0 is PVH on Xen Jiqian Chen
@ 2024-05-16  9:52 ` Jiqian Chen
  2024-05-16 13:08   ` Jan Beulich
  2024-05-16  9:52 ` [XEN PATCH v8 2/5] x86/pvh: Allow (un)map_pirq when dom0 is PVH Jiqian Chen
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 49+ messages in thread
From: Jiqian Chen @ 2024-05-16  9:52 UTC (permalink / raw
  To: xen-devel
  Cc: Jan Beulich, Andrew Cooper, Roger Pau Monné, Wei Liu,
	George Dunlap, Julien Grall, Stefano Stabellini, Anthony PERARD,
	Anthony PERARD, Juergen Gross, Daniel P . Smith,
	Stewart Hildebrand, Huang Rui, Jiqian Chen, Huang Rui,
	Stewart Hildebrand

When a device has been reset on dom0 side, the vpci on Xen
side won't get notification, so the cached state in vpci is
all out of date compare with the real device state.
To solve that problem, add a new hypercall to clear all vpci
device state. When the state of device is reset on dom0 side,
dom0 can call this hypercall to notify vpci.

Signed-off-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
Reviewed-by: Stewart Hildebrand <stewart.hildebrand@amd.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
---
 xen/arch/x86/hvm/hypercall.c |  1 +
 xen/drivers/pci/physdev.c    | 36 ++++++++++++++++++++++++++++++++++++
 xen/drivers/vpci/vpci.c      | 10 ++++++++++
 xen/include/public/physdev.h |  7 +++++++
 xen/include/xen/vpci.h       |  6 ++++++
 5 files changed, 60 insertions(+)

diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c
index 14679dd82971..56fbb69ab201 100644
--- a/xen/arch/x86/hvm/hypercall.c
+++ b/xen/arch/x86/hvm/hypercall.c
@@ -84,6 +84,7 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
     case PHYSDEVOP_pci_mmcfg_reserved:
     case PHYSDEVOP_pci_device_add:
     case PHYSDEVOP_pci_device_remove:
+    case PHYSDEVOP_pci_device_state_reset:
     case PHYSDEVOP_dbgp_op:
         if ( !is_hardware_domain(currd) )
             return -ENOSYS;
diff --git a/xen/drivers/pci/physdev.c b/xen/drivers/pci/physdev.c
index 42db3e6d133c..73dc8f058b0e 100644
--- a/xen/drivers/pci/physdev.c
+++ b/xen/drivers/pci/physdev.c
@@ -2,6 +2,7 @@
 #include <xen/guest_access.h>
 #include <xen/hypercall.h>
 #include <xen/init.h>
+#include <xen/vpci.h>
 
 #ifndef COMPAT
 typedef long ret_t;
@@ -67,6 +68,41 @@ ret_t pci_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
         break;
     }
 
+    case PHYSDEVOP_pci_device_state_reset: {
+        struct physdev_pci_device dev;
+        struct pci_dev *pdev;
+        pci_sbdf_t sbdf;
+
+        if ( !is_pci_passthrough_enabled() )
+            return -EOPNOTSUPP;
+
+        ret = -EFAULT;
+        if ( copy_from_guest(&dev, arg, 1) != 0 )
+            break;
+        sbdf = PCI_SBDF(dev.seg, dev.bus, dev.devfn);
+
+        ret = xsm_resource_setup_pci(XSM_PRIV, sbdf.sbdf);
+        if ( ret )
+            break;
+
+        pcidevs_lock();
+        pdev = pci_get_pdev(NULL, sbdf);
+        if ( !pdev )
+        {
+            pcidevs_unlock();
+            ret = -ENODEV;
+            break;
+        }
+
+        write_lock(&pdev->domain->pci_lock);
+        ret = vpci_reset_device_state(pdev);
+        write_unlock(&pdev->domain->pci_lock);
+        pcidevs_unlock();
+        if ( ret )
+            printk(XENLOG_ERR "%pp: failed to reset PCI device state\n", &sbdf);
+        break;
+    }
+
     default:
         ret = -ENOSYS;
         break;
diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
index 97e115dc5798..424aec2d5c46 100644
--- a/xen/drivers/vpci/vpci.c
+++ b/xen/drivers/vpci/vpci.c
@@ -115,6 +115,16 @@ int vpci_assign_device(struct pci_dev *pdev)
 
     return rc;
 }
+
+int vpci_reset_device_state(struct pci_dev *pdev)
+{
+    ASSERT(pcidevs_locked());
+    ASSERT(rw_is_write_locked(&pdev->domain->pci_lock));
+
+    vpci_deassign_device(pdev);
+    return vpci_assign_device(pdev);
+}
+
 #endif /* __XEN__ */
 
 static int vpci_register_cmp(const struct vpci_register *r1,
diff --git a/xen/include/public/physdev.h b/xen/include/public/physdev.h
index f0c0d4727c0b..f5bab1f29779 100644
--- a/xen/include/public/physdev.h
+++ b/xen/include/public/physdev.h
@@ -296,6 +296,13 @@ DEFINE_XEN_GUEST_HANDLE(physdev_pci_device_add_t);
  */
 #define PHYSDEVOP_prepare_msix          30
 #define PHYSDEVOP_release_msix          31
+/*
+ * Notify the hypervisor that a PCI device has been reset, so that any
+ * internally cached state is regenerated.  Should be called after any
+ * device reset performed by the hardware domain.
+ */
+#define PHYSDEVOP_pci_device_state_reset     32
+
 struct physdev_pci_device {
     /* IN */
     uint16_t seg;
diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
index 6e4c972f35ed..93b1c1d72c05 100644
--- a/xen/include/xen/vpci.h
+++ b/xen/include/xen/vpci.h
@@ -30,6 +30,7 @@ int __must_check vpci_assign_device(struct pci_dev *pdev);
 
 /* Remove all handlers and free vpci related structures. */
 void vpci_deassign_device(struct pci_dev *pdev);
+int __must_check vpci_reset_device_state(struct pci_dev *pdev);
 
 /* Add/remove a register handler. */
 int __must_check vpci_add_register_mask(struct vpci *vpci,
@@ -266,6 +267,11 @@ static inline int vpci_assign_device(struct pci_dev *pdev)
 
 static inline void vpci_deassign_device(struct pci_dev *pdev) { }
 
+static inline int __must_check vpci_reset_device_state(struct pci_dev *pdev)
+{
+    return 0;
+}
+
 static inline void vpci_dump_msi(void) { }
 
 static inline uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg,
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [XEN PATCH v8 2/5] x86/pvh: Allow (un)map_pirq when dom0 is PVH
  2024-05-16  9:52 [XEN PATCH v8 0/5] Support device passthrough when dom0 is PVH on Xen Jiqian Chen
  2024-05-16  9:52 ` [XEN PATCH v8 1/5] xen/vpci: Clear all vpci status of device Jiqian Chen
@ 2024-05-16  9:52 ` Jiqian Chen
  2024-05-16 13:29   ` Jan Beulich
  2024-05-16  9:52 ` [XEN PATCH v8 3/5] x86/pvh: Add PHYSDEVOP_setup_gsi for PVH dom0 Jiqian Chen
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 49+ messages in thread
From: Jiqian Chen @ 2024-05-16  9:52 UTC (permalink / raw
  To: xen-devel
  Cc: Jan Beulich, Andrew Cooper, Roger Pau Monné, Wei Liu,
	George Dunlap, Julien Grall, Stefano Stabellini, Anthony PERARD,
	Anthony PERARD, Juergen Gross, Daniel P . Smith,
	Stewart Hildebrand, Huang Rui, Jiqian Chen, Huang Rui

If run Xen with PVH dom0 and hvm domU, hvm will map a pirq for
a passthrough device by using gsi, see
xen_pt_realize->xc_physdev_map_pirq and
pci_add_dm_done->xc_physdev_map_pirq. Then xc_physdev_map_pirq
will call into Xen, but in hvm_physdev_op, PHYSDEVOP_map_pirq
is not allowed because currd is PVH dom0 and PVH has no
X86_EMU_USE_PIRQ flag, it will fail at has_pirq check.

So, allow PHYSDEVOP_map_pirq when dom0 is PVH and also allow
PHYSDEVOP_unmap_pirq for the failed path to unmap pirq. And
add a new check to prevent self map when caller has no PIRQ
flag.

Signed-off-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
---
 xen/arch/x86/hvm/hypercall.c |  2 ++
 xen/arch/x86/physdev.c       | 24 ++++++++++++++++++++++++
 2 files changed, 26 insertions(+)

diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c
index 56fbb69ab201..d49fb8b548a3 100644
--- a/xen/arch/x86/hvm/hypercall.c
+++ b/xen/arch/x86/hvm/hypercall.c
@@ -74,6 +74,8 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
     {
     case PHYSDEVOP_map_pirq:
     case PHYSDEVOP_unmap_pirq:
+        break;
+
     case PHYSDEVOP_eoi:
     case PHYSDEVOP_irq_status_query:
     case PHYSDEVOP_get_free_pirq:
diff --git a/xen/arch/x86/physdev.c b/xen/arch/x86/physdev.c
index 7efa17cf4c1e..1337f95171cd 100644
--- a/xen/arch/x86/physdev.c
+++ b/xen/arch/x86/physdev.c
@@ -305,11 +305,23 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
     case PHYSDEVOP_map_pirq: {
         physdev_map_pirq_t map;
         struct msi_info msi;
+        struct domain *d;
 
         ret = -EFAULT;
         if ( copy_from_guest(&map, arg, 1) != 0 )
             break;
 
+        d = rcu_lock_domain_by_any_id(map.domid);
+        if ( d == NULL )
+            return -ESRCH;
+        /* If caller is the same HVM guest as current, check pirq flag */
+        if ( !is_pv_domain(d) && !has_pirq(d) && map.domid == DOMID_SELF )
+        {
+            rcu_unlock_domain(d);
+            return -EOPNOTSUPP;
+        }
+        rcu_unlock_domain(d);
+
         switch ( map.type )
         {
         case MAP_PIRQ_TYPE_MSI_SEG:
@@ -343,11 +355,23 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 
     case PHYSDEVOP_unmap_pirq: {
         struct physdev_unmap_pirq unmap;
+        struct domain *d;
 
         ret = -EFAULT;
         if ( copy_from_guest(&unmap, arg, 1) != 0 )
             break;
 
+        d = rcu_lock_domain_by_any_id(unmap.domid);
+        if ( d == NULL )
+            return -ESRCH;
+        /* If caller is the same HVM guest as current, check pirq flag */
+        if ( !is_pv_domain(d) && !has_pirq(d) && unmap.domid == DOMID_SELF )
+        {
+            rcu_unlock_domain(d);
+            return -EOPNOTSUPP;
+        }
+        rcu_unlock_domain(d);
+
         ret = physdev_unmap_pirq(unmap.domid, unmap.pirq);
         break;
     }
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [XEN PATCH v8 3/5] x86/pvh: Add PHYSDEVOP_setup_gsi for PVH dom0
  2024-05-16  9:52 [XEN PATCH v8 0/5] Support device passthrough when dom0 is PVH on Xen Jiqian Chen
  2024-05-16  9:52 ` [XEN PATCH v8 1/5] xen/vpci: Clear all vpci status of device Jiqian Chen
  2024-05-16  9:52 ` [XEN PATCH v8 2/5] x86/pvh: Allow (un)map_pirq when dom0 is PVH Jiqian Chen
@ 2024-05-16  9:52 ` Jiqian Chen
  2024-05-16 13:49   ` Jan Beulich
  2024-05-16  9:52 ` [RFC XEN PATCH v8 4/5] tools: Add new function to get gsi from dev Jiqian Chen
  2024-05-16  9:52 ` [RFC XEN PATCH v8 5/5] domctl: Add XEN_DOMCTL_gsi_permission to grant gsi Jiqian Chen
  4 siblings, 1 reply; 49+ messages in thread
From: Jiqian Chen @ 2024-05-16  9:52 UTC (permalink / raw
  To: xen-devel
  Cc: Jan Beulich, Andrew Cooper, Roger Pau Monné, Wei Liu,
	George Dunlap, Julien Grall, Stefano Stabellini, Anthony PERARD,
	Anthony PERARD, Juergen Gross, Daniel P . Smith,
	Stewart Hildebrand, Huang Rui, Jiqian Chen, Huang Rui

On PVH dom0, the gsis don't get registered, but
the gsi of a passthrough device must be configured for it to
be able to be mapped into a hvm domU.
On Linux kernel side, it calles PHYSDEVOP_setup_gsi for
passthrough devices to register gsi when dom0 is PVH.
So, add PHYSDEVOP_setup_gsi for above purpose.

Signed-off-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
---
 xen/arch/x86/hvm/hypercall.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c
index d49fb8b548a3..98e3c6b176ff 100644
--- a/xen/arch/x86/hvm/hypercall.c
+++ b/xen/arch/x86/hvm/hypercall.c
@@ -76,6 +76,11 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
     case PHYSDEVOP_unmap_pirq:
         break;
 
+    case PHYSDEVOP_setup_gsi:
+        if ( !is_hardware_domain(currd) )
+            return -EOPNOTSUPP;
+        break;
+
     case PHYSDEVOP_eoi:
     case PHYSDEVOP_irq_status_query:
     case PHYSDEVOP_get_free_pirq:
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC XEN PATCH v8 4/5] tools: Add new function to get gsi from dev
  2024-05-16  9:52 [XEN PATCH v8 0/5] Support device passthrough when dom0 is PVH on Xen Jiqian Chen
                   ` (2 preceding siblings ...)
  2024-05-16  9:52 ` [XEN PATCH v8 3/5] x86/pvh: Add PHYSDEVOP_setup_gsi for PVH dom0 Jiqian Chen
@ 2024-05-16  9:52 ` Jiqian Chen
  2024-05-16  9:52 ` [RFC XEN PATCH v8 5/5] domctl: Add XEN_DOMCTL_gsi_permission to grant gsi Jiqian Chen
  4 siblings, 0 replies; 49+ messages in thread
From: Jiqian Chen @ 2024-05-16  9:52 UTC (permalink / raw
  To: xen-devel
  Cc: Jan Beulich, Andrew Cooper, Roger Pau Monné, Wei Liu,
	George Dunlap, Julien Grall, Stefano Stabellini, Anthony PERARD,
	Anthony PERARD, Juergen Gross, Daniel P . Smith,
	Stewart Hildebrand, Huang Rui, Jiqian Chen, Huang Rui

In PVH dom0, it uses the linux local interrupt mechanism,
when it allocs irq for a gsi, it is dynamic, and follow
the principle of applying first, distributing first. And
irq number is alloced from small to large, but the applying
gsi number is not, may gsi 38 comes before gsi 28, that
causes the irq number is not equal with the gsi number.
And when passthrough a device, QEMU will use its gsi number
to do pirq mapping, see xen_pt_realize->xc_physdev_map_pirq,
but the gsi number is got from file
/sys/bus/pci/devices/<sbdf>/irq, so it will fail when mapping.
And in current codes, there is no method to get gsi for
userspace.

For above purpose, add new function to get gsi. And call this
function before xc_physdev_(un)map_pirq

Signed-off-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Chen Jiqian <Jiqian.Chen@amd.com>
---
 tools/include/xen-sys/Linux/privcmd.h |  7 +++++++
 tools/include/xencall.h               |  2 ++
 tools/include/xenctrl.h               |  2 ++
 tools/libs/call/core.c                |  5 +++++
 tools/libs/call/libxencall.map        |  2 ++
 tools/libs/call/linux.c               | 15 +++++++++++++++
 tools/libs/call/private.h             |  9 +++++++++
 tools/libs/ctrl/xc_physdev.c          |  4 ++++
 tools/libs/light/libxl_pci.c          | 23 +++++++++++++++++++++++
 9 files changed, 69 insertions(+)

diff --git a/tools/include/xen-sys/Linux/privcmd.h b/tools/include/xen-sys/Linux/privcmd.h
index bc60e8fd55eb..977f1a058797 100644
--- a/tools/include/xen-sys/Linux/privcmd.h
+++ b/tools/include/xen-sys/Linux/privcmd.h
@@ -95,6 +95,11 @@ typedef struct privcmd_mmap_resource {
 	__u64 addr;
 } privcmd_mmap_resource_t;
 
+typedef struct privcmd_gsi_from_dev {
+	__u32 sbdf;
+	int gsi;
+} privcmd_gsi_from_dev_t;
+
 /*
  * @cmd: IOCTL_PRIVCMD_HYPERCALL
  * @arg: &privcmd_hypercall_t
@@ -114,6 +119,8 @@ typedef struct privcmd_mmap_resource {
 	_IOC(_IOC_NONE, 'P', 6, sizeof(domid_t))
 #define IOCTL_PRIVCMD_MMAP_RESOURCE				\
 	_IOC(_IOC_NONE, 'P', 7, sizeof(privcmd_mmap_resource_t))
+#define IOCTL_PRIVCMD_GSI_FROM_DEV				\
+	_IOC(_IOC_NONE, 'P', 10, sizeof(privcmd_gsi_from_dev_t))
 #define IOCTL_PRIVCMD_UNIMPLEMENTED				\
 	_IOC(_IOC_NONE, 'P', 0xFF, 0)
 
diff --git a/tools/include/xencall.h b/tools/include/xencall.h
index fc95ed0fe58e..750aab070323 100644
--- a/tools/include/xencall.h
+++ b/tools/include/xencall.h
@@ -113,6 +113,8 @@ int xencall5(xencall_handle *xcall, unsigned int op,
              uint64_t arg1, uint64_t arg2, uint64_t arg3,
              uint64_t arg4, uint64_t arg5);
 
+int xen_oscall_gsi_from_dev(xencall_handle *xcall, unsigned int sbdf);
+
 /* Variant(s) of the above, as needed, returning "long" instead of "int". */
 long xencall2L(xencall_handle *xcall, unsigned int op,
                uint64_t arg1, uint64_t arg2);
diff --git a/tools/include/xenctrl.h b/tools/include/xenctrl.h
index 499685594427..841db41ad7e4 100644
--- a/tools/include/xenctrl.h
+++ b/tools/include/xenctrl.h
@@ -1641,6 +1641,8 @@ int xc_physdev_unmap_pirq(xc_interface *xch,
                           uint32_t domid,
                           int pirq);
 
+int xc_physdev_gsi_from_dev(xc_interface *xch, uint32_t sbdf);
+
 /*
  *  LOGGING AND ERROR REPORTING
  */
diff --git a/tools/libs/call/core.c b/tools/libs/call/core.c
index 02c4f8e1aefa..6dae50c9a6ba 100644
--- a/tools/libs/call/core.c
+++ b/tools/libs/call/core.c
@@ -173,6 +173,11 @@ int xencall5(xencall_handle *xcall, unsigned int op,
     return osdep_hypercall(xcall, &call);
 }
 
+int xen_oscall_gsi_from_dev(xencall_handle *xcall, unsigned int sbdf)
+{
+    return osdep_oscall(xcall, sbdf);
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/tools/libs/call/libxencall.map b/tools/libs/call/libxencall.map
index d18a3174e9dc..b92a0b5dc12c 100644
--- a/tools/libs/call/libxencall.map
+++ b/tools/libs/call/libxencall.map
@@ -10,6 +10,8 @@ VERS_1.0 {
 		xencall4;
 		xencall5;
 
+		xen_oscall_gsi_from_dev;
+
 		xencall_alloc_buffer;
 		xencall_free_buffer;
 		xencall_alloc_buffer_pages;
diff --git a/tools/libs/call/linux.c b/tools/libs/call/linux.c
index 6d588e6bea8f..92c740e176f2 100644
--- a/tools/libs/call/linux.c
+++ b/tools/libs/call/linux.c
@@ -85,6 +85,21 @@ long osdep_hypercall(xencall_handle *xcall, privcmd_hypercall_t *hypercall)
     return ioctl(xcall->fd, IOCTL_PRIVCMD_HYPERCALL, hypercall);
 }
 
+int osdep_oscall(xencall_handle *xcall, unsigned int sbdf)
+{
+    privcmd_gsi_from_dev_t dev_gsi = {
+        .sbdf = sbdf,
+        .gsi = -1,
+    };
+
+    if (ioctl(xcall->fd, IOCTL_PRIVCMD_GSI_FROM_DEV, &dev_gsi)) {
+        PERROR("failed to get gsi from dev");
+        return -1;
+    }
+
+    return dev_gsi.gsi;
+}
+
 static void *alloc_pages_bufdev(xencall_handle *xcall, size_t npages)
 {
     void *p;
diff --git a/tools/libs/call/private.h b/tools/libs/call/private.h
index 9c3aa432efe2..cd6eb5a3e66f 100644
--- a/tools/libs/call/private.h
+++ b/tools/libs/call/private.h
@@ -57,6 +57,15 @@ int osdep_xencall_close(xencall_handle *xcall);
 
 long osdep_hypercall(xencall_handle *xcall, privcmd_hypercall_t *hypercall);
 
+#if defined(__linux__)
+int osdep_oscall(xencall_handle *xcall, unsigned int sbdf);
+#else
+static inline int osdep_oscall(xencall_handle *xcall, unsigned int sbdf)
+{
+    return -1;
+}
+#endif
+
 void *osdep_alloc_pages(xencall_handle *xcall, size_t nr_pages);
 void osdep_free_pages(xencall_handle *xcall, void *p, size_t nr_pages);
 
diff --git a/tools/libs/ctrl/xc_physdev.c b/tools/libs/ctrl/xc_physdev.c
index 460a8e779ce8..c1458f3a38b5 100644
--- a/tools/libs/ctrl/xc_physdev.c
+++ b/tools/libs/ctrl/xc_physdev.c
@@ -111,3 +111,7 @@ int xc_physdev_unmap_pirq(xc_interface *xch,
     return rc;
 }
 
+int xc_physdev_gsi_from_dev(xc_interface *xch, uint32_t sbdf)
+{
+    return xen_oscall_gsi_from_dev(xch->xcall, sbdf);
+}
diff --git a/tools/libs/light/libxl_pci.c b/tools/libs/light/libxl_pci.c
index 96cb4da0794e..7e44d4c3ae2b 100644
--- a/tools/libs/light/libxl_pci.c
+++ b/tools/libs/light/libxl_pci.c
@@ -1406,6 +1406,12 @@ static bool pci_supp_legacy_irq(void)
 #endif
 }
 
+#define PCI_DEVID(bus, devfn)\
+            ((((uint16_t)(bus)) << 8) | ((devfn) & 0xff))
+
+#define PCI_SBDF(seg, bus, devfn) \
+            ((((uint32_t)(seg)) << 16) | (PCI_DEVID(bus, devfn)))
+
 static void pci_add_dm_done(libxl__egc *egc,
                             pci_add_state *pas,
                             int rc)
@@ -1418,6 +1424,7 @@ static void pci_add_dm_done(libxl__egc *egc,
     unsigned long long start, end, flags, size;
     int irq, i;
     int r;
+    uint32_t sbdf;
     uint32_t flag = XEN_DOMCTL_DEV_RDM_RELAXED;
     uint32_t domainid = domid;
     bool isstubdom = libxl_is_stubdom(ctx, domid, &domainid);
@@ -1486,6 +1493,13 @@ static void pci_add_dm_done(libxl__egc *egc,
         goto out_no_irq;
     }
     if ((fscanf(f, "%u", &irq) == 1) && irq) {
+        sbdf = PCI_SBDF(pci->domain, pci->bus,
+                        (PCI_DEVFN(pci->dev, pci->func)));
+        r = xc_physdev_gsi_from_dev(ctx->xch, sbdf);
+        /* if fail, keep using irq; if success, r is gsi, use gsi */
+        if (r != -1) {
+            irq = r;
+        }
         r = xc_physdev_map_pirq(ctx->xch, domid, irq, &irq);
         if (r < 0) {
             LOGED(ERROR, domainid, "xc_physdev_map_pirq irq=%d (error=%d)",
@@ -2172,8 +2186,10 @@ static void pci_remove_detached(libxl__egc *egc,
     int  irq = 0, i, stubdomid = 0;
     const char *sysfs_path;
     FILE *f;
+    uint32_t sbdf;
     uint32_t domainid = prs->domid;
     bool isstubdom;
+    int r;
 
     /* Convenience aliases */
     libxl_device_pci *const pci = &prs->pci;
@@ -2239,6 +2255,13 @@ skip_bar:
     }
 
     if ((fscanf(f, "%u", &irq) == 1) && irq) {
+        sbdf = PCI_SBDF(pci->domain, pci->bus,
+                        (PCI_DEVFN(pci->dev, pci->func)));
+        r = xc_physdev_gsi_from_dev(ctx->xch, sbdf);
+        /* if fail, keep using irq; if success, r is gsi, use gsi */
+        if (r != -1) {
+            irq = r;
+        }
         rc = xc_physdev_unmap_pirq(ctx->xch, domid, irq);
         if (rc < 0) {
             /*
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [RFC XEN PATCH v8 5/5] domctl: Add XEN_DOMCTL_gsi_permission to grant gsi
  2024-05-16  9:52 [XEN PATCH v8 0/5] Support device passthrough when dom0 is PVH on Xen Jiqian Chen
                   ` (3 preceding siblings ...)
  2024-05-16  9:52 ` [RFC XEN PATCH v8 4/5] tools: Add new function to get gsi from dev Jiqian Chen
@ 2024-05-16  9:52 ` Jiqian Chen
  2024-05-16 14:01   ` Jan Beulich
  4 siblings, 1 reply; 49+ messages in thread
From: Jiqian Chen @ 2024-05-16  9:52 UTC (permalink / raw
  To: xen-devel
  Cc: Jan Beulich, Andrew Cooper, Roger Pau Monné, Wei Liu,
	George Dunlap, Julien Grall, Stefano Stabellini, Anthony PERARD,
	Anthony PERARD, Juergen Gross, Daniel P . Smith,
	Stewart Hildebrand, Huang Rui, Jiqian Chen, Huang Rui

Some type of domain don't have PIRQ, like PVH, when
passthrough a device to guest on PVH dom0, callstack
pci_add_dm_done->XEN_DOMCTL_irq_permission will failed
at domain_pirq_to_irq.

So, add a new hypercall to grant/revoke gsi permission
when dom0 is not PV or dom0 has not PIRQ flag.

Signed-off-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
---
 tools/include/xenctrl.h      |  5 +++
 tools/libs/ctrl/xc_domain.c  | 15 ++++++++
 tools/libs/light/libxl_pci.c | 72 ++++++++++++++++++++++++++++--------
 xen/arch/x86/domctl.c        | 31 ++++++++++++++++
 xen/include/public/domctl.h  |  9 +++++
 xen/xsm/flask/hooks.c        |  1 +
 6 files changed, 117 insertions(+), 16 deletions(-)

diff --git a/tools/include/xenctrl.h b/tools/include/xenctrl.h
index 841db41ad7e4..c21a79d74be3 100644
--- a/tools/include/xenctrl.h
+++ b/tools/include/xenctrl.h
@@ -1382,6 +1382,11 @@ int xc_domain_irq_permission(xc_interface *xch,
                              uint32_t pirq,
                              bool allow_access);
 
+int xc_domain_gsi_permission(xc_interface *xch,
+                             uint32_t domid,
+                             uint32_t gsi,
+                             bool allow_access);
+
 int xc_domain_iomem_permission(xc_interface *xch,
                                uint32_t domid,
                                unsigned long first_mfn,
diff --git a/tools/libs/ctrl/xc_domain.c b/tools/libs/ctrl/xc_domain.c
index f2d9d14b4d9f..8540e84fda93 100644
--- a/tools/libs/ctrl/xc_domain.c
+++ b/tools/libs/ctrl/xc_domain.c
@@ -1394,6 +1394,21 @@ int xc_domain_irq_permission(xc_interface *xch,
     return do_domctl(xch, &domctl);
 }
 
+int xc_domain_gsi_permission(xc_interface *xch,
+                             uint32_t domid,
+                             uint32_t gsi,
+                             bool allow_access)
+{
+    struct xen_domctl domctl = {
+        .cmd = XEN_DOMCTL_gsi_permission,
+        .domain = domid,
+        .u.gsi_permission.gsi = gsi,
+        .u.gsi_permission.allow_access = allow_access,
+    };
+
+    return do_domctl(xch, &domctl);
+}
+
 int xc_domain_iomem_permission(xc_interface *xch,
                                uint32_t domid,
                                unsigned long first_mfn,
diff --git a/tools/libs/light/libxl_pci.c b/tools/libs/light/libxl_pci.c
index 7e44d4c3ae2b..1d1b81dd2844 100644
--- a/tools/libs/light/libxl_pci.c
+++ b/tools/libs/light/libxl_pci.c
@@ -1412,6 +1412,37 @@ static bool pci_supp_legacy_irq(void)
 #define PCI_SBDF(seg, bus, devfn) \
             ((((uint32_t)(seg)) << 16) | (PCI_DEVID(bus, devfn)))
 
+static int pci_device_set_gsi(libxl_ctx *ctx,
+                              libxl_domid domid,
+                              libxl_device_pci *pci,
+                              bool map,
+                              int *gsi_back)
+{
+    int r, gsi, pirq;
+    uint32_t sbdf;
+
+    sbdf = PCI_SBDF(pci->domain, pci->bus, (PCI_DEVFN(pci->dev, pci->func)));
+    r = xc_physdev_gsi_from_dev(ctx->xch, sbdf);
+    *gsi_back = r;
+    if (r < 0)
+        return r;
+
+    gsi = r;
+    pirq = r;
+    if (map)
+        r = xc_physdev_map_pirq(ctx->xch, domid, gsi, &pirq);
+    else
+        r = xc_physdev_unmap_pirq(ctx->xch, domid, pirq);
+    if (r)
+        return r;
+
+    r = xc_domain_gsi_permission(ctx->xch, domid, gsi, map);
+    if (r && errno == EOPNOTSUPP)
+        r = xc_domain_irq_permission(ctx->xch, domid, gsi, map);
+
+    return r;
+}
+
 static void pci_add_dm_done(libxl__egc *egc,
                             pci_add_state *pas,
                             int rc)
@@ -1424,10 +1455,10 @@ static void pci_add_dm_done(libxl__egc *egc,
     unsigned long long start, end, flags, size;
     int irq, i;
     int r;
-    uint32_t sbdf;
     uint32_t flag = XEN_DOMCTL_DEV_RDM_RELAXED;
     uint32_t domainid = domid;
     bool isstubdom = libxl_is_stubdom(ctx, domid, &domainid);
+    int gsi;
 
     /* Convenience aliases */
     bool starting = pas->starting;
@@ -1485,6 +1516,19 @@ static void pci_add_dm_done(libxl__egc *egc,
     fclose(f);
     if (!pci_supp_legacy_irq())
         goto out_no_irq;
+
+    r = pci_device_set_gsi(ctx, domid, pci, 1, &gsi);
+    if (gsi >= 0) {
+        if (r < 0) {
+            rc = ERROR_FAIL;
+            LOGED(ERROR, domainid,
+                  "pci_device_set_gsi gsi=%d (error=%d)", gsi, errno);
+            goto out;
+        } else {
+            goto process_permissive;
+        }
+    }
+    /* if gsi < 0, keep using irq */
     sysfs_path = GCSPRINTF(SYSFS_PCI_DEV"/"PCI_BDF"/irq", pci->domain,
                                 pci->bus, pci->dev, pci->func);
     f = fopen(sysfs_path, "r");
@@ -1493,13 +1537,6 @@ static void pci_add_dm_done(libxl__egc *egc,
         goto out_no_irq;
     }
     if ((fscanf(f, "%u", &irq) == 1) && irq) {
-        sbdf = PCI_SBDF(pci->domain, pci->bus,
-                        (PCI_DEVFN(pci->dev, pci->func)));
-        r = xc_physdev_gsi_from_dev(ctx->xch, sbdf);
-        /* if fail, keep using irq; if success, r is gsi, use gsi */
-        if (r != -1) {
-            irq = r;
-        }
         r = xc_physdev_map_pirq(ctx->xch, domid, irq, &irq);
         if (r < 0) {
             LOGED(ERROR, domainid, "xc_physdev_map_pirq irq=%d (error=%d)",
@@ -1519,6 +1556,7 @@ static void pci_add_dm_done(libxl__egc *egc,
     }
     fclose(f);
 
+process_permissive:
     /* Don't restrict writes to the PCI config space from this VM */
     if (pci->permissive) {
         if ( sysfs_write_bdf(gc, SYSFS_PCIBACK_DRIVER"/permissive",
@@ -2186,10 +2224,10 @@ static void pci_remove_detached(libxl__egc *egc,
     int  irq = 0, i, stubdomid = 0;
     const char *sysfs_path;
     FILE *f;
-    uint32_t sbdf;
     uint32_t domainid = prs->domid;
     bool isstubdom;
     int r;
+    int gsi;
 
     /* Convenience aliases */
     libxl_device_pci *const pci = &prs->pci;
@@ -2245,6 +2283,15 @@ skip_bar:
     if (!pci_supp_legacy_irq())
         goto skip_legacy_irq;
 
+    r = pci_device_set_gsi(ctx, domid, pci, 0, &gsi);
+    if (gsi >= 0) {
+        if (r < 0) {
+            LOGED(ERROR, domainid,
+                  "pci_device_set_gsi gsi=%d (error=%d)", gsi, errno);
+        }
+        goto skip_legacy_irq;
+    }
+    /* if gsi < 0, keep using irq */
     sysfs_path = GCSPRINTF(SYSFS_PCI_DEV"/"PCI_BDF"/irq", pci->domain,
                            pci->bus, pci->dev, pci->func);
 
@@ -2255,13 +2302,6 @@ skip_bar:
     }
 
     if ((fscanf(f, "%u", &irq) == 1) && irq) {
-        sbdf = PCI_SBDF(pci->domain, pci->bus,
-                        (PCI_DEVFN(pci->dev, pci->func)));
-        r = xc_physdev_gsi_from_dev(ctx->xch, sbdf);
-        /* if fail, keep using irq; if success, r is gsi, use gsi */
-        if (r != -1) {
-            irq = r;
-        }
         rc = xc_physdev_unmap_pirq(ctx->xch, domid, irq);
         if (rc < 0) {
             /*
diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c
index 9a72d57333e9..9b8a08b2a81d 100644
--- a/xen/arch/x86/domctl.c
+++ b/xen/arch/x86/domctl.c
@@ -237,6 +237,37 @@ long arch_do_domctl(
         break;
     }
 
+    case XEN_DOMCTL_gsi_permission:
+    {
+        unsigned int gsi = domctl->u.gsi_permission.gsi;
+        int allow = domctl->u.gsi_permission.allow_access;
+
+        if ( is_pv_domain(current->domain) || has_pirq(current->domain) )
+        {
+            ret = -EOPNOTSUPP;
+            break;
+        }
+
+        if ( gsi >= nr_irqs_gsi )
+        {
+            ret = -EINVAL;
+            break;
+        }
+
+        if ( !irq_access_permitted(current->domain, gsi) ||
+             xsm_irq_permission(XSM_HOOK, d, gsi, allow) )
+        {
+            ret = -EPERM;
+            break;
+        }
+
+        if ( allow )
+            ret = irq_permit_access(d, gsi);
+        else
+            ret = irq_deny_access(d, gsi);
+        break;
+    }
+
     case XEN_DOMCTL_getpageframeinfo3:
     {
         unsigned int num = domctl->u.getpageframeinfo3.num;
diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
index a33f9ec32b08..47e95f9ee824 100644
--- a/xen/include/public/domctl.h
+++ b/xen/include/public/domctl.h
@@ -447,6 +447,13 @@ struct xen_domctl_irq_permission {
 };
 
 
+/* XEN_DOMCTL_gsi_permission */
+struct xen_domctl_gsi_permission {
+    uint32_t gsi;
+    uint8_t allow_access;    /* flag to specify enable/disable of x86 gsi access */
+};
+
+
 /* XEN_DOMCTL_iomem_permission */
 struct xen_domctl_iomem_permission {
     uint64_aligned_t first_mfn;/* first page (physical page number) in range */
@@ -1277,6 +1284,7 @@ struct xen_domctl {
 #define XEN_DOMCTL_vmtrace_op                    84
 #define XEN_DOMCTL_get_paging_mempool_size       85
 #define XEN_DOMCTL_set_paging_mempool_size       86
+#define XEN_DOMCTL_gsi_permission                87
 #define XEN_DOMCTL_gdbsx_guestmemio            1000
 #define XEN_DOMCTL_gdbsx_pausevcpu             1001
 #define XEN_DOMCTL_gdbsx_unpausevcpu           1002
@@ -1299,6 +1307,7 @@ struct xen_domctl {
         struct xen_domctl_setdomainhandle   setdomainhandle;
         struct xen_domctl_setdebugging      setdebugging;
         struct xen_domctl_irq_permission    irq_permission;
+        struct xen_domctl_gsi_permission    gsi_permission;
         struct xen_domctl_iomem_permission  iomem_permission;
         struct xen_domctl_ioport_permission ioport_permission;
         struct xen_domctl_hypercall_init    hypercall_init;
diff --git a/xen/xsm/flask/hooks.c b/xen/xsm/flask/hooks.c
index 5e88c71b8e22..a5b134c91101 100644
--- a/xen/xsm/flask/hooks.c
+++ b/xen/xsm/flask/hooks.c
@@ -685,6 +685,7 @@ static int cf_check flask_domctl(struct domain *d, int cmd)
     case XEN_DOMCTL_shadow_op:
     case XEN_DOMCTL_ioport_permission:
     case XEN_DOMCTL_ioport_mapping:
+    case XEN_DOMCTL_gsi_permission:
 #endif
 #ifdef CONFIG_HAS_PASSTHROUGH
     /*
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [XEN PATCH v8 1/5] xen/vpci: Clear all vpci status of device
  2024-05-16  9:52 ` [XEN PATCH v8 1/5] xen/vpci: Clear all vpci status of device Jiqian Chen
@ 2024-05-16 13:08   ` Jan Beulich
  2024-05-17  8:08     ` Chen, Jiqian
  0 siblings, 1 reply; 49+ messages in thread
From: Jan Beulich @ 2024-05-16 13:08 UTC (permalink / raw
  To: Jiqian Chen, Daniel P . Smith
  Cc: Andrew Cooper, Roger Pau Monné, Wei Liu, George Dunlap,
	Julien Grall, Stefano Stabellini, Anthony PERARD, Juergen Gross,
	Stewart Hildebrand, Huang Rui, xen-devel

On 16.05.2024 11:52, Jiqian Chen wrote:
> @@ -67,6 +68,41 @@ ret_t pci_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>          break;
>      }
>  
> +    case PHYSDEVOP_pci_device_state_reset: {
> +        struct physdev_pci_device dev;
> +        struct pci_dev *pdev;
> +        pci_sbdf_t sbdf;
> +
> +        if ( !is_pci_passthrough_enabled() )
> +            return -EOPNOTSUPP;
> +
> +        ret = -EFAULT;
> +        if ( copy_from_guest(&dev, arg, 1) != 0 )
> +            break;
> +        sbdf = PCI_SBDF(dev.seg, dev.bus, dev.devfn);
> +
> +        ret = xsm_resource_setup_pci(XSM_PRIV, sbdf.sbdf);
> +        if ( ret )
> +            break;

Daniel, is re-use of this hook appropriate here?

> +        pcidevs_lock();
> +        pdev = pci_get_pdev(NULL, sbdf);
> +        if ( !pdev )
> +        {
> +            pcidevs_unlock();
> +            ret = -ENODEV;
> +            break;
> +        }
> +
> +        write_lock(&pdev->domain->pci_lock);
> +        ret = vpci_reset_device_state(pdev);
> +        write_unlock(&pdev->domain->pci_lock);
> +        pcidevs_unlock();

Can't this be done right after the write_lock()?

> +        if ( ret )
> +            printk(XENLOG_ERR "%pp: failed to reset PCI device state\n", &sbdf);

s/PCI/vPCI/ (at least as long as nothing else is done here)

> --- a/xen/drivers/vpci/vpci.c
> +++ b/xen/drivers/vpci/vpci.c
> @@ -115,6 +115,16 @@ int vpci_assign_device(struct pci_dev *pdev)
>  
>      return rc;
>  }
> +
> +int vpci_reset_device_state(struct pci_dev *pdev)
> +{
> +    ASSERT(pcidevs_locked());

Is this necessary for ...

> +    ASSERT(rw_is_write_locked(&pdev->domain->pci_lock));
> +
> +    vpci_deassign_device(pdev);
> +    return vpci_assign_device(pdev);

... either of these calls? Both functions do themselves only have the
2nd of the asserts you add.

> --- a/xen/include/public/physdev.h
> +++ b/xen/include/public/physdev.h
> @@ -296,6 +296,13 @@ DEFINE_XEN_GUEST_HANDLE(physdev_pci_device_add_t);
>   */
>  #define PHYSDEVOP_prepare_msix          30
>  #define PHYSDEVOP_release_msix          31
> +/*
> + * Notify the hypervisor that a PCI device has been reset, so that any
> + * internally cached state is regenerated.  Should be called after any
> + * device reset performed by the hardware domain.
> + */
> +#define PHYSDEVOP_pci_device_state_reset     32

Nit: Just a single blank as a separator will do here, for going past the
columnar arrangement of other #define-s anyway.

>  struct physdev_pci_device {
>      /* IN */
>      uint16_t seg;

Is re-using this struct for this new sub-op sufficient? IOW are all
possible resets equal, and hence it doesn't need specifying what kind of
reset was done? For example, other than FLR most reset variants reset all
functions in one go aiui. Imo that would better require only a single
hypercall, just to avoid possible confusion. It also reads as if FLR would
not reset as many registers as other reset variants would.

This may seem to not matter right now, but this is a stable interface you
add, and hence modifying it later will be cumbersome, if possible at all.

Jan


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [XEN PATCH v8 2/5] x86/pvh: Allow (un)map_pirq when dom0 is PVH
  2024-05-16  9:52 ` [XEN PATCH v8 2/5] x86/pvh: Allow (un)map_pirq when dom0 is PVH Jiqian Chen
@ 2024-05-16 13:29   ` Jan Beulich
  2024-05-17  8:44     ` Chen, Jiqian
  0 siblings, 1 reply; 49+ messages in thread
From: Jan Beulich @ 2024-05-16 13:29 UTC (permalink / raw
  To: Jiqian Chen
  Cc: Andrew Cooper, Roger Pau Monné, Wei Liu, George Dunlap,
	Julien Grall, Stefano Stabellini, Anthony PERARD, Juergen Gross,
	Daniel P . Smith, Stewart Hildebrand, Huang Rui, xen-devel

On 16.05.2024 11:52, Jiqian Chen wrote:
> If run Xen with PVH dom0 and hvm domU, hvm will map a pirq for
> a passthrough device by using gsi, see
> xen_pt_realize->xc_physdev_map_pirq and
> pci_add_dm_done->xc_physdev_map_pirq.

xen_pt_realize() is in qemu, which imo wants saying here (for being a different
repo), the more that pci_add_dm_done() is in libxl.

> --- a/xen/arch/x86/hvm/hypercall.c
> +++ b/xen/arch/x86/hvm/hypercall.c
> @@ -74,6 +74,8 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>      {
>      case PHYSDEVOP_map_pirq:
>      case PHYSDEVOP_unmap_pirq:
> +        break;

I think this could do with a comment as to why it's permitted as well as giving
a reference to where further restrictions are enforced (or simply mentioning
the constraint of this only being permitted for management of other domains).

> --- a/xen/arch/x86/physdev.c
> +++ b/xen/arch/x86/physdev.c
> @@ -305,11 +305,23 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>      case PHYSDEVOP_map_pirq: {
>          physdev_map_pirq_t map;
>          struct msi_info msi;
> +        struct domain *d;
>  
>          ret = -EFAULT;
>          if ( copy_from_guest(&map, arg, 1) != 0 )
>              break;
>  
> +        d = rcu_lock_domain_by_any_id(map.domid);
> +        if ( d == NULL )
> +            return -ESRCH;
> +        /* If caller is the same HVM guest as current, check pirq flag */

The caller is always current. What I think you mean is "caller is same as
the subject domain". I'm also having trouble with seeing the usefulness
of saying "check pirq flag". Instead I think you want to state the
restriction here that you actually mean to enforce (which would also mean
mentioning PVH in some way, to distinguish from the "normal HVM" case).

> +        if ( !is_pv_domain(d) && !has_pirq(d) && map.domid == DOMID_SELF )

You exclude DOMID_SELF but not the domain's ID? Why not simply check d
being current->domain, thus covering both cases? Plus you could use
rcu_lock_domain_by_id() to exclude DOMID_SELF, and you could use
rcu_lock_remote_domain_by_id() to exclude the local domain altogether.
Finally I'm not even sure you need the RCU lock here (else you could
use knownalive_domain_from_domid()). But perhaps that's better to cover
the qemu-in-stubdom case, which we have to consider potentially malicious.

I'm also inclined to suggest to use is_hvm_domain() here in favor of
!is_pv_domain().

Jan

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [XEN PATCH v8 3/5] x86/pvh: Add PHYSDEVOP_setup_gsi for PVH dom0
  2024-05-16  9:52 ` [XEN PATCH v8 3/5] x86/pvh: Add PHYSDEVOP_setup_gsi for PVH dom0 Jiqian Chen
@ 2024-05-16 13:49   ` Jan Beulich
  2024-05-17  9:00     ` Chen, Jiqian
  0 siblings, 1 reply; 49+ messages in thread
From: Jan Beulich @ 2024-05-16 13:49 UTC (permalink / raw
  To: Jiqian Chen
  Cc: Andrew Cooper, Roger Pau Monné, Wei Liu, George Dunlap,
	Julien Grall, Stefano Stabellini, Anthony PERARD, Juergen Gross,
	Daniel P . Smith, Stewart Hildebrand, Huang Rui, xen-devel

On 16.05.2024 11:52, Jiqian Chen wrote:
> --- a/xen/arch/x86/hvm/hypercall.c
> +++ b/xen/arch/x86/hvm/hypercall.c
> @@ -76,6 +76,11 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>      case PHYSDEVOP_unmap_pirq:
>          break;
>  
> +    case PHYSDEVOP_setup_gsi:
> +        if ( !is_hardware_domain(currd) )
> +            return -EOPNOTSUPP;
> +        break;
> +
>      case PHYSDEVOP_eoi:
>      case PHYSDEVOP_irq_status_query:
>      case PHYSDEVOP_get_free_pirq:

Below here we have a hardware-domain-only block already. Any reason not
to add to there? Yes, that uses -ENOSYS. Imo that wants changing anyway,
but even without that to me it would seem more consistent overall to have
the new case simply added there.

Just to double check: Is there a respective Linux patch already (if so,
cross-linking the patches may be helpful)? Or does PVH Linux invoke the
sub-op already anyway, just that so far it fails?

Jan


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC XEN PATCH v8 5/5] domctl: Add XEN_DOMCTL_gsi_permission to grant gsi
  2024-05-16  9:52 ` [RFC XEN PATCH v8 5/5] domctl: Add XEN_DOMCTL_gsi_permission to grant gsi Jiqian Chen
@ 2024-05-16 14:01   ` Jan Beulich
  2024-05-17 10:45     ` Chen, Jiqian
  0 siblings, 1 reply; 49+ messages in thread
From: Jan Beulich @ 2024-05-16 14:01 UTC (permalink / raw
  To: Jiqian Chen, Daniel P . Smith
  Cc: Andrew Cooper, Roger Pau Monné, Wei Liu, George Dunlap,
	Julien Grall, Stefano Stabellini, Anthony PERARD, Juergen Gross,
	Stewart Hildebrand, Huang Rui, xen-devel

On 16.05.2024 11:52, Jiqian Chen wrote:
> Some type of domain don't have PIRQ, like PVH, when
> passthrough a device to guest on PVH dom0, callstack
> pci_add_dm_done->XEN_DOMCTL_irq_permission will failed
> at domain_pirq_to_irq.
> 
> So, add a new hypercall to grant/revoke gsi permission
> when dom0 is not PV or dom0 has not PIRQ flag.

Honestly I find this hard to follow, and thus not really making clear why
no other existing mechanism could be used.

> Signed-off-by: Huang Rui <ray.huang@amd.com>
> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
> ---

Below here in an RFC patch you typically would want to put specific items
you're seeking feedback on. Without that it's hard to tell why this is
marked RFC.

> --- a/xen/arch/x86/domctl.c
> +++ b/xen/arch/x86/domctl.c
> @@ -237,6 +237,37 @@ long arch_do_domctl(
>          break;
>      }
>  
> +    case XEN_DOMCTL_gsi_permission:
> +    {
> +        unsigned int gsi = domctl->u.gsi_permission.gsi;
> +        int allow = domctl->u.gsi_permission.allow_access;

bool?

> +        if ( is_pv_domain(current->domain) || has_pirq(current->domain) )
> +        {
> +            ret = -EOPNOTSUPP;
> +            break;
> +        }

Such a restriction imo wants explaining in a comment.

> +        if ( gsi >= nr_irqs_gsi )
> +        {
> +            ret = -EINVAL;
> +            break;
> +        }
> +
> +        if ( !irq_access_permitted(current->domain, gsi) ||

I.e. assuming IRQ == GSI? Is that a valid assumption when any number of
source overrides may be surfaced by ACPI?

> +             xsm_irq_permission(XSM_HOOK, d, gsi, allow) )

Here I'm pretty sure you can't very well re-use an existing hook, as the
value of interest is in a different numbering space, and a possible hook
function has no way of knowing which one it is. Daniel?

> +        {
> +            ret = -EPERM;
> +            break;
> +        }
> +
> +        if ( allow )
> +            ret = irq_permit_access(d, gsi);
> +        else
> +            ret = irq_deny_access(d, gsi);

As above I'm afraid you can't assume IRQ == GSI.

> --- a/xen/include/public/domctl.h
> +++ b/xen/include/public/domctl.h
> @@ -447,6 +447,13 @@ struct xen_domctl_irq_permission {
>  };
>  
>  
> +/* XEN_DOMCTL_gsi_permission */
> +struct xen_domctl_gsi_permission {
> +    uint32_t gsi;
> +    uint8_t allow_access;    /* flag to specify enable/disable of x86 gsi access */
> +};

Explicit padding please, including a check that it's zero on input.

> +
> +
>  /* XEN_DOMCTL_iomem_permission */

No double blank lines please. In fact you will want to break the double blank
lines in leading context, inserting in the middle.

Jan


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [XEN PATCH v8 1/5] xen/vpci: Clear all vpci status of device
  2024-05-16 13:08   ` Jan Beulich
@ 2024-05-17  8:08     ` Chen, Jiqian
  2024-05-17  8:20       ` Jan Beulich
  2024-05-17 13:15       ` Stewart Hildebrand
  0 siblings, 2 replies; 49+ messages in thread
From: Chen, Jiqian @ 2024-05-17  8:08 UTC (permalink / raw
  To: Jan Beulich, Daniel P . Smith
  Cc: Andrew Cooper, Roger Pau Monné, Wei Liu, George Dunlap,
	Julien Grall, Stefano Stabellini, Anthony PERARD, Juergen Gross,
	Hildebrand, Stewart, Huang, Ray, xen-devel@lists.xenproject.org,
	Chen, Jiqian

On 2024/5/16 21:08, Jan Beulich wrote:
> On 16.05.2024 11:52, Jiqian Chen wrote:
>> @@ -67,6 +68,41 @@ ret_t pci_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>>          break;
>>      }
>>  
>> +    case PHYSDEVOP_pci_device_state_reset: {
>> +        struct physdev_pci_device dev;
>> +        struct pci_dev *pdev;
>> +        pci_sbdf_t sbdf;
>> +
>> +        if ( !is_pci_passthrough_enabled() )
>> +            return -EOPNOTSUPP;
>> +
>> +        ret = -EFAULT;
>> +        if ( copy_from_guest(&dev, arg, 1) != 0 )
>> +            break;
>> +        sbdf = PCI_SBDF(dev.seg, dev.bus, dev.devfn);
>> +
>> +        ret = xsm_resource_setup_pci(XSM_PRIV, sbdf.sbdf);
>> +        if ( ret )
>> +            break;
> 
> Daniel, is re-use of this hook appropriate here?
In the v2 of this series, Daniel and Roger have agreed that reusing xsm_resource_setup_pci is enough.

> 
>> +        pcidevs_lock();
>> +        pdev = pci_get_pdev(NULL, sbdf);
>> +        if ( !pdev )
>> +        {
>> +            pcidevs_unlock();
>> +            ret = -ENODEV;
>> +            break;
>> +        }
>> +
>> +        write_lock(&pdev->domain->pci_lock);
>> +        ret = vpci_reset_device_state(pdev);
>> +        write_unlock(&pdev->domain->pci_lock);
>> +        pcidevs_unlock();
> 
> Can't this be done right after the write_lock()?
You are right, I will move it in next version.

> 
>> +        if ( ret )
>> +            printk(XENLOG_ERR "%pp: failed to reset PCI device state\n", &sbdf);
> 
> s/PCI/vPCI/ (at least as long as nothing else is done here)
OK, will change in next version.

> 
>> --- a/xen/drivers/vpci/vpci.c
>> +++ b/xen/drivers/vpci/vpci.c
>> @@ -115,6 +115,16 @@ int vpci_assign_device(struct pci_dev *pdev)
>>  
>>      return rc;
>>  }
>> +
>> +int vpci_reset_device_state(struct pci_dev *pdev)
>> +{
>> +    ASSERT(pcidevs_locked());
> 
> Is this necessary for ...
> 
>> +    ASSERT(rw_is_write_locked(&pdev->domain->pci_lock));
>> +
>> +    vpci_deassign_device(pdev);
>> +    return vpci_assign_device(pdev);
> 
> ... either of these calls? Both functions do themselves only have the
> 2nd of the asserts you add.
I checked codes again; I will remove the two asserts here in next version.

> 
>> --- a/xen/include/public/physdev.h
>> +++ b/xen/include/public/physdev.h
>> @@ -296,6 +296,13 @@ DEFINE_XEN_GUEST_HANDLE(physdev_pci_device_add_t);
>>   */
>>  #define PHYSDEVOP_prepare_msix          30
>>  #define PHYSDEVOP_release_msix          31
>> +/*
>> + * Notify the hypervisor that a PCI device has been reset, so that any
>> + * internally cached state is regenerated.  Should be called after any
>> + * device reset performed by the hardware domain.
>> + */
>> +#define PHYSDEVOP_pci_device_state_reset     32
> 
> Nit: Just a single blank as a separator will do here, for going past the
> columnar arrangement of other #define-s anyway.
OK.
> 
>>  struct physdev_pci_device {
>>      /* IN */
>>      uint16_t seg;
> 
> Is re-using this struct for this new sub-op sufficient? IOW are all
> possible resets equal, and hence it doesn't need specifying what kind of
> reset was done? For example, other than FLR most reset variants reset all
> functions in one go aiui. Imo that would better require only a single
> hypercall, just to avoid possible confusion. It also reads as if FLR would
> not reset as many registers as other reset variants would.
If I understood correctly that you mean in this hypercall it needs to support resetting both one function and all functions of a slot(dev)?
But it can be done for caller to use a cycle to call this reset hypercall for each slot function.

> 
> This may seem to not matter right now, but this is a stable interface you
> add, and hence modifying it later will be cumbersome, if possible at all.
> 
> Jan

-- 
Best regards,
Jiqian Chen.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [XEN PATCH v8 1/5] xen/vpci: Clear all vpci status of device
  2024-05-17  8:08     ` Chen, Jiqian
@ 2024-05-17  8:20       ` Jan Beulich
  2024-05-17  9:28         ` Chen, Jiqian
  2024-05-17 13:15       ` Stewart Hildebrand
  1 sibling, 1 reply; 49+ messages in thread
From: Jan Beulich @ 2024-05-17  8:20 UTC (permalink / raw
  To: Chen, Jiqian
  Cc: Andrew Cooper, Roger Pau Monné, Wei Liu, George Dunlap,
	Julien Grall, Stefano Stabellini, Anthony PERARD, Juergen Gross,
	Hildebrand, Stewart, Huang, Ray, xen-devel@lists.xenproject.org,
	Daniel P . Smith

On 17.05.2024 10:08, Chen, Jiqian wrote:
> On 2024/5/16 21:08, Jan Beulich wrote:
>> On 16.05.2024 11:52, Jiqian Chen wrote:
>>>  struct physdev_pci_device {
>>>      /* IN */
>>>      uint16_t seg;
>>
>> Is re-using this struct for this new sub-op sufficient? IOW are all
>> possible resets equal, and hence it doesn't need specifying what kind of
>> reset was done? For example, other than FLR most reset variants reset all
>> functions in one go aiui. Imo that would better require only a single
>> hypercall, just to avoid possible confusion. It also reads as if FLR would
>> not reset as many registers as other reset variants would.
> If I understood correctly that you mean in this hypercall it needs to support resetting both one function and all functions of a slot(dev)?
> But it can be done for caller to use a cycle to call this reset hypercall for each slot function.

It could, yes, but since (aiui) there needs to be an indication of the
kind of reset anyway, we can as well avoid relying on the caller doing
so (and at the same time simplify what the caller needs to do).

Jan


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [XEN PATCH v8 2/5] x86/pvh: Allow (un)map_pirq when dom0 is PVH
  2024-05-16 13:29   ` Jan Beulich
@ 2024-05-17  8:44     ` Chen, Jiqian
  0 siblings, 0 replies; 49+ messages in thread
From: Chen, Jiqian @ 2024-05-17  8:44 UTC (permalink / raw
  To: Jan Beulich
  Cc: Andrew Cooper, Roger Pau Monné, Wei Liu, George Dunlap,
	Julien Grall, Stefano Stabellini, Anthony PERARD, Juergen Gross,
	Daniel P . Smith, Hildebrand, Stewart, Huang, Ray,
	xen-devel@lists.xenproject.org, Chen, Jiqian

On 2024/5/16 21:29, Jan Beulich wrote:
> On 16.05.2024 11:52, Jiqian Chen wrote:
>> If run Xen with PVH dom0 and hvm domU, hvm will map a pirq for
>> a passthrough device by using gsi, see
>> xen_pt_realize->xc_physdev_map_pirq and
>> pci_add_dm_done->xc_physdev_map_pirq.
> 
> xen_pt_realize() is in qemu, which imo wants saying here (for being a different
> repo), the more that pci_add_dm_done() is in libxl.
OK, I will describe more here(in qemu and in libxl).

> 
>> --- a/xen/arch/x86/hvm/hypercall.c
>> +++ b/xen/arch/x86/hvm/hypercall.c
>> @@ -74,6 +74,8 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>>      {
>>      case PHYSDEVOP_map_pirq:
>>      case PHYSDEVOP_unmap_pirq:
>> +        break;
> 
> I think this could do with a comment as to why it's permitted as well as giving
> a reference to where further restrictions are enforced (or simply mentioning
> the constraint of this only being permitted for management of other domains).
Thanks, will add
/* 
  * Only being permitted for management of other domains.
  * Further restrictions are enforced in do_physdev_op.
  */

> 
>> --- a/xen/arch/x86/physdev.c
>> +++ b/xen/arch/x86/physdev.c
>> @@ -305,11 +305,23 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>>      case PHYSDEVOP_map_pirq: {
>>          physdev_map_pirq_t map;
>>          struct msi_info msi;
>> +        struct domain *d;
>>  
>>          ret = -EFAULT;
>>          if ( copy_from_guest(&map, arg, 1) != 0 )
>>              break;
>>  
>> +        d = rcu_lock_domain_by_any_id(map.domid);
>> +        if ( d == NULL )
>> +            return -ESRCH;
>> +        /* If caller is the same HVM guest as current, check pirq flag */
> 
> The caller is always current. What I think you mean is "caller is same as
> the subject domain". 
Yes, I want to prevent self-map when subject domain(domU) doesn't have X86_EMU_USE_PIRQ flag.

> I'm also having trouble with seeing the usefulness of saying "check pirq flag". Instead I think you want to state the
> restriction here that you actually mean to enforce (which would also mean
> mentioning PVH in some way, to distinguish from the "normal HVM" case).
Yes, PVH and the HVM without X86_EMU_USE_PIRQ flag,
If a HVM has X86_EMU_USE_PIRQ flag, map_pirq should be permitted.

I will change this comment to be:
/* Prevent self-map when domain has no X86_EMU_USE_PIRQ flag */

> 
>> +        if ( !is_pv_domain(d) && !has_pirq(d) && map.domid == DOMID_SELF )
> 
> You exclude DOMID_SELF but not the domain's ID? Why not simply check d
> being current->domain, thus covering both cases? 
> Plus you could use rcu_lock_domain_by_id() to exclude DOMID_SELF, and you could use
> rcu_lock_remote_domain_by_id() to exclude the local domain altogether.
But there is a case that hvm hold PIRQ flag and DOMID_SELF id will do this pirq_map, see code
physdev_map_pirq.
I think change to check d being current->domain is more suitable.

> Finally I'm not even sure you need the RCU lock here (else you could
> use knownalive_domain_from_domid()). But perhaps that's better to cover
> the qemu-in-stubdom case, which we have to consider potentially malicious.
Yes, for potential safety reasons, let's keep the RCU lock.

> 
> I'm also inclined to suggest to use is_hvm_domain() here in favor of
> !is_pv_domain().
OK, will change to is_hvm_domain in next version.

> 
> Jan

-- 
Best regards,
Jiqian Chen.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [XEN PATCH v8 3/5] x86/pvh: Add PHYSDEVOP_setup_gsi for PVH dom0
  2024-05-16 13:49   ` Jan Beulich
@ 2024-05-17  9:00     ` Chen, Jiqian
  2024-05-17  9:04       ` Jan Beulich
  0 siblings, 1 reply; 49+ messages in thread
From: Chen, Jiqian @ 2024-05-17  9:00 UTC (permalink / raw
  To: Jan Beulich
  Cc: Andrew Cooper, Roger Pau Monné, Wei Liu, George Dunlap,
	Julien Grall, Stefano Stabellini, Anthony PERARD, Juergen Gross,
	Daniel P . Smith, Hildebrand, Stewart, Huang, Ray,
	xen-devel@lists.xenproject.org, Chen, Jiqian

On 2024/5/16 21:49, Jan Beulich wrote:
> On 16.05.2024 11:52, Jiqian Chen wrote:
>> --- a/xen/arch/x86/hvm/hypercall.c
>> +++ b/xen/arch/x86/hvm/hypercall.c
>> @@ -76,6 +76,11 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>>      case PHYSDEVOP_unmap_pirq:
>>          break;
>>  
>> +    case PHYSDEVOP_setup_gsi:
>> +        if ( !is_hardware_domain(currd) )
>> +            return -EOPNOTSUPP;
>> +        break;
>> +
>>      case PHYSDEVOP_eoi:
>>      case PHYSDEVOP_irq_status_query:
>>      case PHYSDEVOP_get_free_pirq:
> 
> Below here we have a hardware-domain-only block already. Any reason not
> to add to there? Yes, that uses -ENOSYS. Imo that wants changing anyway,
> but even without that to me it would seem more consistent overall to have
> the new case simply added there.
Ah yes, I remembered you suggest me to use EOPNOTSUPP in v4, if change to ENOSYS is also fine, I will move to below in next version.

> 
> Just to double check: Is there a respective Linux patch already (if so,
> cross-linking the patches may be helpful)?
Yes, my corresponding kernel patch:
https://lore.kernel.org/lkml/20240515065011.13797-1-Jiqian.Chen@amd.com/T/#mc56b111562d7c67886da905e306f12b3ef8076b4 
Do you mean I need to add this link into the commit message once the kernel patch is accepted?
> Or does PVH Linux invoke the sub-op already anyway, just that so far it fails? 
> 
> Jan

-- 
Best regards,
Jiqian Chen.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [XEN PATCH v8 3/5] x86/pvh: Add PHYSDEVOP_setup_gsi for PVH dom0
  2024-05-17  9:00     ` Chen, Jiqian
@ 2024-05-17  9:04       ` Jan Beulich
  0 siblings, 0 replies; 49+ messages in thread
From: Jan Beulich @ 2024-05-17  9:04 UTC (permalink / raw
  To: Chen, Jiqian
  Cc: Andrew Cooper, Roger Pau Monné, Wei Liu, George Dunlap,
	Julien Grall, Stefano Stabellini, Anthony PERARD, Juergen Gross,
	Daniel P . Smith, Hildebrand, Stewart, Huang, Ray,
	xen-devel@lists.xenproject.org

On 17.05.2024 11:00, Chen, Jiqian wrote:
> On 2024/5/16 21:49, Jan Beulich wrote:
>> On 16.05.2024 11:52, Jiqian Chen wrote:
>>> --- a/xen/arch/x86/hvm/hypercall.c
>>> +++ b/xen/arch/x86/hvm/hypercall.c
>>> @@ -76,6 +76,11 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>>>      case PHYSDEVOP_unmap_pirq:
>>>          break;
>>>  
>>> +    case PHYSDEVOP_setup_gsi:
>>> +        if ( !is_hardware_domain(currd) )
>>> +            return -EOPNOTSUPP;
>>> +        break;
>>> +
>>>      case PHYSDEVOP_eoi:
>>>      case PHYSDEVOP_irq_status_query:
>>>      case PHYSDEVOP_get_free_pirq:
>>
>> Below here we have a hardware-domain-only block already. Any reason not
>> to add to there? Yes, that uses -ENOSYS. Imo that wants changing anyway,
>> but even without that to me it would seem more consistent overall to have
>> the new case simply added there.
> Ah yes, I remembered you suggest me to use EOPNOTSUPP in v4, if change to ENOSYS is also fine, I will move to below in next version.
> 
>>
>> Just to double check: Is there a respective Linux patch already (if so,
>> cross-linking the patches may be helpful)?
> Yes, my corresponding kernel patch:
> https://lore.kernel.org/lkml/20240515065011.13797-1-Jiqian.Chen@amd.com/T/#mc56b111562d7c67886da905e306f12b3ef8076b4 
> Do you mean I need to add this link into the commit message once the kernel patch is accepted?

Not necessarily the commit message itself, but below the --- marker.
If the kernel patch was accepted earlier than the Xen one (which imo it
better shouldn't be), you'd likely want to move to pointing at the
resulting commit.

Jan


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [XEN PATCH v8 1/5] xen/vpci: Clear all vpci status of device
  2024-05-17  8:20       ` Jan Beulich
@ 2024-05-17  9:28         ` Chen, Jiqian
  2024-05-17  9:50           ` Jan Beulich
  0 siblings, 1 reply; 49+ messages in thread
From: Chen, Jiqian @ 2024-05-17  9:28 UTC (permalink / raw
  To: Jan Beulich
  Cc: Andrew Cooper, Roger Pau Monné, Wei Liu, George Dunlap,
	Julien Grall, Stefano Stabellini, Anthony PERARD, Juergen Gross,
	Hildebrand, Stewart, Huang, Ray, xen-devel@lists.xenproject.org,
	Daniel P . Smith, Chen, Jiqian

On 2024/5/17 16:20, Jan Beulich wrote:
> On 17.05.2024 10:08, Chen, Jiqian wrote:
>> On 2024/5/16 21:08, Jan Beulich wrote:
>>> On 16.05.2024 11:52, Jiqian Chen wrote:
>>>>  struct physdev_pci_device {
>>>>      /* IN */
>>>>      uint16_t seg;
>>>
>>> Is re-using this struct for this new sub-op sufficient? IOW are all
>>> possible resets equal, and hence it doesn't need specifying what kind of
>>> reset was done? For example, other than FLR most reset variants reset all
>>> functions in one go aiui. Imo that would better require only a single
>>> hypercall, just to avoid possible confusion. It also reads as if FLR would
>>> not reset as many registers as other reset variants would.
>> If I understood correctly that you mean in this hypercall it needs to support resetting both one function and all functions of a slot(dev)?
>> But it can be done for caller to use a cycle to call this reset hypercall for each slot function.
> 
> It could, yes, but since (aiui) there needs to be an indication of the
> kind of reset anyway, we can as well avoid relying on the caller doing
> so (and at the same time simplify what the caller needs to do).
Since the corresponding kernel patch has been merged into linux_next branch
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?h=next-20240515&id=b272722511d5e8ae580f01830687b8a6b2717f01,
if it's not very mandatory and necessary, just let the caller handle it temporarily.
Or it can add a new hypercall to reset all functions in one go in future potential requirement, like PHYSDEVOP_pci_device_state_reset_all_func.

> 
> Jan

-- 
Best regards,
Jiqian Chen.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [XEN PATCH v8 1/5] xen/vpci: Clear all vpci status of device
  2024-05-17  9:28         ` Chen, Jiqian
@ 2024-05-17  9:50           ` Jan Beulich
  2024-05-17 10:00             ` Chen, Jiqian
  2024-05-17 10:03             ` Jürgen Groß
  0 siblings, 2 replies; 49+ messages in thread
From: Jan Beulich @ 2024-05-17  9:50 UTC (permalink / raw
  To: Chen, Jiqian
  Cc: Andrew Cooper, Roger Pau Monné, Wei Liu, George Dunlap,
	Julien Grall, Stefano Stabellini, Anthony PERARD, Juergen Gross,
	Hildebrand, Stewart, Huang, Ray, xen-devel@lists.xenproject.org,
	Daniel P . Smith

On 17.05.2024 11:28, Chen, Jiqian wrote:
> On 2024/5/17 16:20, Jan Beulich wrote:
>> On 17.05.2024 10:08, Chen, Jiqian wrote:
>>> On 2024/5/16 21:08, Jan Beulich wrote:
>>>> On 16.05.2024 11:52, Jiqian Chen wrote:
>>>>>  struct physdev_pci_device {
>>>>>      /* IN */
>>>>>      uint16_t seg;
>>>>
>>>> Is re-using this struct for this new sub-op sufficient? IOW are all
>>>> possible resets equal, and hence it doesn't need specifying what kind of
>>>> reset was done? For example, other than FLR most reset variants reset all
>>>> functions in one go aiui. Imo that would better require only a single
>>>> hypercall, just to avoid possible confusion. It also reads as if FLR would
>>>> not reset as many registers as other reset variants would.
>>> If I understood correctly that you mean in this hypercall it needs to support resetting both one function and all functions of a slot(dev)?
>>> But it can be done for caller to use a cycle to call this reset hypercall for each slot function.
>>
>> It could, yes, but since (aiui) there needs to be an indication of the
>> kind of reset anyway, we can as well avoid relying on the caller doing
>> so (and at the same time simplify what the caller needs to do).
> Since the corresponding kernel patch has been merged into linux_next branch
> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?h=next-20240515&id=b272722511d5e8ae580f01830687b8a6b2717f01,
> if it's not very mandatory and necessary, just let the caller handle it temporarily.

As also mentioned for the other patch having a corresponding kernel one:
The kernel patch would imo better not be merged until the new sub-op is
actually finalized.

> Or it can add a new hypercall to reset all functions in one go in future potential requirement, like PHYSDEVOP_pci_device_state_reset_all_func.

I disagree. We shouldn't introduce incomplete sub-ops. At the very least,
if you want to stick to the present form, I'd expect you to supply reasons
why distinguishing different reset forms is not necessary (now or later).

Jan


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [XEN PATCH v8 1/5] xen/vpci: Clear all vpci status of device
  2024-05-17  9:50           ` Jan Beulich
@ 2024-05-17 10:00             ` Chen, Jiqian
  2024-05-17 10:31               ` Jan Beulich
  2024-05-17 10:03             ` Jürgen Groß
  1 sibling, 1 reply; 49+ messages in thread
From: Chen, Jiqian @ 2024-05-17 10:00 UTC (permalink / raw
  To: Jan Beulich
  Cc: Andrew Cooper, Roger Pau Monné, Wei Liu, George Dunlap,
	Julien Grall, Stefano Stabellini, Anthony PERARD, Juergen Gross,
	Hildebrand, Stewart, Huang, Ray, xen-devel@lists.xenproject.org,
	Daniel P . Smith, Chen, Jiqian

On 2024/5/17 17:50, Jan Beulich wrote:
> On 17.05.2024 11:28, Chen, Jiqian wrote:
>> On 2024/5/17 16:20, Jan Beulich wrote:
>>> On 17.05.2024 10:08, Chen, Jiqian wrote:
>>>> On 2024/5/16 21:08, Jan Beulich wrote:
>>>>> On 16.05.2024 11:52, Jiqian Chen wrote:
>>>>>>  struct physdev_pci_device {
>>>>>>      /* IN */
>>>>>>      uint16_t seg;
>>>>>
>>>>> Is re-using this struct for this new sub-op sufficient? IOW are all
>>>>> possible resets equal, and hence it doesn't need specifying what kind of
>>>>> reset was done? For example, other than FLR most reset variants reset all
>>>>> functions in one go aiui. Imo that would better require only a single
>>>>> hypercall, just to avoid possible confusion. It also reads as if FLR would
>>>>> not reset as many registers as other reset variants would.
>>>> If I understood correctly that you mean in this hypercall it needs to support resetting both one function and all functions of a slot(dev)?
>>>> But it can be done for caller to use a cycle to call this reset hypercall for each slot function.
>>>
>>> It could, yes, but since (aiui) there needs to be an indication of the
>>> kind of reset anyway, we can as well avoid relying on the caller doing
>>> so (and at the same time simplify what the caller needs to do).
>> Since the corresponding kernel patch has been merged into linux_next branch
>> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?h=next-20240515&id=b272722511d5e8ae580f01830687b8a6b2717f01,
>> if it's not very mandatory and necessary, just let the caller handle it temporarily.
> 
> As also mentioned for the other patch having a corresponding kernel one:
> The kernel patch would imo better not be merged until the new sub-op is
> actually finalized.
OK, what should I do next step?
Upstream a patch to revert the merged patch on kernel side?

> 
>> Or it can add a new hypercall to reset all functions in one go in future potential requirement, like PHYSDEVOP_pci_device_state_reset_all_func.
> 
> I disagree. We shouldn't introduce incomplete sub-ops. At the very least,
> if you want to stick to the present form, I'd expect you to supply reasons
> why distinguishing different reset forms is not necessary (now or later).
OK, if want to distinguish different reset, is it acceptable to add a parameter, like "u8 flag", and reset every function if corresponding bit is 1?

> 
> Jan

-- 
Best regards,
Jiqian Chen.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [XEN PATCH v8 1/5] xen/vpci: Clear all vpci status of device
  2024-05-17  9:50           ` Jan Beulich
  2024-05-17 10:00             ` Chen, Jiqian
@ 2024-05-17 10:03             ` Jürgen Groß
  2024-05-17 10:09               ` Chen, Jiqian
  1 sibling, 1 reply; 49+ messages in thread
From: Jürgen Groß @ 2024-05-17 10:03 UTC (permalink / raw
  To: Jan Beulich, Chen, Jiqian
  Cc: Andrew Cooper, Roger Pau Monné, Wei Liu, George Dunlap,
	Julien Grall, Stefano Stabellini, Anthony PERARD,
	Hildebrand, Stewart, Huang, Ray, xen-devel@lists.xenproject.org,
	Daniel P . Smith

On 17.05.24 11:50, Jan Beulich wrote:
> On 17.05.2024 11:28, Chen, Jiqian wrote:
>> On 2024/5/17 16:20, Jan Beulich wrote:
>>> On 17.05.2024 10:08, Chen, Jiqian wrote:
>>>> On 2024/5/16 21:08, Jan Beulich wrote:
>>>>> On 16.05.2024 11:52, Jiqian Chen wrote:
>>>>>>   struct physdev_pci_device {
>>>>>>       /* IN */
>>>>>>       uint16_t seg;
>>>>>
>>>>> Is re-using this struct for this new sub-op sufficient? IOW are all
>>>>> possible resets equal, and hence it doesn't need specifying what kind of
>>>>> reset was done? For example, other than FLR most reset variants reset all
>>>>> functions in one go aiui. Imo that would better require only a single
>>>>> hypercall, just to avoid possible confusion. It also reads as if FLR would
>>>>> not reset as many registers as other reset variants would.
>>>> If I understood correctly that you mean in this hypercall it needs to support resetting both one function and all functions of a slot(dev)?
>>>> But it can be done for caller to use a cycle to call this reset hypercall for each slot function.
>>>
>>> It could, yes, but since (aiui) there needs to be an indication of the
>>> kind of reset anyway, we can as well avoid relying on the caller doing
>>> so (and at the same time simplify what the caller needs to do).
>> Since the corresponding kernel patch has been merged into linux_next branch
>> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?h=next-20240515&id=b272722511d5e8ae580f01830687b8a6b2717f01,
>> if it's not very mandatory and necessary, just let the caller handle it temporarily.
> 
> As also mentioned for the other patch having a corresponding kernel one:
> The kernel patch would imo better not be merged until the new sub-op is
> actually finalized.

Oh, sorry to have overlooked that the interfcae change isn't yet committed on
Xen side.

I'll drop the patch from my linux-next branch.


Juergen



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [XEN PATCH v8 1/5] xen/vpci: Clear all vpci status of device
  2024-05-17 10:03             ` Jürgen Groß
@ 2024-05-17 10:09               ` Chen, Jiqian
  0 siblings, 0 replies; 49+ messages in thread
From: Chen, Jiqian @ 2024-05-17 10:09 UTC (permalink / raw
  To: Jürgen Groß, Jan Beulich
  Cc: Andrew Cooper, Roger Pau Monné, Wei Liu, George Dunlap,
	Julien Grall, Stefano Stabellini, Anthony PERARD,
	Hildebrand, Stewart, Huang, Ray, xen-devel@lists.xenproject.org,
	Daniel P . Smith, Chen, Jiqian

Hi Juergen:

On 2024/5/17 18:03, Jürgen Groß wrote:
> On 17.05.24 11:50, Jan Beulich wrote:
>> On 17.05.2024 11:28, Chen, Jiqian wrote:
>>> On 2024/5/17 16:20, Jan Beulich wrote:
>>>> On 17.05.2024 10:08, Chen, Jiqian wrote:
>>>>> On 2024/5/16 21:08, Jan Beulich wrote:
>>>>>> On 16.05.2024 11:52, Jiqian Chen wrote:
>>>>>>>   struct physdev_pci_device {
>>>>>>>       /* IN */
>>>>>>>       uint16_t seg;
>>>>>>
>>>>>> Is re-using this struct for this new sub-op sufficient? IOW are all
>>>>>> possible resets equal, and hence it doesn't need specifying what kind of
>>>>>> reset was done? For example, other than FLR most reset variants reset all
>>>>>> functions in one go aiui. Imo that would better require only a single
>>>>>> hypercall, just to avoid possible confusion. It also reads as if FLR would
>>>>>> not reset as many registers as other reset variants would.
>>>>> If I understood correctly that you mean in this hypercall it needs to support resetting both one function and all functions of a slot(dev)?
>>>>> But it can be done for caller to use a cycle to call this reset hypercall for each slot function.
>>>>
>>>> It could, yes, but since (aiui) there needs to be an indication of the
>>>> kind of reset anyway, we can as well avoid relying on the caller doing
>>>> so (and at the same time simplify what the caller needs to do).
>>> Since the corresponding kernel patch has been merged into linux_next branch
>>> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?h=next-20240515&id=b272722511d5e8ae580f01830687b8a6b2717f01,
>>> if it's not very mandatory and necessary, just let the caller handle it temporarily.
>>
>> As also mentioned for the other patch having a corresponding kernel one:
>> The kernel patch would imo better not be merged until the new sub-op is
>> actually finalized.
> 
> Oh, sorry to have overlooked that the interfcae change isn't yet committed on
> Xen side.
> 
> I'll drop the patch from my linux-next branch.
Thanks. I will modify my code according to Jan's requirements and send a new version soon.

> 
> 
> Juergen
> 

-- 
Best regards,
Jiqian Chen.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [XEN PATCH v8 1/5] xen/vpci: Clear all vpci status of device
  2024-05-17 10:00             ` Chen, Jiqian
@ 2024-05-17 10:31               ` Jan Beulich
  2024-05-17 11:01                 ` Chen, Jiqian
  0 siblings, 1 reply; 49+ messages in thread
From: Jan Beulich @ 2024-05-17 10:31 UTC (permalink / raw
  To: Chen, Jiqian
  Cc: Andrew Cooper, Roger Pau Monné, Wei Liu, George Dunlap,
	Julien Grall, Stefano Stabellini, Anthony PERARD, Juergen Gross,
	Hildebrand, Stewart, Huang, Ray, xen-devel@lists.xenproject.org,
	Daniel P . Smith

On 17.05.2024 12:00, Chen, Jiqian wrote:
> On 2024/5/17 17:50, Jan Beulich wrote:
>> On 17.05.2024 11:28, Chen, Jiqian wrote:
>>> On 2024/5/17 16:20, Jan Beulich wrote:
>>>> On 17.05.2024 10:08, Chen, Jiqian wrote:
>>>>> On 2024/5/16 21:08, Jan Beulich wrote:
>>>>>> On 16.05.2024 11:52, Jiqian Chen wrote:
>>>>>>>  struct physdev_pci_device {
>>>>>>>      /* IN */
>>>>>>>      uint16_t seg;
>>>>>>
>>>>>> Is re-using this struct for this new sub-op sufficient? IOW are all
>>>>>> possible resets equal, and hence it doesn't need specifying what kind of
>>>>>> reset was done? For example, other than FLR most reset variants reset all
>>>>>> functions in one go aiui. Imo that would better require only a single
>>>>>> hypercall, just to avoid possible confusion. It also reads as if FLR would
>>>>>> not reset as many registers as other reset variants would.
>>>>> If I understood correctly that you mean in this hypercall it needs to support resetting both one function and all functions of a slot(dev)?
>>>>> But it can be done for caller to use a cycle to call this reset hypercall for each slot function.
>>>>
>>>> It could, yes, but since (aiui) there needs to be an indication of the
>>>> kind of reset anyway, we can as well avoid relying on the caller doing
>>>> so (and at the same time simplify what the caller needs to do).
>>> Since the corresponding kernel patch has been merged into linux_next branch
>>> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?h=next-20240515&id=b272722511d5e8ae580f01830687b8a6b2717f01,
>>> if it's not very mandatory and necessary, just let the caller handle it temporarily.
>>
>> As also mentioned for the other patch having a corresponding kernel one:
>> The kernel patch would imo better not be merged until the new sub-op is
>> actually finalized.
> OK, what should I do next step?
> Upstream a patch to revert the merged patch on kernel side?
> 
>>
>>> Or it can add a new hypercall to reset all functions in one go in future potential requirement, like PHYSDEVOP_pci_device_state_reset_all_func.
>>
>> I disagree. We shouldn't introduce incomplete sub-ops. At the very least,
>> if you want to stick to the present form, I'd expect you to supply reasons
>> why distinguishing different reset forms is not necessary (now or later).
> OK, if want to distinguish different reset, is it acceptable to add a parameter, like "u8 flag", and reset every function if corresponding bit is 1?

I'm afraid a boolean won't do, at least not long term. I think it wants to
be an enumeration (i.e. a set of enumeration-like #define-s). And just to
stress it again: The extra argument is _not_ primarily for the looping over
all functions. It is to convey the kind of reset that was done. The single
vs all function(s) aspect is just a useful side effect this will have.

Jan


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC XEN PATCH v8 5/5] domctl: Add XEN_DOMCTL_gsi_permission to grant gsi
  2024-05-16 14:01   ` Jan Beulich
@ 2024-05-17 10:45     ` Chen, Jiqian
  2024-05-17 10:51       ` Jan Beulich
  0 siblings, 1 reply; 49+ messages in thread
From: Chen, Jiqian @ 2024-05-17 10:45 UTC (permalink / raw
  To: Jan Beulich, Daniel P . Smith
  Cc: Andrew Cooper, Roger Pau Monné, Wei Liu, George Dunlap,
	Julien Grall, Stefano Stabellini, Anthony PERARD, Juergen Gross,
	Hildebrand, Stewart, Huang, Ray, xen-devel@lists.xenproject.org,
	Chen, Jiqian

On 2024/5/16 22:01, Jan Beulich wrote:
> On 16.05.2024 11:52, Jiqian Chen wrote:
>> Some type of domain don't have PIRQ, like PVH, when
>> passthrough a device to guest on PVH dom0, callstack
>> pci_add_dm_done->XEN_DOMCTL_irq_permission will failed
>> at domain_pirq_to_irq.
>>
>> So, add a new hypercall to grant/revoke gsi permission
>> when dom0 is not PV or dom0 has not PIRQ flag.
> 
> Honestly I find this hard to follow, and thus not really making clear why
> no other existing mechanism could be used.
Sorry, I will describe more clearly in next version.

> 
>> Signed-off-by: Huang Rui <ray.huang@amd.com>
>> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
>> ---
> 
> Below here in an RFC patch you typically would want to put specific items
> you're seeking feedback on. Without that it's hard to tell why this is
> marked RFC.
OK, I will add " RFC: wait for the third patch on kernel side is accepted." in next version.

> 
>> --- a/xen/arch/x86/domctl.c
>> +++ b/xen/arch/x86/domctl.c
>> @@ -237,6 +237,37 @@ long arch_do_domctl(
>>          break;
>>      }
>>  
>> +    case XEN_DOMCTL_gsi_permission:
>> +    {
>> +        unsigned int gsi = domctl->u.gsi_permission.gsi;
>> +        int allow = domctl->u.gsi_permission.allow_access;
> 
> bool?
Will change.

> 
>> +        if ( is_pv_domain(current->domain) || has_pirq(current->domain) )
>> +        {
>> +            ret = -EOPNOTSUPP;
>> +            break;
>> +        }
> 
> Such a restriction imo wants explaining in a comment.
Will add in next version.

> 
>> +        if ( gsi >= nr_irqs_gsi )
>> +        {
>> +            ret = -EINVAL;
>> +            break;
>> +        }
>> +
>> +        if ( !irq_access_permitted(current->domain, gsi) ||
> 
> I.e. assuming IRQ == GSI? Is that a valid assumption when any number of
> source overrides may be surfaced by ACPI?
All irqs smaller than nr_irqs_gsi are gsi, aren't they?

> 
>> +             xsm_irq_permission(XSM_HOOK, d, gsi, allow) )
> 
> Here I'm pretty sure you can't very well re-use an existing hook, as the
> value of interest is in a different numbering space, and a possible hook
> function has no way of knowing which one it is. Daniel?
> 
>> +        {
>> +            ret = -EPERM;
>> +            break;
>> +        }
>> +
>> +        if ( allow )
>> +            ret = irq_permit_access(d, gsi);
>> +        else
>> +            ret = irq_deny_access(d, gsi);
> 
> As above I'm afraid you can't assume IRQ == GSI.
> 
>> --- a/xen/include/public/domctl.h
>> +++ b/xen/include/public/domctl.h
>> @@ -447,6 +447,13 @@ struct xen_domctl_irq_permission {
>>  };
>>  
>>  
>> +/* XEN_DOMCTL_gsi_permission */
>> +struct xen_domctl_gsi_permission {
>> +    uint32_t gsi;
>> +    uint8_t allow_access;    /* flag to specify enable/disable of x86 gsi access */
>> +};
> 
> Explicit padding please, including a check that it's zero on input.
Thanks, I will add in next version.

> 
>> +
>> +
>>  /* XEN_DOMCTL_iomem_permission */
> 
> No double blank lines please. In fact you will want to break the double blank
> lines in leading context, inserting in the middle.
Will remove one.
> 
> Jan

-- 
Best regards,
Jiqian Chen.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC XEN PATCH v8 5/5] domctl: Add XEN_DOMCTL_gsi_permission to grant gsi
  2024-05-17 10:45     ` Chen, Jiqian
@ 2024-05-17 10:51       ` Jan Beulich
  2024-05-17 11:14         ` Chen, Jiqian
  0 siblings, 1 reply; 49+ messages in thread
From: Jan Beulich @ 2024-05-17 10:51 UTC (permalink / raw
  To: Chen, Jiqian
  Cc: Andrew Cooper, Roger Pau Monné, Wei Liu, George Dunlap,
	Julien Grall, Stefano Stabellini, Anthony PERARD, Juergen Gross,
	Hildebrand, Stewart, Huang, Ray, xen-devel@lists.xenproject.org,
	Daniel P . Smith

On 17.05.2024 12:45, Chen, Jiqian wrote:
> On 2024/5/16 22:01, Jan Beulich wrote:
>> On 16.05.2024 11:52, Jiqian Chen wrote:
>>> +        if ( gsi >= nr_irqs_gsi )
>>> +        {
>>> +            ret = -EINVAL;
>>> +            break;
>>> +        }
>>> +
>>> +        if ( !irq_access_permitted(current->domain, gsi) ||
>>
>> I.e. assuming IRQ == GSI? Is that a valid assumption when any number of
>> source overrides may be surfaced by ACPI?
> All irqs smaller than nr_irqs_gsi are gsi, aren't they?

They are, but there's not necessarily a 1:1 mapping.

Jan


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [XEN PATCH v8 1/5] xen/vpci: Clear all vpci status of device
  2024-05-17 10:31               ` Jan Beulich
@ 2024-05-17 11:01                 ` Chen, Jiqian
  2024-05-17 11:48                   ` Jan Beulich
  0 siblings, 1 reply; 49+ messages in thread
From: Chen, Jiqian @ 2024-05-17 11:01 UTC (permalink / raw
  To: Jan Beulich
  Cc: Andrew Cooper, Roger Pau Monné, Wei Liu, George Dunlap,
	Julien Grall, Stefano Stabellini, Anthony PERARD, Juergen Gross,
	Hildebrand, Stewart, Huang, Ray, xen-devel@lists.xenproject.org,
	Daniel P . Smith, Chen, Jiqian

On 2024/5/17 18:31, Jan Beulich wrote:
> On 17.05.2024 12:00, Chen, Jiqian wrote:
>> On 2024/5/17 17:50, Jan Beulich wrote:
>>> On 17.05.2024 11:28, Chen, Jiqian wrote:
>>>> On 2024/5/17 16:20, Jan Beulich wrote:
>>>>> On 17.05.2024 10:08, Chen, Jiqian wrote:
>>>>>> On 2024/5/16 21:08, Jan Beulich wrote:
>>>>>>> On 16.05.2024 11:52, Jiqian Chen wrote:
>>>>>>>>  struct physdev_pci_device {
>>>>>>>>      /* IN */
>>>>>>>>      uint16_t seg;
>>>>>>>
>>>>>>> Is re-using this struct for this new sub-op sufficient? IOW are all
>>>>>>> possible resets equal, and hence it doesn't need specifying what kind of
>>>>>>> reset was done? For example, other than FLR most reset variants reset all
>>>>>>> functions in one go aiui. Imo that would better require only a single
>>>>>>> hypercall, just to avoid possible confusion. It also reads as if FLR would
>>>>>>> not reset as many registers as other reset variants would.
>>>>>> If I understood correctly that you mean in this hypercall it needs to support resetting both one function and all functions of a slot(dev)?
>>>>>> But it can be done for caller to use a cycle to call this reset hypercall for each slot function.
>>>>>
>>>>> It could, yes, but since (aiui) there needs to be an indication of the
>>>>> kind of reset anyway, we can as well avoid relying on the caller doing
>>>>> so (and at the same time simplify what the caller needs to do).
>>>> Since the corresponding kernel patch has been merged into linux_next branch
>>>> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?h=next-20240515&id=b272722511d5e8ae580f01830687b8a6b2717f01,
>>>> if it's not very mandatory and necessary, just let the caller handle it temporarily.
>>>
>>> As also mentioned for the other patch having a corresponding kernel one:
>>> The kernel patch would imo better not be merged until the new sub-op is
>>> actually finalized.
>> OK, what should I do next step?
>> Upstream a patch to revert the merged patch on kernel side?
>>
>>>
>>>> Or it can add a new hypercall to reset all functions in one go in future potential requirement, like PHYSDEVOP_pci_device_state_reset_all_func.
>>>
>>> I disagree. We shouldn't introduce incomplete sub-ops. At the very least,
>>> if you want to stick to the present form, I'd expect you to supply reasons
>>> why distinguishing different reset forms is not necessary (now or later).
>> OK, if want to distinguish different reset, is it acceptable to add a parameter, like "u8 flag", and reset every function if corresponding bit is 1?
> 
> I'm afraid a boolean won't do, at least not long term. I think it wants to
> be an enumeration (i.e. a set of enumeration-like #define-s). And just to
> stress it again: The extra argument is _not_ primarily for the looping over
> all functions. It is to convey the kind of reset that was done. The single
> vs all function(s) aspect is just a useful side effect this will have.
Do you mean, like:
enum RESET_DEVICE_STATE {
	RESET_DEVICE_SINGLE_FUNC,
	RESET_DEVICE_ALL_FUNC,
	Others
};
If caller pass in RESET_DEVICE_SINGLE_FUNC, I call what I add in this patch, as for other types call other functions if added in future?

> 
> Jan

-- 
Best regards,
Jiqian Chen.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC XEN PATCH v8 5/5] domctl: Add XEN_DOMCTL_gsi_permission to grant gsi
  2024-05-17 10:51       ` Jan Beulich
@ 2024-05-17 11:14         ` Chen, Jiqian
  2024-05-17 11:50           ` Jan Beulich
  0 siblings, 1 reply; 49+ messages in thread
From: Chen, Jiqian @ 2024-05-17 11:14 UTC (permalink / raw
  To: Jan Beulich
  Cc: Andrew Cooper, Roger Pau Monné, Wei Liu, George Dunlap,
	Julien Grall, Stefano Stabellini, Anthony PERARD, Juergen Gross,
	Hildebrand, Stewart, Huang, Ray, xen-devel@lists.xenproject.org,
	Daniel P . Smith, Chen, Jiqian

On 2024/5/17 18:51, Jan Beulich wrote:
> On 17.05.2024 12:45, Chen, Jiqian wrote:
>> On 2024/5/16 22:01, Jan Beulich wrote:
>>> On 16.05.2024 11:52, Jiqian Chen wrote:
>>>> +        if ( gsi >= nr_irqs_gsi )
>>>> +        {
>>>> +            ret = -EINVAL;
>>>> +            break;
>>>> +        }
>>>> +
>>>> +        if ( !irq_access_permitted(current->domain, gsi) ||
>>>
>>> I.e. assuming IRQ == GSI? Is that a valid assumption when any number of
>>> source overrides may be surfaced by ACPI?
>> All irqs smaller than nr_irqs_gsi are gsi, aren't they?
> 
> They are, but there's not necessarily a 1:1 mapping.
Oh, so do I need to add a new gsi_caps to store granted gsi?
And Dom0 defaults to own all gsis permission?

> 
> Jan

-- 
Best regards,
Jiqian Chen.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [XEN PATCH v8 1/5] xen/vpci: Clear all vpci status of device
  2024-05-17 11:01                 ` Chen, Jiqian
@ 2024-05-17 11:48                   ` Jan Beulich
  0 siblings, 0 replies; 49+ messages in thread
From: Jan Beulich @ 2024-05-17 11:48 UTC (permalink / raw
  To: Chen, Jiqian
  Cc: Andrew Cooper, Roger Pau Monné, Wei Liu, George Dunlap,
	Julien Grall, Stefano Stabellini, Anthony PERARD, Juergen Gross,
	Hildebrand, Stewart, Huang, Ray, xen-devel@lists.xenproject.org,
	Daniel P . Smith

On 17.05.2024 13:01, Chen, Jiqian wrote:
> On 2024/5/17 18:31, Jan Beulich wrote:
>> On 17.05.2024 12:00, Chen, Jiqian wrote:
>>> On 2024/5/17 17:50, Jan Beulich wrote:
>>>> On 17.05.2024 11:28, Chen, Jiqian wrote:
>>>>> On 2024/5/17 16:20, Jan Beulich wrote:
>>>>>> On 17.05.2024 10:08, Chen, Jiqian wrote:
>>>>>>> On 2024/5/16 21:08, Jan Beulich wrote:
>>>>>>>> On 16.05.2024 11:52, Jiqian Chen wrote:
>>>>>>>>>  struct physdev_pci_device {
>>>>>>>>>      /* IN */
>>>>>>>>>      uint16_t seg;
>>>>>>>>
>>>>>>>> Is re-using this struct for this new sub-op sufficient? IOW are all
>>>>>>>> possible resets equal, and hence it doesn't need specifying what kind of
>>>>>>>> reset was done? For example, other than FLR most reset variants reset all
>>>>>>>> functions in one go aiui. Imo that would better require only a single
>>>>>>>> hypercall, just to avoid possible confusion. It also reads as if FLR would
>>>>>>>> not reset as many registers as other reset variants would.
>>>>>>> If I understood correctly that you mean in this hypercall it needs to support resetting both one function and all functions of a slot(dev)?
>>>>>>> But it can be done for caller to use a cycle to call this reset hypercall for each slot function.
>>>>>>
>>>>>> It could, yes, but since (aiui) there needs to be an indication of the
>>>>>> kind of reset anyway, we can as well avoid relying on the caller doing
>>>>>> so (and at the same time simplify what the caller needs to do).
>>>>> Since the corresponding kernel patch has been merged into linux_next branch
>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?h=next-20240515&id=b272722511d5e8ae580f01830687b8a6b2717f01,
>>>>> if it's not very mandatory and necessary, just let the caller handle it temporarily.
>>>>
>>>> As also mentioned for the other patch having a corresponding kernel one:
>>>> The kernel patch would imo better not be merged until the new sub-op is
>>>> actually finalized.
>>> OK, what should I do next step?
>>> Upstream a patch to revert the merged patch on kernel side?
>>>
>>>>
>>>>> Or it can add a new hypercall to reset all functions in one go in future potential requirement, like PHYSDEVOP_pci_device_state_reset_all_func.
>>>>
>>>> I disagree. We shouldn't introduce incomplete sub-ops. At the very least,
>>>> if you want to stick to the present form, I'd expect you to supply reasons
>>>> why distinguishing different reset forms is not necessary (now or later).
>>> OK, if want to distinguish different reset, is it acceptable to add a parameter, like "u8 flag", and reset every function if corresponding bit is 1?
>>
>> I'm afraid a boolean won't do, at least not long term. I think it wants to
>> be an enumeration (i.e. a set of enumeration-like #define-s). And just to
>> stress it again: The extra argument is _not_ primarily for the looping over
>> all functions. It is to convey the kind of reset that was done. The single
>> vs all function(s) aspect is just a useful side effect this will have.
> Do you mean, like:
> enum RESET_DEVICE_STATE {
> 	RESET_DEVICE_SINGLE_FUNC,
> 	RESET_DEVICE_ALL_FUNC,
> 	Others
> };

No. What we need to be able to tell apart in the hypervisor is (at least)
FLR and Conventional Reset. I can't tell right away whether the sub-forms
of the latter also may need telling apart. I expect you to dive into that
and make a good proposal.

Jan


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC XEN PATCH v8 5/5] domctl: Add XEN_DOMCTL_gsi_permission to grant gsi
  2024-05-17 11:14         ` Chen, Jiqian
@ 2024-05-17 11:50           ` Jan Beulich
  2024-05-29  2:41             ` Chen, Jiqian
  0 siblings, 1 reply; 49+ messages in thread
From: Jan Beulich @ 2024-05-17 11:50 UTC (permalink / raw
  To: Chen, Jiqian
  Cc: Andrew Cooper, Roger Pau Monné, Wei Liu, George Dunlap,
	Julien Grall, Stefano Stabellini, Anthony PERARD, Juergen Gross,
	Hildebrand, Stewart, Huang, Ray, xen-devel@lists.xenproject.org,
	Daniel P . Smith

On 17.05.2024 13:14, Chen, Jiqian wrote:
> On 2024/5/17 18:51, Jan Beulich wrote:
>> On 17.05.2024 12:45, Chen, Jiqian wrote:
>>> On 2024/5/16 22:01, Jan Beulich wrote:
>>>> On 16.05.2024 11:52, Jiqian Chen wrote:
>>>>> +        if ( gsi >= nr_irqs_gsi )
>>>>> +        {
>>>>> +            ret = -EINVAL;
>>>>> +            break;
>>>>> +        }
>>>>> +
>>>>> +        if ( !irq_access_permitted(current->domain, gsi) ||
>>>>
>>>> I.e. assuming IRQ == GSI? Is that a valid assumption when any number of
>>>> source overrides may be surfaced by ACPI?
>>> All irqs smaller than nr_irqs_gsi are gsi, aren't they?
>>
>> They are, but there's not necessarily a 1:1 mapping.
> Oh, so do I need to add a new gsi_caps to store granted gsi?

Probably not. You ought to be able to translate between GSI and IRQ,
and then continue to record in / check against IRQ permissions.

Jan


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [XEN PATCH v8 1/5] xen/vpci: Clear all vpci status of device
  2024-05-17  8:08     ` Chen, Jiqian
  2024-05-17  8:20       ` Jan Beulich
@ 2024-05-17 13:15       ` Stewart Hildebrand
  1 sibling, 0 replies; 49+ messages in thread
From: Stewart Hildebrand @ 2024-05-17 13:15 UTC (permalink / raw
  To: Chen, Jiqian, Jan Beulich, Daniel P . Smith
  Cc: Andrew Cooper, Roger Pau Monné, Wei Liu, George Dunlap,
	Julien Grall, Stefano Stabellini, Anthony PERARD, Juergen Gross,
	Huang, Ray, xen-devel@lists.xenproject.org

On 5/17/24 04:08, Chen, Jiqian wrote:
> On 2024/5/16 21:08, Jan Beulich wrote:
>> On 16.05.2024 11:52, Jiqian Chen wrote:
>>> @@ -67,6 +68,41 @@ ret_t pci_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>>> +        pcidevs_lock();
>>> +        pdev = pci_get_pdev(NULL, sbdf);
>>> +        if ( !pdev )
>>> +        {
>>> +            pcidevs_unlock();
>>> +            ret = -ENODEV;
>>> +            break;
>>> +        }
>>> +
>>> +        write_lock(&pdev->domain->pci_lock);
>>> +        ret = vpci_reset_device_state(pdev);
>>> +        write_unlock(&pdev->domain->pci_lock);
>>> +        pcidevs_unlock();
>>
>> Can't this be done right after the write_lock()?
> You are right, I will move it in next version.

So that we are clear on the proposed order of calls here:

+        write_lock(&pdev->domain->pci_lock);
+        pcidevs_unlock();
+        ret = vpci_reset_device_state(pdev);
+        write_unlock(&pdev->domain->pci_lock);

>>> --- a/xen/drivers/vpci/vpci.c
>>> +++ b/xen/drivers/vpci/vpci.c
>>> @@ -115,6 +115,16 @@ int vpci_assign_device(struct pci_dev *pdev)
>>>  
>>>      return rc;
>>>  }
>>> +
>>> +int vpci_reset_device_state(struct pci_dev *pdev)
>>> +{
>>> +    ASSERT(pcidevs_locked());
>>
>> Is this necessary for ...
>>
>>> +    ASSERT(rw_is_write_locked(&pdev->domain->pci_lock));
>>> +
>>> +    vpci_deassign_device(pdev);
>>> +    return vpci_assign_device(pdev);
>>
>> ... either of these calls? Both functions do themselves only have the
>> 2nd of the asserts you add.
> I checked codes again; I will remove the two asserts here in next version.

Nit: I think it's okay to keep the
ASSERT(rw_is_write_locked(&pdev->domain->pci_lock));



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC XEN PATCH v8 5/5] domctl: Add XEN_DOMCTL_gsi_permission to grant gsi
  2024-05-17 11:50           ` Jan Beulich
@ 2024-05-29  2:41             ` Chen, Jiqian
  2024-05-29  6:31               ` Jan Beulich
  0 siblings, 1 reply; 49+ messages in thread
From: Chen, Jiqian @ 2024-05-29  2:41 UTC (permalink / raw
  To: Jan Beulich
  Cc: Andrew Cooper, Roger Pau Monné, Wei Liu, George Dunlap,
	Julien Grall, Stefano Stabellini, Anthony PERARD, Juergen Gross,
	Hildebrand, Stewart, Huang, Ray, xen-devel@lists.xenproject.org,
	Daniel P . Smith, Chen, Jiqian

Hi,
On 2024/5/17 19:50, Jan Beulich wrote:
> On 17.05.2024 13:14, Chen, Jiqian wrote:
>> On 2024/5/17 18:51, Jan Beulich wrote:
>>> On 17.05.2024 12:45, Chen, Jiqian wrote:
>>>> On 2024/5/16 22:01, Jan Beulich wrote:
>>>>> On 16.05.2024 11:52, Jiqian Chen wrote:
>>>>>> +        if ( gsi >= nr_irqs_gsi )
>>>>>> +        {
>>>>>> +            ret = -EINVAL;
>>>>>> +            break;
>>>>>> +        }
>>>>>> +
>>>>>> +        if ( !irq_access_permitted(current->domain, gsi) ||
>>>>>
>>>>> I.e. assuming IRQ == GSI? Is that a valid assumption when any number of
>>>>> source overrides may be surfaced by ACPI?
>>>> All irqs smaller than nr_irqs_gsi are gsi, aren't they?
>>>
>>> They are, but there's not necessarily a 1:1 mapping.
>> Oh, so do I need to add a new gsi_caps to store granted gsi?
> 
> Probably not. You ought to be able to translate between GSI and IRQ,
> and then continue to record in / check against IRQ permissions.
But I found in function init_irq_data:
    for ( irq = 0; irq < nr_irqs_gsi; irq++ )
    {
        int rc;

        desc = irq_to_desc(irq);
        desc->irq = irq;

        rc = init_one_irq_desc(desc);
        if ( rc )
            return rc;
    }
Does it mean that when irq < nr_irqs_gsi, the gsi and irq is a 1:1 mapping?
What's more, when using PHYSDEVOP_setup_gsi, it calls mp_register_gsi,
and in mp_register_gsi, it uses " desc = irq_to_desc(gsi); " to get irq_desc directly.

Combining above, can we consider "gsi == irq" when irq < nr_irqs_gsi ?
> 
> Jan

-- 
Best regards,
Jiqian Chen.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC XEN PATCH v8 5/5] domctl: Add XEN_DOMCTL_gsi_permission to grant gsi
  2024-05-29  2:41             ` Chen, Jiqian
@ 2024-05-29  6:31               ` Jan Beulich
  2024-05-29  6:56                 ` Chen, Jiqian
  0 siblings, 1 reply; 49+ messages in thread
From: Jan Beulich @ 2024-05-29  6:31 UTC (permalink / raw
  To: Chen, Jiqian
  Cc: Andrew Cooper, Roger Pau Monné, Wei Liu, George Dunlap,
	Julien Grall, Stefano Stabellini, Anthony PERARD, Juergen Gross,
	Hildebrand, Stewart, Huang, Ray, xen-devel@lists.xenproject.org,
	Daniel P . Smith

On 29.05.2024 04:41, Chen, Jiqian wrote:
> Hi,
> On 2024/5/17 19:50, Jan Beulich wrote:
>> On 17.05.2024 13:14, Chen, Jiqian wrote:
>>> On 2024/5/17 18:51, Jan Beulich wrote:
>>>> On 17.05.2024 12:45, Chen, Jiqian wrote:
>>>>> On 2024/5/16 22:01, Jan Beulich wrote:
>>>>>> On 16.05.2024 11:52, Jiqian Chen wrote:
>>>>>>> +        if ( gsi >= nr_irqs_gsi )
>>>>>>> +        {
>>>>>>> +            ret = -EINVAL;
>>>>>>> +            break;
>>>>>>> +        }
>>>>>>> +
>>>>>>> +        if ( !irq_access_permitted(current->domain, gsi) ||
>>>>>>
>>>>>> I.e. assuming IRQ == GSI? Is that a valid assumption when any number of
>>>>>> source overrides may be surfaced by ACPI?
>>>>> All irqs smaller than nr_irqs_gsi are gsi, aren't they?
>>>>
>>>> They are, but there's not necessarily a 1:1 mapping.
>>> Oh, so do I need to add a new gsi_caps to store granted gsi?
>>
>> Probably not. You ought to be able to translate between GSI and IRQ,
>> and then continue to record in / check against IRQ permissions.
> But I found in function init_irq_data:
>     for ( irq = 0; irq < nr_irqs_gsi; irq++ )
>     {
>         int rc;
> 
>         desc = irq_to_desc(irq);
>         desc->irq = irq;
> 
>         rc = init_one_irq_desc(desc);
>         if ( rc )
>             return rc;
>     }
> Does it mean that when irq < nr_irqs_gsi, the gsi and irq is a 1:1 mapping?

No, as explained before. I also don't see how you would derive that from
the code above. "nr_irqs_gsi" describes what its name says: The number of
IRQs mapping to a (_some_) GSI. That's to tell them from the non-GSI (i.e.
mainly MSI) ones. There's no implication whatsoever on the IRQ <-> GSI
mapping.

> What's more, when using PHYSDEVOP_setup_gsi, it calls mp_register_gsi,
> and in mp_register_gsi, it uses " desc = irq_to_desc(gsi); " to get irq_desc directly.

Which may be wrong, while that wrong-ness may not have hit anyone in
practice (for reasons that would need working out).

> Combining above, can we consider "gsi == irq" when irq < nr_irqs_gsi ?

Again - no.

Jan


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC XEN PATCH v8 5/5] domctl: Add XEN_DOMCTL_gsi_permission to grant gsi
  2024-05-29  6:31               ` Jan Beulich
@ 2024-05-29  6:56                 ` Chen, Jiqian
  2024-05-29  7:10                   ` Jan Beulich
  0 siblings, 1 reply; 49+ messages in thread
From: Chen, Jiqian @ 2024-05-29  6:56 UTC (permalink / raw
  To: Jan Beulich
  Cc: Andrew Cooper, Roger Pau Monné, Wei Liu, George Dunlap,
	Julien Grall, Stefano Stabellini, Anthony PERARD, Juergen Gross,
	Hildebrand, Stewart, Huang, Ray, xen-devel@lists.xenproject.org,
	Daniel P . Smith, Chen, Jiqian

On 2024/5/29 14:31, Jan Beulich wrote:
> On 29.05.2024 04:41, Chen, Jiqian wrote:
>> Hi,
>> On 2024/5/17 19:50, Jan Beulich wrote:
>>> On 17.05.2024 13:14, Chen, Jiqian wrote:
>>>> On 2024/5/17 18:51, Jan Beulich wrote:
>>>>> On 17.05.2024 12:45, Chen, Jiqian wrote:
>>>>>> On 2024/5/16 22:01, Jan Beulich wrote:
>>>>>>> On 16.05.2024 11:52, Jiqian Chen wrote:
>>>>>>>> +        if ( gsi >= nr_irqs_gsi )
>>>>>>>> +        {
>>>>>>>> +            ret = -EINVAL;
>>>>>>>> +            break;
>>>>>>>> +        }
>>>>>>>> +
>>>>>>>> +        if ( !irq_access_permitted(current->domain, gsi) ||
>>>>>>>
>>>>>>> I.e. assuming IRQ == GSI? Is that a valid assumption when any number of
>>>>>>> source overrides may be surfaced by ACPI?
>>>>>> All irqs smaller than nr_irqs_gsi are gsi, aren't they?
>>>>>
>>>>> They are, but there's not necessarily a 1:1 mapping.
>>>> Oh, so do I need to add a new gsi_caps to store granted gsi?
>>>
>>> Probably not. You ought to be able to translate between GSI and IRQ,
>>> and then continue to record in / check against IRQ permissions.
>> But I found in function init_irq_data:
>>     for ( irq = 0; irq < nr_irqs_gsi; irq++ )
>>     {
>>         int rc;
>>
>>         desc = irq_to_desc(irq);
>>         desc->irq = irq;
>>
>>         rc = init_one_irq_desc(desc);
>>         if ( rc )
>>             return rc;
>>     }
>> Does it mean that when irq < nr_irqs_gsi, the gsi and irq is a 1:1 mapping?
> 
> No, as explained before. I also don't see how you would derive that from the code above.
Because here set desc->irq = irq, and it seems there is no other place to change this desc->irq, so, gsi 1 is considered to irq 1.

> "nr_irqs_gsi" describes what its name says: The number of
> IRQs mapping to a (_some_) GSI. That's to tell them from the non-GSI (i.e.
> mainly MSI) ones. There's no implication whatsoever on the IRQ <-> GSI
> mapping.
> 
>> What's more, when using PHYSDEVOP_setup_gsi, it calls mp_register_gsi,
>> and in mp_register_gsi, it uses " desc = irq_to_desc(gsi); " to get irq_desc directly.
> 
> Which may be wrong, while that wrong-ness may not have hit anyone in
> practice (for reasons that would need working out).
> 
>> Combining above, can we consider "gsi == irq" when irq < nr_irqs_gsi ?
> 
> Again - no.
Since you are certain that they are not equal, could you tell me where show they are not equal or where build their mappings,
so that I can know how to do a conversion gsi from irq.

> 
> Jan

-- 
Best regards,
Jiqian Chen.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC XEN PATCH v8 5/5] domctl: Add XEN_DOMCTL_gsi_permission to grant gsi
  2024-05-29  6:56                 ` Chen, Jiqian
@ 2024-05-29  7:10                   ` Jan Beulich
  2024-05-29 11:13                     ` Chen, Jiqian
  0 siblings, 1 reply; 49+ messages in thread
From: Jan Beulich @ 2024-05-29  7:10 UTC (permalink / raw
  To: Chen, Jiqian
  Cc: Andrew Cooper, Roger Pau Monné, Wei Liu, George Dunlap,
	Julien Grall, Stefano Stabellini, Anthony PERARD, Juergen Gross,
	Hildebrand, Stewart, Huang, Ray, xen-devel@lists.xenproject.org,
	Daniel P . Smith

On 29.05.2024 08:56, Chen, Jiqian wrote:
> On 2024/5/29 14:31, Jan Beulich wrote:
>> On 29.05.2024 04:41, Chen, Jiqian wrote:
>>> Hi,
>>> On 2024/5/17 19:50, Jan Beulich wrote:
>>>> On 17.05.2024 13:14, Chen, Jiqian wrote:
>>>>> On 2024/5/17 18:51, Jan Beulich wrote:
>>>>>> On 17.05.2024 12:45, Chen, Jiqian wrote:
>>>>>>> On 2024/5/16 22:01, Jan Beulich wrote:
>>>>>>>> On 16.05.2024 11:52, Jiqian Chen wrote:
>>>>>>>>> +        if ( gsi >= nr_irqs_gsi )
>>>>>>>>> +        {
>>>>>>>>> +            ret = -EINVAL;
>>>>>>>>> +            break;
>>>>>>>>> +        }
>>>>>>>>> +
>>>>>>>>> +        if ( !irq_access_permitted(current->domain, gsi) ||
>>>>>>>>
>>>>>>>> I.e. assuming IRQ == GSI? Is that a valid assumption when any number of
>>>>>>>> source overrides may be surfaced by ACPI?
>>>>>>> All irqs smaller than nr_irqs_gsi are gsi, aren't they?
>>>>>>
>>>>>> They are, but there's not necessarily a 1:1 mapping.
>>>>> Oh, so do I need to add a new gsi_caps to store granted gsi?
>>>>
>>>> Probably not. You ought to be able to translate between GSI and IRQ,
>>>> and then continue to record in / check against IRQ permissions.
>>> But I found in function init_irq_data:
>>>     for ( irq = 0; irq < nr_irqs_gsi; irq++ )
>>>     {
>>>         int rc;
>>>
>>>         desc = irq_to_desc(irq);
>>>         desc->irq = irq;
>>>
>>>         rc = init_one_irq_desc(desc);
>>>         if ( rc )
>>>             return rc;
>>>     }
>>> Does it mean that when irq < nr_irqs_gsi, the gsi and irq is a 1:1 mapping?
>>
>> No, as explained before. I also don't see how you would derive that from the code above.
> Because here set desc->irq = irq, and it seems there is no other place to change this desc->irq, so, gsi 1 is considered to irq 1.

What are you taking this from? The loop bound isn't nr_gsis, and the iteration
variable isn't in GSI space either; it's in IRQ numbering space. In this loop
we're merely leveraging that every GSI has a corresponding IRQ; there are no
assumptions made about the mapping between the two. Afaics at least.

>> "nr_irqs_gsi" describes what its name says: The number of
>> IRQs mapping to a (_some_) GSI. That's to tell them from the non-GSI (i.e.
>> mainly MSI) ones. There's no implication whatsoever on the IRQ <-> GSI
>> mapping.
>>
>>> What's more, when using PHYSDEVOP_setup_gsi, it calls mp_register_gsi,
>>> and in mp_register_gsi, it uses " desc = irq_to_desc(gsi); " to get irq_desc directly.
>>
>> Which may be wrong, while that wrong-ness may not have hit anyone in
>> practice (for reasons that would need working out).
>>
>>> Combining above, can we consider "gsi == irq" when irq < nr_irqs_gsi ?
>>
>> Again - no.
> Since you are certain that they are not equal, could you tell me where show they are not equal or where build their mappings,
> so that I can know how to do a conversion gsi from irq.

I did point you at the ACPI Interrupt Source Override structure before.
We're parsing those in acpi_parse_int_src_ovr(), to give you a place to
start going from.

Jan


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC XEN PATCH v8 5/5] domctl: Add XEN_DOMCTL_gsi_permission to grant gsi
  2024-05-29  7:10                   ` Jan Beulich
@ 2024-05-29 11:13                     ` Chen, Jiqian
  2024-05-29 12:22                       ` Jan Beulich
  0 siblings, 1 reply; 49+ messages in thread
From: Chen, Jiqian @ 2024-05-29 11:13 UTC (permalink / raw
  To: Jan Beulich
  Cc: Andrew Cooper, Roger Pau Monné, Wei Liu, George Dunlap,
	Julien Grall, Stefano Stabellini, Anthony PERARD, Juergen Gross,
	Hildebrand, Stewart, Huang, Ray, xen-devel@lists.xenproject.org,
	Daniel P . Smith, Chen, Jiqian

On 2024/5/29 15:10, Jan Beulich wrote:
> On 29.05.2024 08:56, Chen, Jiqian wrote:
>> On 2024/5/29 14:31, Jan Beulich wrote:
>>> On 29.05.2024 04:41, Chen, Jiqian wrote:
>>>> Hi,
>>>> On 2024/5/17 19:50, Jan Beulich wrote:
>>>>> On 17.05.2024 13:14, Chen, Jiqian wrote:
>>>>>> On 2024/5/17 18:51, Jan Beulich wrote:
>>>>>>> On 17.05.2024 12:45, Chen, Jiqian wrote:
>>>>>>>> On 2024/5/16 22:01, Jan Beulich wrote:
>>>>>>>>> On 16.05.2024 11:52, Jiqian Chen wrote:
>>>>>>>>>> +        if ( gsi >= nr_irqs_gsi )
>>>>>>>>>> +        {
>>>>>>>>>> +            ret = -EINVAL;
>>>>>>>>>> +            break;
>>>>>>>>>> +        }
>>>>>>>>>> +
>>>>>>>>>> +        if ( !irq_access_permitted(current->domain, gsi) ||
>>>>>>>>>
>>>>>>>>> I.e. assuming IRQ == GSI? Is that a valid assumption when any number of
>>>>>>>>> source overrides may be surfaced by ACPI?
>>>>>>>> All irqs smaller than nr_irqs_gsi are gsi, aren't they?
>>>>>>>
>>>>>>> They are, but there's not necessarily a 1:1 mapping.
>>>>>> Oh, so do I need to add a new gsi_caps to store granted gsi?
>>>>>
>>>>> Probably not. You ought to be able to translate between GSI and IRQ,
>>>>> and then continue to record in / check against IRQ permissions.
>>>> But I found in function init_irq_data:
>>>>     for ( irq = 0; irq < nr_irqs_gsi; irq++ )
>>>>     {
>>>>         int rc;
>>>>
>>>>         desc = irq_to_desc(irq);
>>>>         desc->irq = irq;
>>>>
>>>>         rc = init_one_irq_desc(desc);
>>>>         if ( rc )
>>>>             return rc;
>>>>     }
>>>> Does it mean that when irq < nr_irqs_gsi, the gsi and irq is a 1:1 mapping?
>>>
>>> No, as explained before. I also don't see how you would derive that from the code above.
>> Because here set desc->irq = irq, and it seems there is no other place to change this desc->irq, so, gsi 1 is considered to irq 1.
> 
> What are you taking this from? The loop bound isn't nr_gsis, and the iteration
> variable isn't in GSI space either; it's in IRQ numbering space. In this loop
> we're merely leveraging that every GSI has a corresponding IRQ;
> there are no assumptions made about the mapping between the two. Afaics at least.
> 
>>> "nr_irqs_gsi" describes what its name says: The number of
>>> IRQs mapping to a (_some_) GSI. That's to tell them from the non-GSI (i.e.
>>> mainly MSI) ones. There's no implication whatsoever on the IRQ <-> GSI
>>> mapping.
>>>
>>>> What's more, when using PHYSDEVOP_setup_gsi, it calls mp_register_gsi,
>>>> and in mp_register_gsi, it uses " desc = irq_to_desc(gsi); " to get irq_desc directly.
>>>
>>> Which may be wrong, while that wrong-ness may not have hit anyone in
>>> practice (for reasons that would need working out).
>>>
>>>> Combining above, can we consider "gsi == irq" when irq < nr_irqs_gsi ?
>>>
>>> Again - no.
>> Since you are certain that they are not equal, could you tell me where show they are not equal or where build their mappings,
>> so that I can know how to do a conversion gsi from irq.
> 
> I did point you at the ACPI Interrupt Source Override structure before.
> We're parsing those in acpi_parse_int_src_ovr(), to give you a place to
> start going from.
Oh! I think I know.
If I want to transform gsi to irq, I need to do below:
	int irq, entry, ioapic, pin;

	ioapic = mp_find_ioapic(gsi);
	pin = gsi - mp_ioapic_routing[ioapic].gsi_base;
	entry = find_irq_entry(ioapic, pin, mp_INT);
	irq = pin_2_irq(entry, ioapic, pin);

Am I right?

> 
> Jan

-- 
Best regards,
Jiqian Chen.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC XEN PATCH v8 5/5] domctl: Add XEN_DOMCTL_gsi_permission to grant gsi
  2024-05-29 11:13                     ` Chen, Jiqian
@ 2024-05-29 12:22                       ` Jan Beulich
  2024-05-30 11:19                         ` Chen, Jiqian
  0 siblings, 1 reply; 49+ messages in thread
From: Jan Beulich @ 2024-05-29 12:22 UTC (permalink / raw
  To: Chen, Jiqian
  Cc: Andrew Cooper, Roger Pau Monné, Wei Liu, George Dunlap,
	Julien Grall, Stefano Stabellini, Anthony PERARD, Juergen Gross,
	Hildebrand, Stewart, Huang, Ray, xen-devel@lists.xenproject.org,
	Daniel P . Smith

On 29.05.2024 13:13, Chen, Jiqian wrote:
> On 2024/5/29 15:10, Jan Beulich wrote:
>> On 29.05.2024 08:56, Chen, Jiqian wrote:
>>> On 2024/5/29 14:31, Jan Beulich wrote:
>>>> On 29.05.2024 04:41, Chen, Jiqian wrote:
>>>>> But I found in function init_irq_data:
>>>>>     for ( irq = 0; irq < nr_irqs_gsi; irq++ )
>>>>>     {
>>>>>         int rc;
>>>>>
>>>>>         desc = irq_to_desc(irq);
>>>>>         desc->irq = irq;
>>>>>
>>>>>         rc = init_one_irq_desc(desc);
>>>>>         if ( rc )
>>>>>             return rc;
>>>>>     }
>>>>> Does it mean that when irq < nr_irqs_gsi, the gsi and irq is a 1:1 mapping?
>>>>
>>>> No, as explained before. I also don't see how you would derive that from the code above.
>>> Because here set desc->irq = irq, and it seems there is no other place to change this desc->irq, so, gsi 1 is considered to irq 1.
>>
>> What are you taking this from? The loop bound isn't nr_gsis, and the iteration
>> variable isn't in GSI space either; it's in IRQ numbering space. In this loop
>> we're merely leveraging that every GSI has a corresponding IRQ;
>> there are no assumptions made about the mapping between the two. Afaics at least.
>>
>>>> "nr_irqs_gsi" describes what its name says: The number of
>>>> IRQs mapping to a (_some_) GSI. That's to tell them from the non-GSI (i.e.
>>>> mainly MSI) ones. There's no implication whatsoever on the IRQ <-> GSI
>>>> mapping.
>>>>
>>>>> What's more, when using PHYSDEVOP_setup_gsi, it calls mp_register_gsi,
>>>>> and in mp_register_gsi, it uses " desc = irq_to_desc(gsi); " to get irq_desc directly.
>>>>
>>>> Which may be wrong, while that wrong-ness may not have hit anyone in
>>>> practice (for reasons that would need working out).
>>>>
>>>>> Combining above, can we consider "gsi == irq" when irq < nr_irqs_gsi ?
>>>>
>>>> Again - no.
>>> Since you are certain that they are not equal, could you tell me where show they are not equal or where build their mappings,
>>> so that I can know how to do a conversion gsi from irq.
>>
>> I did point you at the ACPI Interrupt Source Override structure before.
>> We're parsing those in acpi_parse_int_src_ovr(), to give you a place to
>> start going from.
> Oh! I think I know.
> If I want to transform gsi to irq, I need to do below:
> 	int irq, entry, ioapic, pin;
> 
> 	ioapic = mp_find_ioapic(gsi);
> 	pin = gsi - mp_ioapic_routing[ioapic].gsi_base;
> 	entry = find_irq_entry(ioapic, pin, mp_INT);
> 	irq = pin_2_irq(entry, ioapic, pin);
> 
> Am I right?

This looks plausible, yes.

Jan


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC XEN PATCH v8 5/5] domctl: Add XEN_DOMCTL_gsi_permission to grant gsi
  2024-05-29 12:22                       ` Jan Beulich
@ 2024-05-30 11:19                         ` Chen, Jiqian
  2024-05-30 15:51                           ` Jan Beulich
  0 siblings, 1 reply; 49+ messages in thread
From: Chen, Jiqian @ 2024-05-30 11:19 UTC (permalink / raw
  To: Jan Beulich
  Cc: Andrew Cooper, Roger Pau Monné, Wei Liu, George Dunlap,
	Julien Grall, Stefano Stabellini, Anthony PERARD, Juergen Gross,
	Hildebrand, Stewart, Huang, Ray, xen-devel@lists.xenproject.org,
	Daniel P . Smith, Chen, Jiqian

On 2024/5/29 20:22, Jan Beulich wrote:
> On 29.05.2024 13:13, Chen, Jiqian wrote:
>> On 2024/5/29 15:10, Jan Beulich wrote:
>>> On 29.05.2024 08:56, Chen, Jiqian wrote:
>>>> On 2024/5/29 14:31, Jan Beulich wrote:
>>>>> On 29.05.2024 04:41, Chen, Jiqian wrote:
>>>>>> But I found in function init_irq_data:
>>>>>>     for ( irq = 0; irq < nr_irqs_gsi; irq++ )
>>>>>>     {
>>>>>>         int rc;
>>>>>>
>>>>>>         desc = irq_to_desc(irq);
>>>>>>         desc->irq = irq;
>>>>>>
>>>>>>         rc = init_one_irq_desc(desc);
>>>>>>         if ( rc )
>>>>>>             return rc;
>>>>>>     }
>>>>>> Does it mean that when irq < nr_irqs_gsi, the gsi and irq is a 1:1 mapping?
>>>>>
>>>>> No, as explained before. I also don't see how you would derive that from the code above.
>>>> Because here set desc->irq = irq, and it seems there is no other place to change this desc->irq, so, gsi 1 is considered to irq 1.
>>>
>>> What are you taking this from? The loop bound isn't nr_gsis, and the iteration
>>> variable isn't in GSI space either; it's in IRQ numbering space. In this loop
>>> we're merely leveraging that every GSI has a corresponding IRQ;
>>> there are no assumptions made about the mapping between the two. Afaics at least.
>>>
>>>>> "nr_irqs_gsi" describes what its name says: The number of
>>>>> IRQs mapping to a (_some_) GSI. That's to tell them from the non-GSI (i.e.
>>>>> mainly MSI) ones. There's no implication whatsoever on the IRQ <-> GSI
>>>>> mapping.
>>>>>
>>>>>> What's more, when using PHYSDEVOP_setup_gsi, it calls mp_register_gsi,
>>>>>> and in mp_register_gsi, it uses " desc = irq_to_desc(gsi); " to get irq_desc directly.
>>>>>
>>>>> Which may be wrong, while that wrong-ness may not have hit anyone in
>>>>> practice (for reasons that would need working out).
>>>>>
>>>>>> Combining above, can we consider "gsi == irq" when irq < nr_irqs_gsi ?
>>>>>
>>>>> Again - no.
>>>> Since you are certain that they are not equal, could you tell me where show they are not equal or where build their mappings,
>>>> so that I can know how to do a conversion gsi from irq.
>>>
>>> I did point you at the ACPI Interrupt Source Override structure before.
>>> We're parsing those in acpi_parse_int_src_ovr(), to give you a place to
>>> start going from.
>> Oh! I think I know.
>> If I want to transform gsi to irq, I need to do below:
>> 	int irq, entry, ioapic, pin;
>>
>> 	ioapic = mp_find_ioapic(gsi);
>> 	pin = gsi - mp_ioapic_routing[ioapic].gsi_base;
>> 	entry = find_irq_entry(ioapic, pin, mp_INT);
>> 	irq = pin_2_irq(entry, ioapic, pin);
>>
>> Am I right?
> 
> This looks plausible, yes.
I dump all mpc_config_intsrc of array mp_irqs, it shows:
(XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 0 dstapic 33 dstirq 2
(XEN) find_irq_entry type 3 irqtype 0 irqflag 15 srcbus 0 srcbusirq 9 dstapic 33 dstirq 9
(XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 1 dstapic 33 dstirq 1
(XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 3 dstapic 33 dstirq 3
(XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 4 dstapic 33 dstirq 4
(XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 5 dstapic 33 dstirq 5
(XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 6 dstapic 33 dstirq 6
(XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 7 dstapic 33 dstirq 7
(XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 8 dstapic 33 dstirq 8
(XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 10 dstapic 33 dstirq 10
(XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 11 dstapic 33 dstirq 11
(XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 12 dstapic 33 dstirq 12
(XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 13 dstapic 33 dstirq 13
(XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 14 dstapic 33 dstirq 14
(XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 15 dstapic 33 dstirq 15

It seems only Legacy irq and gsi[0:15] has a mapping in mp_irqs.
Other gsi can be considered 1:1 mapping with irq? Or are there other places reflect the mapping between irq and gsi?
And my code will be:
    case XEN_DOMCTL_gsi_permission:
    {
        unsigned int gsi = domctl->u.gsi_permission.gsi;
        int irq;
        bool allow = domctl->u.gsi_permission.allow_access;

        /*
         * gsi[0,15] is not 1:1 mapping to legacy irq[0,15], it need a
         * transformation. Other gsi is considered be 1:1 mapping to irq
         */
        if ( gsi < 16 )
            irq = gsi_2_irq(gsi);
        else
            irq = gsi;

        /*
         * If current domain is PV or it has PIRQ flag, it has a mapping
         * of gsi, pirq and irq, so it should use XEN_DOMCTL_irq_permission
         * to grant irq permission.
         */
        if ( is_pv_domain(current->domain) || has_pirq(current->domain) )
        {
            ret = -EOPNOTSUPP;
            break;
        }

        if ( gsi >= nr_irqs_gsi || irq < 0 )
        {
            ret = -EINVAL;
            break;
        }

        if ( !irq_access_permitted(current->domain, irq) ||
             xsm_irq_permission(XSM_HOOK, d, irq, allow) )
        {
            ret = -EPERM;
            break;
        }

        if ( allow )
            ret = irq_permit_access(d, irq);
        else
            ret = irq_deny_access(d, irq);
        break;
    }

Is above acceptable?

> 
> Jan

-- 
Best regards,
Jiqian Chen.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC XEN PATCH v8 5/5] domctl: Add XEN_DOMCTL_gsi_permission to grant gsi
  2024-05-30 11:19                         ` Chen, Jiqian
@ 2024-05-30 15:51                           ` Jan Beulich
  2024-06-04  3:04                             ` Chen, Jiqian
  0 siblings, 1 reply; 49+ messages in thread
From: Jan Beulich @ 2024-05-30 15:51 UTC (permalink / raw
  To: Chen, Jiqian
  Cc: Andrew Cooper, Roger Pau Monné, Wei Liu, George Dunlap,
	Julien Grall, Stefano Stabellini, Anthony PERARD, Juergen Gross,
	Hildebrand, Stewart, Huang, Ray, xen-devel@lists.xenproject.org,
	Daniel P . Smith

On 30.05.2024 13:19, Chen, Jiqian wrote:
> On 2024/5/29 20:22, Jan Beulich wrote:
>> On 29.05.2024 13:13, Chen, Jiqian wrote:
>>> On 2024/5/29 15:10, Jan Beulich wrote:
>>>> On 29.05.2024 08:56, Chen, Jiqian wrote:
>>>>> On 2024/5/29 14:31, Jan Beulich wrote:
>>>>>> On 29.05.2024 04:41, Chen, Jiqian wrote:
>>>>>>> But I found in function init_irq_data:
>>>>>>>     for ( irq = 0; irq < nr_irqs_gsi; irq++ )
>>>>>>>     {
>>>>>>>         int rc;
>>>>>>>
>>>>>>>         desc = irq_to_desc(irq);
>>>>>>>         desc->irq = irq;
>>>>>>>
>>>>>>>         rc = init_one_irq_desc(desc);
>>>>>>>         if ( rc )
>>>>>>>             return rc;
>>>>>>>     }
>>>>>>> Does it mean that when irq < nr_irqs_gsi, the gsi and irq is a 1:1 mapping?
>>>>>>
>>>>>> No, as explained before. I also don't see how you would derive that from the code above.
>>>>> Because here set desc->irq = irq, and it seems there is no other place to change this desc->irq, so, gsi 1 is considered to irq 1.
>>>>
>>>> What are you taking this from? The loop bound isn't nr_gsis, and the iteration
>>>> variable isn't in GSI space either; it's in IRQ numbering space. In this loop
>>>> we're merely leveraging that every GSI has a corresponding IRQ;
>>>> there are no assumptions made about the mapping between the two. Afaics at least.
>>>>
>>>>>> "nr_irqs_gsi" describes what its name says: The number of
>>>>>> IRQs mapping to a (_some_) GSI. That's to tell them from the non-GSI (i.e.
>>>>>> mainly MSI) ones. There's no implication whatsoever on the IRQ <-> GSI
>>>>>> mapping.
>>>>>>
>>>>>>> What's more, when using PHYSDEVOP_setup_gsi, it calls mp_register_gsi,
>>>>>>> and in mp_register_gsi, it uses " desc = irq_to_desc(gsi); " to get irq_desc directly.
>>>>>>
>>>>>> Which may be wrong, while that wrong-ness may not have hit anyone in
>>>>>> practice (for reasons that would need working out).
>>>>>>
>>>>>>> Combining above, can we consider "gsi == irq" when irq < nr_irqs_gsi ?
>>>>>>
>>>>>> Again - no.
>>>>> Since you are certain that they are not equal, could you tell me where show they are not equal or where build their mappings,
>>>>> so that I can know how to do a conversion gsi from irq.
>>>>
>>>> I did point you at the ACPI Interrupt Source Override structure before.
>>>> We're parsing those in acpi_parse_int_src_ovr(), to give you a place to
>>>> start going from.
>>> Oh! I think I know.
>>> If I want to transform gsi to irq, I need to do below:
>>> 	int irq, entry, ioapic, pin;
>>>
>>> 	ioapic = mp_find_ioapic(gsi);
>>> 	pin = gsi - mp_ioapic_routing[ioapic].gsi_base;
>>> 	entry = find_irq_entry(ioapic, pin, mp_INT);
>>> 	irq = pin_2_irq(entry, ioapic, pin);
>>>
>>> Am I right?
>>
>> This looks plausible, yes.
> I dump all mpc_config_intsrc of array mp_irqs, it shows:
> (XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 0 dstapic 33 dstirq 2
> (XEN) find_irq_entry type 3 irqtype 0 irqflag 15 srcbus 0 srcbusirq 9 dstapic 33 dstirq 9
> (XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 1 dstapic 33 dstirq 1
> (XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 3 dstapic 33 dstirq 3
> (XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 4 dstapic 33 dstirq 4
> (XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 5 dstapic 33 dstirq 5
> (XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 6 dstapic 33 dstirq 6
> (XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 7 dstapic 33 dstirq 7
> (XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 8 dstapic 33 dstirq 8
> (XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 10 dstapic 33 dstirq 10
> (XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 11 dstapic 33 dstirq 11
> (XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 12 dstapic 33 dstirq 12
> (XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 13 dstapic 33 dstirq 13
> (XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 14 dstapic 33 dstirq 14
> (XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 15 dstapic 33 dstirq 15
> 
> It seems only Legacy irq and gsi[0:15] has a mapping in mp_irqs.
> Other gsi can be considered 1:1 mapping with irq? Or are there other places reflect the mapping between irq and gsi?

It may be uncommon to have overrides for higher GSIs, but I don't think ACPI
disallows that.

Jan


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC XEN PATCH v8 5/5] domctl: Add XEN_DOMCTL_gsi_permission to grant gsi
  2024-05-30 15:51                           ` Jan Beulich
@ 2024-06-04  3:04                             ` Chen, Jiqian
  2024-06-04  5:55                               ` Jan Beulich
  0 siblings, 1 reply; 49+ messages in thread
From: Chen, Jiqian @ 2024-06-04  3:04 UTC (permalink / raw
  To: Jan Beulich
  Cc: Andrew Cooper, Roger Pau Monné, Wei Liu, George Dunlap,
	Julien Grall, Stefano Stabellini, Anthony PERARD, Juergen Gross,
	Hildebrand, Stewart, Huang, Ray, xen-devel@lists.xenproject.org,
	Daniel P . Smith, Chen, Jiqian

On 2024/5/30 23:51, Jan Beulich wrote:
> On 30.05.2024 13:19, Chen, Jiqian wrote:
>> On 2024/5/29 20:22, Jan Beulich wrote:
>>> On 29.05.2024 13:13, Chen, Jiqian wrote:
>>>> On 2024/5/29 15:10, Jan Beulich wrote:
>>>>> On 29.05.2024 08:56, Chen, Jiqian wrote:
>>>>>> On 2024/5/29 14:31, Jan Beulich wrote:
>>>>>>> On 29.05.2024 04:41, Chen, Jiqian wrote:
>>>>>>>> But I found in function init_irq_data:
>>>>>>>>     for ( irq = 0; irq < nr_irqs_gsi; irq++ )
>>>>>>>>     {
>>>>>>>>         int rc;
>>>>>>>>
>>>>>>>>         desc = irq_to_desc(irq);
>>>>>>>>         desc->irq = irq;
>>>>>>>>
>>>>>>>>         rc = init_one_irq_desc(desc);
>>>>>>>>         if ( rc )
>>>>>>>>             return rc;
>>>>>>>>     }
>>>>>>>> Does it mean that when irq < nr_irqs_gsi, the gsi and irq is a 1:1 mapping?
>>>>>>>
>>>>>>> No, as explained before. I also don't see how you would derive that from the code above.
>>>>>> Because here set desc->irq = irq, and it seems there is no other place to change this desc->irq, so, gsi 1 is considered to irq 1.
>>>>>
>>>>> What are you taking this from? The loop bound isn't nr_gsis, and the iteration
>>>>> variable isn't in GSI space either; it's in IRQ numbering space. In this loop
>>>>> we're merely leveraging that every GSI has a corresponding IRQ;
>>>>> there are no assumptions made about the mapping between the two. Afaics at least.
>>>>>
>>>>>>> "nr_irqs_gsi" describes what its name says: The number of
>>>>>>> IRQs mapping to a (_some_) GSI. That's to tell them from the non-GSI (i.e.
>>>>>>> mainly MSI) ones. There's no implication whatsoever on the IRQ <-> GSI
>>>>>>> mapping.
>>>>>>>
>>>>>>>> What's more, when using PHYSDEVOP_setup_gsi, it calls mp_register_gsi,
>>>>>>>> and in mp_register_gsi, it uses " desc = irq_to_desc(gsi); " to get irq_desc directly.
>>>>>>>
>>>>>>> Which may be wrong, while that wrong-ness may not have hit anyone in
>>>>>>> practice (for reasons that would need working out).
>>>>>>>
>>>>>>>> Combining above, can we consider "gsi == irq" when irq < nr_irqs_gsi ?
>>>>>>>
>>>>>>> Again - no.
>>>>>> Since you are certain that they are not equal, could you tell me where show they are not equal or where build their mappings,
>>>>>> so that I can know how to do a conversion gsi from irq.
>>>>>
>>>>> I did point you at the ACPI Interrupt Source Override structure before.
>>>>> We're parsing those in acpi_parse_int_src_ovr(), to give you a place to
>>>>> start going from.
>>>> Oh! I think I know.
>>>> If I want to transform gsi to irq, I need to do below:
>>>> 	int irq, entry, ioapic, pin;
>>>>
>>>> 	ioapic = mp_find_ioapic(gsi);
>>>> 	pin = gsi - mp_ioapic_routing[ioapic].gsi_base;
>>>> 	entry = find_irq_entry(ioapic, pin, mp_INT);
>>>> 	irq = pin_2_irq(entry, ioapic, pin);
>>>>
>>>> Am I right?
>>>
>>> This looks plausible, yes.
>> I dump all mpc_config_intsrc of array mp_irqs, it shows:
>> (XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 0 dstapic 33 dstirq 2
>> (XEN) find_irq_entry type 3 irqtype 0 irqflag 15 srcbus 0 srcbusirq 9 dstapic 33 dstirq 9
>> (XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 1 dstapic 33 dstirq 1
>> (XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 3 dstapic 33 dstirq 3
>> (XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 4 dstapic 33 dstirq 4
>> (XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 5 dstapic 33 dstirq 5
>> (XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 6 dstapic 33 dstirq 6
>> (XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 7 dstapic 33 dstirq 7
>> (XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 8 dstapic 33 dstirq 8
>> (XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 10 dstapic 33 dstirq 10
>> (XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 11 dstapic 33 dstirq 11
>> (XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 12 dstapic 33 dstirq 12
>> (XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 13 dstapic 33 dstirq 13
>> (XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 14 dstapic 33 dstirq 14
>> (XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 15 dstapic 33 dstirq 15
>>
>> It seems only Legacy irq and gsi[0:15] has a mapping in mp_irqs.
>> Other gsi can be considered 1:1 mapping with irq? Or are there other places reflect the mapping between irq and gsi?
> 
> It may be uncommon to have overrides for higher GSIs, but I don't think ACPI
> disallows that.
Do you suggest me to add overrides for higher GSIs into array mp_irqs?

> 
> Jan

-- 
Best regards,
Jiqian Chen.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC XEN PATCH v8 5/5] domctl: Add XEN_DOMCTL_gsi_permission to grant gsi
  2024-06-04  3:04                             ` Chen, Jiqian
@ 2024-06-04  5:55                               ` Jan Beulich
  2024-06-04  6:01                                 ` Chen, Jiqian
  0 siblings, 1 reply; 49+ messages in thread
From: Jan Beulich @ 2024-06-04  5:55 UTC (permalink / raw
  To: Chen, Jiqian
  Cc: Andrew Cooper, Roger Pau Monné, Wei Liu, George Dunlap,
	Julien Grall, Stefano Stabellini, Anthony PERARD, Juergen Gross,
	Hildebrand, Stewart, Huang, Ray, xen-devel@lists.xenproject.org,
	Daniel P . Smith

On 04.06.2024 05:04, Chen, Jiqian wrote:
> On 2024/5/30 23:51, Jan Beulich wrote:
>> On 30.05.2024 13:19, Chen, Jiqian wrote:
>>> On 2024/5/29 20:22, Jan Beulich wrote:
>>>> On 29.05.2024 13:13, Chen, Jiqian wrote:
>>>>> On 2024/5/29 15:10, Jan Beulich wrote:
>>>>>> On 29.05.2024 08:56, Chen, Jiqian wrote:
>>>>>>> On 2024/5/29 14:31, Jan Beulich wrote:
>>>>>>>> On 29.05.2024 04:41, Chen, Jiqian wrote:
>>>>>>>>> But I found in function init_irq_data:
>>>>>>>>>     for ( irq = 0; irq < nr_irqs_gsi; irq++ )
>>>>>>>>>     {
>>>>>>>>>         int rc;
>>>>>>>>>
>>>>>>>>>         desc = irq_to_desc(irq);
>>>>>>>>>         desc->irq = irq;
>>>>>>>>>
>>>>>>>>>         rc = init_one_irq_desc(desc);
>>>>>>>>>         if ( rc )
>>>>>>>>>             return rc;
>>>>>>>>>     }
>>>>>>>>> Does it mean that when irq < nr_irqs_gsi, the gsi and irq is a 1:1 mapping?
>>>>>>>>
>>>>>>>> No, as explained before. I also don't see how you would derive that from the code above.
>>>>>>> Because here set desc->irq = irq, and it seems there is no other place to change this desc->irq, so, gsi 1 is considered to irq 1.
>>>>>>
>>>>>> What are you taking this from? The loop bound isn't nr_gsis, and the iteration
>>>>>> variable isn't in GSI space either; it's in IRQ numbering space. In this loop
>>>>>> we're merely leveraging that every GSI has a corresponding IRQ;
>>>>>> there are no assumptions made about the mapping between the two. Afaics at least.
>>>>>>
>>>>>>>> "nr_irqs_gsi" describes what its name says: The number of
>>>>>>>> IRQs mapping to a (_some_) GSI. That's to tell them from the non-GSI (i.e.
>>>>>>>> mainly MSI) ones. There's no implication whatsoever on the IRQ <-> GSI
>>>>>>>> mapping.
>>>>>>>>
>>>>>>>>> What's more, when using PHYSDEVOP_setup_gsi, it calls mp_register_gsi,
>>>>>>>>> and in mp_register_gsi, it uses " desc = irq_to_desc(gsi); " to get irq_desc directly.
>>>>>>>>
>>>>>>>> Which may be wrong, while that wrong-ness may not have hit anyone in
>>>>>>>> practice (for reasons that would need working out).
>>>>>>>>
>>>>>>>>> Combining above, can we consider "gsi == irq" when irq < nr_irqs_gsi ?
>>>>>>>>
>>>>>>>> Again - no.
>>>>>>> Since you are certain that they are not equal, could you tell me where show they are not equal or where build their mappings,
>>>>>>> so that I can know how to do a conversion gsi from irq.
>>>>>>
>>>>>> I did point you at the ACPI Interrupt Source Override structure before.
>>>>>> We're parsing those in acpi_parse_int_src_ovr(), to give you a place to
>>>>>> start going from.
>>>>> Oh! I think I know.
>>>>> If I want to transform gsi to irq, I need to do below:
>>>>> 	int irq, entry, ioapic, pin;
>>>>>
>>>>> 	ioapic = mp_find_ioapic(gsi);
>>>>> 	pin = gsi - mp_ioapic_routing[ioapic].gsi_base;
>>>>> 	entry = find_irq_entry(ioapic, pin, mp_INT);
>>>>> 	irq = pin_2_irq(entry, ioapic, pin);
>>>>>
>>>>> Am I right?
>>>>
>>>> This looks plausible, yes.
>>> I dump all mpc_config_intsrc of array mp_irqs, it shows:
>>> (XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 0 dstapic 33 dstirq 2
>>> (XEN) find_irq_entry type 3 irqtype 0 irqflag 15 srcbus 0 srcbusirq 9 dstapic 33 dstirq 9
>>> (XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 1 dstapic 33 dstirq 1
>>> (XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 3 dstapic 33 dstirq 3
>>> (XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 4 dstapic 33 dstirq 4
>>> (XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 5 dstapic 33 dstirq 5
>>> (XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 6 dstapic 33 dstirq 6
>>> (XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 7 dstapic 33 dstirq 7
>>> (XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 8 dstapic 33 dstirq 8
>>> (XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 10 dstapic 33 dstirq 10
>>> (XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 11 dstapic 33 dstirq 11
>>> (XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 12 dstapic 33 dstirq 12
>>> (XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 13 dstapic 33 dstirq 13
>>> (XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 14 dstapic 33 dstirq 14
>>> (XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 15 dstapic 33 dstirq 15
>>>
>>> It seems only Legacy irq and gsi[0:15] has a mapping in mp_irqs.
>>> Other gsi can be considered 1:1 mapping with irq? Or are there other places reflect the mapping between irq and gsi?
>>
>> It may be uncommon to have overrides for higher GSIs, but I don't think ACPI
>> disallows that.
> Do you suggest me to add overrides for higher GSIs into array mp_irqs?

Why "add"? That's what mp_override_legacy_irq() already does, isn't it?
Assuming of course any are surfaced at all by ACPI.

Jan


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC XEN PATCH v8 5/5] domctl: Add XEN_DOMCTL_gsi_permission to grant gsi
  2024-06-04  5:55                               ` Jan Beulich
@ 2024-06-04  6:01                                 ` Chen, Jiqian
  2024-06-04  6:12                                   ` Jan Beulich
  2024-06-04  6:19                                   ` Jan Beulich
  0 siblings, 2 replies; 49+ messages in thread
From: Chen, Jiqian @ 2024-06-04  6:01 UTC (permalink / raw
  To: Jan Beulich
  Cc: Andrew Cooper, Roger Pau Monné, Wei Liu, George Dunlap,
	Julien Grall, Stefano Stabellini, Anthony PERARD, Juergen Gross,
	Hildebrand, Stewart, Huang, Ray, xen-devel@lists.xenproject.org,
	Daniel P . Smith, Chen, Jiqian

On 2024/6/4 13:55, Jan Beulich wrote:
> On 04.06.2024 05:04, Chen, Jiqian wrote:
>> On 2024/5/30 23:51, Jan Beulich wrote:
>>> On 30.05.2024 13:19, Chen, Jiqian wrote:
>>>> On 2024/5/29 20:22, Jan Beulich wrote:
>>>>> On 29.05.2024 13:13, Chen, Jiqian wrote:
>>>>>> On 2024/5/29 15:10, Jan Beulich wrote:
>>>>>>> On 29.05.2024 08:56, Chen, Jiqian wrote:
>>>>>>>> On 2024/5/29 14:31, Jan Beulich wrote:
>>>>>>>>> On 29.05.2024 04:41, Chen, Jiqian wrote:
>>>>>>>>>> But I found in function init_irq_data:
>>>>>>>>>>     for ( irq = 0; irq < nr_irqs_gsi; irq++ )
>>>>>>>>>>     {
>>>>>>>>>>         int rc;
>>>>>>>>>>
>>>>>>>>>>         desc = irq_to_desc(irq);
>>>>>>>>>>         desc->irq = irq;
>>>>>>>>>>
>>>>>>>>>>         rc = init_one_irq_desc(desc);
>>>>>>>>>>         if ( rc )
>>>>>>>>>>             return rc;
>>>>>>>>>>     }
>>>>>>>>>> Does it mean that when irq < nr_irqs_gsi, the gsi and irq is a 1:1 mapping?
>>>>>>>>>
>>>>>>>>> No, as explained before. I also don't see how you would derive that from the code above.
>>>>>>>> Because here set desc->irq = irq, and it seems there is no other place to change this desc->irq, so, gsi 1 is considered to irq 1.
>>>>>>>
>>>>>>> What are you taking this from? The loop bound isn't nr_gsis, and the iteration
>>>>>>> variable isn't in GSI space either; it's in IRQ numbering space. In this loop
>>>>>>> we're merely leveraging that every GSI has a corresponding IRQ;
>>>>>>> there are no assumptions made about the mapping between the two. Afaics at least.
>>>>>>>
>>>>>>>>> "nr_irqs_gsi" describes what its name says: The number of
>>>>>>>>> IRQs mapping to a (_some_) GSI. That's to tell them from the non-GSI (i.e.
>>>>>>>>> mainly MSI) ones. There's no implication whatsoever on the IRQ <-> GSI
>>>>>>>>> mapping.
>>>>>>>>>
>>>>>>>>>> What's more, when using PHYSDEVOP_setup_gsi, it calls mp_register_gsi,
>>>>>>>>>> and in mp_register_gsi, it uses " desc = irq_to_desc(gsi); " to get irq_desc directly.
>>>>>>>>>
>>>>>>>>> Which may be wrong, while that wrong-ness may not have hit anyone in
>>>>>>>>> practice (for reasons that would need working out).
>>>>>>>>>
>>>>>>>>>> Combining above, can we consider "gsi == irq" when irq < nr_irqs_gsi ?
>>>>>>>>>
>>>>>>>>> Again - no.
>>>>>>>> Since you are certain that they are not equal, could you tell me where show they are not equal or where build their mappings,
>>>>>>>> so that I can know how to do a conversion gsi from irq.
>>>>>>>
>>>>>>> I did point you at the ACPI Interrupt Source Override structure before.
>>>>>>> We're parsing those in acpi_parse_int_src_ovr(), to give you a place to
>>>>>>> start going from.
>>>>>> Oh! I think I know.
>>>>>> If I want to transform gsi to irq, I need to do below:
>>>>>> 	int irq, entry, ioapic, pin;
>>>>>>
>>>>>> 	ioapic = mp_find_ioapic(gsi);
>>>>>> 	pin = gsi - mp_ioapic_routing[ioapic].gsi_base;
>>>>>> 	entry = find_irq_entry(ioapic, pin, mp_INT);
>>>>>> 	irq = pin_2_irq(entry, ioapic, pin);
>>>>>>
>>>>>> Am I right?
>>>>>
>>>>> This looks plausible, yes.
>>>> I dump all mpc_config_intsrc of array mp_irqs, it shows:
>>>> (XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 0 dstapic 33 dstirq 2
>>>> (XEN) find_irq_entry type 3 irqtype 0 irqflag 15 srcbus 0 srcbusirq 9 dstapic 33 dstirq 9
>>>> (XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 1 dstapic 33 dstirq 1
>>>> (XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 3 dstapic 33 dstirq 3
>>>> (XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 4 dstapic 33 dstirq 4
>>>> (XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 5 dstapic 33 dstirq 5
>>>> (XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 6 dstapic 33 dstirq 6
>>>> (XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 7 dstapic 33 dstirq 7
>>>> (XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 8 dstapic 33 dstirq 8
>>>> (XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 10 dstapic 33 dstirq 10
>>>> (XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 11 dstapic 33 dstirq 11
>>>> (XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 12 dstapic 33 dstirq 12
>>>> (XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 13 dstapic 33 dstirq 13
>>>> (XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 14 dstapic 33 dstirq 14
>>>> (XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 15 dstapic 33 dstirq 15
>>>>
>>>> It seems only Legacy irq and gsi[0:15] has a mapping in mp_irqs.
>>>> Other gsi can be considered 1:1 mapping with irq? Or are there other places reflect the mapping between irq and gsi?
>>>
>>> It may be uncommon to have overrides for higher GSIs, but I don't think ACPI
>>> disallows that.
>> Do you suggest me to add overrides for higher GSIs into array mp_irqs?
> 
> Why "add"? That's what mp_override_legacy_irq() already does, isn't it?
No. mp_override_legacy_irq only overrides for gsi < 16, but not for gsi >= 16(I dump all mappings from array mp_irqs).
In my environment, gsi of my dGPU is 24.
So, how do I process for gsi >= 16?

> Assuming of course any are surfaced at all by ACPI.
> 
> Jan

-- 
Best regards,
Jiqian Chen.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC XEN PATCH v8 5/5] domctl: Add XEN_DOMCTL_gsi_permission to grant gsi
  2024-06-04  6:01                                 ` Chen, Jiqian
@ 2024-06-04  6:12                                   ` Jan Beulich
  2024-06-04  6:33                                     ` Chen, Jiqian
  2024-06-04  6:19                                   ` Jan Beulich
  1 sibling, 1 reply; 49+ messages in thread
From: Jan Beulich @ 2024-06-04  6:12 UTC (permalink / raw
  To: Chen, Jiqian
  Cc: Andrew Cooper, Roger Pau Monné, Wei Liu, George Dunlap,
	Julien Grall, Stefano Stabellini, Anthony PERARD, Juergen Gross,
	Hildebrand, Stewart, Huang, Ray, xen-devel@lists.xenproject.org,
	Daniel P . Smith

On 04.06.2024 08:01, Chen, Jiqian wrote:
> On 2024/6/4 13:55, Jan Beulich wrote:
>> On 04.06.2024 05:04, Chen, Jiqian wrote:
>>> On 2024/5/30 23:51, Jan Beulich wrote:
>>>> On 30.05.2024 13:19, Chen, Jiqian wrote:
>>>>> I dump all mpc_config_intsrc of array mp_irqs, it shows:
>>>>> (XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 0 dstapic 33 dstirq 2
>>>>> (XEN) find_irq_entry type 3 irqtype 0 irqflag 15 srcbus 0 srcbusirq 9 dstapic 33 dstirq 9
>>>>> (XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 1 dstapic 33 dstirq 1
>>>>> (XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 3 dstapic 33 dstirq 3
>>>>> (XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 4 dstapic 33 dstirq 4
>>>>> (XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 5 dstapic 33 dstirq 5
>>>>> (XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 6 dstapic 33 dstirq 6
>>>>> (XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 7 dstapic 33 dstirq 7
>>>>> (XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 8 dstapic 33 dstirq 8
>>>>> (XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 10 dstapic 33 dstirq 10
>>>>> (XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 11 dstapic 33 dstirq 11
>>>>> (XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 12 dstapic 33 dstirq 12
>>>>> (XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 13 dstapic 33 dstirq 13
>>>>> (XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 14 dstapic 33 dstirq 14
>>>>> (XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 15 dstapic 33 dstirq 15
>>>>>
>>>>> It seems only Legacy irq and gsi[0:15] has a mapping in mp_irqs.
>>>>> Other gsi can be considered 1:1 mapping with irq? Or are there other places reflect the mapping between irq and gsi?
>>>>
>>>> It may be uncommon to have overrides for higher GSIs, but I don't think ACPI
>>>> disallows that.
>>> Do you suggest me to add overrides for higher GSIs into array mp_irqs?
>>
>> Why "add"? That's what mp_override_legacy_irq() already does, isn't it?
> No. mp_override_legacy_irq only overrides for gsi < 16, but not for gsi >= 16(I dump all mappings from array mp_irqs).

I assume you mean you observe so ...

> In my environment, gsi of my dGPU is 24.

... on one specific system? The function is invoked from
acpi_parse_int_src_ovr(), and I can't spot any restriction to
IRQs less than 16 there.

Jan


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC XEN PATCH v8 5/5] domctl: Add XEN_DOMCTL_gsi_permission to grant gsi
  2024-06-04  6:01                                 ` Chen, Jiqian
  2024-06-04  6:12                                   ` Jan Beulich
@ 2024-06-04  6:19                                   ` Jan Beulich
  1 sibling, 0 replies; 49+ messages in thread
From: Jan Beulich @ 2024-06-04  6:19 UTC (permalink / raw
  To: Chen, Jiqian
  Cc: Andrew Cooper, Roger Pau Monné, Wei Liu, George Dunlap,
	Julien Grall, Stefano Stabellini, Anthony PERARD, Juergen Gross,
	Hildebrand, Stewart, Huang, Ray, xen-devel@lists.xenproject.org,
	Daniel P . Smith

On 04.06.2024 08:01, Chen, Jiqian wrote:
> So, how do I process for gsi >= 16?

Oh, and to answer this as well: Isn't it as simple as treating
as 1:1 mapped any (valid) GSI you can't find in mp_irqs[]? You
only care about the mapping, not e.g. polarity or trigger mode
here, iirc.

Jan



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC XEN PATCH v8 5/5] domctl: Add XEN_DOMCTL_gsi_permission to grant gsi
  2024-06-04  6:12                                   ` Jan Beulich
@ 2024-06-04  6:33                                     ` Chen, Jiqian
  2024-06-04  6:36                                       ` Jan Beulich
  0 siblings, 1 reply; 49+ messages in thread
From: Chen, Jiqian @ 2024-06-04  6:33 UTC (permalink / raw
  To: Jan Beulich
  Cc: Andrew Cooper, Roger Pau Monné, Wei Liu, George Dunlap,
	Julien Grall, Stefano Stabellini, Anthony PERARD, Juergen Gross,
	Hildebrand, Stewart, Huang, Ray, xen-devel@lists.xenproject.org,
	Daniel P . Smith, Chen, Jiqian

On 2024/6/4 14:12, Jan Beulich wrote:
> On 04.06.2024 08:01, Chen, Jiqian wrote:
>> On 2024/6/4 13:55, Jan Beulich wrote:
>>> On 04.06.2024 05:04, Chen, Jiqian wrote:
>>>> On 2024/5/30 23:51, Jan Beulich wrote:
>>>>> On 30.05.2024 13:19, Chen, Jiqian wrote:
>>>>>> I dump all mpc_config_intsrc of array mp_irqs, it shows:
>>>>>> (XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 0 dstapic 33 dstirq 2
>>>>>> (XEN) find_irq_entry type 3 irqtype 0 irqflag 15 srcbus 0 srcbusirq 9 dstapic 33 dstirq 9
>>>>>> (XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 1 dstapic 33 dstirq 1
>>>>>> (XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 3 dstapic 33 dstirq 3
>>>>>> (XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 4 dstapic 33 dstirq 4
>>>>>> (XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 5 dstapic 33 dstirq 5
>>>>>> (XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 6 dstapic 33 dstirq 6
>>>>>> (XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 7 dstapic 33 dstirq 7
>>>>>> (XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 8 dstapic 33 dstirq 8
>>>>>> (XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 10 dstapic 33 dstirq 10
>>>>>> (XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 11 dstapic 33 dstirq 11
>>>>>> (XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 12 dstapic 33 dstirq 12
>>>>>> (XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 13 dstapic 33 dstirq 13
>>>>>> (XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 14 dstapic 33 dstirq 14
>>>>>> (XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 15 dstapic 33 dstirq 15
>>>>>>
>>>>>> It seems only Legacy irq and gsi[0:15] has a mapping in mp_irqs.
>>>>>> Other gsi can be considered 1:1 mapping with irq? Or are there other places reflect the mapping between irq and gsi?
>>>>>
>>>>> It may be uncommon to have overrides for higher GSIs, but I don't think ACPI
>>>>> disallows that.
>>>> Do you suggest me to add overrides for higher GSIs into array mp_irqs?
>>>
>>> Why "add"? That's what mp_override_legacy_irq() already does, isn't it?
>> No. mp_override_legacy_irq only overrides for gsi < 16, but not for gsi >= 16(I dump all mappings from array mp_irqs).
> 
> I assume you mean you observe so ...
No, after starting xen pvh dom0, I dump all entries from mp_irqs.

> 
>> In my environment, gsi of my dGPU is 24.
> 
> ... on one specific system? The function is invoked from
> acpi_parse_int_src_ovr(), and I can't spot any restriction to
> IRQs less than 16 there.
I didn't see any restriction too, but from the dump results, there are only 16 entries, see previous email. 

> 
> Jan

-- 
Best regards,
Jiqian Chen.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC XEN PATCH v8 5/5] domctl: Add XEN_DOMCTL_gsi_permission to grant gsi
  2024-06-04  6:33                                     ` Chen, Jiqian
@ 2024-06-04  6:36                                       ` Jan Beulich
  2024-06-04  8:18                                         ` Chen, Jiqian
  0 siblings, 1 reply; 49+ messages in thread
From: Jan Beulich @ 2024-06-04  6:36 UTC (permalink / raw
  To: Chen, Jiqian
  Cc: Andrew Cooper, Roger Pau Monné, Wei Liu, George Dunlap,
	Julien Grall, Stefano Stabellini, Anthony PERARD, Juergen Gross,
	Hildebrand, Stewart, Huang, Ray, xen-devel@lists.xenproject.org,
	Daniel P . Smith

On 04.06.2024 08:33, Chen, Jiqian wrote:
> On 2024/6/4 14:12, Jan Beulich wrote:
>> On 04.06.2024 08:01, Chen, Jiqian wrote:
>>> On 2024/6/4 13:55, Jan Beulich wrote:
>>>> On 04.06.2024 05:04, Chen, Jiqian wrote:
>>>>> On 2024/5/30 23:51, Jan Beulich wrote:
>>>>>> On 30.05.2024 13:19, Chen, Jiqian wrote:
>>>>>>> It seems only Legacy irq and gsi[0:15] has a mapping in mp_irqs.
>>>>>>> Other gsi can be considered 1:1 mapping with irq? Or are there other places reflect the mapping between irq and gsi?
>>>>>>
>>>>>> It may be uncommon to have overrides for higher GSIs, but I don't think ACPI
>>>>>> disallows that.
>>>>> Do you suggest me to add overrides for higher GSIs into array mp_irqs?
>>>>
>>>> Why "add"? That's what mp_override_legacy_irq() already does, isn't it?
>>> No. mp_override_legacy_irq only overrides for gsi < 16, but not for gsi >= 16(I dump all mappings from array mp_irqs).
>>
>> I assume you mean you observe so ...
> No, after starting xen pvh dom0, I dump all entries from mp_irqs.

IOW really your answer is "yes" ...

>>> In my environment, gsi of my dGPU is 24.
>>
>> ... on one specific system?

... to this question I raised. Whatever you dump on any number of
systems, there's always the chance that there's another system
where things are different.

>> The function is invoked from
>> acpi_parse_int_src_ovr(), and I can't spot any restriction to
>> IRQs less than 16 there.
> I didn't see any restriction too, but from the dump results, there are only 16 entries, see previous email. 

Hence why I tried to point out that going from observations on a
particular system isn't enough.

Jan


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC XEN PATCH v8 5/5] domctl: Add XEN_DOMCTL_gsi_permission to grant gsi
  2024-06-04  6:36                                       ` Jan Beulich
@ 2024-06-04  8:18                                         ` Chen, Jiqian
  2024-06-04 17:17                                           ` Jan Beulich
  0 siblings, 1 reply; 49+ messages in thread
From: Chen, Jiqian @ 2024-06-04  8:18 UTC (permalink / raw
  To: Jan Beulich
  Cc: Andrew Cooper, Roger Pau Monné, Wei Liu, George Dunlap,
	Julien Grall, Stefano Stabellini, Anthony PERARD, Juergen Gross,
	Hildebrand, Stewart, Huang, Ray, xen-devel@lists.xenproject.org,
	Daniel P . Smith, Chen, Jiqian

On 2024/6/4 14:36, Jan Beulich wrote:
> On 04.06.2024 08:33, Chen, Jiqian wrote:
>> On 2024/6/4 14:12, Jan Beulich wrote:
>>> On 04.06.2024 08:01, Chen, Jiqian wrote:
>>>> On 2024/6/4 13:55, Jan Beulich wrote:
>>>>> On 04.06.2024 05:04, Chen, Jiqian wrote:
>>>>>> On 2024/5/30 23:51, Jan Beulich wrote:
>>>>>>> On 30.05.2024 13:19, Chen, Jiqian wrote:
>>>>>>>> It seems only Legacy irq and gsi[0:15] has a mapping in mp_irqs.
>>>>>>>> Other gsi can be considered 1:1 mapping with irq? Or are there other places reflect the mapping between irq and gsi?
>>>>>>>
>>>>>>> It may be uncommon to have overrides for higher GSIs, but I don't think ACPI
>>>>>>> disallows that.
>>>>>> Do you suggest me to add overrides for higher GSIs into array mp_irqs?
>>>>>
>>>>> Why "add"? That's what mp_override_legacy_irq() already does, isn't it?
>>>> No. mp_override_legacy_irq only overrides for gsi < 16, but not for gsi >= 16(I dump all mappings from array mp_irqs).
>>>
>>> I assume you mean you observe so ...
>> No, after starting xen pvh dom0, I dump all entries from mp_irqs.
> 
> IOW really your answer is "yes" ...
> 
>>>> In my environment, gsi of my dGPU is 24.
>>>
>>> ... on one specific system?
> 
> ... to this question I raised. Whatever you dump on any number of
> systems, there's always the chance that there's another system
> where things are different.
> 
>>> The function is invoked from
>>> acpi_parse_int_src_ovr(), and I can't spot any restriction to
>>> IRQs less than 16 there.
>> I didn't see any restriction too, but from the dump results, there are only 16 entries, see previous email. 
> 
> Hence why I tried to point out that going from observations on a
> particular system isn't enough.
Anyway, I agree with you that I need to get mapping from mp_irqs.
I tried to get more debug information from my environment. And I attach them here, maybe you can find some problems.
acpi_parse_madt_ioapic_entries
	acpi_table_parse_madt(ACPI_MADT_TYPE_INTERRUPT_OVERRIDE, acpi_parse_int_src_ovr, MAX_IRQ_SOURCES);
		acpi_parse_int_src_ovr
			mp_override_legacy_irq
				only process two entries, irq 0 gsi 2 and irq 9 gsi 9
There are only two entries whose type is ACPI_MADT_TYPE_INTERRUPT_OVERRIDE in MADT table. Is it normal?
And
acpi_parse_madt_ioapic_entries
	mp_config_acpi_legacy_irqs
		process the other GSIs(< 16), so that the total number of mp_irqs is 16.

> 
> Jan

-- 
Best regards,
Jiqian Chen.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC XEN PATCH v8 5/5] domctl: Add XEN_DOMCTL_gsi_permission to grant gsi
  2024-06-04  8:18                                         ` Chen, Jiqian
@ 2024-06-04 17:17                                           ` Jan Beulich
  2024-06-05  7:04                                             ` Chen, Jiqian
  0 siblings, 1 reply; 49+ messages in thread
From: Jan Beulich @ 2024-06-04 17:17 UTC (permalink / raw
  To: Chen, Jiqian
  Cc: Andrew Cooper, Roger Pau Monné, Wei Liu, George Dunlap,
	Julien Grall, Stefano Stabellini, Anthony PERARD, Juergen Gross,
	Hildebrand, Stewart, Huang, Ray, xen-devel@lists.xenproject.org,
	Daniel P . Smith

On 04.06.2024 10:18, Chen, Jiqian wrote:
> I tried to get more debug information from my environment. And I attach them here, maybe you can find some problems.
> acpi_parse_madt_ioapic_entries
> 	acpi_table_parse_madt(ACPI_MADT_TYPE_INTERRUPT_OVERRIDE, acpi_parse_int_src_ovr, MAX_IRQ_SOURCES);
> 		acpi_parse_int_src_ovr
> 			mp_override_legacy_irq
> 				only process two entries, irq 0 gsi 2 and irq 9 gsi 9
> There are only two entries whose type is ACPI_MADT_TYPE_INTERRUPT_OVERRIDE in MADT table. Is it normal?

Yes, that's what you'd typically see (or just one such entry).

Jan


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC XEN PATCH v8 5/5] domctl: Add XEN_DOMCTL_gsi_permission to grant gsi
  2024-06-04 17:17                                           ` Jan Beulich
@ 2024-06-05  7:04                                             ` Chen, Jiqian
  2024-06-05 10:09                                               ` Jan Beulich
  0 siblings, 1 reply; 49+ messages in thread
From: Chen, Jiqian @ 2024-06-05  7:04 UTC (permalink / raw
  To: Jan Beulich
  Cc: Andrew Cooper, Roger Pau Monné, Wei Liu, George Dunlap,
	Julien Grall, Stefano Stabellini, Anthony PERARD, Juergen Gross,
	Hildebrand, Stewart, Huang, Ray, xen-devel@lists.xenproject.org,
	Daniel P . Smith, Chen, Jiqian

On 2024/6/5 01:17, Jan Beulich wrote:
> On 04.06.2024 10:18, Chen, Jiqian wrote:
>> I tried to get more debug information from my environment. And I attach them here, maybe you can find some problems.
>> acpi_parse_madt_ioapic_entries
>> 	acpi_table_parse_madt(ACPI_MADT_TYPE_INTERRUPT_OVERRIDE, acpi_parse_int_src_ovr, MAX_IRQ_SOURCES);
>> 		acpi_parse_int_src_ovr
>> 			mp_override_legacy_irq
>> 				only process two entries, irq 0 gsi 2 and irq 9 gsi 9
>> There are only two entries whose type is ACPI_MADT_TYPE_INTERRUPT_OVERRIDE in MADT table. Is it normal?
> 
> Yes, that's what you'd typically see (or just one such entry).
Ok, let me conclude that acpi_parse_int_src_ovr get two entries from MADT table and add them into mp_irqs. They are [irq, gsi][0, 2] and [irq, gsi][9, 9].
Then in the following function mp_config_acpi_legacy_irqs initializes the 1:1 mapping of irq and gsi [0~15 except 2 and 9], and add them into mp_irqs.
But for high GSIs(>= 16), no mapping processing.
Right?

Is it that the Xen hypervisor lacks some handling of high GSIs?
For now, if hypervisor gets a high GSIs, it can't be transformed to irq, because there is no mapping between them.

> 
> Jan

-- 
Best regards,
Jiqian Chen.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC XEN PATCH v8 5/5] domctl: Add XEN_DOMCTL_gsi_permission to grant gsi
  2024-06-05  7:04                                             ` Chen, Jiqian
@ 2024-06-05 10:09                                               ` Jan Beulich
  2024-06-05 10:22                                                 ` Chen, Jiqian
  0 siblings, 1 reply; 49+ messages in thread
From: Jan Beulich @ 2024-06-05 10:09 UTC (permalink / raw
  To: Chen, Jiqian
  Cc: Andrew Cooper, Roger Pau Monné, Wei Liu, George Dunlap,
	Julien Grall, Stefano Stabellini, Anthony PERARD, Juergen Gross,
	Hildebrand, Stewart, Huang, Ray, xen-devel@lists.xenproject.org,
	Daniel P . Smith

On 05.06.2024 09:04, Chen, Jiqian wrote:
> On 2024/6/5 01:17, Jan Beulich wrote:
>> On 04.06.2024 10:18, Chen, Jiqian wrote:
>>> I tried to get more debug information from my environment. And I attach them here, maybe you can find some problems.
>>> acpi_parse_madt_ioapic_entries
>>> 	acpi_table_parse_madt(ACPI_MADT_TYPE_INTERRUPT_OVERRIDE, acpi_parse_int_src_ovr, MAX_IRQ_SOURCES);
>>> 		acpi_parse_int_src_ovr
>>> 			mp_override_legacy_irq
>>> 				only process two entries, irq 0 gsi 2 and irq 9 gsi 9
>>> There are only two entries whose type is ACPI_MADT_TYPE_INTERRUPT_OVERRIDE in MADT table. Is it normal?
>>
>> Yes, that's what you'd typically see (or just one such entry).
> Ok, let me conclude that acpi_parse_int_src_ovr get two entries from MADT table and add them into mp_irqs. They are [irq, gsi][0, 2] and [irq, gsi][9, 9].
> Then in the following function mp_config_acpi_legacy_irqs initializes the 1:1 mapping of irq and gsi [0~15 except 2 and 9], and add them into mp_irqs.
> But for high GSIs(>= 16), no mapping processing.
> Right?

On that specific system of yours - yes. In the general case high GSIs
may have entries, too.

> Is it that the Xen hypervisor lacks some handling of high GSIs?

I don't think so. Unless you can point out something?

> For now, if hypervisor gets a high GSIs, it can't be transformed to irq, because there is no mapping between them.

No, in the absence of a source override (note the word "override") the
default identity mapping applies.

Jan


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC XEN PATCH v8 5/5] domctl: Add XEN_DOMCTL_gsi_permission to grant gsi
  2024-06-05 10:09                                               ` Jan Beulich
@ 2024-06-05 10:22                                                 ` Chen, Jiqian
  2024-06-06  6:14                                                   ` Jan Beulich
  0 siblings, 1 reply; 49+ messages in thread
From: Chen, Jiqian @ 2024-06-05 10:22 UTC (permalink / raw
  To: Jan Beulich
  Cc: Andrew Cooper, Roger Pau Monné, Wei Liu, George Dunlap,
	Julien Grall, Stefano Stabellini, Anthony PERARD, Juergen Gross,
	Hildebrand, Stewart, Huang, Ray, xen-devel@lists.xenproject.org,
	Daniel P . Smith, Chen, Jiqian

On 2024/6/5 18:09, Jan Beulich wrote:
> On 05.06.2024 09:04, Chen, Jiqian wrote:
>> On 2024/6/5 01:17, Jan Beulich wrote:
>>> On 04.06.2024 10:18, Chen, Jiqian wrote:
>>>> I tried to get more debug information from my environment. And I attach them here, maybe you can find some problems.
>>>> acpi_parse_madt_ioapic_entries
>>>> 	acpi_table_parse_madt(ACPI_MADT_TYPE_INTERRUPT_OVERRIDE, acpi_parse_int_src_ovr, MAX_IRQ_SOURCES);
>>>> 		acpi_parse_int_src_ovr
>>>> 			mp_override_legacy_irq
>>>> 				only process two entries, irq 0 gsi 2 and irq 9 gsi 9
>>>> There are only two entries whose type is ACPI_MADT_TYPE_INTERRUPT_OVERRIDE in MADT table. Is it normal?
>>>
>>> Yes, that's what you'd typically see (or just one such entry).
>> Ok, let me conclude that acpi_parse_int_src_ovr get two entries from MADT table and add them into mp_irqs. They are [irq, gsi][0, 2] and [irq, gsi][9, 9].
>> Then in the following function mp_config_acpi_legacy_irqs initializes the 1:1 mapping of irq and gsi [0~15 except 2 and 9], and add them into mp_irqs.
>> But for high GSIs(>= 16), no mapping processing.
>> Right?
> 
> On that specific system of yours - yes. In the general case high GSIs
> may have entries, too.
> 
>> Is it that the Xen hypervisor lacks some handling of high GSIs?
> 
> I don't think so. Unless you can point out something?
Ok, so the implementation is still to get mapping from mp_irqs, I will change in next version.
Thank you.

> 
>> For now, if hypervisor gets a high GSIs, it can't be transformed to irq, because there is no mapping between them.
> 
> No, in the absence of a source override (note the word "override") the
> default identity mapping applies.
What is identity mapping? Like the mp_config_acpi_legacy_irqs does?

> 
> Jan

-- 
Best regards,
Jiqian Chen.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [RFC XEN PATCH v8 5/5] domctl: Add XEN_DOMCTL_gsi_permission to grant gsi
  2024-06-05 10:22                                                 ` Chen, Jiqian
@ 2024-06-06  6:14                                                   ` Jan Beulich
  0 siblings, 0 replies; 49+ messages in thread
From: Jan Beulich @ 2024-06-06  6:14 UTC (permalink / raw
  To: Chen, Jiqian
  Cc: Andrew Cooper, Roger Pau Monné, Wei Liu, George Dunlap,
	Julien Grall, Stefano Stabellini, Anthony PERARD, Juergen Gross,
	Hildebrand, Stewart, Huang, Ray, xen-devel@lists.xenproject.org,
	Daniel P . Smith

On 05.06.2024 12:22, Chen, Jiqian wrote:
> On 2024/6/5 18:09, Jan Beulich wrote:
>> On 05.06.2024 09:04, Chen, Jiqian wrote:
>>> For now, if hypervisor gets a high GSIs, it can't be transformed to irq, because there is no mapping between them.
>>
>> No, in the absence of a source override (note the word "override") the
>> default identity mapping applies.
> What is identity mapping?

GSI == IRQ

Jan


^ permalink raw reply	[flat|nested] 49+ messages in thread

end of thread, other threads:[~2024-06-06  6:15 UTC | newest]

Thread overview: 49+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-05-16  9:52 [XEN PATCH v8 0/5] Support device passthrough when dom0 is PVH on Xen Jiqian Chen
2024-05-16  9:52 ` [XEN PATCH v8 1/5] xen/vpci: Clear all vpci status of device Jiqian Chen
2024-05-16 13:08   ` Jan Beulich
2024-05-17  8:08     ` Chen, Jiqian
2024-05-17  8:20       ` Jan Beulich
2024-05-17  9:28         ` Chen, Jiqian
2024-05-17  9:50           ` Jan Beulich
2024-05-17 10:00             ` Chen, Jiqian
2024-05-17 10:31               ` Jan Beulich
2024-05-17 11:01                 ` Chen, Jiqian
2024-05-17 11:48                   ` Jan Beulich
2024-05-17 10:03             ` Jürgen Groß
2024-05-17 10:09               ` Chen, Jiqian
2024-05-17 13:15       ` Stewart Hildebrand
2024-05-16  9:52 ` [XEN PATCH v8 2/5] x86/pvh: Allow (un)map_pirq when dom0 is PVH Jiqian Chen
2024-05-16 13:29   ` Jan Beulich
2024-05-17  8:44     ` Chen, Jiqian
2024-05-16  9:52 ` [XEN PATCH v8 3/5] x86/pvh: Add PHYSDEVOP_setup_gsi for PVH dom0 Jiqian Chen
2024-05-16 13:49   ` Jan Beulich
2024-05-17  9:00     ` Chen, Jiqian
2024-05-17  9:04       ` Jan Beulich
2024-05-16  9:52 ` [RFC XEN PATCH v8 4/5] tools: Add new function to get gsi from dev Jiqian Chen
2024-05-16  9:52 ` [RFC XEN PATCH v8 5/5] domctl: Add XEN_DOMCTL_gsi_permission to grant gsi Jiqian Chen
2024-05-16 14:01   ` Jan Beulich
2024-05-17 10:45     ` Chen, Jiqian
2024-05-17 10:51       ` Jan Beulich
2024-05-17 11:14         ` Chen, Jiqian
2024-05-17 11:50           ` Jan Beulich
2024-05-29  2:41             ` Chen, Jiqian
2024-05-29  6:31               ` Jan Beulich
2024-05-29  6:56                 ` Chen, Jiqian
2024-05-29  7:10                   ` Jan Beulich
2024-05-29 11:13                     ` Chen, Jiqian
2024-05-29 12:22                       ` Jan Beulich
2024-05-30 11:19                         ` Chen, Jiqian
2024-05-30 15:51                           ` Jan Beulich
2024-06-04  3:04                             ` Chen, Jiqian
2024-06-04  5:55                               ` Jan Beulich
2024-06-04  6:01                                 ` Chen, Jiqian
2024-06-04  6:12                                   ` Jan Beulich
2024-06-04  6:33                                     ` Chen, Jiqian
2024-06-04  6:36                                       ` Jan Beulich
2024-06-04  8:18                                         ` Chen, Jiqian
2024-06-04 17:17                                           ` Jan Beulich
2024-06-05  7:04                                             ` Chen, Jiqian
2024-06-05 10:09                                               ` Jan Beulich
2024-06-05 10:22                                                 ` Chen, Jiqian
2024-06-06  6:14                                                   ` Jan Beulich
2024-06-04  6:19                                   ` Jan Beulich

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).