Xen-Devel Archive mirror
 help / color / mirror / Atom feed
From: "Roger Pau Monné" <roger.pau@citrix.com>
To: "Marek Marczykowski-Górecki" <marmarek@invisiblethingslab.com>
Cc: xen-devel@lists.xenproject.org, Jan Beulich <jbeulich@suse.com>,
	Andrew Cooper <andrew.cooper3@citrix.com>
Subject: Re: [PATCH v8 2/6] x86/hvm: Allow access to registers on the same page as MSI-X table
Date: Fri, 10 May 2024 16:50:30 +0200	[thread overview]
Message-ID: <Zj40NvNqlJWFjcCb@macbook> (raw)
In-Reply-To: <fbe01c945d75409406ac0b02bc17d44f57a39ccf.1715313192.git-series.marmarek@invisiblethingslab.com>

On Fri, May 10, 2024 at 05:53:22AM +0200, Marek Marczykowski-Górecki wrote:
> Some devices (notably Intel Wifi 6 AX210 card) keep auxiliary registers
> on the same page as MSI-X table. Device model (especially one in
> stubdomain) cannot really handle those, as direct writes to that page is
> refused (page is on the mmio_ro_ranges list). Instead, extend
> msixtbl_mmio_ops to handle such accesses too.
> 
> Doing this, requires correlating read/write location with guest
> MSI-X table address. Since QEMU doesn't map MSI-X table to the guest,
> it requires msixtbl_entry->gtable, which is HVM-only. Similar feature
> for PV would need to be done separately.
> 
> This will be also used to read Pending Bit Array, if it lives on the same
> page, making QEMU not needing /dev/mem access at all (especially helpful
> with lockdown enabled in dom0). If PBA lives on another page, QEMU will
> map it to the guest directly.
> If PBA lives on the same page, discard writes and log a message.
> Technically, writes outside of PBA could be allowed, but at this moment
> the precise location of PBA isn't saved, and also no known device abuses
> the spec in this way (at least yet).
> 
> To access those registers, msixtbl_mmio_ops need the relevant page
> mapped. MSI handling already has infrastructure for that, using fixmap,
> so try to map first/last page of the MSI-X table (if necessary) and save
> their fixmap indexes. Note that msix_get_fixmap() does reference
> counting and reuses existing mapping, so just call it directly, even if
> the page was mapped before. Also, it uses a specific range of fixmap
> indexes which doesn't include 0, so use 0 as default ("not mapped")
> value - which simplifies code a bit.
> 
> Based on assumption that all MSI-X page accesses are handled by Xen, do
> not forward adjacent accesses to other hypothetical ioreq servers, even
> if the access wasn't handled for some reason (failure to map pages etc).
> Relevant places log a message about that already.
> 
> Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>

Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

Just one typo in a logged message, and one comment re the return type
of  adjacent_{read,write}().

> ---
> Changes in v8:
> - rename adjacent_handle to get_adjacent_idx
> - put SBDF at the start of error messages
> - use 0 for ADJACENT_DONT_HANDLE (it's FIX_RESERVED)
> - merge conditions in msixtbl_range into one "if"
> - add assert for address alignment
> - change back to setting pval to ~0UL at the start of adjacent_read
> Changes in v7:
> - simplify logic based on assumption that all access to MSI-X pages are
>   handled by Xen (Roger)
> - move calling adjacent_handle() into adjacent_{read,write}() (Roger)
> - move range check into msixtbl_addr_to_desc() (Roger)
> - fix off-by-one when initializing adj_access_idx[ADJ_IDX_LAST] (Roger)
> - no longer distinguish between unhandled write due to PBA nearby and
>   other reasons
> - add missing break after ASSERT_UNREACHABLE (Jan)
> Changes in v6:
> - use MSIX_CHECK_WARN macro
> - extend assert on fixmap_idx
> - add break in default label, after ASSERT_UNREACHABLE(), and move
>   setting default there
> - style fixes
> Changes in v5:
> - style fixes
> - include GCC version in the commit message
> - warn only once (per domain, per device) about failed adjacent access
> Changes in v4:
> - drop same_page parameter of msixtbl_find_entry(), distinguish two
>   cases in relevant callers
> - rename adj_access_table_idx to adj_access_idx
> - code style fixes
> - drop alignment check in adjacent_{read,write}() - all callers already
>   have it earlier
> - delay mapping first/last MSI-X pages until preparing device for a
>   passthrough
> v3:
>  - merge handling into msixtbl_mmio_ops
>  - extend commit message
> v2:
>  - adjust commit message
>  - pass struct domain to msixtbl_page_handler_get_hwaddr()
>  - reduce local variables used only once
>  - log a warning if write is forbidden if MSI-X and PBA lives on the same
>    page
>  - do not passthrough unaligned accesses
>  - handle accesses both before and after MSI-X table
> ---
>  xen/arch/x86/hvm/vmsi.c        | 208 ++++++++++++++++++++++++++++++++--
>  xen/arch/x86/include/asm/msi.h |   5 +-
>  xen/arch/x86/msi.c             |  42 +++++++-
>  3 files changed, 245 insertions(+), 10 deletions(-)
> 
> diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c
> index 999917983789..d506d6adaaf6 100644
> --- a/xen/arch/x86/hvm/vmsi.c
> +++ b/xen/arch/x86/hvm/vmsi.c
> @@ -180,6 +180,10 @@ static bool msixtbl_initialised(const struct domain *d)
>      return d->arch.hvm.msixtbl_list.next;
>  }
>  
> +/*
> + * Lookup an msixtbl_entry on the same page as given addr. It's up to the
> + * caller to check if address is strictly part of the table - if relevant.
> + */
>  static struct msixtbl_entry *msixtbl_find_entry(
>      struct vcpu *v, unsigned long addr)
>  {
> @@ -187,8 +191,8 @@ static struct msixtbl_entry *msixtbl_find_entry(
>      struct domain *d = v->domain;
>  
>      list_for_each_entry( entry, &d->arch.hvm.msixtbl_list, list )
> -        if ( addr >= entry->gtable &&
> -             addr < entry->gtable + entry->table_len )
> +        if ( PFN_DOWN(addr) >= PFN_DOWN(entry->gtable) &&
> +             PFN_DOWN(addr) <= PFN_DOWN(entry->gtable + entry->table_len - 1) )
>              return entry;
>  
>      return NULL;
> @@ -203,6 +207,10 @@ static struct msi_desc *msixtbl_addr_to_desc(
>      if ( !entry || !entry->pdev )
>          return NULL;
>  
> +    if ( addr < entry->gtable ||
> +         addr >= entry->gtable + entry->table_len )
> +        return NULL;
> +
>      nr_entry = (addr - entry->gtable) / PCI_MSIX_ENTRY_SIZE;
>  
>      list_for_each_entry( desc, &entry->pdev->msi_list, list )
> @@ -213,6 +221,157 @@ static struct msi_desc *msixtbl_addr_to_desc(
>      return NULL;
>  }
>  
> +/*
> + * Returns:
> + *  - 0 (FIX_RESERVED) if no handling should be done
> + *  - a fixmap idx to use for handling
> + */
> +static unsigned int get_adjacent_idx(
> +    const struct msixtbl_entry *entry, unsigned long addr, bool write)
> +{
> +    unsigned int adj_type;
> +    struct arch_msix *msix;
> +
> +    if ( !entry || !entry->pdev )
> +    {
> +        ASSERT_UNREACHABLE();
> +        return 0;
> +    }
> +
> +    if ( PFN_DOWN(addr) == PFN_DOWN(entry->gtable) && addr < entry->gtable )
> +        adj_type = ADJ_IDX_FIRST;
> +    else if ( PFN_DOWN(addr) == PFN_DOWN(entry->gtable + entry->table_len - 1) &&
> +              addr >= entry->gtable + entry->table_len )
> +        adj_type = ADJ_IDX_LAST;
> +    else
> +    {
> +        /* All callers should already do equivalent range checking. */
> +        ASSERT_UNREACHABLE();
> +        return 0;
> +    }
> +
> +    msix = entry->pdev->msix;
> +    if ( !msix )
> +    {
> +        ASSERT_UNREACHABLE();
> +        return 0;
> +    }
> +
> +    if ( !msix->adj_access_idx[adj_type] )
> +    {
> +        if ( MSIX_CHECK_WARN(msix, entry->pdev->domain->domain_id,
> +                             adjacent_not_initialized) )
> +            gprintk(XENLOG_WARNING,
> +                    "%pp: Page for adjacent(%d) MSI-X table access not initialized (addr %#lx, gtable %#lx))\n",
                                                                                                             ^ extra )
> +                    &entry->pdev->sbdf, adj_type, addr, entry->gtable);
> +        return 0;
> +    }
> +
> +    /* If PBA lives on the same page too, discard writes. */
> +    if ( write &&
> +         ((adj_type == ADJ_IDX_LAST &&
> +           msix->table.last == msix->pba.first) ||
> +          (adj_type == ADJ_IDX_FIRST &&
> +           msix->table.first == msix->pba.last)) )
> +    {
> +        if ( MSIX_CHECK_WARN(msix, entry->pdev->domain->domain_id,
> +                             adjacent_pba) )
> +            gprintk(XENLOG_WARNING,
> +                    "%pp: MSI-X table and PBA share a page, "
> +                    "discard write to adjacent memory (%#lx)\n",
> +                    &entry->pdev->sbdf, addr);
> +        return 0;
> +    }
> +
> +    return msix->adj_access_idx[adj_type];
> +}
> +
> +static int adjacent_read(
> +    const struct msixtbl_entry *entry,
> +    paddr_t address, unsigned int len, uint64_t *pval)
> +{
> +    const void __iomem *hwaddr;
> +    unsigned int fixmap_idx;
> +
> +    ASSERT(IS_ALIGNED(address, len));
> +
> +    *pval = ~0UL;
> +
> +    fixmap_idx = get_adjacent_idx(entry, address, false);
> +
> +    if ( !fixmap_idx )
> +        return X86EMUL_OKAY;
> +
> +    hwaddr = fix_to_virt(fixmap_idx) + PAGE_OFFSET(address);
> +
> +    switch ( len )
> +    {
> +    case 1:
> +        *pval = readb(hwaddr);
> +        break;
> +
> +    case 2:
> +        *pval = readw(hwaddr);
> +        break;
> +
> +    case 4:
> +        *pval = readl(hwaddr);
> +        break;
> +
> +    case 8:
> +        *pval = readq(hwaddr);
> +        break;
> +
> +    default:
> +        ASSERT_UNREACHABLE();
> +        break;
> +    }
> +
> +    return X86EMUL_OKAY;
> +}
> +
> +static int adjacent_write(
> +    const struct msixtbl_entry *entry,
> +    paddr_t address, unsigned int len, uint64_t val)
> +{
> +    void __iomem *hwaddr;
> +    unsigned int fixmap_idx;
> +
> +    ASSERT(IS_ALIGNED(address, len));
> +
> +    fixmap_idx = get_adjacent_idx(entry, address, true);
> +
> +    if ( !fixmap_idx )
> +        return X86EMUL_OKAY;
> +
> +    hwaddr = fix_to_virt(fixmap_idx) + PAGE_OFFSET(address);
> +
> +    switch ( len )
> +    {
> +    case 1:
> +        writeb(val, hwaddr);
> +        break;
> +
> +    case 2:
> +        writew(val, hwaddr);
> +        break;
> +
> +    case 4:
> +        writel(val, hwaddr);
> +        break;
> +
> +    case 8:
> +        writeq(val, hwaddr);
> +        break;
> +
> +    default:
> +        ASSERT_UNREACHABLE();
> +        break;
> +    }
> +
> +    return X86EMUL_OKAY;

Since adjacent_{read,write}() unconditionally return X86EMUL_OKAY now
they could as well just return void.

I'm fine with leaving it like this, unless the committer doesn't mind
adjusting on commit.  I should have requested earlier to make the
function return void.

Thanks, Roger.


  reply	other threads:[~2024-05-10 14:50 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-10  3:53 [PATCH v8 0/6] MSI-X support with qemu in stubdomain, and other related changes Marek Marczykowski-Górecki
2024-05-10  3:53 ` [PATCH v8 1/6] x86/msi: Extend per-domain/device warning mechanism Marek Marczykowski-Górecki
2024-05-10  3:53 ` [PATCH v8 2/6] x86/hvm: Allow access to registers on the same page as MSI-X table Marek Marczykowski-Górecki
2024-05-10 14:50   ` Roger Pau Monné [this message]
2024-05-10  3:53 ` [PATCH v8 3/6] automation: prevent QEMU access to /dev/mem in PCI passthrough tests Marek Marczykowski-Górecki
2024-05-10  3:53 ` [PATCH v8 4/6] automation: switch to a wifi card on ADL system Marek Marczykowski-Górecki
2024-05-10  3:53 ` [PATCH v8 5/6] [DO NOT APPLY] switch to qemu fork Marek Marczykowski-Górecki
2024-05-10  3:53 ` [PATCH v8 6/6] [DO NOT APPLY] switch to alternative artifact repo Marek Marczykowski-Górecki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Zj40NvNqlJWFjcCb@macbook \
    --to=roger.pau@citrix.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=jbeulich@suse.com \
    --cc=marmarek@invisiblethingslab.com \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).