QEMU-Devel Archive mirror
 help / color / mirror / Atom feed
From: Jason Wang <jasowang@redhat.com>
To: Eugenio Perez Martin <eperezma@redhat.com>
Cc: qemu-devel@nongnu.org, Si-Wei Liu <si-wei.liu@oracle.com>,
	 "Michael S. Tsirkin" <mst@redhat.com>,
	Lei Yang <leiyang@redhat.com>, Peter Xu <peterx@redhat.com>,
	 Jonah Palmer <jonah.palmer@oracle.com>,
	Dragos Tatulea <dtatulea@nvidia.com>
Subject: Re: [RFC 0/2] Identify aliased maps in vdpa SVQ iova_tree
Date: Tue, 14 May 2024 11:56:51 +0800	[thread overview]
Message-ID: <CACGkMEtrPAMb-ZN7AAE8cjEzjZY1Hnm29J7PhUYgwv26=YcdQw@mail.gmail.com> (raw)
In-Reply-To: <CAJaqyWc18UeBHeQSoAFF1u1nkjaAfj0Y85pgSHbhV8xxExjcgg@mail.gmail.com>

On Mon, May 13, 2024 at 5:58 PM Eugenio Perez Martin
<eperezma@redhat.com> wrote:
>
> On Mon, May 13, 2024 at 10:28 AM Jason Wang <jasowang@redhat.com> wrote:
> >
> > On Mon, May 13, 2024 at 2:28 PM Eugenio Perez Martin
> > <eperezma@redhat.com> wrote:
> > >
> > > On Sat, May 11, 2024 at 6:07 AM Jason Wang <jasowang@redhat.com> wrote:
> > > >
> > > > On Fri, May 10, 2024 at 3:16 PM Eugenio Perez Martin
> > > > <eperezma@redhat.com> wrote:
> > > > >
> > > > > On Fri, May 10, 2024 at 6:29 AM Jason Wang <jasowang@redhat.com> wrote:
> > > > > >
> > > > > > On Thu, May 9, 2024 at 3:10 PM Eugenio Perez Martin <eperezma@redhat.com> wrote:
> > > > > > >
> > > > > > > On Thu, May 9, 2024 at 8:27 AM Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > >
> > > > > > > > On Thu, May 9, 2024 at 1:16 AM Eugenio Perez Martin <eperezma@redhat.com> wrote:
> > > > > > > > >
> > > > > > > > > On Wed, May 8, 2024 at 4:29 AM Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > > > >
> > > > > > > > > > On Tue, May 7, 2024 at 6:57 PM Eugenio Perez Martin <eperezma@redhat.com> wrote:
> > > > > > > > > > >
> > > > > > > > > > > On Tue, May 7, 2024 at 9:29 AM Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > On Fri, Apr 12, 2024 at 3:56 PM Eugenio Perez Martin
> > > > > > > > > > > > <eperezma@redhat.com> wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Fri, Apr 12, 2024 at 8:47 AM Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Wed, Apr 10, 2024 at 6:03 PM Eugenio Pérez <eperezma@redhat.com> wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > The guest may have overlapped memory regions, where different GPA leads
> > > > > > > > > > > > > > > to the same HVA.  This causes a problem when overlapped regions
> > > > > > > > > > > > > > > (different GPA but same translated HVA) exists in the tree, as looking
> > > > > > > > > > > > > > > them by HVA will return them twice.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I think I don't understand if there's any side effect for shadow virtqueue?
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > My bad, I totally forgot to put a reference to where this comes from.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Si-Wei found that during initialization this sequences of maps /
> > > > > > > > > > > > > unmaps happens [1]:
> > > > > > > > > > > > >
> > > > > > > > > > > > > HVA                    GPA                IOVA
> > > > > > > > > > > > > -------------------------------------------------------------------------------------------------------------------------
> > > > > > > > > > > > > Map
> > > > > > > > > > > > > [0x7f7903e00000, 0x7f7983e00000)    [0x0, 0x80000000) [0x1000, 0x80000000)
> > > > > > > > > > > > > [0x7f7983e00000, 0x7f9903e00000)    [0x100000000, 0x2080000000)
> > > > > > > > > > > > > [0x80001000, 0x2000001000)
> > > > > > > > > > > > > [0x7f7903ea0000, 0x7f7903ec0000)    [0xfeda0000, 0xfedc0000)
> > > > > > > > > > > > > [0x2000001000, 0x2000021000)
> > > > > > > > > > > > >
> > > > > > > > > > > > > Unmap
> > > > > > > > > > > > > [0x7f7903ea0000, 0x7f7903ec0000)    [0xfeda0000, 0xfedc0000) [0x1000,
> > > > > > > > > > > > > 0x20000) ???
> > > > > > > > > > > > >
> > > > > > > > > > > > > The third HVA range is contained in the first one, but exposed under a
> > > > > > > > > > > > > different GVA (aliased). This is not "flattened" by QEMU, as GPA does
> > > > > > > > > > > > > not overlap, only HVA.
> > > > > > > > > > > > >
> > > > > > > > > > > > > At the third chunk unmap, the current algorithm finds the first chunk,
> > > > > > > > > > > > > not the second one. This series is the way to tell the difference at
> > > > > > > > > > > > > unmap time.
> > > > > > > > > > > > >
> > > > > > > > > > > > > [1] https://lists.nongnu.org/archive/html/qemu-devel/2024-04/msg00079.html
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thanks!
> > > > > > > > > > > >
> > > > > > > > > > > > Ok, I was wondering if we need to store GPA(GIOVA) to HVA mappings in
> > > > > > > > > > > > the iova tree to solve this issue completely. Then there won't be
> > > > > > > > > > > > aliasing issues.
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > I'm ok to explore that route but this has another problem. Both SVQ
> > > > > > > > > > > vrings and CVQ buffers also need to be addressable by VhostIOVATree,
> > > > > > > > > > > and they do not have GPA.
> > > > > > > > > > >
> > > > > > > > > > > At this moment vhost_svq_translate_addr is able to handle this
> > > > > > > > > > > transparently as we translate vaddr to SVQ IOVA. How can we store
> > > > > > > > > > > these new entries? Maybe a (hwaddr)-1 GPA to signal it has no GPA and
> > > > > > > > > > > then a list to go through other entries (SVQ vaddr and CVQ buffers).
> > > > > > > > > >
> > > > > > > > > > This seems to be tricky.
> > > > > > > > > >
> > > > > > > > > > As discussed, it could be another iova tree.
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > Yes but there are many ways to add another IOVATree. Let me expand & recap.
> > > > > > > > >
> > > > > > > > > Option 1 is to simply add another iova tree to VhostShadowVirtqueue.
> > > > > > > > > Let's call it gpa_iova_tree, as opposed to the current iova_tree that
> > > > > > > > > translates from vaddr to SVQ IOVA. To know which one to use is easy at
> > > > > > > > > adding or removing, like in the memory listener, but how to know at
> > > > > > > > > vhost_svq_translate_addr?
> > > > > > > >
> > > > > > > > Then we won't use virtqueue_pop() at all, we need a SVQ version of
> > > > > > > > virtqueue_pop() to translate GPA to SVQ IOVA directly?
> > > > > > > >
> > > > > > >
> > > > > > > The problem is not virtqueue_pop, that's out of the
> > > > > > > vhost_svq_translate_addr. The problem is the need of adding
> > > > > > > conditionals / complexity in all the callers of
> > > > > > >
> > > > > > > > >
> > > > > > > > > The easiest way for me is to rely on memory_region_from_host(). When
> > > > > > > > > vaddr is from the guest, it returns a valid MemoryRegion. When it is
> > > > > > > > > not, it returns NULL. I'm not sure if this is a valid use case, it
> > > > > > > > > just worked in my tests so far.
> > > > > > > > >
> > > > > > > > > Now we have the second problem: The GPA values of the regions of the
> > > > > > > > > two IOVA tree must be unique. We need to be able to find unallocated
> > > > > > > > > regions in SVQ IOVA. At this moment there is only one IOVATree, so
> > > > > > > > > this is done easily by vhost_iova_tree_map_alloc. But it is very
> > > > > > > > > complicated with two trees.
> > > > > > > >
> > > > > > > > Would it be simpler if we decouple the IOVA allocator? For example, we
> > > > > > > > can have a dedicated gtree to track the allocated IOVA ranges. It is
> > > > > > > > shared by both
> > > > > > > >
> > > > > > > > 1) Guest memory (GPA)
> > > > > > > > 2) SVQ virtqueue and buffers
> > > > > > > >
> > > > > > > > And another gtree to track the GPA to IOVA.
> > > > > > > >
> > > > > > > > The SVQ code could use either
> > > > > > > >
> > > > > > > > 1) one linear mappings that contains both SVQ virtqueue and buffers
> > > > > > > >
> > > > > > > > or
> > > > > > > >
> > > > > > > > 2) dynamic IOVA allocation/deallocation helpers
> > > > > > > >
> > > > > > > > So we don't actually need the third gtree for SVQ HVA -> SVQ IOVA?
> > > > > > > >
> > > > > > >
> > > > > > > That's possible, but that scatters the IOVA handling code instead of
> > > > > > > keeping it self-contained in VhostIOVATree.
> > > > > >
> > > > > > To me, the IOVA range/allocation is orthogonal to how IOVA is used.
> > > > > >
> > > > > > An example is the iova allocator in the kernel.
> > > > > >
> > > > > > Note that there's an even simpler IOVA "allocator" in NVME passthrough
> > > > > > code, not sure it is useful here though (haven't had a deep look at
> > > > > > that).
> > > > > >
> > > > >
> > > > > I don't know enough about them to have an opinion. I keep seeing the
> > > > > drawback of needing to synchronize both allocation & adding in all the
> > > > > places we want to modify the IOVATree. At this moment, these are the
> > > > > vhost-vdpa memory listener, the SVQ vring creation and removal, and
> > > > > net CVQ buffers. But it may be more in the future.
> > > > >
> > > > > What are the advantages of keeping these separated that justifies
> > > > > needing to synchronize in all these places, compared with keeping them
> > > > > synchronized in VhostIOVATree?
> > > >
> > > > It doesn't need to be synchronized.
> > > >
> > > > Assuming guest and SVQ shares IOVA range. IOVA only needs to track
> > > > which part of the range has been used.
> > > >
> > >
> > > Not sure if I follow, that is what I mean with "synchronized".
> >
> > Oh right.
> >
> > >
> > > > This simplifies things, we can store GPA->IOVA mappings and SVQ ->
> > > > IOVA mappings separately.
> > > >
> > >
> > > Sorry, I still cannot see the whole picture :).
> > >
> > > Let's assume we have all the GPA mapped to specific IOVA regions, so
> > > we have the first IOVA tree (GPA -> IOVA) filled. Now we enable SVQ
> > > because of the migration. How can we know where we can place SVQ
> > > vrings without having them synchronized?
> >
> > Just allocating a new IOVA range for SVQ?
> >
> > >
> > > At this moment we're using a tree. The tree nature of the current SVQ
> > > IOVA -> VA makes all nodes ordered so it is more or less easy to look
> > > for holes.
> >
> > Yes, iova allocate could still be implemented via a tree.
> >
> > >
> > > Now your proposal uses the SVQ IOVA as tree values. Should we iterate
> > > over all of them, order them, of the two trees, and then look for
> > > holes there?
> >
> > Let me clarify, correct me if I was wrong:
> >
> > 1) IOVA allocator is still implemented via a tree, we just don't need
> > to store how the IOVA is used
> > 2) A dedicated GPA -> IOVA tree, updated via listeners and is used in
> > the datapath SVQ translation
> > 3) A linear mapping or another SVQ -> IOVA tree used for SVQ
> >
>
> Ok, so the part I was missing is that now we have 3 whole trees, with
> somehow redundant information :).

Somehow, it decouples the IOVA usage out of the IOVA allocator. This
might be simple as guests and SVQ may try to share a single IOVA
address space.

>
> In some sense this is simpler than trying to get all the information
> from only two trees. On the bad side, all SVQ calls that allocate some
> region need to remember to add to one of the two other threes. That is
> what I mean by synchronized. But sure, we can go that way.

Just a suggestion, if it turns out to complicate the issue, I'm fine
to go the other way.

Thanks

>
> > Thanks
> >
> > >
> > > > Thanks
> > > >
> > > > >
> > > > > Thanks!
> > > > >
> > > >
> > >
> >
>
>



      reply	other threads:[~2024-05-14  3:58 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-10 10:03 [RFC 0/2] Identify aliased maps in vdpa SVQ iova_tree Eugenio Pérez
2024-04-10 10:03 ` [RFC 1/2] iova_tree: add an id member to DMAMap Eugenio Pérez
2024-04-18 20:46   ` Si-Wei Liu
2024-04-19  8:29     ` Eugenio Perez Martin
2024-04-19 23:49       ` Si-Wei Liu
2024-04-22  8:49         ` Eugenio Perez Martin
2024-04-23 22:20           ` Si-Wei Liu
2024-04-24  7:33             ` Eugenio Perez Martin
2024-04-25 17:43               ` Si-Wei Liu
2024-04-29  8:14                 ` Eugenio Perez Martin
2024-04-29 11:19                   ` Jonah Palmer
2024-04-30 18:11                     ` Eugenio Perez Martin
2024-05-01 22:08                       ` Si-Wei Liu
2024-05-02  6:18                         ` Eugenio Perez Martin
2024-05-07  9:12                           ` Si-Wei Liu
2024-04-30  5:54                   ` Si-Wei Liu
2024-04-30 17:19                     ` Eugenio Perez Martin
2024-05-01 23:13                       ` Si-Wei Liu
2024-05-02  6:44                         ` Eugenio Perez Martin
2024-05-08  0:52                           ` Si-Wei Liu
2024-05-08 15:25                             ` Eugenio Perez Martin
2024-04-10 10:03 ` [RFC 2/2] vdpa: identify aliased maps in iova_tree Eugenio Pérez
2024-04-12  6:46 ` [RFC 0/2] Identify aliased maps in vdpa SVQ iova_tree Jason Wang
2024-04-12  7:56   ` Eugenio Perez Martin
2024-05-07  7:29     ` Jason Wang
2024-05-07 10:56       ` Eugenio Perez Martin
2024-05-08  2:29         ` Jason Wang
2024-05-08 17:15           ` Eugenio Perez Martin
2024-05-09  6:27             ` Jason Wang
2024-05-09  7:10               ` Eugenio Perez Martin
2024-05-10  4:28                 ` Jason Wang
2024-05-10  7:16                   ` Eugenio Perez Martin
2024-05-11  4:00                     ` Jason Wang
2024-05-13  6:27                       ` Eugenio Perez Martin
2024-05-13  8:28                         ` Jason Wang
2024-05-13  9:56                           ` Eugenio Perez Martin
2024-05-14  3:56                             ` Jason Wang [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CACGkMEtrPAMb-ZN7AAE8cjEzjZY1Hnm29J7PhUYgwv26=YcdQw@mail.gmail.com' \
    --to=jasowang@redhat.com \
    --cc=dtatulea@nvidia.com \
    --cc=eperezma@redhat.com \
    --cc=jonah.palmer@oracle.com \
    --cc=leiyang@redhat.com \
    --cc=mst@redhat.com \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=si-wei.liu@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).