QEMU-Devel Archive mirror
 help / color / mirror / Atom feed
From: Eugenio Perez Martin <eperezma@redhat.com>
To: Jason Wang <jasowang@redhat.com>
Cc: qemu-devel@nongnu.org, Si-Wei Liu <si-wei.liu@oracle.com>,
	 "Michael S. Tsirkin" <mst@redhat.com>,
	Lei Yang <leiyang@redhat.com>, Peter Xu <peterx@redhat.com>,
	 Jonah Palmer <jonah.palmer@oracle.com>,
	Dragos Tatulea <dtatulea@nvidia.com>
Subject: Re: [RFC 0/2] Identify aliased maps in vdpa SVQ iova_tree
Date: Mon, 13 May 2024 08:27:24 +0200	[thread overview]
Message-ID: <CAJaqyWeKfVXYj61sgvFrUTpOgy0k-zsLoR4JePEo0Q8XuXYbmA@mail.gmail.com> (raw)
In-Reply-To: <CACGkMEs=-teddtO4ctLdJiwm2gu3sZrKOww-TC+5o2_19Sph4w@mail.gmail.com>

On Sat, May 11, 2024 at 6:07 AM Jason Wang <jasowang@redhat.com> wrote:
>
> On Fri, May 10, 2024 at 3:16 PM Eugenio Perez Martin
> <eperezma@redhat.com> wrote:
> >
> > On Fri, May 10, 2024 at 6:29 AM Jason Wang <jasowang@redhat.com> wrote:
> > >
> > > On Thu, May 9, 2024 at 3:10 PM Eugenio Perez Martin <eperezma@redhat.com> wrote:
> > > >
> > > > On Thu, May 9, 2024 at 8:27 AM Jason Wang <jasowang@redhat.com> wrote:
> > > > >
> > > > > On Thu, May 9, 2024 at 1:16 AM Eugenio Perez Martin <eperezma@redhat.com> wrote:
> > > > > >
> > > > > > On Wed, May 8, 2024 at 4:29 AM Jason Wang <jasowang@redhat.com> wrote:
> > > > > > >
> > > > > > > On Tue, May 7, 2024 at 6:57 PM Eugenio Perez Martin <eperezma@redhat.com> wrote:
> > > > > > > >
> > > > > > > > On Tue, May 7, 2024 at 9:29 AM Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > > >
> > > > > > > > > On Fri, Apr 12, 2024 at 3:56 PM Eugenio Perez Martin
> > > > > > > > > <eperezma@redhat.com> wrote:
> > > > > > > > > >
> > > > > > > > > > On Fri, Apr 12, 2024 at 8:47 AM Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > > > > >
> > > > > > > > > > > On Wed, Apr 10, 2024 at 6:03 PM Eugenio Pérez <eperezma@redhat.com> wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > The guest may have overlapped memory regions, where different GPA leads
> > > > > > > > > > > > to the same HVA.  This causes a problem when overlapped regions
> > > > > > > > > > > > (different GPA but same translated HVA) exists in the tree, as looking
> > > > > > > > > > > > them by HVA will return them twice.
> > > > > > > > > > >
> > > > > > > > > > > I think I don't understand if there's any side effect for shadow virtqueue?
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > My bad, I totally forgot to put a reference to where this comes from.
> > > > > > > > > >
> > > > > > > > > > Si-Wei found that during initialization this sequences of maps /
> > > > > > > > > > unmaps happens [1]:
> > > > > > > > > >
> > > > > > > > > > HVA                    GPA                IOVA
> > > > > > > > > > -------------------------------------------------------------------------------------------------------------------------
> > > > > > > > > > Map
> > > > > > > > > > [0x7f7903e00000, 0x7f7983e00000)    [0x0, 0x80000000) [0x1000, 0x80000000)
> > > > > > > > > > [0x7f7983e00000, 0x7f9903e00000)    [0x100000000, 0x2080000000)
> > > > > > > > > > [0x80001000, 0x2000001000)
> > > > > > > > > > [0x7f7903ea0000, 0x7f7903ec0000)    [0xfeda0000, 0xfedc0000)
> > > > > > > > > > [0x2000001000, 0x2000021000)
> > > > > > > > > >
> > > > > > > > > > Unmap
> > > > > > > > > > [0x7f7903ea0000, 0x7f7903ec0000)    [0xfeda0000, 0xfedc0000) [0x1000,
> > > > > > > > > > 0x20000) ???
> > > > > > > > > >
> > > > > > > > > > The third HVA range is contained in the first one, but exposed under a
> > > > > > > > > > different GVA (aliased). This is not "flattened" by QEMU, as GPA does
> > > > > > > > > > not overlap, only HVA.
> > > > > > > > > >
> > > > > > > > > > At the third chunk unmap, the current algorithm finds the first chunk,
> > > > > > > > > > not the second one. This series is the way to tell the difference at
> > > > > > > > > > unmap time.
> > > > > > > > > >
> > > > > > > > > > [1] https://lists.nongnu.org/archive/html/qemu-devel/2024-04/msg00079.html
> > > > > > > > > >
> > > > > > > > > > Thanks!
> > > > > > > > >
> > > > > > > > > Ok, I was wondering if we need to store GPA(GIOVA) to HVA mappings in
> > > > > > > > > the iova tree to solve this issue completely. Then there won't be
> > > > > > > > > aliasing issues.
> > > > > > > > >
> > > > > > > >
> > > > > > > > I'm ok to explore that route but this has another problem. Both SVQ
> > > > > > > > vrings and CVQ buffers also need to be addressable by VhostIOVATree,
> > > > > > > > and they do not have GPA.
> > > > > > > >
> > > > > > > > At this moment vhost_svq_translate_addr is able to handle this
> > > > > > > > transparently as we translate vaddr to SVQ IOVA. How can we store
> > > > > > > > these new entries? Maybe a (hwaddr)-1 GPA to signal it has no GPA and
> > > > > > > > then a list to go through other entries (SVQ vaddr and CVQ buffers).
> > > > > > >
> > > > > > > This seems to be tricky.
> > > > > > >
> > > > > > > As discussed, it could be another iova tree.
> > > > > > >
> > > > > >
> > > > > > Yes but there are many ways to add another IOVATree. Let me expand & recap.
> > > > > >
> > > > > > Option 1 is to simply add another iova tree to VhostShadowVirtqueue.
> > > > > > Let's call it gpa_iova_tree, as opposed to the current iova_tree that
> > > > > > translates from vaddr to SVQ IOVA. To know which one to use is easy at
> > > > > > adding or removing, like in the memory listener, but how to know at
> > > > > > vhost_svq_translate_addr?
> > > > >
> > > > > Then we won't use virtqueue_pop() at all, we need a SVQ version of
> > > > > virtqueue_pop() to translate GPA to SVQ IOVA directly?
> > > > >
> > > >
> > > > The problem is not virtqueue_pop, that's out of the
> > > > vhost_svq_translate_addr. The problem is the need of adding
> > > > conditionals / complexity in all the callers of
> > > >
> > > > > >
> > > > > > The easiest way for me is to rely on memory_region_from_host(). When
> > > > > > vaddr is from the guest, it returns a valid MemoryRegion. When it is
> > > > > > not, it returns NULL. I'm not sure if this is a valid use case, it
> > > > > > just worked in my tests so far.
> > > > > >
> > > > > > Now we have the second problem: The GPA values of the regions of the
> > > > > > two IOVA tree must be unique. We need to be able to find unallocated
> > > > > > regions in SVQ IOVA. At this moment there is only one IOVATree, so
> > > > > > this is done easily by vhost_iova_tree_map_alloc. But it is very
> > > > > > complicated with two trees.
> > > > >
> > > > > Would it be simpler if we decouple the IOVA allocator? For example, we
> > > > > can have a dedicated gtree to track the allocated IOVA ranges. It is
> > > > > shared by both
> > > > >
> > > > > 1) Guest memory (GPA)
> > > > > 2) SVQ virtqueue and buffers
> > > > >
> > > > > And another gtree to track the GPA to IOVA.
> > > > >
> > > > > The SVQ code could use either
> > > > >
> > > > > 1) one linear mappings that contains both SVQ virtqueue and buffers
> > > > >
> > > > > or
> > > > >
> > > > > 2) dynamic IOVA allocation/deallocation helpers
> > > > >
> > > > > So we don't actually need the third gtree for SVQ HVA -> SVQ IOVA?
> > > > >
> > > >
> > > > That's possible, but that scatters the IOVA handling code instead of
> > > > keeping it self-contained in VhostIOVATree.
> > >
> > > To me, the IOVA range/allocation is orthogonal to how IOVA is used.
> > >
> > > An example is the iova allocator in the kernel.
> > >
> > > Note that there's an even simpler IOVA "allocator" in NVME passthrough
> > > code, not sure it is useful here though (haven't had a deep look at
> > > that).
> > >
> >
> > I don't know enough about them to have an opinion. I keep seeing the
> > drawback of needing to synchronize both allocation & adding in all the
> > places we want to modify the IOVATree. At this moment, these are the
> > vhost-vdpa memory listener, the SVQ vring creation and removal, and
> > net CVQ buffers. But it may be more in the future.
> >
> > What are the advantages of keeping these separated that justifies
> > needing to synchronize in all these places, compared with keeping them
> > synchronized in VhostIOVATree?
>
> It doesn't need to be synchronized.
>
> Assuming guest and SVQ shares IOVA range. IOVA only needs to track
> which part of the range has been used.
>

Not sure if I follow, that is what I mean with "synchronized".

> This simplifies things, we can store GPA->IOVA mappings and SVQ ->
> IOVA mappings separately.
>

Sorry, I still cannot see the whole picture :).

Let's assume we have all the GPA mapped to specific IOVA regions, so
we have the first IOVA tree (GPA -> IOVA) filled. Now we enable SVQ
because of the migration. How can we know where we can place SVQ
vrings without having them synchronized?

At this moment we're using a tree. The tree nature of the current SVQ
IOVA -> VA makes all nodes ordered so it is more or less easy to look
for holes.

Now your proposal uses the SVQ IOVA as tree values. Should we iterate
over all of them, order them, of the two trees, and then look for
holes there?

> Thanks
>
> >
> > Thanks!
> >
>



  reply	other threads:[~2024-05-13  6:29 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-10 10:03 [RFC 0/2] Identify aliased maps in vdpa SVQ iova_tree Eugenio Pérez
2024-04-10 10:03 ` [RFC 1/2] iova_tree: add an id member to DMAMap Eugenio Pérez
2024-04-18 20:46   ` Si-Wei Liu
2024-04-19  8:29     ` Eugenio Perez Martin
2024-04-19 23:49       ` Si-Wei Liu
2024-04-22  8:49         ` Eugenio Perez Martin
2024-04-23 22:20           ` Si-Wei Liu
2024-04-24  7:33             ` Eugenio Perez Martin
2024-04-25 17:43               ` Si-Wei Liu
2024-04-29  8:14                 ` Eugenio Perez Martin
2024-04-29 11:19                   ` Jonah Palmer
2024-04-30 18:11                     ` Eugenio Perez Martin
2024-05-01 22:08                       ` Si-Wei Liu
2024-05-02  6:18                         ` Eugenio Perez Martin
2024-05-07  9:12                           ` Si-Wei Liu
2024-04-30  5:54                   ` Si-Wei Liu
2024-04-30 17:19                     ` Eugenio Perez Martin
2024-05-01 23:13                       ` Si-Wei Liu
2024-05-02  6:44                         ` Eugenio Perez Martin
2024-05-08  0:52                           ` Si-Wei Liu
2024-05-08 15:25                             ` Eugenio Perez Martin
2024-04-10 10:03 ` [RFC 2/2] vdpa: identify aliased maps in iova_tree Eugenio Pérez
2024-04-12  6:46 ` [RFC 0/2] Identify aliased maps in vdpa SVQ iova_tree Jason Wang
2024-04-12  7:56   ` Eugenio Perez Martin
2024-05-07  7:29     ` Jason Wang
2024-05-07 10:56       ` Eugenio Perez Martin
2024-05-08  2:29         ` Jason Wang
2024-05-08 17:15           ` Eugenio Perez Martin
2024-05-09  6:27             ` Jason Wang
2024-05-09  7:10               ` Eugenio Perez Martin
2024-05-10  4:28                 ` Jason Wang
2024-05-10  7:16                   ` Eugenio Perez Martin
2024-05-11  4:00                     ` Jason Wang
2024-05-13  6:27                       ` Eugenio Perez Martin [this message]
2024-05-13  8:28                         ` Jason Wang
2024-05-13  9:56                           ` Eugenio Perez Martin
2024-05-14  3:56                             ` Jason Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAJaqyWeKfVXYj61sgvFrUTpOgy0k-zsLoR4JePEo0Q8XuXYbmA@mail.gmail.com \
    --to=eperezma@redhat.com \
    --cc=dtatulea@nvidia.com \
    --cc=jasowang@redhat.com \
    --cc=jonah.palmer@oracle.com \
    --cc=leiyang@redhat.com \
    --cc=mst@redhat.com \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=si-wei.liu@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).