From: Eugenio Perez Martin <eperezma@redhat.com>
To: Jason Wang <jasowang@redhat.com>
Cc: qemu-devel@nongnu.org, Si-Wei Liu <si-wei.liu@oracle.com>,
"Michael S. Tsirkin" <mst@redhat.com>,
Lei Yang <leiyang@redhat.com>, Peter Xu <peterx@redhat.com>,
Jonah Palmer <jonah.palmer@oracle.com>,
Dragos Tatulea <dtatulea@nvidia.com>
Subject: Re: [RFC 0/2] Identify aliased maps in vdpa SVQ iova_tree
Date: Thu, 9 May 2024 09:10:02 +0200 [thread overview]
Message-ID: <CAJaqyWdA_6Mx3mkcobmBjB5NDJt3tyqTJv2JijF0agnnBFxQxw@mail.gmail.com> (raw)
In-Reply-To: <CACGkMEvdBDFvwvqb_7YXqiPd-ax4Xw7e0BLBhCt_uD6-Uf+DgA@mail.gmail.com>
On Thu, May 9, 2024 at 8:27 AM Jason Wang <jasowang@redhat.com> wrote:
>
> On Thu, May 9, 2024 at 1:16 AM Eugenio Perez Martin <eperezma@redhat.com> wrote:
> >
> > On Wed, May 8, 2024 at 4:29 AM Jason Wang <jasowang@redhat.com> wrote:
> > >
> > > On Tue, May 7, 2024 at 6:57 PM Eugenio Perez Martin <eperezma@redhat.com> wrote:
> > > >
> > > > On Tue, May 7, 2024 at 9:29 AM Jason Wang <jasowang@redhat.com> wrote:
> > > > >
> > > > > On Fri, Apr 12, 2024 at 3:56 PM Eugenio Perez Martin
> > > > > <eperezma@redhat.com> wrote:
> > > > > >
> > > > > > On Fri, Apr 12, 2024 at 8:47 AM Jason Wang <jasowang@redhat.com> wrote:
> > > > > > >
> > > > > > > On Wed, Apr 10, 2024 at 6:03 PM Eugenio Pérez <eperezma@redhat.com> wrote:
> > > > > > > >
> > > > > > > > The guest may have overlapped memory regions, where different GPA leads
> > > > > > > > to the same HVA. This causes a problem when overlapped regions
> > > > > > > > (different GPA but same translated HVA) exists in the tree, as looking
> > > > > > > > them by HVA will return them twice.
> > > > > > >
> > > > > > > I think I don't understand if there's any side effect for shadow virtqueue?
> > > > > > >
> > > > > >
> > > > > > My bad, I totally forgot to put a reference to where this comes from.
> > > > > >
> > > > > > Si-Wei found that during initialization this sequences of maps /
> > > > > > unmaps happens [1]:
> > > > > >
> > > > > > HVA GPA IOVA
> > > > > > -------------------------------------------------------------------------------------------------------------------------
> > > > > > Map
> > > > > > [0x7f7903e00000, 0x7f7983e00000) [0x0, 0x80000000) [0x1000, 0x80000000)
> > > > > > [0x7f7983e00000, 0x7f9903e00000) [0x100000000, 0x2080000000)
> > > > > > [0x80001000, 0x2000001000)
> > > > > > [0x7f7903ea0000, 0x7f7903ec0000) [0xfeda0000, 0xfedc0000)
> > > > > > [0x2000001000, 0x2000021000)
> > > > > >
> > > > > > Unmap
> > > > > > [0x7f7903ea0000, 0x7f7903ec0000) [0xfeda0000, 0xfedc0000) [0x1000,
> > > > > > 0x20000) ???
> > > > > >
> > > > > > The third HVA range is contained in the first one, but exposed under a
> > > > > > different GVA (aliased). This is not "flattened" by QEMU, as GPA does
> > > > > > not overlap, only HVA.
> > > > > >
> > > > > > At the third chunk unmap, the current algorithm finds the first chunk,
> > > > > > not the second one. This series is the way to tell the difference at
> > > > > > unmap time.
> > > > > >
> > > > > > [1] https://lists.nongnu.org/archive/html/qemu-devel/2024-04/msg00079.html
> > > > > >
> > > > > > Thanks!
> > > > >
> > > > > Ok, I was wondering if we need to store GPA(GIOVA) to HVA mappings in
> > > > > the iova tree to solve this issue completely. Then there won't be
> > > > > aliasing issues.
> > > > >
> > > >
> > > > I'm ok to explore that route but this has another problem. Both SVQ
> > > > vrings and CVQ buffers also need to be addressable by VhostIOVATree,
> > > > and they do not have GPA.
> > > >
> > > > At this moment vhost_svq_translate_addr is able to handle this
> > > > transparently as we translate vaddr to SVQ IOVA. How can we store
> > > > these new entries? Maybe a (hwaddr)-1 GPA to signal it has no GPA and
> > > > then a list to go through other entries (SVQ vaddr and CVQ buffers).
> > >
> > > This seems to be tricky.
> > >
> > > As discussed, it could be another iova tree.
> > >
> >
> > Yes but there are many ways to add another IOVATree. Let me expand & recap.
> >
> > Option 1 is to simply add another iova tree to VhostShadowVirtqueue.
> > Let's call it gpa_iova_tree, as opposed to the current iova_tree that
> > translates from vaddr to SVQ IOVA. To know which one to use is easy at
> > adding or removing, like in the memory listener, but how to know at
> > vhost_svq_translate_addr?
>
> Then we won't use virtqueue_pop() at all, we need a SVQ version of
> virtqueue_pop() to translate GPA to SVQ IOVA directly?
>
The problem is not virtqueue_pop, that's out of the
vhost_svq_translate_addr. The problem is the need of adding
conditionals / complexity in all the callers of
> >
> > The easiest way for me is to rely on memory_region_from_host(). When
> > vaddr is from the guest, it returns a valid MemoryRegion. When it is
> > not, it returns NULL. I'm not sure if this is a valid use case, it
> > just worked in my tests so far.
> >
> > Now we have the second problem: The GPA values of the regions of the
> > two IOVA tree must be unique. We need to be able to find unallocated
> > regions in SVQ IOVA. At this moment there is only one IOVATree, so
> > this is done easily by vhost_iova_tree_map_alloc. But it is very
> > complicated with two trees.
>
> Would it be simpler if we decouple the IOVA allocator? For example, we
> can have a dedicated gtree to track the allocated IOVA ranges. It is
> shared by both
>
> 1) Guest memory (GPA)
> 2) SVQ virtqueue and buffers
>
> And another gtree to track the GPA to IOVA.
>
> The SVQ code could use either
>
> 1) one linear mappings that contains both SVQ virtqueue and buffers
>
> or
>
> 2) dynamic IOVA allocation/deallocation helpers
>
> So we don't actually need the third gtree for SVQ HVA -> SVQ IOVA?
>
That's possible, but that scatters the IOVA handling code instead of
keeping it self-contained in VhostIOVATree.
> >
> > Option 2a is to add another IOVATree in VhostIOVATree. I think the
> > easiest way is to keep the GPA -> SVQ IOVA in one tree, let's call it
> > iova_gpa_map, and the current vaddr -> SVQ IOVA tree in
> > iova_taddr_map. This second tree should contain both vaddr memory that
> > belongs to the guest and host-only vaddr like vrings and CVQ buffers.
> >
> > How to pass the GPA to VhostIOVATree API? To add it to DMAMap is
> > complicated, as it is shared with intel_iommu. We can add new
> > functions to VhostIOVATree that accepts vaddr plus GPA, or maybe it is
> > enough with GPA only. It should be functions to add, remove, and
> > allocate new entries. But vaddr ones must be kept, because buffers
> > might be host-only.
> >
> > Then the caller can choose which version to call: for adding and
> > removing guest memory from the memory listener, the GPA variant.
> > Adding SVQ vrings and CVQ buffers should use the current vaddr
> > versions. vhost_svq_translate_addr still needs to use
> > memory_region_from_host() to know which one to call.
>
> So the idea is, for region_del/region_add use iova_gpa_map? For the
> SVQ translation, use the iova_taddr_map?
>
> I suspect we still need to synchronize with those two trees so it
> might be still problematic as iova_taddr_map contains the overlapped
> regions.
>
Right. All adding / removing functions with GPA must also update the
current iova_taddr_map too.
> >
> > Although I didn't like this approach because it complicates
> > VhostIOVATree, I think it is the easier way except for option 4, I'll
> > explain later.
> >
> > This has an extra advantage: currently, the lookup in
> > vhost_svq_translate_addr is linear, O(1). This would allow us to use
> > the tree properly.
>
> It uses g_tree_foreach() which I guess is not O(1).
>
I'm sorry I meant O(N).
> >
> > Option 2b could be to keep them totally separated. So current
> > VhostIOVATree->iova_taddr_map only contains host-only entries, and the
> > new iova_gpa_map containst the guest entries. I don't think this case
> > adds anything except less memory usage, as the gpa map (which should
> > be the fastest) will be the same size. Also, it makes it difficult to
> > implement vhost_iova_tree_map_alloc.
> >
> > Option 3 is to not add new functions but extend the current ones. That
> > would require special values of GPA values to indicate no GPA, like
> > SVQ vrings. I think option 2a is better, but this may help to keep the
> > interface simpler.
> >
> > Option 4 is what I'm proposing in this RFC. To store the GPA as map id
> > so we can tell if the vaddr corresponds to one SVQ IOVA or another.
> > Now I'm having trouble retrieving the memory section I see in the
> > memory listener. It should not be so difficult but. The main advantage
> > is not to duplicate data structs that are already in QEMU, but maybe
> > it is not worth the effort.
> >
> > Going further with this option, we could add a flag to ignore the .id
> > member added. But it adds more and more complexity to the API so I
> > would prefer option 2a for this.
>
> Thanks
>
> >
> > > Thanks
> > >
> > > >
> > > > Thanks!
> > > >
> > > > > Thanks
> > > > >
> > > > > >
> > > > > > > Thanks
> > > > > > >
> > > > > > > >
> > > > > > > > To solve this, track GPA in the DMA entry that acs as unique identifiers
> > > > > > > > to the maps. When the map needs to be removed, iova tree is able to
> > > > > > > > find the right one.
> > > > > > > >
> > > > > > > > Users that does not go to this extra layer of indirection can use the
> > > > > > > > iova tree as usual, with id = 0.
> > > > > > > >
> > > > > > > > This was found by Si-Wei Liu <si-wei.liu@oracle.com>, but I'm having a hard
> > > > > > > > time to reproduce the issue. This has been tested only without overlapping
> > > > > > > > maps. If it works with overlapping maps, it will be intergrated in the main
> > > > > > > > series.
> > > > > > > >
> > > > > > > > Comments are welcome. Thanks!
> > > > > > > >
> > > > > > > > Eugenio Pérez (2):
> > > > > > > > iova_tree: add an id member to DMAMap
> > > > > > > > vdpa: identify aliased maps in iova_tree
> > > > > > > >
> > > > > > > > hw/virtio/vhost-vdpa.c | 2 ++
> > > > > > > > include/qemu/iova-tree.h | 5 +++--
> > > > > > > > util/iova-tree.c | 3 ++-
> > > > > > > > 3 files changed, 7 insertions(+), 3 deletions(-)
> > > > > > > >
> > > > > > > > --
> > > > > > > > 2.44.0
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>
next prev parent reply other threads:[~2024-05-09 7:11 UTC|newest]
Thread overview: 37+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-04-10 10:03 [RFC 0/2] Identify aliased maps in vdpa SVQ iova_tree Eugenio Pérez
2024-04-10 10:03 ` [RFC 1/2] iova_tree: add an id member to DMAMap Eugenio Pérez
2024-04-18 20:46 ` Si-Wei Liu
2024-04-19 8:29 ` Eugenio Perez Martin
2024-04-19 23:49 ` Si-Wei Liu
2024-04-22 8:49 ` Eugenio Perez Martin
2024-04-23 22:20 ` Si-Wei Liu
2024-04-24 7:33 ` Eugenio Perez Martin
2024-04-25 17:43 ` Si-Wei Liu
2024-04-29 8:14 ` Eugenio Perez Martin
2024-04-29 11:19 ` Jonah Palmer
2024-04-30 18:11 ` Eugenio Perez Martin
2024-05-01 22:08 ` Si-Wei Liu
2024-05-02 6:18 ` Eugenio Perez Martin
2024-05-07 9:12 ` Si-Wei Liu
2024-04-30 5:54 ` Si-Wei Liu
2024-04-30 17:19 ` Eugenio Perez Martin
2024-05-01 23:13 ` Si-Wei Liu
2024-05-02 6:44 ` Eugenio Perez Martin
2024-05-08 0:52 ` Si-Wei Liu
2024-05-08 15:25 ` Eugenio Perez Martin
2024-04-10 10:03 ` [RFC 2/2] vdpa: identify aliased maps in iova_tree Eugenio Pérez
2024-04-12 6:46 ` [RFC 0/2] Identify aliased maps in vdpa SVQ iova_tree Jason Wang
2024-04-12 7:56 ` Eugenio Perez Martin
2024-05-07 7:29 ` Jason Wang
2024-05-07 10:56 ` Eugenio Perez Martin
2024-05-08 2:29 ` Jason Wang
2024-05-08 17:15 ` Eugenio Perez Martin
2024-05-09 6:27 ` Jason Wang
2024-05-09 7:10 ` Eugenio Perez Martin [this message]
2024-05-10 4:28 ` Jason Wang
2024-05-10 7:16 ` Eugenio Perez Martin
2024-05-11 4:00 ` Jason Wang
2024-05-13 6:27 ` Eugenio Perez Martin
2024-05-13 8:28 ` Jason Wang
2024-05-13 9:56 ` Eugenio Perez Martin
2024-05-14 3:56 ` Jason Wang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAJaqyWdA_6Mx3mkcobmBjB5NDJt3tyqTJv2JijF0agnnBFxQxw@mail.gmail.com \
--to=eperezma@redhat.com \
--cc=dtatulea@nvidia.com \
--cc=jasowang@redhat.com \
--cc=jonah.palmer@oracle.com \
--cc=leiyang@redhat.com \
--cc=mst@redhat.com \
--cc=peterx@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=si-wei.liu@oracle.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).