QEMU-Devel Archive mirror
 help / color / mirror / Atom feed
From: Eugenio Perez Martin <eperezma@redhat.com>
To: Sahil <icegambit91@gmail.com>
Cc: qemu-level <qemu-devel@nongnu.org>,
	Stefano Garzarella <sgarzare@redhat.com>
Subject: Re: Intention to work on GSoC project
Date: Wed, 3 Apr 2024 20:37:49 +0200	[thread overview]
Message-ID: <CAJaqyWc+8OJZ33TtqeBy+Vy9HdW8zzbMKqg2mNWVaFda=g0XBA@mail.gmail.com> (raw)
In-Reply-To: <1934013.taCxCBeP46@valdaarhun>

On Wed, Apr 3, 2024 at 4:36 PM Sahil <icegambit91@gmail.com> wrote:
>
> Hi,
>
> Thank you for the reply.
>
> On Tuesday, April 2, 2024 5:08:24 PM IST Eugenio Perez Martin wrote:
> > [...]
> > > > > Q2.
> > > > > In the Red Hat article, just below the first listing ("Memory layout of a
> > > > > packed virtqueue descriptor"), there's the following line referring to the
> > > > > buffer id in "virtq_desc":
> > > > > > This time, the id field is not an index for the device to look for the
> > > > > > buffer: it is an opaque value for it, only has meaning for the driver.
> > > > >
> > > > > But the device returns the buffer id when it writes the used descriptor to
> > > > > the descriptor ring. The "only has meaning for the driver" part has got me
> > > > > a little confused. Which buffer id is this that the device returns? Is it related
> > > > > to the buffer id in the available descriptor?
> > > >
> > > > In my understanding, buffer id is the element that avail descriptor
> > > > marks to identify when adding descriptors to table. Device will returns
> > > > the buffer id in the processed descriptor or the last descriptor in a
> > > > chain, and write it to the descriptor that used idx refers to (first
> > > > one in the chain). Then used idx increments.
> > > >
> > > > The Packed Virtqueue blog [1] is helpful, but some details in the
> > > > examples
> > > > are making me confused.
> > > >
> > > > Q1.
> > > > In the last step of the two-entries descriptor table example, it says
> > > > both buffers #0 and #1 are available for the device. I understand
> > > > descriptor[0] is available and descriptor[1] is not, but there is no ID #0
> > > > now. So does the device got buffer #0 by notification beforehand? If so,
> > > > does it mean buffer #0 will be lost when notifications are disabled?
> >
> > I guess you mean the table labeled "Figure: Full two-entries descriptor
> > table".
> >
> > Take into account that the descriptor table is not the state of all
> > the descriptors. That information must be maintained by the device and
> > the driver internally.
> >
> > The descriptor table is used as a circular buffer, where one part is
> > writable by the driver and the other part is writable by the device.
> > For the device to override the descriptor table entry where descriptor
> > id 0 used to be does not mean that the descriptor id 0 is used. It
> > just means that the device communicates to the driver that descriptor
> > 1 is used, and both sides need to keep the descriptor state
> > coherently.
> >
> > > I too have a similar question and understanding the relation between
> > > buffer
> > > ids in the used and available descriptors might give more insight into
> > > this. For available descriptors, the buffer id is used to associate
> > > descriptors with a particular buffer. I am still not very sure about ids
> > > in used descriptors.
> > >
> > > Regarding Q1, both buffers #0 and #1 are available. In the mentioned
> > > figure, both descriptor[0] and descriptor[1] are available. This figure
> > > follows the figure with the caption "Using first buffer out of order". So
> > > in the first figure the device reads buffer #1 and writes the used
> > > descriptor but it still has buffer #0 to read. That still belongs to the
> > > device while buffer #1 can now be handled by the driver once again. So in
> > > the next figure, the driver makes buffer #1 available again. The device
> > > can still read buffer #0 from the previous batch of available
> > > descriptors.
> > >
> > > Based on what I have understood, the driver can't touch the descriptor
> > > corresponding to buffer #0 until the device acknowledges it. I did find
> > > the
> > > figure a little confusing as well. I think once the meaning of buffer id
> > > is clear from the driver's and device's perspective, it'll be easier to
> > > understand the figure.
> >
> > I think you got it right. Please let me know if you have further questions.
>
> I would like to clarify one thing in the figure "Full two-entries descriptor
> table". The driver can only overwrite a used descriptor in the descriptor
> ring, right?

Except for the first round, the driver can only write to used entries
in the descriptor table. In other words, their avail and used flags
must be equal.

> And likewise for the device?

Yes, but with avail descs. I think you got this already, but I want to
be as complete as possible here.

> So in the figure, the driver will
> have to wait until descriptor[1] is used before it can overwrite it?
>

Yes, but I think it is easier to think that both descriptor id 0 and 1
are available already. The descriptor id must be less than virtqueue
size.

An entry with a valid buffer and length must be invalid because of the
descriptor id in that situation, either because it is a number > vq
length or because it is a descriptor already available.

> Suppose the device marks descriptor[0] as used. I think the driver will
> not be able to overwrite that descriptor entry because it has to go in
> order and is at descriptor[1]. Is that correct?

The device must write one descriptor as used, either 0 or 1, at
descriptors[0] as all the descriptors are available.

Now, it does not matter if the device marks as used one or the two
descriptors: the driver must write its next available descriptor at
descriptor[1]. This is not because descriptor[1] contains a special
field or data, but because the driver must write the avail descriptors
sequentially, so the device knows the address to poll or check after a
notification.

In other words, descriptor[1] is just a buffer space from the driver
to communicate an available descriptor to the device. It does not
matter what it contained before the writing, as the driver must
process that information before writing the new available descriptor.

> Is it possible for the driver
> to go "backwards" in the descriptor ring?
>

Nope, under any circumstance.

> > > I am also not very sure about what happens when notifications are
> > > disabled.
> > > I'll have to read up on that again. But I believe the driver still won't
> > > be able to touch #0 until the device uses it.
> >
> > If one side disables notification it needs to check the indexes or the
> > flags by its own means: Timers, read the memory in a busy loop, etc.
>
> Understood. Thank you for the clarification.
>
> I have some questions from the "Virtio live migration technical deep
> dive" article [1].
>
> Q1.
> In the paragraph just above Figure 6, there is the following line:
> > the vhost kernel thread and QEMU may run in different CPU threads,
> > so these writes must be synchronized with QEMU cleaning of the dirty
> > bitmap, and this write must be seen strictly after the modifications of
> > the guest memory by the QEMU thread.
>
> I am not clear on the last part of the statement. The modification of guest
> memory is being done by the vhost device and not by the QEMU thread, right?
>

QEMU also writes to the bitmap cleaning it, so it knows the memory
does not need to be resent.

Feel free to ask questions about this, but you don't need to interact
with the dirty bitmap in the project.

> Q2.
> In the first point of the "Dynamic device state: virtqueue state" section:
> >The guest makes available N descriptors at the source of the migration,
> >so its avail idx member in the avail idx is N.
>
> I think there's a typo here: "...avail idx member in the avail ring is N"
> instead of "...avail idx is N".
>

Good catch! I'll fix it, thanks!

> Regarding the implementation of this project, can the project be broken
> down into two parts:
> 1. implementing packed virtqueues in QEMU, and

Right, but let me expand on this: QEMU already supports packed
virtqueue in an emulated device (hw/virtio/virtio.c). The missing part
is the "driver" one, to be able to communicate with a vDPA device, at
hw/virtio/vhost-shadow-virtqueue.c.

> 2. providing mechanisms for (live) migration to work with packed
>     virtqueues.
>

The first part implements packed vq without live migration and then
test for live migration on top, yes. But if everything is right, this
step should require no changes from 1 actually. It is totally ok if we
need to dedicate time to this of course.

> I am ready to start working on the implementation. In one of your
> previous emails you had talked about moving packed virtqueue
> related implementation from the kernel's drivers/virtio/virtio_ring.c
> into vhost_shadow_virtqueue.c.
>

Right.

> My plan is to also understand how split virtqueue has been implemented
> in QEMU. I think that'll be helpful when moving the kernel's implementation
> to QEMU.
>

Sure, the split virtqueue is implemented in the same file
vhost_shadow_virtqueue.c. If you deploy vhost_vdpa +vdpa_sim or
vp_vdpa [1][2], you can:
* Run QEMU with -netdev type=vhost-vdpa,x-svq=on
* Set GDB breakpoint in interesting functions like
vhost_handle_guest_kick and vhost_svq_flush.

Thanks!

> Please let me know if I should change my approach.
>
> Thanks,
> Sahil
>
>



  reply	other threads:[~2024-04-03 18:39 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-25 21:38 Intention to work on GSoC project Sahil
2024-02-29  8:32 ` Stefano Garzarella
2024-02-29 19:02   ` Sahil
2024-03-01  7:40   ` Eugenio Perez Martin
2024-03-01 18:29     ` Sahil
2024-03-14 15:09       ` Eugenio Perez Martin
2024-03-15  7:14         ` Sahil
2024-03-15 11:27           ` Eugenio Perez Martin
2024-03-16 20:26             ` Sahil
2024-03-18 19:47               ` Sahil
2024-03-20 16:29                 ` Eugenio Perez Martin
2024-03-25 13:20                   ` Sahil
2024-04-01 18:23                     ` daleyoung4242
2024-04-02  4:58                       ` Sahil
2024-04-02 11:38                         ` Eugenio Perez Martin
2024-04-03 14:36                           ` Sahil
2024-04-03 18:37                             ` Eugenio Perez Martin [this message]
2024-04-04 19:06                               ` Sahil
2024-04-14 18:52                                 ` Sahil
2024-04-15  8:57                                   ` Eugenio Perez Martin
2024-04-15 19:42                                     ` Sahil
2024-04-16  8:41                                       ` Eugenio Perez Martin
2024-04-17  4:23                                         ` Sahil
2024-05-06 19:00                                           ` Sahil
2024-05-07  7:14                                             ` Eugenio Perez Martin
2024-05-08  3:23                                               ` Sahil
2024-05-13 13:49                                                 ` Sahil
2024-05-13 14:23                                                   ` Eugenio Perez Martin
2024-05-13 18:04                                                     ` Sahil
2024-03-20 15:57               ` Eugenio Perez Martin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAJaqyWc+8OJZ33TtqeBy+Vy9HdW8zzbMKqg2mNWVaFda=g0XBA@mail.gmail.com' \
    --to=eperezma@redhat.com \
    --cc=icegambit91@gmail.com \
    --cc=qemu-devel@nongnu.org \
    --cc=sgarzare@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).