Re: [RFC v2 1/5] virtio: Initialize sequence variables

virtio-fs.lists.linux.dev archive mirror
 help / color / mirror / Atom feed

From: Eugenio Perez Martin <eperezma@redhat.com>
To: Jonah Palmer <jonah.palmer@oracle.com>
Cc: qemu-devel@nongnu.org, mst@redhat.com, raphael@enfabrica.net,
	 kwolf@redhat.com, hreitz@redhat.com, jasowang@redhat.com,
	pbonzini@redhat.com,  fam@euphon.net, stefanha@redhat.com,
	qemu-block@nongnu.org,  schalla@marvell.com, leiyang@redhat.com,
	virtio-fs@lists.linux.dev,  si-wei.liu@oracle.com,
	boris.ostrovsky@oracle.com
Subject: Re: [RFC v2 1/5] virtio: Initialize sequence variables
Date: Fri, 5 Apr 2024 17:04:32 +0200	[thread overview]
Message-ID: <CAJaqyWdZQpYUK=bNe1UsV-5F+U=JC3rkN6YEYxid-Zmha+WCpQ@mail.gmail.com> (raw)
In-Reply-To: <c1e78d58-53bb-4091-b775-3d4a004a8bfd@oracle.com>

On Fri, Apr 5, 2024 at 3:59 PM Jonah Palmer <jonah.palmer@oracle.com> wrote:
>
>
>
> On 4/4/24 12:33 PM, Eugenio Perez Martin wrote:
> > On Thu, Apr 4, 2024 at 4:42 PM Jonah Palmer <jonah.palmer@oracle.com> wrote:
> >>
> >>
> >>
> >> On 4/4/24 7:35 AM, Eugenio Perez Martin wrote:
> >>> On Wed, Apr 3, 2024 at 6:51 PM Jonah Palmer <jonah.palmer@oracle.com> wrote:
> >>>>
> >>>>
> >>>>
> >>>> On 4/3/24 6:18 AM, Eugenio Perez Martin wrote:
> >>>>> On Thu, Mar 28, 2024 at 5:22 PM Jonah Palmer <jonah.palmer@oracle.com> wrote:
> >>>>>>
> >>>>>> Initialize sequence variables for VirtQueue and VirtQueueElement
> >>>>>> structures. A VirtQueue's sequence variables are initialized when a
> >>>>>> VirtQueue is being created or reset. A VirtQueueElement's sequence
> >>>>>> variable is initialized when a VirtQueueElement is being initialized.
> >>>>>> These variables will be used to support the VIRTIO_F_IN_ORDER feature.
> >>>>>>
> >>>>>> A VirtQueue's used_seq_idx represents the next expected index in a
> >>>>>> sequence of VirtQueueElements to be processed (put on the used ring).
> >>>>>> The next VirtQueueElement added to the used ring must match this
> >>>>>> sequence number before additional elements can be safely added to the
> >>>>>> used ring. It's also particularly useful for helping find the number of
> >>>>>> new elements added to the used ring.
> >>>>>>
> >>>>>> A VirtQueue's current_seq_idx represents the current sequence index.
> >>>>>> This value is essentially a counter where the value is assigned to a new
> >>>>>> VirtQueueElement and then incremented. Given its uint16_t type, this
> >>>>>> sequence number can be between 0 and 65,535.
> >>>>>>
> >>>>>> A VirtQueueElement's seq_idx represents the sequence number assigned to
> >>>>>> the VirtQueueElement when it was created. This value must match with the
> >>>>>> VirtQueue's used_seq_idx before the element can be put on the used ring
> >>>>>> by the device.
> >>>>>>
> >>>>>> Signed-off-by: Jonah Palmer <jonah.palmer@oracle.com>
> >>>>>> ---
> >>>>>>     hw/virtio/virtio.c         | 18 ++++++++++++++++++
> >>>>>>     include/hw/virtio/virtio.h |  1 +
> >>>>>>     2 files changed, 19 insertions(+)
> >>>>>>
> >>>>>> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> >>>>>> index fb6b4ccd83..069d96df99 100644
> >>>>>> --- a/hw/virtio/virtio.c
> >>>>>> +++ b/hw/virtio/virtio.c
> >>>>>> @@ -132,6 +132,10 @@ struct VirtQueue
> >>>>>>         uint16_t used_idx;
> >>>>>>         bool used_wrap_counter;
> >>>>>>
> >>>>>> +    /* In-Order sequence indices */
> >>>>>> +    uint16_t used_seq_idx;
> >>>>>> +    uint16_t current_seq_idx;
> >>>>>> +
> >>>>>
> >>>>> I'm having a hard time understanding the difference between these and
> >>>>> last_avail_idx and used_idx. It seems to me if we replace them
> >>>>> everything will work? What am I missing?
> >>>>>
> >>>>
> >>>> For used_seq_idx, it does work like used_idx except the difference is
> >>>> when their values get updated, specifically for the split VQ case.
> >>>>
> >>>> As you know, for the split VQ case, the used_idx is updated during
> >>>> virtqueue_split_flush. However, imagine a batch of elements coming in
> >>>> where virtqueue_split_fill is called multiple times before
> >>>> virtqueue_split_flush. We want to make sure we write these elements to
> >>>> the used ring in-order and we'll know its order based on used_seq_idx.
> >>>>
> >>>> Alternatively, I thought about replicating the logic for the packed VQ
> >>>> case (where this used_seq_idx isn't used) where we start looking at
> >>>> vq->used_elems[vq->used_idx] and iterate through until we find a used
> >>>> element, but I wasn't sure how to handle the case where elements get
> >>>> used (written to the used ring) and new elements get put in used_elems
> >>>> before the used_idx is updated. Since this search would require us to
> >>>> always start at index vq->used_idx.
> >>>>
> >>>> For example, say, of three elements getting filled (elem0 - elem2),
> >>>> elem1 and elem0 come back first (vq->used_idx = 0):
> >>>>
> >>>> elem1 - not in-order
> >>>> elem0 - in-order, vq->used_elems[vq->used_idx + 1] (elem1) also now
> >>>>            in-order, write elem0 and elem1 to used ring, mark elements as
> >>>>            used
> >>>>
> >>>> Then elem2 comes back, but vq->used_idx is still 0, so how do we know to
> >>>> ignore the used elements at vq->used_idx (elem0) and vq->used_idx + 1
> >>>> (elem1) and iterate to vq->used_idx + 2 (elem2)?
> >>>>
> >>>> Hmm... now that I'm thinking about it, maybe for the split VQ case we
> >>>> could continue looking through the vq->used_elems array until we find an
> >>>> unused element... but then again how would we (1) know if the element is
> >>>> in-order and (2) know when to stop searching?
> >>>>
> >>>
> >>> Ok I think I understand the problem now. It is aggravated if we add
> >>> chained descriptors to the mix.
> >>>
> >>> We know that the order of used descriptors must be the exact same as
> >>> the order they were made available, leaving out in order batching.
> >>> What if vq->used_elems at virtqueue_pop and then virtqueue_push just
> >>> marks them as used somehow? Two booleans (or flag) would do for a
> >>> first iteration.
> >>>
> >>> If we go with this approach I think used_elems should be renamed actually.
> >>>
> >>
> >> If I'm understanding correctly, I don't think adding newly created
> >> elements to vq->used_elems at virtqueue_pop will do much for us.
> >
> > By knowing what descriptor id must go in each position of the used ring.
> >
> > Following your example, let's say avail_idx is 10 at that moment.
> > Then, the driver makes available the three elements you mention, so:
> > used_elems[10] = elem0
> > used_elems[11] = elem1
> > used_elems[12] = elem2
> >
> > Now the device uses elem1. virtqueue_push can search linearly for
> > elem->index in used_elems[used_idx]...used_elems[avail_idx] range. As
> > the device is mis-behaving, no need to optimize this behavior.
> > used_elems[11].index == elem->index, so we mark it as used somehow.
> > Let's say we add a boolean to VirtQueueElement.
> >
> > virtqueue_flush can scan used_elems[used_idx]...used_elems[avail_idx]
> > looking for used elements. At this moment used_elems[used_idx] is not
> > used so it stops there.
> >
> > Now elem0 is pushed. It is the first one in the
> > used_elems[used_idx]...used_elems[avail_idx] range, so we can write it
> > to the used ring at fill. used_idx++. We use the rest of the
> > descriptor until we find one in used_elems that is not used, which is
> > used_elems[12].
> >
> > After that virtqueue_flush is called. At its scanning, used_elems[10]
> > is used, so it writes it to the used ring. After that, used_elems[11]
> > is also used, so it is written also. used_elems[12] is not used, so it
> > stops there.
> >
> > Finally, elem2 is pushed, so used_elems[12] is written.
> > virtqueue_flush detects it, so it writes it to the guest's used ring.
> >
> > Let me know what you think. I find it simpler than declaring new
> > indexes, but I may be wrong.
> >
>
> I think I see where you're getting at, but I just have a few clarifying
> questions about your proposal here.
>
> So you're proposing to add entries to used_elems at virtqueue_pop, ok.
>
> avail_idx = 10, then the driver makes some new entries (elems) available
> in the avail ring:
>
> used_elems[10] = elem0
> used_elems[11] = elem1
> used_elems[12] = elem2
>
> At this point, avail_idx = 13, used_idx = 10.
>
> elem1 gets used first, ok.
>
> Now, if I'm understanding correctly, you're saying that in
> virtqueue_push explicitly (not virtqueue_fill/virtqueue_flush), we scan
> used_elems[used_idx] - used_elems[avail_idx] to find used_elems[i].index
> == elem->index and mark it as used, e.g. used_elems[i].used = true.
> Okay, now used_elems[11] has been marked as used.
>
> Now we make it to virtqueue_fill. What's the role you want
> virtqueue_fill to take here (if any)?
>

Sorry I meant virtqueue_fill here.

> You say that after we mark this element as used, we go to
> virtqueue_flush and scan for used elements in used_elems[used_idx] -
> used_elems[avail_idx]. Of course, the first one of this order will be in
> used_elems[used_idx], which is currently showing the element as unused,
> so we're done with this element for now.
>
> So, what exactly is the role of virtqueue_flush here? I'm inferring here
> that you want the virtqueue_flush role (for split VQs) to do both the
> writing to the used ring (normally done in virtqueue_fill) as well as
> updating the used_idx (normally done in virtqueue_flush). Is this correct?
>

I modelled this after the packed vq scenario, where all is updated at
_flush. But yes, I think you got it totally right.

> Next, elem0 gets used second.
>
> Again, in virtqueue_push we scan scan used_elems[used_idx] -
> used_elems[avail_idx] to find used_elems[i].index == elem->index and
> mark it as used. Okay, used_elems[10] has been marked as used.
>
> Then you say, "It is the first one in the used_elems[used_idx] -
> used_elems[avail_idx] range, so we can write it to the used ring at
> fill. used_idx++. We use the rest of the descriptor until we find one in
> used_elems that is not used, which is used_elems[12]."
>
> This, to me, sounds like "in virtqueue_fill, when we find an order (e.g.
> used_elems[used_idx].index == elem->index) write it to the used ring AND
> increment the used_idx. Keep writing elements to the used ring if
> used_elems[used_idx].used == true and, for each element being written,
> incremented used_idx."
>
> This is a bit confusing to me since next you say "After that,
> virtqueue_flush is called. At its scanning, used_elems[10] is used, so
> it writes it to the used ring. After that, used_elems[11] is also used,
> so it is written also. used_elems[12] is not used, so it stops there."
>
> This sounds very similar to what you proposed for virtqueue_fill, except
> it looks like you're also saying to do this in virtqueue_flush, hence my
> confusion.
>
> If you wouldn't mind, could you clarify the roles of virtqueue_fill and
> virtqueue_flush here for me? Thanks :)!
>

I see how they're confusing if following the split vq way, sorry!
* _fill: Only update used_elems (or equivalent)
* _flush: Write information to used ring or descriptor ring.

> > This makes it difficult to actually batch used descriptors. My
> > proposal is to address it in another series, by delegating it to the
> > caller and recovering proper usage of virtqueue_push(idx) parameter.
> > The caller can specify either to batch as usual, or to delegate the
> > automatic (and potentially inefficient) ordering I'm proposing here.
> >
>
> Just to be clear, for this series, you'd like me to implement a solution
> that does *not* consider the case where virtqueue_fill is called
> multiple times before virtqueue_flush (and to put a solution for this in
> a separate series)?
>

No, it is supported. It is just not very efficient because of the linear search.

For it to be supported properly the caller should indicate
virtqueue_fill idx properly. But that requires modifications to all
devices, so I'm proposing to do it on top.

> Are we not concerned that we might shoot ourselves in the foot here by
> implementing a process that may not work well for a batching solution,
> especially when we have an almost-working solution for batching and
> non-batching cases?
>
> >> We
> >> could just keep adding processed elements to vq->used_elems at
> >> virtqueue_fill but instead of:
> >>
> >> vq->used_elems[seq_idx].in_num = elem->in_num;
> >> vq->used_elems[seq_idx].out_num = elem->out_num;
> >>
> >> We could do:
> >>
> >> vq->used_elems[seq_idx].in_num = 1;
> >> vq->used_elems[seq_idx].out_num = 1;
> >>
> >> We'd use in_num and out_num as separate flags. in_num could indicate if
> >> this element has been written to the used ring while out_num could
> >> indicate if this element has been flushed (1 for no, 0 for yes). In
> >> other words, when we go to write to the used ring, start at index
> >> vq->used_idx and iterate through the used elements.
> >>
> >> If a used element's in_num and out_num == 0, then this element is
> >> invalid (not yet processed) and we stop the search.
> >>
> >> If a used element's in_num and out_num == 1, then this element is valid,
> >> written to the used ring, in_num is set to 0, and the search continues.
> >>
> >> Lastly, if a used element's in_num == 0 but out_num == 1, then this
> >> element has already been written to the used ring but not yet flushed,
> >> so ignore this element and continue searching.
> >>
> >> There should never be a case where in_num == 1 and out_num == 0.
> >>
> >> However, this would probably mean that before (or right after) we
> >> actually perform the flush we'll have to iterate through the used_elems
> >> array one more time and set their out_num's to 0 to indicate the element
> >> has been flushed.
> >>
> >> Again, this is just for the batched split VQ case where we have to keep
> >> track of elements that have been written but not flushed and elements
> >> that have been written and flushed, given that we don't know which
> >> elements have actually been written to the used ring until the used_idx
> >> is updated.
> >>
> >> This approach appears more costly though if we're really trying to avoid
> >> having this new used_seq_idx VirtQueue member.
> >>
> >>>> In any case, the use of this variable could be seen as an optimization
> >>>> as its value will tell us where to start looking in vq->used_elems
> >>>> instead of always starting at vq->used_idx.
> >>>>
> >>>> If this is like a one-shot scenario where one element gets written and
> >>>> then flushed after, then yes in this case used_seq_idx == used_idx.
> >>>>
> >>>> ------
> >>>>
> >>>> For current_seq_idx, this is pretty much just a counter. Every new
> >>>> VirtQueueElement created from virtqueue_pop is given a number and the
> >>>> counter is incremented. Like grabbing a ticket number and waiting for
> >>>> your number to be called. The next person to grab a ticket number will
> >>>> be your number + 1.
> >>>>
> >>>
> >>> So it's like last_avail_idx, isn't it?
> >>>
> >>
> >> For the split VQ case, pretty much. Though if we hit this case in
> >> virtqueue_split_pop, we may get into some trouble:
> >>
> >> if (!virtqueue_get_head(vq, vq->last_avail_idx++, &head)) {
> >>       goto done;
> >> }
> >>
> >
> > That's a fatal mistake and the device breaks, vdev->broken = true. It
> > cannot be used anymore from that point on, because of all the checks
> > of that variable.
> >
> > Does that solve the problem?
> >
>
> Ah, it does. My apologies, I should've recognized this would result in
> the device breaking.
>
> >> However for the packed VQ case, last_avail_idx might not work in the way
> >> we'd need it to for this implementation. In virtqueue_packed_pop, we see
> >> this:
> >>
> >> elem->ndescs = (desc_cache == &indirect_desc_cache) ? 1 : elem_entries;
> >> vq->last_avail_idx += elem->ndescs;
> >>
> >> It would appear as though last_avail_idx is incremented by total number
> >> of descriptors associated with the element, which can be greater than 1.
> >> This implementation increments by 1 for each element.
> >>
> >> Actually... It's good you mentioned this because I think my packed VQ
> >> implementation is wrong. For packed VQs, vq->used_idx is incremented by
> >> the total number of descriptors in the flushed elements and not
> >> necessarily the number of elements being flushed like in the split VQ
> >> case. I'm adding elements to vq->used_elems in a per-element sequence
> >> rather than going by the number of descriptors an element holds, which
> >> should be the case for packed VQs.
> >>
> >
> > If you keep it by your proposed index I think you can increase it one
> > per head, as they are the entries that are written in both cases.
> > unsed_idx should increment properly already.
> >
> > If you move to my proposal, both should increase by elem->ndescs as
> > you suggest here.
> >
>
> Ack! Thanks!
>
> >>>> Let me know if I'm making any sense. Thanks :)
> >>>>
> >>>> Jonah
> >>>>
> >>>>>>         /* Last used index value we have signalled on */
> >>>>>>         uint16_t signalled_used;
> >>>>>>
> >>>>>> @@ -1621,6 +1625,11 @@ static void *virtqueue_split_pop(VirtQueue *vq, size_t sz)
> >>>>>>             elem->in_sg[i] = iov[out_num + i];
> >>>>>>         }
> >>>>>>
> >>>>>> +    /* Assign sequence index for in-order processing */
> >>>>>> +    if (virtio_vdev_has_feature(vdev, VIRTIO_F_IN_ORDER)) {
> >>>>>> +        elem->seq_idx = vq->current_seq_idx++;
> >>>>>> +    }
> >>>>>> +
> >>>>>>         vq->inuse++;
> >>>>>>
> >>>>>>         trace_virtqueue_pop(vq, elem, elem->in_num, elem->out_num);
> >>>>>> @@ -1760,6 +1769,11 @@ static void *virtqueue_packed_pop(VirtQueue *vq, size_t sz)
> >>>>>>         vq->shadow_avail_idx = vq->last_avail_idx;
> >>>>>>         vq->shadow_avail_wrap_counter = vq->last_avail_wrap_counter;
> >>>>>>
> >>>>>> +    /* Assign sequence index for in-order processing */
> >>>>>> +    if (virtio_vdev_has_feature(vdev, VIRTIO_F_IN_ORDER)) {
> >>>>>> +        elem->seq_idx = vq->current_seq_idx++;
> >>>>>> +    }
> >>>>>> +
> >>>>>>         trace_virtqueue_pop(vq, elem, elem->in_num, elem->out_num);
> >>>>>>     done:
> >>>>>>         address_space_cache_destroy(&indirect_desc_cache);
> >>>>>> @@ -2087,6 +2101,8 @@ static void __virtio_queue_reset(VirtIODevice *vdev, uint32_t i)
> >>>>>>         vdev->vq[i].notification = true;
> >>>>>>         vdev->vq[i].vring.num = vdev->vq[i].vring.num_default;
> >>>>>>         vdev->vq[i].inuse = 0;
> >>>>>> +    vdev->vq[i].used_seq_idx = 0;
> >>>>>> +    vdev->vq[i].current_seq_idx = 0;
> >>>>>>         virtio_virtqueue_reset_region_cache(&vdev->vq[i]);
> >>>>>>     }
> >>>>>>
> >>>>>> @@ -2334,6 +2350,8 @@ VirtQueue *virtio_add_queue(VirtIODevice *vdev, int queue_size,
> >>>>>>         vdev->vq[i].vring.align = VIRTIO_PCI_VRING_ALIGN;
> >>>>>>         vdev->vq[i].handle_output = handle_output;
> >>>>>>         vdev->vq[i].used_elems = g_new0(VirtQueueElement, queue_size);
> >>>>>> +    vdev->vq[i].used_seq_idx = 0;
> >>>>>> +    vdev->vq[i].current_seq_idx = 0;
> >>>>>>
> >>>>>>         return &vdev->vq[i];
> >>>>>>     }
> >>>>>> diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
> >>>>>> index b3c74a1bca..910b2a3427 100644
> >>>>>> --- a/include/hw/virtio/virtio.h
> >>>>>> +++ b/include/hw/virtio/virtio.h
> >>>>>> @@ -75,6 +75,7 @@ typedef struct VirtQueueElement
> >>>>>>         hwaddr *out_addr;
> >>>>>>         struct iovec *in_sg;
> >>>>>>         struct iovec *out_sg;
> >>>>>> +    uint16_t seq_idx;
> >>>>>>     } VirtQueueElement;
> >>>>>>
> >>>>>>     #define VIRTIO_QUEUE_MAX 1024
> >>>>>> --
> >>>>>> 2.39.3
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> >
>

next prev parent reply	other threads:[~2024-04-05 15:05 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-28 16:21 [RFC v2 0/5] virtio,vhost: Add VIRTIO_F_IN_ORDER support Jonah Palmer
2024-03-28 16:21 ` [RFC v2 1/5] virtio: Initialize sequence variables Jonah Palmer
2024-04-03 10:18   ` Eugenio Perez Martin
2024-04-03 16:51     ` Jonah Palmer
2024-04-04 11:35       ` Eugenio Perez Martin
2024-04-04 14:41         ` Jonah Palmer
2024-04-04 16:33           ` Eugenio Perez Martin
2024-04-05 13:58             ` Jonah Palmer
2024-04-05 15:04               ` Eugenio Perez Martin [this message]
2024-04-05 15:37                 ` Jonah Palmer
2024-03-28 16:22 ` [RFC v2 2/5] virtio: In-order support for split VQs Jonah Palmer
2024-03-28 16:22 ` [RFC v2 3/5] virtio: In-order support for packed VQs Jonah Palmer
2024-03-28 16:22 ` [RFC v2 4/5] vhost,vhost-user: Add VIRTIO_F_IN_ORDER to vhost feature bits Jonah Palmer
2024-03-28 16:22 ` [RFC v2 5/5] virtio: Add VIRTIO_F_IN_ORDER property definition Jonah Palmer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAJaqyWdZQpYUK=bNe1UsV-5F+U=JC3rkN6YEYxid-Zmha+WCpQ@mail.gmail.com' \
    --to=eperezma@redhat.com \
    --cc=boris.ostrovsky@oracle.com \
    --cc=fam@euphon.net \
    --cc=hreitz@redhat.com \
    --cc=jasowang@redhat.com \
    --cc=jonah.palmer@oracle.com \
    --cc=kwolf@redhat.com \
    --cc=leiyang@redhat.com \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=raphael@enfabrica.net \
    --cc=schalla@marvell.com \
    --cc=si-wei.liu@oracle.com \
    --cc=stefanha@redhat.com \
    --cc=virtio-fs@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).