From: Stefan Hajnoczi <stefanha@redhat.com>
To: Kevin Wolf <kwolf@redhat.com>
Cc: qemu-devel@nongnu.org, Michael Roth <michael.roth@amd.com>,
"Michael S. Tsirkin" <mst@redhat.com>,
Markus Armbruster <armbru@redhat.com>,
Hanna Reitz <hreitz@redhat.com>,
qemu-block@nongnu.org,
Manos Pitsidianakis <manos.pitsidianakis@linaro.org>
Subject: Re: [PATCH v2 5/5] monitor: use aio_co_reschedule_self()
Date: Mon, 6 May 2024 14:09:08 -0400 [thread overview]
Message-ID: <20240506180908.GB7375@fedora.redhat.com> (raw)
In-Reply-To: <ZjUf3TkiBO1N264C@redhat.com>
[-- Attachment #1: Type: text/plain, Size: 3395 bytes --]
On Fri, May 03, 2024 at 07:33:17PM +0200, Kevin Wolf wrote:
> Am 06.02.2024 um 20:06 hat Stefan Hajnoczi geschrieben:
> > The aio_co_reschedule_self() API is designed to avoid the race
> > condition between scheduling the coroutine in another AioContext and
> > yielding.
> >
> > The QMP dispatch code uses the open-coded version that appears
> > susceptible to the race condition at first glance:
> >
> > aio_co_schedule(qemu_get_aio_context(), qemu_coroutine_self());
> > qemu_coroutine_yield();
> >
> > The code is actually safe because the iohandler and qemu_aio_context
> > AioContext run under the Big QEMU Lock. Nevertheless, set a good example
> > and use aio_co_reschedule_self() so it's obvious that there is no race.
> >
> > Suggested-by: Hanna Reitz <hreitz@redhat.com>
> > Reviewed-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org>
> > Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
> > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> > ---
> > qapi/qmp-dispatch.c | 7 ++-----
> > 1 file changed, 2 insertions(+), 5 deletions(-)
> >
> > diff --git a/qapi/qmp-dispatch.c b/qapi/qmp-dispatch.c
> > index 176b549473..f3488afeef 100644
> > --- a/qapi/qmp-dispatch.c
> > +++ b/qapi/qmp-dispatch.c
> > @@ -212,8 +212,7 @@ QDict *coroutine_mixed_fn qmp_dispatch(const QmpCommandList *cmds, QObject *requ
> > * executing the command handler so that it can make progress if it
> > * involves an AIO_WAIT_WHILE().
> > */
> > - aio_co_schedule(qemu_get_aio_context(), qemu_coroutine_self());
> > - qemu_coroutine_yield();
> > + aio_co_reschedule_self(qemu_get_aio_context());
>
> Turns out that this one actually causes a regression. [1] This code is
> ŕun in iohandler_ctx, aio_co_reschedule_self() looks at the new context
> and compares it with qemu_get_current_aio_context() - and because both
> are qemu_aio_context, it decides that it has nothing to do. So the
> command handler coroutine actually still runs in iohandler_ctx now,
> which is not what we want.
>
> We could just revert this patch because it was only meant as a cleanup
> without a semantic difference.
>
> Or aio_co_reschedule_self() could look at qemu_coroutine_self()->ctx
> instead of using qemu_get_current_aio_context(). That would be a little
> more indirect, though, and I'm not sure if co->ctx is always up to date.
>
> Any opinions on what is the best way to fix this?
If the commit is reverted then similar bugs may be introduced again in
the future. The qemu_get_current_aio_context() API is unaware of
iohandler_ctx and this can lead to unexpected results.
I will send patches to revert the commit and add doc comments explaining
iohandler_ctx's special behavior. This will reduce, but not eliminate,
the risk of future bugs.
Modifying aio_co_reschedule_self() might be better long-term fix, but
I'm afraid it will create more bugs because it will expose the subtle
distinction between the current coroutine AioContext and non-coroutine
AioContext in new places. I think the root cause is that iohandler_ctx
isn't a full-fledged AioContext with its own event loop. iohandler_ctx
is a special superset of qemu_aio_context that the main loop monitors.
Stefan
>
> Kevin
>
> [1] https://issues.redhat.com/browse/RHEL-34618
>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
next prev parent reply other threads:[~2024-05-06 18:10 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-02-06 19:06 [PATCH v2 0/5] virtio-blk: iothread-vq-mapping cleanups Stefan Hajnoczi
2024-02-06 19:06 ` [PATCH v2 1/5] virtio-blk: enforce iothread-vq-mapping validation Stefan Hajnoczi
2024-02-06 19:06 ` [PATCH v2 2/5] virtio-blk: clarify that there is at least 1 virtqueue Stefan Hajnoczi
2024-02-06 19:06 ` [PATCH v2 3/5] virtio-blk: add vq_rq[] bounds check in virtio_blk_dma_restart_cb() Stefan Hajnoczi
2024-02-06 19:06 ` [PATCH v2 4/5] virtio-blk: declare VirtIOBlock::rq with a type Stefan Hajnoczi
2024-02-06 19:06 ` [PATCH v2 5/5] monitor: use aio_co_reschedule_self() Stefan Hajnoczi
2024-05-03 17:33 ` Kevin Wolf
2024-05-06 18:09 ` Stefan Hajnoczi [this message]
2024-02-06 19:08 ` [PATCH v2 0/5] virtio-blk: iothread-vq-mapping cleanups Michael S. Tsirkin
2024-02-07 13:45 ` Kevin Wolf
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240506180908.GB7375@fedora.redhat.com \
--to=stefanha@redhat.com \
--cc=armbru@redhat.com \
--cc=hreitz@redhat.com \
--cc=kwolf@redhat.com \
--cc=manos.pitsidianakis@linaro.org \
--cc=michael.roth@amd.com \
--cc=mst@redhat.com \
--cc=qemu-block@nongnu.org \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).