All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: "Wang, Wei W" <wei.w.wang@intel.com>
Cc: "Wang, Lei4" <lei4.wang@intel.com>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	"farosas@suse.de" <farosas@suse.de>
Subject: Re: [PATCH] migration: Yield coroutine when receiving MIG_CMD_POSTCOPY_LISTEN
Date: Wed, 3 Apr 2024 12:33:44 -0400	[thread overview]
Message-ID: <Zg2E6MKQPaG3gA1k@x1n> (raw)
In-Reply-To: <DS0PR11MB637346AE0C9777A6C25746CFDC3D2@DS0PR11MB6373.namprd11.prod.outlook.com>

On Wed, Apr 03, 2024 at 04:04:21PM +0000, Wang, Wei W wrote:
> On Wednesday, April 3, 2024 10:42 PM, Peter Xu wrote:
> > On Wed, Apr 03, 2024 at 04:35:35PM +0800, Wang, Lei wrote:
> > > We should change the following line from
> > >
> > > 	while (!qemu_sem_timedwait(&mis->postcopy_qemufile_dst_done,
> > 100)) {
> > >
> > > to
> > >
> > > 	while (qemu_sem_timedwait(&mis->postcopy_qemufile_dst_done,
> > 100)) {
> > 
> > Stupid me.. :(  Thanks for figuring this out.
> > 
> > >
> > > After that fix, test passed and no segfault.
> > >
> > > Given that the test shows a yield to the main loop won't introduce
> > > much overhead (<1ms), how about first yield unconditionally, then we
> > > enter the while loop to wait for several ms and yield periodically?
> > 
> > Shouldn't the expectation be that this should return immediately without a
> > wait?  We're already processing LISTEN command, and on the source as you
> > said it was much after the connect().  It won't guarantee the ordering but IIUC
> > the majority should still have a direct hit?
> > 
> > What we can do though is reducing the 100ms timeout if you see that's
> > perhaps a risk of having too large a downtime when by accident.  We can even
> > do it in a tight loop here considering downtime is important, but to provide an
> > intermediate ground: how about 100ms -> 1ms poll?
> 
> Would it be better to use busy wait here, instead of blocking for even 1ms here?
> It's likely that the preempt channel is waiting for the main thread to dispatch for accept(),
> but we are calling qemu_sem_timedwait here to block the main thread for 1 more ms.

I think it's about the expectation of whether we should already received
that sem post.  My understanding is in most cases we should directly return
and avoid such wait.

Per my previous experience, 1ms is not a major issue to be added on top of
downtime in corner cases like this.

We do have a lot of othre potential optimizations to reduce downtime, or I
should say in the other way, that..  there can be a lot of cases where we
can hit much larger downtime than expected. Consider when we don't even
account downtime for device states for now, either load_state or
save_state, we only count RAM but that's far from accurate.. and we do have
more chances to optimize.  Some are listed here, but some may not:

https://wiki.qemu.org/ToDo/LiveMigration#Optimizations

If you agree with my above "expectation" statement, I'd say we should avoid
using a busy loop whenever possible in QEMU unless extremely necessary.

> 
> 
> > 
> > If you agree (and also to Wei; please review this and comment if there's any!),
> > would you write up the commit log, fully test it in whatever way you could,
> > and resend as a formal patch (please do this before Friday if possible)?  You
> > can keep a "Suggested-by:" for me.  I want to queue it for
> > rc3 if it can catch it. It seems important if Wei can always reproduce it.
> 
> Not sure if Lei would be able to online as the following two days are Chinese holiday.
> If not, I could help take over to send late tomorrow. Let's see.

Oops, I forgot that even if I was aware..

Please do so if you can do this.  Thank you, Wei!  (I hope you can switch
some working hours later on!)

Let me know if that doesn't work; it'll be all fine.

Thanks,

-- 
Peter Xu



  reply	other threads:[~2024-04-03 16:35 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-29  3:32 [PATCH] migration: Yield coroutine when receiving MIG_CMD_POSTCOPY_LISTEN Lei Wang
2024-03-29  8:54 ` Wang, Wei W
2024-04-01 16:13   ` Peter Xu
2024-04-01 17:17     ` Fabiano Rosas
2024-04-01 18:47       ` Peter Xu
2024-04-01 21:22         ` Fabiano Rosas
2024-04-02  6:55     ` Wang, Lei
2024-04-02  7:25       ` Wang, Wei W
2024-04-02  9:28         ` Wang, Lei
2024-04-02 21:39           ` Peter Xu
2024-04-03  8:35             ` Wang, Lei
2024-04-03 14:42               ` Peter Xu
2024-04-03 16:04                 ` Wang, Wei W
2024-04-03 16:33                   ` Peter Xu [this message]
2024-04-04 10:11                     ` Wang, Wei W
2024-04-02  7:20     ` Wang, Wei W
2024-04-02 21:43       ` Peter Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Zg2E6MKQPaG3gA1k@x1n \
    --to=peterx@redhat.com \
    --cc=farosas@suse.de \
    --cc=lei4.wang@intel.com \
    --cc=qemu-devel@nongnu.org \
    --cc=wei.w.wang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.