On Thursday, April 4, 2024 12:34 AM, Peter Xu wrote:

> On Wed, Apr 03, 2024 at 04:04:21PM +0000, Wang, Wei W wrote:

> > On Wednesday, April 3, 2024 10:42 PM, Peter Xu wrote:

> > > On Wed, Apr 03, 2024 at 04:35:35PM +0800, Wang, Lei wrote:

> > > > We should change the following line from

> > > >

> > > > while (!qemu_sem_timedwait(&mis->postcopy_qemufile_dst_done,

> > > 100)) {

> > > >

> > > > to

> > > >

> > > > while (qemu_sem_timedwait(&mis->postcopy_qemufile_dst_done,

> > > 100)) {

> > >

> > > Stupid me.. :( Thanks for figuring this out.

> > >

> > > >

> > > > After that fix, test passed and no segfault.

> > > >

> > > > Given that the test shows a yield to the main loop won't introduce

> > > > much overhead (<1ms), how about first yield unconditionally, then

> > > > we enter the while loop to wait for several ms and yield periodically?

> > >

> > > Shouldn't the expectation be that this should return immediately

> > > without a wait? We're already processing LISTEN command, and on the

> > > source as you said it was much after the connect(). It won't

> > > guarantee the ordering but IIUC the majority should still have a direct hit?

> > >

> > > What we can do though is reducing the 100ms timeout if you see

> > > that's perhaps a risk of having too large a downtime when by

> > > accident. We can even do it in a tight loop here considering

> > > downtime is important, but to provide an intermediate ground: how about

> 100ms -> 1ms poll?

> >

> > Would it be better to use busy wait here, instead of blocking for even 1ms

> here?

> > It's likely that the preempt channel is waiting for the main thread to

> > dispatch for accept(), but we are calling qemu_sem_timedwait here to block

> the main thread for 1 more ms.

> I think it's about the expectation of whether we should already received that

> sem post. My understanding is in most cases we should directly return and

> avoid such wait.

The assumption of " should already received" might not be true.

On my machine, it seems always "not received".

> Per my previous experience, 1ms is not a major issue to be added on top of

> downtime in corner cases like this.

Yeah, in most cases, probably yes. There are some usages requiring relatively low

downtime. Remember NFV usages in early days expect the downtime to be close to

10ms. In this case, adding 1ms would mean ~10% performance regression ☹

> We do have a lot of othre potential optimizations to reduce downtime, or I

> should say in the other way, that.. there can be a lot of cases where we can hit

> much larger downtime than expected. Consider when we don't even account

> downtime for device states for now, either load_state or save_state, we only

> count RAM but that's far from accurate.. and we do have more chances to

> optimize. Some are listed here, but some may not:

> https://wiki.qemu.org/ToDo/LiveMigration#Optimizations

> If you agree with my above "expectation" statement, I'd say we should avoid

> using a busy loop whenever possible in QEMU unless extremely necessary.

I just posted a version with a bit changes from your suggestion:

It still blocks by the sem in the loop, but before that it checks if the channel is

created. If not, yield to main loop. Then when reaching to sem, it is likely not

need to wait.

Another change is that I moved it to the place before we load the states (in

loadvm_handle_cmd_packaged). Because I think that's logically the place we

should ensure the channel to be ready (don’t have a good reason to add it

when handling the LISTEN cmd).

Please let me know if you would disagree anywhere.