From: "Maciej S. Szmigiero" <mail@maciej.szmigiero.name>
To: "Daniel P. Berrangé" <berrange@redhat.com>
Cc: "Peter Xu" <peterx@redhat.com>, "Fabiano Rosas" <farosas@suse.de>,
"Alex Williamson" <alex.williamson@redhat.com>,
"Cédric Le Goater" <clg@redhat.com>,
"Eric Blake" <eblake@redhat.com>,
"Markus Armbruster" <armbru@redhat.com>,
"Avihai Horon" <avihaih@nvidia.com>,
"Joao Martins" <joao.m.martins@oracle.com>,
qemu-devel@nongnu.org
Subject: Re: [PATCH RFC 00/26] Multifd 🔀 device state transfer support with VFIO consumer
Date: Thu, 18 Apr 2024 20:14:15 +0200 [thread overview]
Message-ID: <aebcd78e-b8b6-44db-b2be-0bbd5acccf3f@maciej.szmigiero.name> (raw)
In-Reply-To: <ZiD4aLSre6qubuHr@redhat.com>
On 18.04.2024 12:39, Daniel P. Berrangé wrote:
> On Thu, Apr 18, 2024 at 11:50:12AM +0200, Maciej S. Szmigiero wrote:
>> On 17.04.2024 18:35, Daniel P. Berrangé wrote:
>>> On Wed, Apr 17, 2024 at 02:11:37PM +0200, Maciej S. Szmigiero wrote:
>>>> On 17.04.2024 10:36, Daniel P. Berrangé wrote:
>>>>> On Tue, Apr 16, 2024 at 04:42:39PM +0200, Maciej S. Szmigiero wrote:
>>>>>> From: "Maciej S. Szmigiero" <maciej.szmigiero@oracle.com>
(..)
>>>>> That said, the idea of reserving channels specifically for VFIO doesn't
>>>>> make a whole lot of sense to me either.
>>>>>
>>>>> Once we've done the RAM transfer, and are in the switchover phase
>>>>> doing device state transfer, all the multifd channels are idle.
>>>>> We should just use all those channels to transfer the device state,
>>>>> in parallel. Reserving channels just guarantees many idle channels
>>>>> during RAM transfer, and further idle channels during vmstate
>>>>> transfer.
>>>>>
>>>>> IMHO it is more flexible to just use all available multifd channel
>>>>> resources all the time.
>>>>
>>>> The reason for having dedicated device state channels is that they
>>>> provide lower downtime in my tests.
>>>>
>>>> With either 15 or 11 mixed multifd channels (no dedicated device state
>>>> channels) I get a downtime of about 1250 msec.
>>>>
>>>> Comparing that with 15 total multifd channels / 4 dedicated device
>>>> state channels that give downtime of about 1100 ms it means that using
>>>> dedicated channels gets about 14% downtime improvement.
>>>
>>> Hmm, can you clarify. /when/ is the VFIO vmstate transfer taking
>>> place ? Is is transferred concurrently with the RAM ? I had thought
>>> this series still has the RAM transfer iterations running first,
>>> and then the VFIO VMstate at the end, simply making use of multifd
>>> channels for parallelism of the end phase. your reply though makes
>>> me question my interpretation though.
>>>
>>> Let me try to illustrate channel flow in various scenarios, time
>>> flowing left to right:
>>>
>>> 1. serialized RAM, then serialized VM state (ie historical migration)
>>>
>>> main: | Init | RAM iter 1 | RAM iter 2 | ... | RAM iter N | VM State |
>>>
>>>
>>> 2. parallel RAM, then serialized VM state (ie today's multifd)
>>>
>>> main: | Init | | VM state |
>>> multifd1: | RAM iter 1 | RAM iter 2 | ... | RAM iter N |
>>> multifd2: | RAM iter 1 | RAM iter 2 | ... | RAM iter N |
>>> multifd3: | RAM iter 1 | RAM iter 2 | ... | RAM iter N |
>>>
>>>
>>> 3. parallel RAM, then parallel VM state
>>>
>>> main: | Init | | VM state |
>>> multifd1: | RAM iter 1 | RAM iter 2 | ... | RAM iter N |
>>> multifd2: | RAM iter 1 | RAM iter 2 | ... | RAM iter N |
>>> multifd3: | RAM iter 1 | RAM iter 2 | ... | RAM iter N |
>>> multifd4: | VFIO VM state |
>>> multifd5: | VFIO VM state |
>>>
>>>
>>> 4. parallel RAM and VFIO VM state, then remaining VM state
>>>
>>> main: | Init | | VM state |
>>> multifd1: | RAM iter 1 | RAM iter 2 | ... | RAM iter N |
>>> multifd2: | RAM iter 1 | RAM iter 2 | ... | RAM iter N |
>>> multifd3: | RAM iter 1 | RAM iter 2 | ... | RAM iter N |
>>> multifd4: | VFIO VM state |
>>> multifd5: | VFIO VM state |
>>>
>>>
>>> I thought this series was implementing approx (3), but are you actually
>>> implementing (4), or something else entirely ?
>>
>> You are right that this series operation is approximately implementing
>> the schema described as numer 3 in your diagrams.
>
>> However, there are some additional details worth mentioning:
>> * There's some but relatively small amount of VFIO data being
>> transferred from the "save_live_iterate" SaveVMHandler while the VM is
>> still running.
>>
>> This is still happening via the main migration channel.
>> Parallelizing this transfer in the future might make sense too,
>> although obviously this doesn't impact the downtime.
>>
>> * After the VM is stopped and downtime starts the main (~ 400 MiB)
>> VFIO device state gets transferred via multifd channels.
>>
>> However, these multifd channels (if they are not dedicated to device
>> state transfer) aren't idle during that time.
>> Rather they seem to be transferring the residual RAM data.
>>
>> That's most likely what causes the additional observed downtime
>> when dedicated device state transfer multifd channels aren't used.
>
> Ahh yes, I forgot about the residual dirty RAM, that makes sense as
> an explanation. Allow me to work through the scenarios though, as I
> still think my suggestion to not have separate dedicate channels is
> better....
>
>
> Lets say hypothetically we have an existing deployment today that
> uses 6 multifd channels for RAM. ie:
>
> main: | Init | | VM state |
> multifd1: | RAM iter 1 | RAM iter 2 | ... | RAM iter N | Residual RAM |
> multifd2: | RAM iter 1 | RAM iter 2 | ... | RAM iter N | Residual RAM |
> multifd3: | RAM iter 1 | RAM iter 2 | ... | RAM iter N | Residual RAM |
> multifd4: | RAM iter 1 | RAM iter 2 | ... | RAM iter N | Residual RAM |
> multifd5: | RAM iter 1 | RAM iter 2 | ... | RAM iter N | Residual RAM |
> multifd6: | RAM iter 1 | RAM iter 2 | ... | RAM iter N | Residual RAM |
>
> That value of 6 was chosen because that corresponds to the amount
> of network & CPU utilization the admin wants to allow, for this
> VM to migrate. All 6 channels are fully utilized at all times.
>
>
> If we now want to parallelize VFIO VM state, the peak network
> and CPU utilization the admin wants to reserve for the VM should
> not change. Thus the admin will still wants to configure only 6
> channels total.
>
> With your proposal the admin has to reduce RAM transfer to 4 of the
> channels, in order to then reserve 2 channels for VFIO VM state, so we
> get a flow like:
>
>
> main: | Init | | VM state |
> multifd1: | RAM iter 1 | RAM iter 2 | ... | RAM iter N | Residual RAM |
> multifd2: | RAM iter 1 | RAM iter 2 | ... | RAM iter N | Residual RAM |
> multifd3: | RAM iter 1 | RAM iter 2 | ... | RAM iter N | Residual RAM |
> multifd4: | RAM iter 1 | RAM iter 2 | ... | RAM iter N | Residual RAM |
> multifd5: | VFIO VM state |
> multifd6: | VFIO VM state |
>
> This is bad, as it reduces performance of RAM transfer. VFIO VM
> state transfer is better, but that's not a net win overall.
>
>
>
> So lets say the admin was happy to increase the number of multifd
> channels from 6 to 8.
>
> This series proposes that they would leave RAM using 6 channels as
> before, and now reserve the 2 extra ones for VFIO VM state:
>
> main: | Init | | VM state |
> multifd1: | RAM iter 1 | RAM iter 2 | ... | RAM iter N | Residual RAM |
> multifd2: | RAM iter 1 | RAM iter 2 | ... | RAM iter N | Residual RAM |
> multifd3: | RAM iter 1 | RAM iter 2 | ... | RAM iter N | Residual RAM |
> multifd4: | RAM iter 1 | RAM iter 2 | ... | RAM iter N | Residual RAM |
> multifd5: | RAM iter 1 | RAM iter 2 | ... | RAM iter N | Residual RAM |
> multifd6: | RAM iter 1 | RAM iter 2 | ... | RAM iter N | Residual RAM |
> multifd7: | VFIO VM state |
> multifd8: | VFIO VM state |
>
>
> RAM would perform as well as it did historically, and VM state would
> improve due to the 2 parallel channels, and not competing with the
> residual RAM transfer.
>
> This is what your latency comparison numbers show as a benefit for
> this channel reservation design.
>
> I believe this comparison is inappropriate / unfair though, as it is
> comparing a situation with 6 total channels against a situation with
> 8 total channels.
>
> If the admin was happy to increase the total channels to 8, then they
> should allow RAM to use all 8 channels, and then VFIO VM state +
> residual RAM to also use the very same set of 8 channels:
>
> main: | Init | | VM state |
> multifd1: | RAM iter 1 | RAM iter 2 | ... | RAM iter N | Residual RAM + VFIO VM state|
> multifd2: | RAM iter 1 | RAM iter 2 | ... | RAM iter N | Residual RAM + VFIO VM state|
> multifd3: | RAM iter 1 | RAM iter 2 | ... | RAM iter N | Residual RAM + VFIO VM state|
> multifd4: | RAM iter 1 | RAM iter 2 | ... | RAM iter N | Residual RAM + VFIO VM state|
> multifd5: | RAM iter 1 | RAM iter 2 | ... | RAM iter N | Residual RAM + VFIO VM state|
> multifd6: | RAM iter 1 | RAM iter 2 | ... | RAM iter N | Residual RAM + VFIO VM state|
> multifd7: | RAM iter 1 | RAM iter 2 | ... | RAM iter N | Residual RAM + VFIO VM state|
> multifd8: | RAM iter 1 | RAM iter 2 | ... | RAM iter N | Residual RAM + VFIO VM state|
>
> This will speed up initial RAM iters still further & the final switch
> over phase even more. If residual RAM is larger than VFIO VM state,
> then it will dominate the switchover latency, so having VFIO VM state
> compete is not a problem. If VFIO VM state is larger than residual RAM,
> then allowing it acces to all 8 channels instead of only 2 channels
> will be a clear win.
I re-did the measurement with increased the number of multifd channels,
first to (total count/dedicated count) 25/0, then to 100/0.
The results did not improve:
With 25/0 multifd mixed channels config I still get around 1250 msec
downtime - the same as with 15/0 or 11/0 mixed configs I measured
earlier.
But with the (pretty insane) 100/0 mixed channel config the whole setup
gets so for into the law of diminishing returns that the results actually
get worse: the downtime is now about 1450 msec.
I guess that's from all the extra overhead from switching between 100
multifd channels.
I think one of the reasons for these results is that mixed (RAM + device
state) multifd channels participate in the RAM sync process
(MULTIFD_FLAG_SYNC) whereas device state dedicated channels don't.
It is possible that there are other subtle performance interactions too,
but I am not 100% sure about that.
> With regards,
> Daniel
Best regards,
Maciej
next prev parent reply other threads:[~2024-04-18 18:15 UTC|newest]
Thread overview: 54+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-04-16 14:42 [PATCH RFC 00/26] Multifd 🔀 device state transfer support with VFIO consumer Maciej S. Szmigiero
2024-04-16 14:42 ` [PATCH RFC 01/26] migration: Add x-channel-header pseudo-capability Maciej S. Szmigiero
2024-04-16 14:42 ` [PATCH RFC 02/26] migration: Add migration channel header send/receive Maciej S. Szmigiero
2024-04-16 14:42 ` [PATCH RFC 03/26] migration: Add send/receive header for main channel Maciej S. Szmigiero
2024-04-16 14:42 ` [PATCH RFC 04/26] multifd: change multifd_new_send_channel_create() param type Maciej S. Szmigiero
2024-04-16 14:42 ` [PATCH RFC 05/26] migration: Add a DestroyNotify parameter to socket_send_channel_create() Maciej S. Szmigiero
2024-04-16 14:42 ` [PATCH RFC 06/26] multifd: pass MFDSendChannelConnectData when connecting sending socket Maciej S. Szmigiero
2024-04-16 14:42 ` [PATCH RFC 07/26] migration/postcopy: pass PostcopyPChannelConnectData when connecting sending preempt socket Maciej S. Szmigiero
2024-04-16 14:42 ` [PATCH RFC 08/26] migration: Allow passing migration header in migration channel creation Maciej S. Szmigiero
2024-04-16 14:42 ` [PATCH RFC 09/26] migration: Add send/receive header for postcopy preempt channel Maciej S. Szmigiero
2024-04-16 14:42 ` [PATCH RFC 10/26] migration: Add send/receive header for multifd channel Maciej S. Szmigiero
2024-04-16 14:42 ` [PATCH RFC 11/26] migration/options: Mapped-ram is not channel header compatible Maciej S. Szmigiero
2024-04-16 14:42 ` [PATCH RFC 12/26] migration: Enable x-channel-header pseudo-capability Maciej S. Szmigiero
2024-04-16 14:42 ` [PATCH RFC 13/26] vfio/migration: Add save_{iterate, complete_precopy}_started trace events Maciej S. Szmigiero
2024-04-16 14:42 ` [PATCH RFC 14/26] migration/ram: Add load start trace event Maciej S. Szmigiero
2024-04-16 14:42 ` [PATCH RFC 15/26] migration/multifd: Zero p->flags before starting filling a packet Maciej S. Szmigiero
2024-04-16 14:42 ` [PATCH RFC 16/26] migration: Add save_live_complete_precopy_async{, wait} handlers Maciej S. Szmigiero
2024-04-16 14:42 ` [PATCH RFC 17/26] migration: Add qemu_loadvm_load_state_buffer() and its handler Maciej S. Szmigiero
2024-04-16 14:42 ` [PATCH RFC 18/26] migration: Add load_finish handler and associated functions Maciej S. Szmigiero
2024-04-16 14:42 ` [PATCH RFC 19/26] migration: Add x-multifd-channels-device-state parameter Maciej S. Szmigiero
2024-04-16 14:42 ` [PATCH RFC 20/26] migration: Add MULTIFD_DEVICE_STATE migration channel type Maciej S. Szmigiero
2024-04-16 14:43 ` [PATCH RFC 21/26] migration/multifd: Device state transfer support - receive side Maciej S. Szmigiero
2024-04-16 14:43 ` [PATCH RFC 22/26] migration/multifd: Convert multifd_send_pages::next_channel to atomic Maciej S. Szmigiero
2024-04-16 14:43 ` [PATCH RFC 23/26] migration/multifd: Device state transfer support - send side Maciej S. Szmigiero
2024-04-29 20:04 ` Peter Xu
2024-05-06 16:25 ` Maciej S. Szmigiero
2024-04-16 14:43 ` [PATCH RFC 24/26] migration/multifd: Add migration_has_device_state_support() Maciej S. Szmigiero
2024-04-16 14:43 ` [PATCH RFC 25/26] vfio/migration: Multifd device state transfer support - receive side Maciej S. Szmigiero
2024-04-16 14:43 ` [PATCH RFC 26/26] vfio/migration: Multifd device state transfer support - send side Maciej S. Szmigiero
2024-04-17 8:36 ` [PATCH RFC 00/26] Multifd 🔀 device state transfer support with VFIO consumer Daniel P. Berrangé
2024-04-17 12:11 ` Maciej S. Szmigiero
2024-04-17 16:35 ` Daniel P. Berrangé
2024-04-18 9:50 ` Maciej S. Szmigiero
2024-04-18 10:39 ` Daniel P. Berrangé
2024-04-18 18:14 ` Maciej S. Szmigiero [this message]
2024-04-18 20:02 ` Peter Xu
2024-04-19 10:07 ` Daniel P. Berrangé
2024-04-19 15:31 ` Peter Xu
2024-04-23 16:15 ` Maciej S. Szmigiero
2024-04-23 22:20 ` Peter Xu
2024-04-23 22:25 ` Maciej S. Szmigiero
2024-04-23 22:35 ` Peter Xu
2024-04-26 17:34 ` Maciej S. Szmigiero
2024-04-29 15:09 ` Peter Xu
2024-05-06 16:26 ` Maciej S. Szmigiero
2024-05-06 17:56 ` Peter Xu
2024-05-07 8:41 ` Avihai Horon
2024-05-07 16:13 ` Peter Xu
2024-05-07 17:23 ` Avihai Horon
2024-04-23 16:14 ` Maciej S. Szmigiero
2024-04-23 22:27 ` Peter Xu
2024-04-26 17:35 ` Maciej S. Szmigiero
2024-04-29 20:34 ` Peter Xu
2024-04-19 10:20 ` Daniel P. Berrangé
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aebcd78e-b8b6-44db-b2be-0bbd5acccf3f@maciej.szmigiero.name \
--to=mail@maciej.szmigiero.name \
--cc=alex.williamson@redhat.com \
--cc=armbru@redhat.com \
--cc=avihaih@nvidia.com \
--cc=berrange@redhat.com \
--cc=clg@redhat.com \
--cc=eblake@redhat.com \
--cc=farosas@suse.de \
--cc=joao.m.martins@oracle.com \
--cc=peterx@redhat.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).