QEMU-Devel Archive mirror
 help / color / mirror / Atom feed
From: Michael Galaxy <mgalaxy@akamai.com>
To: Zheng Chuan <zhengchuan@huawei.com>, Peter Xu <peterx@redhat.com>,
	"Gonglei (Arei)" <arei.gonglei@huawei.com>
Cc: "Daniel P. Berrangé" <berrange@redhat.com>,
	"Markus Armbruster" <armbru@redhat.com>,
	"Yu Zhang" <yu.zhang@ionos.com>,
	"Zhijian Li (Fujitsu)" <lizhijian@fujitsu.com>,
	"Jinpu Wang" <jinpu.wang@ionos.com>,
	"Elmar Gerdes" <elmar.gerdes@ionos.com>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	"Yuval Shaia" <yuval.shaia.ml@gmail.com>,
	"Kevin Wolf" <kwolf@redhat.com>,
	"Prasanna Kumar Kalever" <prasanna.kalever@redhat.com>,
	"Cornelia Huck" <cohuck@redhat.com>,
	"Michael Roth" <michael.roth@amd.com>,
	"Prasanna Kumar Kalever" <prasanna4324@gmail.com>,
	"integration@gluster.org" <integration@gluster.org>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"qemu-block@nongnu.org" <qemu-block@nongnu.org>,
	"devel@lists.libvirt.org" <devel@lists.libvirt.org>,
	"Hanna Reitz" <hreitz@redhat.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	"Thomas Huth" <thuth@redhat.com>,
	"Eric Blake" <eblake@redhat.com>,
	"Song Gao" <gaosong@loongson.cn>,
	"Marc-André Lureau" <marcandre.lureau@redhat.com>,
	"Alex Bennée" <alex.bennee@linaro.org>,
	"Wainer dos Santos Moschetta" <wainersm@redhat.com>,
	"Beraldo Leal" <bleal@redhat.com>,
	Pannengyuan <pannengyuan@huawei.com>,
	Xiexiangyou <xiexiangyou@huawei.com>
Subject: Re: [PATCH-for-9.1 v2 2/3] migration: Remove RDMA protocol handling
Date: Mon, 13 May 2024 13:52:27 -0500	[thread overview]
Message-ID: <cd93922f-cf58-4a42-854d-0b39c6941905@akamai.com> (raw)
In-Reply-To: <13ce4f9e-1e7c-24a9-0dc9-c40962979663@huawei.com>

One thing to keep in mind here (despite me not having any hardware to 
test) was that one of the original goals here
in the RDMA implementation was not simply raw throughput nor raw 
latency, but a lack of CPU utilization in kernel
space due to the offload. While it is entirely possible that newer 
hardware w/ TCP might compete, the significant
reductions in CPU usage in the TCP/IP stack were a big win at the time.

Just something to consider while you're doing the testing........

- Michael

On 5/9/24 03:58, Zheng Chuan wrote:
> Hi, Peter,Lei,Jinpu.
>
> On 2024/5/8 0:28, Peter Xu wrote:
>> On Tue, May 07, 2024 at 01:50:43AM +0000, Gonglei (Arei) wrote:
>>> Hello,
>>>
>>>> -----Original Message-----
>>>> From: Peter Xu [mailto:peterx@redhat.com]
>>>> Sent: Monday, May 6, 2024 11:18 PM
>>>> To: Gonglei (Arei) <arei.gonglei@huawei.com>
>>>> Cc: Daniel P. Berrangé <berrange@redhat.com>; Markus Armbruster
>>>> <armbru@redhat.com>; Michael Galaxy <mgalaxy@akamai.com>; Yu Zhang
>>>> <yu.zhang@ionos.com>; Zhijian Li (Fujitsu) <lizhijian@fujitsu.com>; Jinpu Wang
>>>> <jinpu.wang@ionos.com>; Elmar Gerdes <elmar.gerdes@ionos.com>;
>>>> qemu-devel@nongnu.org; Yuval Shaia <yuval.shaia.ml@gmail.com>; Kevin Wolf
>>>> <kwolf@redhat.com>; Prasanna Kumar Kalever
>>>> <prasanna.kalever@redhat.com>; Cornelia Huck <cohuck@redhat.com>;
>>>> Michael Roth <michael.roth@amd.com>; Prasanna Kumar Kalever
>>>> <prasanna4324@gmail.com>; integration@gluster.org; Paolo Bonzini
>>>> <pbonzini@redhat.com>; qemu-block@nongnu.org; devel@lists.libvirt.org;
>>>> Hanna Reitz <hreitz@redhat.com>; Michael S. Tsirkin <mst@redhat.com>;
>>>> Thomas Huth <thuth@redhat.com>; Eric Blake <eblake@redhat.com>; Song
>>>> Gao <gaosong@loongson.cn>; Marc-André Lureau
>>>> <marcandre.lureau@redhat.com>; Alex Bennée <alex.bennee@linaro.org>;
>>>> Wainer dos Santos Moschetta <wainersm@redhat.com>; Beraldo Leal
>>>> <bleal@redhat.com>; Pannengyuan <pannengyuan@huawei.com>;
>>>> Xiexiangyou <xiexiangyou@huawei.com>
>>>> Subject: Re: [PATCH-for-9.1 v2 2/3] migration: Remove RDMA protocol handling
>>>>
>>>> On Mon, May 06, 2024 at 02:06:28AM +0000, Gonglei (Arei) wrote:
>>>>> Hi, Peter
>>>> Hey, Lei,
>>>>
>>>> Happy to see you around again after years.
>>>>
>>> Haha, me too.
>>>
>>>>> RDMA features high bandwidth, low latency (in non-blocking lossless
>>>>> network), and direct remote memory access by bypassing the CPU (As you
>>>>> know, CPU resources are expensive for cloud vendors, which is one of
>>>>> the reasons why we introduced offload cards.), which TCP does not have.
>>>> It's another cost to use offload cards, v.s. preparing more cpu resources?
>>>>
>>> Software and hardware offload converged architecture is the way to go for all cloud vendors
>>> (Including comprehensive benefits in terms of performance, cost, security, and innovation speed),
>>> it's not just a matter of adding the resource of a DPU card.
>>>
>>>>> In some scenarios where fast live migration is needed (extremely short
>>>>> interruption duration and migration duration) is very useful. To this
>>>>> end, we have also developed RDMA support for multifd.
>>>> Will any of you upstream that work?  I'm curious how intrusive would it be
>>>> when adding it to multifd, if it can keep only 5 exported functions like what
>>>> rdma.h does right now it'll be pretty nice.  We also want to make sure it works
>>>> with arbitrary sized loads and buffers, e.g. vfio is considering to add IO loads to
>>>> multifd channels too.
>>>>
>>> In fact, we sent the patchset to the community in 2021. Pls see:
>>> https://urldefense.com/v3/__https://lore.kernel.org/all/20210203185906.GT2950@work-vm/T/__;!!GjvTz_vk!VfP_SV-8uRya7rBdopv8OUJkmnSi44Ktpqq1E7sr_Xcwt6zvveW51qboWOBSTChdUG1hJwfAl7HZl4NUEGc$
> Yes, I have sent the patchset of multifd support for rdma migration by taking over my colleague, and also
> sorry for not keeping on this work at that time due to some reasons.
> And also I am strongly agree with Lei that the RDMA protocol has some special advantages against with TCP
> in some scenario, and we are indeed to use it in our product.
>
>> I wasn't aware of that for sure in the past..
>>
>> Multifd has changed quite a bit in the last 9.0 release, that may not apply
>> anymore.  One thing to mention is please look at Dan's comment on possible
>> use of rsocket.h:
>>
>> https://urldefense.com/v3/__https://lore.kernel.org/all/ZjJm6rcqS5EhoKgK@redhat.com/__;!!GjvTz_vk!VfP_SV-8uRya7rBdopv8OUJkmnSi44Ktpqq1E7sr_Xcwt6zvveW51qboWOBSTChdUG1hJwfAl7HZ0CFSE-o$
>>
>> And Jinpu did help provide an initial test result over the library:
>>
>> https://urldefense.com/v3/__https://lore.kernel.org/qemu-devel/CAMGffEk8wiKNQmoUYxcaTHGtiEm2dwoCF_W7T0vMcD-i30tUkA@mail.gmail.com/__;!!GjvTz_vk!VfP_SV-8uRya7rBdopv8OUJkmnSi44Ktpqq1E7sr_Xcwt6zvveW51qboWOBSTChdUG1hJwfAl7HZxPNcdb4$
>>
>> It looks like we have a chance to apply that in QEMU.
>>
>>>
>>>> One thing to note that the question here is not about a pure performance
>>>> comparison between rdma and nics only.  It's about help us make a decision
>>>> on whether to drop rdma, iow, even if rdma performs well, the community still
>>>> has the right to drop it if nobody can actively work and maintain it.
>>>> It's just that if nics can perform as good it's more a reason to drop, unless
>>>> companies can help to provide good support and work together.
>>>>
>>> We are happy to provide the necessary review and maintenance work for RDMA
>>> if the community needs it.
>>>
>>> CC'ing Chuan Zheng.
>> I'm not sure whether you and Jinpu's team would like to work together and
>> provide a final solution for rdma over multifd.  It could be much simpler
>> than the original 2021 proposal if the rsocket API will work out.
>>
>> Thanks,
>>
> That's a good news to see the socket abstraction for RDMA!
> When I was developed the series above, the most pain is the RDMA migration has no QIOChannel abstraction and i need to take a 'fake channel'
> for it which is awkward in code implementation.
> So, as far as I know, we can do this by
> i. the first thing is that we need to evaluate the rsocket is good enough to satisfy our QIOChannel fundamental abstraction
> ii. if it works right, then we will continue to see if it can give us opportunity to hide the detail of rdma protocol
>      into rsocket by remove most of code in rdma.c and also some hack in migration main process.
> iii. implement the advanced features like multi-fd and multi-uri for rdma migration.
>
> Since I am not familiar with rsocket, I need some times to look at it and do some quick verify with rdma migration based on rsocket.
> But, yes, I am willing to involved in this refactor work and to see if we can make this migration feature more better:)
>
>


  parent reply	other threads:[~2024-05-13 18:54 UTC|newest]

Thread overview: 75+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-28 13:02 [PATCH-for-9.1 v2 0/3] rdma: Remove RDMA subsystem and pvrdma device Philippe Mathieu-Daudé
2024-03-28 13:02 ` [PATCH-for-9.1 v2 1/3] hw/rdma: Remove pvrdma device and rdmacm-mux helper Philippe Mathieu-Daudé
2024-03-28 17:51   ` Thomas Huth
2024-03-28 13:02 ` [PATCH-for-9.1 v2 2/3] migration: Remove RDMA protocol handling Philippe Mathieu-Daudé
2024-03-28 14:18   ` Fabiano Rosas
2024-03-28 15:01     ` Peter Xu
2024-03-28 15:22       ` Thomas Huth
2024-03-28 19:04         ` Peter Xu
2024-03-29  1:53       ` Zhijian Li (Fujitsu) via
2024-03-29 10:28         ` Philippe Mathieu-Daudé
2024-03-29 19:44           ` Daniel P. Berrangé
2024-04-01  7:55           ` Zhijian Li (Fujitsu) via
2024-04-01 21:26             ` Yu Zhang
2024-04-02 21:23               ` Peter Xu
2024-04-08 14:07                 ` Jinpu Wang
2024-04-08 16:18                   ` Peter Xu
2024-04-09  7:32                     ` Jinpu Wang
2024-04-09 19:46                       ` Peter Xu
2024-04-10  2:28                         ` Zhijian Li (Fujitsu) via
2024-04-10 13:49                           ` Peter Xu
2024-04-11 14:20                             ` Peter Xu
2024-04-11 16:36                               ` Yu Zhang
2024-04-12 14:04                                 ` Peter Xu
2024-04-29 13:08                                 ` Michael Galaxy
2024-04-29 14:56                                   ` Peter Xu
2024-04-29 20:45                                     ` Yu Zhang
2024-04-29 20:56                                       ` Michael Galaxy
2024-04-30  7:15                                     ` Markus Armbruster
2024-04-30  8:00                                       ` Daniel P. Berrangé
2024-05-01 15:31                                         ` Peter Xu
2024-05-01 15:59                                           ` Daniel P. Berrangé
2024-05-01 16:16                                             ` Peter Xu
2024-05-02 13:22                                               ` Michael Galaxy
2024-05-02 13:30                                                 ` Jinpu Wang
2024-05-02 16:19                                                   ` Peter Xu
2024-05-02 17:10                                                     ` Jinpu Wang
2024-05-03  6:40                                             ` Jinpu Wang
2024-05-03 14:33                                               ` Peter Xu
2024-05-06 10:08                                                 ` Jinpu Wang
2024-05-06 15:28                                                   ` Peter Xu
2024-05-07  4:52                                                     ` Jinpu Wang
2024-05-08 10:06                                                       ` Daniel P. Berrangé
2024-05-06  2:06                                           ` Gonglei (Arei) via
2024-05-06 15:18                                             ` Peter Xu
2024-05-07  1:50                                               ` Gonglei (Arei) via
2024-05-07 16:28                                                 ` Peter Xu
2024-05-09  8:58                                                   ` Zheng Chuan via
2024-05-09 14:13                                                     ` Peter Xu
2024-05-13  7:30                                                       ` Jinpu Wang
2024-05-14 15:19                                                       ` Yu Zhang
2024-05-16 17:29                                                         ` Michael Galaxy
2024-05-17 13:01                                                           ` Yu Zhang
2024-05-21 22:15                                                             ` Peter Xu
2024-05-28  9:06                                                               ` Gonglei (Arei) via
2024-05-28  9:11                                                                 ` Jinpu Wang
2024-05-28 15:54                                                                 ` Peter Xu
2024-05-29  2:43                                                                   ` Gonglei (Arei) via
2024-05-29  4:33                                                                     ` Jinpu Wang
2024-05-29  6:05                                                                       ` Greg Sword
2024-05-29  7:04                                                                         ` Jinpu Wang
2024-05-29  8:30                                                                         ` Gonglei (Arei) via
2024-05-29  9:17                                                                           ` Jinpu Wang
2024-05-29  9:34                                                                             ` Gonglei (Arei) via
2024-05-29  9:44                                                                               ` Jinpu Wang
2024-05-29  9:47                                                                             ` Gonglei (Arei) via
2024-05-29 11:13                                                                               ` Haris Iqbal
2024-05-30 18:23                                                                       ` Sean Hefty
2024-05-29 16:33                                                                     ` Peter Xu
2024-05-13 18:52                                                     ` Michael Galaxy [this message]
2024-04-11 14:42                         ` Jinpu Wang
2024-04-09  9:00                     ` Markus Armbruster
2024-03-28 13:02 ` [PATCH-for-9.1 v2 3/3] block/gluster: " Philippe Mathieu-Daudé
2024-03-28 17:54   ` Thomas Huth
2024-03-29  9:17 ` [PATCH-for-9.1 v2 0/3] rdma: Remove RDMA subsystem and pvrdma device Michael S. Tsirkin
2024-04-03  9:37 ` Philippe Mathieu-Daudé

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cd93922f-cf58-4a42-854d-0b39c6941905@akamai.com \
    --to=mgalaxy@akamai.com \
    --cc=alex.bennee@linaro.org \
    --cc=arei.gonglei@huawei.com \
    --cc=armbru@redhat.com \
    --cc=berrange@redhat.com \
    --cc=bleal@redhat.com \
    --cc=cohuck@redhat.com \
    --cc=devel@lists.libvirt.org \
    --cc=eblake@redhat.com \
    --cc=elmar.gerdes@ionos.com \
    --cc=gaosong@loongson.cn \
    --cc=hreitz@redhat.com \
    --cc=integration@gluster.org \
    --cc=jinpu.wang@ionos.com \
    --cc=kwolf@redhat.com \
    --cc=lizhijian@fujitsu.com \
    --cc=marcandre.lureau@redhat.com \
    --cc=michael.roth@amd.com \
    --cc=mst@redhat.com \
    --cc=pannengyuan@huawei.com \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=prasanna.kalever@redhat.com \
    --cc=prasanna4324@gmail.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=thuth@redhat.com \
    --cc=wainersm@redhat.com \
    --cc=xiexiangyou@huawei.com \
    --cc=yu.zhang@ionos.com \
    --cc=yuval.shaia.ml@gmail.com \
    --cc=zhengchuan@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).