From: Pavel Begunkov <asml.silence@gmail.com>
To: Mina Almasry <almasrymina@google.com>
Cc: "Mathieu Desnoyers" <mathieu.desnoyers@efficios.com>,
"Martin KaFai Lau" <martin.lau@linux.dev>,
"Song Liu" <song@kernel.org>,
"Yonghong Song" <yonghong.song@linux.dev>,
"John Fastabend" <john.fastabend@gmail.com>,
"KP Singh" <kpsingh@kernel.org>,
"Stanislav Fomichev" <sdf@google.com>,
"Hao Luo" <haoluo@google.com>, "Jiri Olsa" <jolsa@kernel.org>,
netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-doc@vger.kernel.org, linux-alpha@vger.kernel.org,
linux-mips@vger.kernel.org, linux-parisc@vger.kernel.org,
sparclinux@vger.kernel.org, linux-trace-kernel@vger.kernel.org,
linux-arch@vger.kernel.org, bpf@vger.kernel.org,
linux-kselftest@vger.kernel.org, linux-media@vger.kernel.org,
dri-devel@lists.freedesktop.org,
"David S. Miller" <davem@davemloft.net>,
"Eric Dumazet" <edumazet@google.com>,
"Jakub Kicinski" <kuba@kernel.org>,
"Paolo Abeni" <pabeni@redhat.com>,
"Jonathan Corbet" <corbet@lwn.net>,
"Richard Henderson" <richard.henderson@linaro.org>,
"Ivan Kokshaysky" <ink@jurassic.park.msu.ru>,
"Matt Turner" <mattst88@gmail.com>,
"Thomas Bogendoerfer" <tsbogend@alpha.franken.de>,
"James E.J. Bottomley" <James.Bottomley@hansenpartnership.com>,
"Helge Deller" <deller@gmx.de>,
"Jesper Dangaard Brouer" <hawk@kernel.org>,
"Ilias Apalodimas" <ilias.apalodimas@linaro.org>,
"Steven Rostedt" <rostedt@goodmis.org>,
"Masami Hiramatsu" <mhiramat@kernel.org>,
"Arnd Bergmann" <arnd@arndb.de>,
"Alexei Starovoitov" <ast@kernel.org>,
"Daniel Borkmann" <daniel@iogearbox.net>,
"Andrii Nakryiko" <andrii@kernel.org>,
"David Ahern" <dsahern@kernel.org>,
"Willem de Bruijn" <willemdebruijn.kernel@gmail.com>,
"Shuah Khan" <shuah@kernel.org>,
"Sumit Semwal" <sumit.semwal@linaro.org>,
"Christian König" <christian.koenig@amd.com>,
"David Wei" <dw@davidwei.uk>, "Jason Gunthorpe" <jgg@ziepe.ca>,
"Yunsheng Lin" <linyunsheng@huawei.com>,
"Shailend Chand" <shailend@google.com>,
"Harshitha Ramamurthy" <hramamurthy@google.com>,
"Shakeel Butt" <shakeelb@google.com>,
"Jeroen de Borst" <jeroendb@google.com>,
"Praveen Kaligineedi" <pkaligineedi@google.com>
Subject: Re: [RFC PATCH net-next v5 07/14] page_pool: devmem support
Date: Wed, 14 Feb 2024 15:30:25 +0000 [thread overview]
Message-ID: <c28e1f66-84c8-40f7-b200-f18bee06cb33@gmail.com> (raw)
In-Reply-To: <CAHS8izO2zARuMovrYU3kdwSXsQAM6+SajQjDT3ckSvVOfHwaCQ@mail.gmail.com>
On 2/13/24 21:11, Mina Almasry wrote:
> On Tue, Feb 13, 2024 at 5:28 AM Pavel Begunkov <asml.silence@gmail.com> wrote:
>>
...
>>
>> A bit of a churn with the padding and nesting net_iov but looks
>> sturdier. No duplication, and you can just check positions of the
>> structure instead of per-field NET_IOV_ASSERT_OFFSET, which you
>> have to not forget to update e.g. when adding a new field. Also,
>
> Yes, this is nicer. If possible I'll punt it to a minor cleanup as a
> follow up change. Logistically I think if this series need-not touch
> code outside of net/, that's better.
Outside of net it should only be a small change in struct page
layout, but otherwise with struct_group_tagged things like
page->pp_magic would still work. Anyway, I'm not insisting.
>> with the change __netmem_clear_lsb can return a pointer to that
>> structure, casting struct net_iov when it's a page is a bit iffy.
>>
>> And the next question would be whether it'd be a good idea to encode
>> iov vs page not by setting a bit but via one of the fields in the
>> structure, maybe pp_magic.
>>
>
> I will push back against this, for 2 reasons:
>
> 1. I think pp_magic's first 2 bits (and maybe more) are used by mm
> code and thus I think extending usage of pp_magic in this series is a
> bit iffy and I would like to avoid it. I just don't want to touch the
> semantics of struct page if I don't have to.
> 2. I think this will be a measurable perf regression. Currently we can
> tell if a pointer is a page or net_iov without dereferencing the
> pointer and dirtying the cache-line. This will cause us to possibly
> dereference the pointer in areas where we don't need to. I think I had
> an earlier version of this code that required a dereference to tell if
> a page was devmem and Eric pointed to me it was a perf regression.
fair enough
> I also don't see any upside of using pp_magic, other than making the
> code slightly more readable, maybe.
>
>> With that said I'm a bit concerned about the net_iov size. If each
>> represents 4096 bytes and you're registering 10MB, then you need
>> 30 pages worth of memory just for the iov array. Makes kvmalloc
>> a must even for relatively small sizes.
>>
>
> This I think is an age-old challenge with pages. 1.6% of the machine's
> memory is 'wasted' on every machine because a struct page needs to be
> allocated for each PAGE_SIZE region. We're running into the same issue
> here where if we want to refer to PAGE_SIZE regions of memory we need
> to allocate some reference to it. Note that net_iov can be relatively
> easily extended to support N order pages. Also note that in the devmem
> TCP use case it's not really an issue; the minor increase in mem
> utilization is more than offset by the saving in memory bw as compared
> to using host memory as a bounce buffer.
It's not about memory consumption per se but rather the need
to vmalloc everything because of size.
> All in all I vote this is
> something that can be tuned or improved in the future if someone finds
> the extra memory usage a hurdle to using devmem TCP or this net_iov
> infra.
That's exactly what I was saying about overlaying it with
struct page, where the increase in size came from, but I agree
it's not critical
>> And the final bit, I don't believe the overlay is necessary in
>> this series. Optimisations are great, but this one is a bit more on
>> the controversial side. Unless I missed something and it does make
>> things easier, it might make sense to do it separately later.
>>
>
> I completely agree, the overlay is not necessary. I implemented the
> overlay in response to Yunsheng's strong requests for more 'unified'
> processing between page and devmem. This is the most unification I can
> do IMO without violating the requirements from Jason. I'm prepared to
> remove the overlay if it turns out controversial, but so far I haven't
> seen any complaints. Jason, please do take a look if you have not
> already.
Just to be clear, I have no objections to the change but noting
that IMHO it can be removed for now if it'd be dragging down
the set.
--
Pavel Begunkov
next prev parent reply other threads:[~2024-02-14 15:31 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-12-18 2:40 [RFC PATCH net-next v5 00/14] Device Memory TCP Mina Almasry
2023-12-18 2:40 ` [RFC PATCH net-next v5 01/14] net: page_pool: create hooks for custom page providers Mina Almasry
2023-12-18 2:40 ` [RFC PATCH net-next v5 02/14] net: page_pool: factor out page_pool recycle check Mina Almasry
2023-12-18 2:40 ` [RFC PATCH net-next v5 03/14] net: netdev netlink api to bind dma-buf to a net device Mina Almasry
2023-12-18 2:40 ` [RFC PATCH net-next v5 04/14] netdev: support binding dma-buf to netdevice Mina Almasry
2023-12-18 2:40 ` [RFC PATCH net-next v5 05/14] netdev: netdevice devmem allocator Mina Almasry
2024-02-13 13:15 ` Pavel Begunkov
2024-02-13 20:01 ` Mina Almasry
2023-12-18 2:40 ` [RFC PATCH net-next v5 06/14] page_pool: convert to use netmem Mina Almasry
2023-12-18 2:40 ` [RFC PATCH net-next v5 07/14] page_pool: devmem support Mina Almasry
2024-02-13 13:18 ` Pavel Begunkov
2024-02-13 21:11 ` Mina Almasry
2024-02-14 15:30 ` Pavel Begunkov [this message]
2023-12-18 2:40 ` [RFC PATCH net-next v5 08/14] memory-provider: dmabuf devmem memory provider Mina Almasry
2024-02-13 13:19 ` Pavel Begunkov
2023-12-18 2:40 ` [RFC PATCH net-next v5 09/14] net: support non paged skb frags Mina Almasry
2023-12-18 2:40 ` [RFC PATCH net-next v5 10/14] net: add support for skbs with unreadable frags Mina Almasry
2023-12-18 2:40 ` [RFC PATCH net-next v5 11/14] tcp: RX path for devmem TCP Mina Almasry
2023-12-18 2:40 ` [RFC PATCH net-next v5 12/14] net: add SO_DEVMEM_DONTNEED setsockopt to release RX frags Mina Almasry
2023-12-18 2:40 ` [RFC PATCH net-next v5 13/14] net: add devmem TCP documentation Mina Almasry
2023-12-18 2:40 ` [RFC PATCH net-next v5 14/14] selftests: add ncdevmem, netcat for devmem TCP Mina Almasry
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=c28e1f66-84c8-40f7-b200-f18bee06cb33@gmail.com \
--to=asml.silence@gmail.com \
--cc=James.Bottomley@hansenpartnership.com \
--cc=almasrymina@google.com \
--cc=andrii@kernel.org \
--cc=arnd@arndb.de \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=christian.koenig@amd.com \
--cc=corbet@lwn.net \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=deller@gmx.de \
--cc=dri-devel@lists.freedesktop.org \
--cc=dsahern@kernel.org \
--cc=dw@davidwei.uk \
--cc=edumazet@google.com \
--cc=haoluo@google.com \
--cc=hawk@kernel.org \
--cc=hramamurthy@google.com \
--cc=ilias.apalodimas@linaro.org \
--cc=ink@jurassic.park.msu.ru \
--cc=jeroendb@google.com \
--cc=jgg@ziepe.ca \
--cc=john.fastabend@gmail.com \
--cc=jolsa@kernel.org \
--cc=kpsingh@kernel.org \
--cc=kuba@kernel.org \
--cc=linux-alpha@vger.kernel.org \
--cc=linux-arch@vger.kernel.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=linux-media@vger.kernel.org \
--cc=linux-mips@vger.kernel.org \
--cc=linux-parisc@vger.kernel.org \
--cc=linux-trace-kernel@vger.kernel.org \
--cc=linyunsheng@huawei.com \
--cc=martin.lau@linux.dev \
--cc=mathieu.desnoyers@efficios.com \
--cc=mattst88@gmail.com \
--cc=mhiramat@kernel.org \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=pkaligineedi@google.com \
--cc=richard.henderson@linaro.org \
--cc=rostedt@goodmis.org \
--cc=sdf@google.com \
--cc=shailend@google.com \
--cc=shakeelb@google.com \
--cc=shuah@kernel.org \
--cc=song@kernel.org \
--cc=sparclinux@vger.kernel.org \
--cc=sumit.semwal@linaro.org \
--cc=tsbogend@alpha.franken.de \
--cc=willemdebruijn.kernel@gmail.com \
--cc=yonghong.song@linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).