Linux-arch Archive mirror
 help / color / mirror / Atom feed
From: Pavel Begunkov <asml.silence@gmail.com>
To: Mina Almasry <almasrymina@google.com>
Cc: "Mathieu Desnoyers" <mathieu.desnoyers@efficios.com>,
	"Martin KaFai Lau" <martin.lau@linux.dev>,
	"Song Liu" <song@kernel.org>,
	"Yonghong Song" <yonghong.song@linux.dev>,
	"John Fastabend" <john.fastabend@gmail.com>,
	"KP Singh" <kpsingh@kernel.org>,
	"Stanislav Fomichev" <sdf@google.com>,
	"Hao Luo" <haoluo@google.com>, "Jiri Olsa" <jolsa@kernel.org>,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-doc@vger.kernel.org, linux-alpha@vger.kernel.org,
	linux-mips@vger.kernel.org, linux-parisc@vger.kernel.org,
	sparclinux@vger.kernel.org, linux-trace-kernel@vger.kernel.org,
	linux-arch@vger.kernel.org, bpf@vger.kernel.org,
	linux-kselftest@vger.kernel.org, linux-media@vger.kernel.org,
	dri-devel@lists.freedesktop.org,
	"David S. Miller" <davem@davemloft.net>,
	"Eric Dumazet" <edumazet@google.com>,
	"Jakub Kicinski" <kuba@kernel.org>,
	"Paolo Abeni" <pabeni@redhat.com>,
	"Jonathan Corbet" <corbet@lwn.net>,
	"Richard Henderson" <richard.henderson@linaro.org>,
	"Ivan Kokshaysky" <ink@jurassic.park.msu.ru>,
	"Matt Turner" <mattst88@gmail.com>,
	"Thomas Bogendoerfer" <tsbogend@alpha.franken.de>,
	"James E.J. Bottomley" <James.Bottomley@hansenpartnership.com>,
	"Helge Deller" <deller@gmx.de>,
	"Jesper Dangaard Brouer" <hawk@kernel.org>,
	"Ilias Apalodimas" <ilias.apalodimas@linaro.org>,
	"Steven Rostedt" <rostedt@goodmis.org>,
	"Masami Hiramatsu" <mhiramat@kernel.org>,
	"Arnd Bergmann" <arnd@arndb.de>,
	"Alexei Starovoitov" <ast@kernel.org>,
	"Daniel Borkmann" <daniel@iogearbox.net>,
	"Andrii Nakryiko" <andrii@kernel.org>,
	"David Ahern" <dsahern@kernel.org>,
	"Willem de Bruijn" <willemdebruijn.kernel@gmail.com>,
	"Shuah Khan" <shuah@kernel.org>,
	"Sumit Semwal" <sumit.semwal@linaro.org>,
	"Christian König" <christian.koenig@amd.com>,
	"David Wei" <dw@davidwei.uk>, "Jason Gunthorpe" <jgg@ziepe.ca>,
	"Yunsheng Lin" <linyunsheng@huawei.com>,
	"Shailend Chand" <shailend@google.com>,
	"Harshitha Ramamurthy" <hramamurthy@google.com>,
	"Shakeel Butt" <shakeelb@google.com>,
	"Jeroen de Borst" <jeroendb@google.com>,
	"Praveen Kaligineedi" <pkaligineedi@google.com>
Subject: Re: [RFC PATCH net-next v5 07/14] page_pool: devmem support
Date: Wed, 14 Feb 2024 15:30:25 +0000	[thread overview]
Message-ID: <c28e1f66-84c8-40f7-b200-f18bee06cb33@gmail.com> (raw)
In-Reply-To: <CAHS8izO2zARuMovrYU3kdwSXsQAM6+SajQjDT3ckSvVOfHwaCQ@mail.gmail.com>

On 2/13/24 21:11, Mina Almasry wrote:
> On Tue, Feb 13, 2024 at 5:28 AM Pavel Begunkov <asml.silence@gmail.com> wrote:
>>
...
>>
>> A bit of a churn with the padding and nesting net_iov but looks
>> sturdier. No duplication, and you can just check positions of the
>> structure instead of per-field NET_IOV_ASSERT_OFFSET, which you
>> have to not forget to update e.g. when adding a new field. Also,
> 
> Yes, this is nicer. If possible I'll punt it to a minor cleanup as a
> follow up change. Logistically I think if this series need-not touch
> code outside of net/, that's better.

Outside of net it should only be a small change in struct page
layout, but otherwise with struct_group_tagged things like
page->pp_magic would still work. Anyway, I'm not insisting.


>> with the change __netmem_clear_lsb can return a pointer to that
>> structure, casting struct net_iov when it's a page is a bit iffy.
>>
>> And the next question would be whether it'd be a good idea to encode
>> iov vs page not by setting a bit but via one of the fields in the
>> structure, maybe pp_magic.
>>
> 
> I will push back against this, for 2 reasons:
> 
> 1. I think pp_magic's first 2 bits (and maybe more) are used by mm
> code and thus I think extending usage of pp_magic in this series is a
> bit iffy and I would like to avoid it. I just don't want to touch the
> semantics of struct page if I don't have to.
> 2. I think this will be a measurable perf regression. Currently we can
> tell if a pointer is a page or net_iov without dereferencing the
> pointer and dirtying the cache-line. This will cause us to possibly
> dereference the pointer in areas where we don't need to. I think I had
> an earlier version of this code that required a dereference to tell if
> a page was devmem and Eric pointed to me it was a perf regression.

fair enough

> I also don't see any upside of using pp_magic, other than making the
> code slightly more readable, maybe.
> 
>> With that said I'm a bit concerned about the net_iov size. If each
>> represents 4096 bytes and you're registering 10MB, then you need
>> 30 pages worth of memory just for the iov array. Makes kvmalloc
>> a must even for relatively small sizes.
>>
> 
> This I think is an age-old challenge with pages. 1.6% of the machine's
> memory is 'wasted' on every machine because a struct page needs to be
> allocated for each PAGE_SIZE region. We're running into the same issue
> here where if we want to refer to PAGE_SIZE regions of memory we need
> to allocate some reference to it. Note that net_iov can be relatively
> easily extended to support N order pages. Also note that in the devmem
> TCP use case it's not really an issue; the minor increase in mem
> utilization is more than offset by the saving in memory bw as compared
> to using host memory as a bounce buffer.

It's not about memory consumption per se but rather the need
to vmalloc everything because of size.

> All in all I vote this is
> something that can be tuned or improved in the future if someone finds
> the extra memory usage a hurdle to using devmem TCP or this net_iov
> infra.

That's exactly what I was saying about overlaying it with
struct page, where the increase in size came from, but I agree
it's not critical

>> And the final bit, I don't believe the overlay is necessary in
>> this series. Optimisations are great, but this one is a bit more on
>> the controversial side. Unless I missed something and it does make
>> things easier, it might make sense to do it separately later.
>>
> 
> I completely agree, the overlay is not necessary. I implemented the
> overlay in response to Yunsheng's  strong requests for more 'unified'
> processing between page and devmem. This is the most unification I can
> do IMO without violating the requirements from Jason. I'm prepared to
> remove the overlay if it turns out controversial, but so far I haven't
> seen any complaints. Jason, please do take a look if you have not
> already.

Just to be clear, I have no objections to the change but noting
that IMHO it can be removed for now if it'd be dragging down
the set.

-- 
Pavel Begunkov

  reply	other threads:[~2024-02-14 15:31 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-12-18  2:40 [RFC PATCH net-next v5 00/14] Device Memory TCP Mina Almasry
2023-12-18  2:40 ` [RFC PATCH net-next v5 01/14] net: page_pool: create hooks for custom page providers Mina Almasry
2023-12-18  2:40 ` [RFC PATCH net-next v5 02/14] net: page_pool: factor out page_pool recycle check Mina Almasry
2023-12-18  2:40 ` [RFC PATCH net-next v5 03/14] net: netdev netlink api to bind dma-buf to a net device Mina Almasry
2023-12-18  2:40 ` [RFC PATCH net-next v5 04/14] netdev: support binding dma-buf to netdevice Mina Almasry
2023-12-18  2:40 ` [RFC PATCH net-next v5 05/14] netdev: netdevice devmem allocator Mina Almasry
2024-02-13 13:15   ` Pavel Begunkov
2024-02-13 20:01     ` Mina Almasry
2023-12-18  2:40 ` [RFC PATCH net-next v5 06/14] page_pool: convert to use netmem Mina Almasry
2023-12-18  2:40 ` [RFC PATCH net-next v5 07/14] page_pool: devmem support Mina Almasry
2024-02-13 13:18   ` Pavel Begunkov
2024-02-13 21:11     ` Mina Almasry
2024-02-14 15:30       ` Pavel Begunkov [this message]
2023-12-18  2:40 ` [RFC PATCH net-next v5 08/14] memory-provider: dmabuf devmem memory provider Mina Almasry
2024-02-13 13:19   ` Pavel Begunkov
2023-12-18  2:40 ` [RFC PATCH net-next v5 09/14] net: support non paged skb frags Mina Almasry
2023-12-18  2:40 ` [RFC PATCH net-next v5 10/14] net: add support for skbs with unreadable frags Mina Almasry
2023-12-18  2:40 ` [RFC PATCH net-next v5 11/14] tcp: RX path for devmem TCP Mina Almasry
2023-12-18  2:40 ` [RFC PATCH net-next v5 12/14] net: add SO_DEVMEM_DONTNEED setsockopt to release RX frags Mina Almasry
2023-12-18  2:40 ` [RFC PATCH net-next v5 13/14] net: add devmem TCP documentation Mina Almasry
2023-12-18  2:40 ` [RFC PATCH net-next v5 14/14] selftests: add ncdevmem, netcat for devmem TCP Mina Almasry

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c28e1f66-84c8-40f7-b200-f18bee06cb33@gmail.com \
    --to=asml.silence@gmail.com \
    --cc=James.Bottomley@hansenpartnership.com \
    --cc=almasrymina@google.com \
    --cc=andrii@kernel.org \
    --cc=arnd@arndb.de \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=christian.koenig@amd.com \
    --cc=corbet@lwn.net \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=deller@gmx.de \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=dsahern@kernel.org \
    --cc=dw@davidwei.uk \
    --cc=edumazet@google.com \
    --cc=haoluo@google.com \
    --cc=hawk@kernel.org \
    --cc=hramamurthy@google.com \
    --cc=ilias.apalodimas@linaro.org \
    --cc=ink@jurassic.park.msu.ru \
    --cc=jeroendb@google.com \
    --cc=jgg@ziepe.ca \
    --cc=john.fastabend@gmail.com \
    --cc=jolsa@kernel.org \
    --cc=kpsingh@kernel.org \
    --cc=kuba@kernel.org \
    --cc=linux-alpha@vger.kernel.org \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linux-media@vger.kernel.org \
    --cc=linux-mips@vger.kernel.org \
    --cc=linux-parisc@vger.kernel.org \
    --cc=linux-trace-kernel@vger.kernel.org \
    --cc=linyunsheng@huawei.com \
    --cc=martin.lau@linux.dev \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mattst88@gmail.com \
    --cc=mhiramat@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=pkaligineedi@google.com \
    --cc=richard.henderson@linaro.org \
    --cc=rostedt@goodmis.org \
    --cc=sdf@google.com \
    --cc=shailend@google.com \
    --cc=shakeelb@google.com \
    --cc=shuah@kernel.org \
    --cc=song@kernel.org \
    --cc=sparclinux@vger.kernel.org \
    --cc=sumit.semwal@linaro.org \
    --cc=tsbogend@alpha.franken.de \
    --cc=willemdebruijn.kernel@gmail.com \
    --cc=yonghong.song@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).