Linux-Modules Archive mirror
 help / color / mirror / Atom feed
From: Luis Chamberlain <mcgrof@kernel.org>
To: "Edgecombe, Rick P" <rick.p.edgecombe@intel.com>
Cc: "keescook@chromium.org" <keescook@chromium.org>,
	"hch@infradead.org" <hch@infradead.org>,
	"prarit@redhat.com" <prarit@redhat.com>,
	"rppt@kernel.org" <rppt@kernel.org>,
	"catalin.marinas@arm.com" <catalin.marinas@arm.com>,
	"Torvalds, Linus" <torvalds@linux-foundation.org>,
	"willy@infradead.org" <willy@infradead.org>,
	"song@kernel.org" <song@kernel.org>,
	"patches@lists.linux.dev" <patches@lists.linux.dev>,
	"pmladek@suse.com" <pmladek@suse.com>,
	"david@redhat.com" <david@redhat.com>,
	"colin.i.king@gmail.com" <colin.i.king@gmail.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"dave.hansen@linux.intel.com" <dave.hansen@linux.intel.com>,
	"jim.cromie@gmail.com" <jim.cromie@gmail.com>,
	"vbabka@suse.cz" <vbabka@suse.cz>,
	"christophe.leroy@csgroup.eu" <christophe.leroy@csgroup.eu>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"tglx@linutronix.de" <tglx@linutronix.de>,
	"jbaron@akamai.com" <jbaron@akamai.com>,
	"peterz@infradead.org" <peterz@infradead.org>,
	"linux-modules@vger.kernel.org" <linux-modules@vger.kernel.org>,
	"gregkh@linuxfoundation.org" <gregkh@linuxfoundation.org>,
	"petr.pavlu@suse.com" <petr.pavlu@suse.com>,
	"rafael@kernel.org" <rafael@kernel.org>,
	"Hocko, Michal" <mhocko@suse.com>,
	"dave@stgolabs.net" <dave@stgolabs.net>
Subject: Re: [RFC 2/2] kread: avoid duplicates
Date: Tue, 18 Apr 2023 11:46:13 -0700	[thread overview]
Message-ID: <ZD7ldcZoWfeN7poU@bombadil.infradead.org> (raw)
In-Reply-To: <ZD3DYqYE4DOiJQaS@bombadil.infradead.org>

On Mon, Apr 17, 2023 at 03:08:34PM -0700, Luis Chamberlain wrote:
> On Mon, Apr 17, 2023 at 05:33:49PM +0000, Edgecombe, Rick P wrote:
> > On Sat, 2023-04-15 at 23:41 -0700, Luis Chamberlain wrote:
> > > On Sat, Apr 15, 2023 at 11:04:12PM -0700, Christoph Hellwig wrote:
> > > > On Thu, Apr 13, 2023 at 10:28:40PM -0700, Luis Chamberlain wrote:
> > > > > With this we run into 0 wasted virtual memory bytes.
> > > > 
> > > > Avoid what duplicates?
> > > 
> > > David Hildenbrand had reported that with over 400 CPUs vmap space
> > > runs out and it seems it was related to module loading. I took a
> > > look and confirmed it. Module loading ends up requiring in the
> > > worst case 3 vmalloc allocations, so typically at least twice
> > > the size of the module size and in the worst case just add
> > > the decompressed module size:
> > > 
> > > a) initial kernel_read*() call
> > > b) optional module decompression
> > > c) the actual module data copy we will keep
> > > 
> > > Duplicate module requests that come from userspace end up being
> > > thrown
> > > in the trash bin, as only one module will be allocated.  Although
> > > there
> > > are checks for a module prior to requesting a module udev still
> > > doesn't
> > > do the best of a job to avoid that and so we end up with tons of
> > > duplicate module requests. We're talking about gigabytes of vmalloc
> > > bytes just lost because of this for large systems and megabytes for
> > > average systems. So for example with just 255 CPUs we can loose about
> > > 13.58 GiB, and for 8 CPUs about 226.53 MiB.
> > > 
> > > I have patches to curtail 1/2 of that space by doing a check in
> > > kernel
> > > before we do the allocation in c) if the module is already present.
> > > For
> > > a) it is harder because userspace just passes a file descriptor. But
> > > since we can get the file path without the vmalloc this RFC suggest
> > > maybe we can add a new kernel_read*() for module loading where it
> > > makes
> > > sense to have only one read happen at a time.
> > 
> > I'm wondering how difficult it would be to just try to remove the
> > vmallocs in (a) and (b) and operate on a list of pages.
> 
> Yes I think it's worth long term to do that, if possible with seq reads.

OK here's what I suggest we do then:

I'll resubmit the first patch which allows us to prove / disprove if
module-autoloading is the culprit. With that in place folks can debug
their setup and verify how udev is to blame.

I'll drop the second kernel_read*() patch / effort and punt this as a
userspace problem as this is also not extremely pressing.

Long term should evaluate how we can avoid vmalloc for the kread and
module decompression.

If this really becomes a pressing issue we can revisit if we want an in
kernel solution, but at this point that likely would be systems with
over 400-500 CPUs with KASAN enabled. Without KASAN the issue should
eventually trigger if you're enablig modules but its hard to say at what
point you'd hit this issue.

  Luis

  reply	other threads:[~2023-04-18 18:46 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-14  5:28 [RFC 0/2] module: fix virtual memory wasted on finit_module() Luis Chamberlain
2023-04-14  5:28 ` [RFC 1/2] module: add debugging auto-load duplicate module support Luis Chamberlain
2023-04-14  5:28 ` [RFC 2/2] kread: avoid duplicates Luis Chamberlain
2023-04-14  6:35   ` Greg KH
2023-04-14 16:35     ` Luis Chamberlain
2023-04-16  6:04   ` Christoph Hellwig
2023-04-16  6:41     ` Luis Chamberlain
2023-04-16 12:50       ` Greg KH
2023-04-16 18:46         ` Luis Chamberlain
2023-04-17  6:05           ` Greg KH
2023-04-17 22:05             ` Luis Chamberlain
2023-04-17 17:33       ` Edgecombe, Rick P
2023-04-17 22:08         ` Luis Chamberlain
2023-04-18 18:46           ` Luis Chamberlain [this message]
2023-04-14 17:25 ` [RFC 0/2] module: fix virtual memory wasted on finit_module() Luis Chamberlain

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZD7ldcZoWfeN7poU@bombadil.infradead.org \
    --to=mcgrof@kernel.org \
    --cc=catalin.marinas@arm.com \
    --cc=christophe.leroy@csgroup.eu \
    --cc=colin.i.king@gmail.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=dave@stgolabs.net \
    --cc=david@redhat.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=hch@infradead.org \
    --cc=jbaron@akamai.com \
    --cc=jim.cromie@gmail.com \
    --cc=keescook@chromium.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-modules@vger.kernel.org \
    --cc=mhocko@suse.com \
    --cc=patches@lists.linux.dev \
    --cc=peterz@infradead.org \
    --cc=petr.pavlu@suse.com \
    --cc=pmladek@suse.com \
    --cc=prarit@redhat.com \
    --cc=rafael@kernel.org \
    --cc=rick.p.edgecombe@intel.com \
    --cc=rppt@kernel.org \
    --cc=song@kernel.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).