Linux-mm Archive mirror
 help / color / mirror / Atom feed
From: Borislav Petkov <bp@alien8.de>
To: Axel Rasmussen <axelrasmussen@google.com>
Cc: Oscar Salvador <osalvador@suse.de>,
	Andrew Morton <akpm@linux-foundation.org>,
	Andy Lutomirski <luto@kernel.org>,
	"Aneesh Kumar K.V" <aneesh.kumar@kernel.org>,
	Christophe Leroy <christophe.leroy@csgroup.eu>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	David Hildenbrand <david@redhat.com>,
	"H. Peter Anvin" <hpa@zytor.com>, Helge Deller <deller@gmx.de>,
	Ingo Molnar <mingo@redhat.com>,
	"James E.J. Bottomley" <James.Bottomley@hansenpartnership.com>,
	John Hubbard <jhubbard@nvidia.com>,
	Liu Shixin <liushixin2@huawei.com>,
	"Matthew Wilcox (Oracle)" <willy@infradead.org>,
	Michael Ellerman <mpe@ellerman.id.au>,
	Muchun Song <muchun.song@linux.dev>,
	"Naveen N. Rao" <naveen.n.rao@linux.ibm.com>,
	Nicholas Piggin <npiggin@gmail.com>, Peter Xu <peterx@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Suren Baghdasaryan <surenb@google.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	linux-parisc@vger.kernel.org, linuxppc-dev@lists.ozlabs.org,
	x86@kernel.org
Subject: Re: [PATCH v2 1/1] arch/fault: don't print logs for pte marker poison errors
Date: Wed, 15 May 2024 22:18:31 +0200	[thread overview]
Message-ID: <20240515201831.GDZkUYlybfejSh79ix@fat_crate.local> (raw)
In-Reply-To: <CAJHvVcj+YBpLbjLy+M+b8K7fj0XvFSZLpsuY-RbCCn9ouV1WjQ@mail.gmail.com>

On Wed, May 15, 2024 at 12:19:16PM -0700, Axel Rasmussen wrote:
> An unprivileged process can allocate a VMA, use the userfaultfd API to
> install one of these PTE markers, and then register a no-op SIGBUS
> handler. Now it can access that address in a tight loop,

Maybe the userfaultfd should not allow this, I dunno. You made me look
at this thing and to me it all sounds weird. One thread does page fault
handling for the other and that helps with live migration somehow. OMG,
whaaat?

Maybe I don't understand it and probably never will...

But, for example, membarrier used do to a stupid thing of allowing one
thread to hammer another with an IPI storm. Bad bad idea. So it got
fixed.

All I'm saying is, if unprivileged processes can do crap, they should be
prevented from doing crap. Like ratelimiting the pagefaults or whatnot.

One of the recovery action strategies from memory poison is, well, you
kill the process. If you can detect the hammering process which
installed that page marker, you kill it. Problem solved.

But again, this userfaultfd thing sounds really weird so I could very
well be way wrong.

> Even in a non-contrived / non-malicious case, use of this API could
> have similar effects. If nothing else, the log message can be
> confusing to administrators: they state that an MCE occurred, whereas
> with the simulated poison API, this is not the case; it isn't a "real"
> MCE / hardware error.

Yeah, I read that part in

Documentation/admin-guide/mm/userfaultfd.rst

Simulated poison huh? Another WTF.

> In the KVM use case, the host can't just allocate a new page, because
> it doesn't know what the guest might have had stored there. Best we

Ok, let's think of real hw poison.

When doing the recovery, you don't care what's stored there because as
far as the hardware is concerned, if you consume that poison the *whole*
machine might go down.

So you lose the page. Plain and simple. And the guest can go visit the
bureau of complaints and grievances.

Still better than killing the guest or even the whole host with other
guests running on it.

> can do is propagate the poison into the guest, and let the guest OS
> deal with it as it sees fit, and mark the page poisoned on the host.

You mark the page as poison on the host and you yank it from under the
guest. That physical frame is gone and the faster all the actors
involved understand that, the better.

> I don't disagree the guest *shouldn't* reaccess it in this case. :)
> But if it did, it should get another poison event just as you say.

Yes, it shouldn't. Look at memory_failure(). This will kill whole
processes if it has to, depending on what the page is used for.

> And, live migration between physical hosts should be transparent to
> the guest. So if the guest gets a poison, and then we live migrate it,

So if I were to design this, I'd do it this way:

0. guest gets hw poison injected

1. it runs memory_failure() and it kills the processes using the page.

2. page is marked poisoned on the host so no other guest gets it.

That's it. No second accesses whatsoever. At least this is how it works
on baremetal.

This hw poisoning emulation is just silly and unnecessary.

But again, I probably am missing some aspects. It all just sounded
really weird to me that's why I thought I should ask what's behind all
that.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette


  reply	other threads:[~2024-05-15 20:19 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-10 18:29 [PATCH v2 0/1] arch/fault: don't print logs for simulated poison errors Axel Rasmussen
2024-05-10 18:29 ` [PATCH v2 1/1] arch/fault: don't print logs for pte marker " Axel Rasmussen
2024-05-10 19:29   ` Peter Xu
2024-05-14 20:26     ` Oscar Salvador
2024-05-14 21:34       ` Peter Xu
2024-05-15 10:21         ` Oscar Salvador
2024-05-22 21:46           ` Peter Xu
2024-05-23  3:08             ` Oscar Salvador
2024-05-23 15:59               ` Peter Xu
2024-05-15 10:41   ` Borislav Petkov
2024-05-15 10:54     ` Oscar Salvador
2024-05-15 17:33       ` Axel Rasmussen
2024-05-15 18:32         ` Borislav Petkov
2024-05-15 19:19           ` Axel Rasmussen
2024-05-15 20:18             ` Borislav Petkov [this message]
2024-05-16 20:28               ` Axel Rasmussen
2024-05-22 22:03               ` Peter Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240515201831.GDZkUYlybfejSh79ix@fat_crate.local \
    --to=bp@alien8.de \
    --cc=James.Bottomley@hansenpartnership.com \
    --cc=akpm@linux-foundation.org \
    --cc=aneesh.kumar@kernel.org \
    --cc=axelrasmussen@google.com \
    --cc=christophe.leroy@csgroup.eu \
    --cc=dave.hansen@linux.intel.com \
    --cc=david@redhat.com \
    --cc=deller@gmx.de \
    --cc=hpa@zytor.com \
    --cc=jhubbard@nvidia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-parisc@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=liushixin2@huawei.com \
    --cc=luto@kernel.org \
    --cc=mingo@redhat.com \
    --cc=mpe@ellerman.id.au \
    --cc=muchun.song@linux.dev \
    --cc=naveen.n.rao@linux.ibm.com \
    --cc=npiggin@gmail.com \
    --cc=osalvador@suse.de \
    --cc=peterx@redhat.com \
    --cc=peterz@infradead.org \
    --cc=surenb@google.com \
    --cc=tglx@linutronix.de \
    --cc=willy@infradead.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).