All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
From: Catalin Marinas <catalin.marinas@arm.com>
To: Ryan Roberts <ryan.roberts@arm.com>
Cc: David Hildenbrand <david@redhat.com>,
	Will Deacon <will@kernel.org>, Joey Gouly <joey.gouly@arm.com>,
	Ard Biesheuvel <ardb@kernel.org>,
	Mark Rutland <mark.rutland@arm.com>,
	Anshuman Khandual <anshuman.khandual@arm.com>,
	Peter Xu <peterx@redhat.com>, Mike Rapoport <rppt@linux.ibm.com>,
	Shivansh Vij <shivanshvij@outlook.com>,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH v1 1/2] arm64/mm: Move PTE_PROT_NONE and PMD_PRESENT_INVALID
Date: Mon, 29 Apr 2024 13:38:33 +0100	[thread overview]
Message-ID: <Zi-UyS5IC_truh8M@arm.com> (raw)
In-Reply-To: <f5de5685-d955-4aa0-a307-a4da927f36f0@arm.com>

On Mon, Apr 29, 2024 at 11:04:53AM +0100, Ryan Roberts wrote:
> On 26/04/2024 15:48, Catalin Marinas wrote:
> > On Thu, Apr 25, 2024 at 11:37:42AM +0100, Ryan Roberts wrote:
> >> Also, IMHO we shouldn't really need to reserve PMD_PRESENT_INVALID for swap
> >> ptes; it would be cleaner to have one bit that defines "present" when valid is
> >> clear (similar to PTE_PROT_NONE today) then another bit which is only defined
> >> when "present && !valid" which tells us if this is PTE_PROT_NONE or
> >> PMD_PRESENT_INVALID (I don't think you can ever have both at the same time?).
> > 
> > I think this make sense, maybe rename the above to PTE_PRESENT_INVALID
> > and use it for both ptes and pmds.
> 
> Yep, sounds good. I've already got a patch to do this, but it's exposed a bug in
> core-mm so will now fix that before I can validate my change. see
> https://lore.kernel.org/linux-arm-kernel/ZiuyGXt0XWwRgFh9@x1n/
> 
> With this in place, I'm proposing to remove PTE_PROT_NONE entirely and instead
> represent PROT_NONE as a present but invalid pte (PTE_VALID=0, PTE_INVALID=1)
> with both PTE_WRITE=0 and PTE_RDONLY=0.
> 
> While the HW would interpret PTE_WRITE=0/PTE_RDONLY=0 as "RW without dirty bit
> modification", this is not a problem as the pte is invalid, so the HW doesn't
> interpret it. And SW always uses the PTE_WRITE bit to interpret the writability
> of the pte. So PTE_WRITE=0/PTE_RDONLY=0 was previously an unused combination
> that we now repurpose for PROT_NONE.

Why not just keep the bits currently in PAGE_NONE (PTE_RDONLY would be
set) and check PTE_USER|PTE_UXN == 0b01 which is a unique combination
for PAGE_NONE (bar the kernel mappings).

For ptes, it doesn't matter, we can assume that PTE_PRESENT_INVALID
means pte_protnone(). For pmds, however, we can end up with
pmd_protnone(pmd_mkinvalid(pmd)) == true for any of the PAGE_*
permissions encoded into a valid pmd. That's where a dedicated
PTE_PROT_NONE bit helped.

Let's say a CPU starts splitting a pmd and does a pmdp_invalidate*()
first to set PTE_PRESENT_INVALID. A different CPU gets a fault and since
the pmd is present, it goes and checks pmd_protnone() which returns
true, ending up on do_huge_pmd_numa_page() path. Maybe some locks help
but it looks fragile to rely on them.

So I think for protnone we need to check some other bits (like USER and
UXN) in addition to PTE_PRESENT_INVALID.

> This will subtly change behaviour in an edge case though. Imagine:
> 
> pte_t pte;
> 
> pte = pte_modify(pte, PAGE_NONE);
> pte = pte_mkwrite_novma(pte);
> WARN_ON(pte_protnone(pte));
> 
> Should that warning fire or not? Previously, because we had a dedicated bit for
> PTE_PROT_NONE it would fire. With my proposed change it will not fire. To me
> it's more intuitive if it doesn't fire. Regardless there is no core code that
> ever does this. Once you have a protnone pte, its terminal - nothing ever
> modifies it with these helpers AFAICS.

I don't think any core code should try to make page a PAGE_NONE pte
writeable.

> Personally I think this is a nice tidy up that saves a SW bit in both present
> and swap ptes. What do you think? (I'll just post the series if its easier to
> provide feedback in that context).

It would be nice to tidy this up and get rid of PTE_PROT_NONE as long as
it doesn't affect the pmd case I mentioned above.

> >> But there is a problem with this: __split_huge_pmd_locked() calls
> >> pmdp_invalidate() for a pmd before it determines that it is pmd_present(). So
> >> the PMD_PRESENT_INVALID can be set in a swap pte today. That feels wrong to me,
> >> but was trying to avoid the whole thing unravelling so didn't persue.
> > 
> > Maybe what's wrong is the arm64 implementation setting this bit on a
> > swap/migration pmd (though we could handle this in the core code as
> > well, it depends what the other architectures do). The only check for
> > the PMD_PRESENT_INVALID bit is in the arm64 code and it can be absorbed
> > into the pmd_present() check. I think it is currently broken as
> > pmd_present() can return true for a swap pmd after pmd_mkinvalid().
> 
> I've posted a fix here:
> https://lore.kernel.org/linux-mm/20240425170704.3379492-1-ryan.roberts@arm.com/
> 
> My position is that you shouldn't be calling pmd_mkinvalid() on a non-present pmd.

I agree, thanks.

-- 
Catalin

WARNING: multiple messages have this Message-ID (diff)
From: Catalin Marinas <catalin.marinas@arm.com>
To: Ryan Roberts <ryan.roberts@arm.com>
Cc: David Hildenbrand <david@redhat.com>,
	Will Deacon <will@kernel.org>, Joey Gouly <joey.gouly@arm.com>,
	Ard Biesheuvel <ardb@kernel.org>,
	Mark Rutland <mark.rutland@arm.com>,
	Anshuman Khandual <anshuman.khandual@arm.com>,
	Peter Xu <peterx@redhat.com>, Mike Rapoport <rppt@linux.ibm.com>,
	Shivansh Vij <shivanshvij@outlook.com>,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH v1 1/2] arm64/mm: Move PTE_PROT_NONE and PMD_PRESENT_INVALID
Date: Mon, 29 Apr 2024 13:38:33 +0100	[thread overview]
Message-ID: <Zi-UyS5IC_truh8M@arm.com> (raw)
In-Reply-To: <f5de5685-d955-4aa0-a307-a4da927f36f0@arm.com>

On Mon, Apr 29, 2024 at 11:04:53AM +0100, Ryan Roberts wrote:
> On 26/04/2024 15:48, Catalin Marinas wrote:
> > On Thu, Apr 25, 2024 at 11:37:42AM +0100, Ryan Roberts wrote:
> >> Also, IMHO we shouldn't really need to reserve PMD_PRESENT_INVALID for swap
> >> ptes; it would be cleaner to have one bit that defines "present" when valid is
> >> clear (similar to PTE_PROT_NONE today) then another bit which is only defined
> >> when "present && !valid" which tells us if this is PTE_PROT_NONE or
> >> PMD_PRESENT_INVALID (I don't think you can ever have both at the same time?).
> > 
> > I think this make sense, maybe rename the above to PTE_PRESENT_INVALID
> > and use it for both ptes and pmds.
> 
> Yep, sounds good. I've already got a patch to do this, but it's exposed a bug in
> core-mm so will now fix that before I can validate my change. see
> https://lore.kernel.org/linux-arm-kernel/ZiuyGXt0XWwRgFh9@x1n/
> 
> With this in place, I'm proposing to remove PTE_PROT_NONE entirely and instead
> represent PROT_NONE as a present but invalid pte (PTE_VALID=0, PTE_INVALID=1)
> with both PTE_WRITE=0 and PTE_RDONLY=0.
> 
> While the HW would interpret PTE_WRITE=0/PTE_RDONLY=0 as "RW without dirty bit
> modification", this is not a problem as the pte is invalid, so the HW doesn't
> interpret it. And SW always uses the PTE_WRITE bit to interpret the writability
> of the pte. So PTE_WRITE=0/PTE_RDONLY=0 was previously an unused combination
> that we now repurpose for PROT_NONE.

Why not just keep the bits currently in PAGE_NONE (PTE_RDONLY would be
set) and check PTE_USER|PTE_UXN == 0b01 which is a unique combination
for PAGE_NONE (bar the kernel mappings).

For ptes, it doesn't matter, we can assume that PTE_PRESENT_INVALID
means pte_protnone(). For pmds, however, we can end up with
pmd_protnone(pmd_mkinvalid(pmd)) == true for any of the PAGE_*
permissions encoded into a valid pmd. That's where a dedicated
PTE_PROT_NONE bit helped.

Let's say a CPU starts splitting a pmd and does a pmdp_invalidate*()
first to set PTE_PRESENT_INVALID. A different CPU gets a fault and since
the pmd is present, it goes and checks pmd_protnone() which returns
true, ending up on do_huge_pmd_numa_page() path. Maybe some locks help
but it looks fragile to rely on them.

So I think for protnone we need to check some other bits (like USER and
UXN) in addition to PTE_PRESENT_INVALID.

> This will subtly change behaviour in an edge case though. Imagine:
> 
> pte_t pte;
> 
> pte = pte_modify(pte, PAGE_NONE);
> pte = pte_mkwrite_novma(pte);
> WARN_ON(pte_protnone(pte));
> 
> Should that warning fire or not? Previously, because we had a dedicated bit for
> PTE_PROT_NONE it would fire. With my proposed change it will not fire. To me
> it's more intuitive if it doesn't fire. Regardless there is no core code that
> ever does this. Once you have a protnone pte, its terminal - nothing ever
> modifies it with these helpers AFAICS.

I don't think any core code should try to make page a PAGE_NONE pte
writeable.

> Personally I think this is a nice tidy up that saves a SW bit in both present
> and swap ptes. What do you think? (I'll just post the series if its easier to
> provide feedback in that context).

It would be nice to tidy this up and get rid of PTE_PROT_NONE as long as
it doesn't affect the pmd case I mentioned above.

> >> But there is a problem with this: __split_huge_pmd_locked() calls
> >> pmdp_invalidate() for a pmd before it determines that it is pmd_present(). So
> >> the PMD_PRESENT_INVALID can be set in a swap pte today. That feels wrong to me,
> >> but was trying to avoid the whole thing unravelling so didn't persue.
> > 
> > Maybe what's wrong is the arm64 implementation setting this bit on a
> > swap/migration pmd (though we could handle this in the core code as
> > well, it depends what the other architectures do). The only check for
> > the PMD_PRESENT_INVALID bit is in the arm64 code and it can be absorbed
> > into the pmd_present() check. I think it is currently broken as
> > pmd_present() can return true for a swap pmd after pmd_mkinvalid().
> 
> I've posted a fix here:
> https://lore.kernel.org/linux-mm/20240425170704.3379492-1-ryan.roberts@arm.com/
> 
> My position is that you shouldn't be calling pmd_mkinvalid() on a non-present pmd.

I agree, thanks.

-- 
Catalin

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2024-04-29 12:38 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-24 11:10 [PATCH v1 0/2] arm64/mm: Enable userfaultfd write-protect Ryan Roberts
2024-04-24 11:10 ` Ryan Roberts
2024-04-24 11:10 ` [PATCH v1 1/2] arm64/mm: Move PTE_PROT_NONE and PMD_PRESENT_INVALID Ryan Roberts
2024-04-24 11:10   ` Ryan Roberts
2024-04-24 16:43   ` Catalin Marinas
2024-04-24 16:43     ` Catalin Marinas
2024-04-25  8:40     ` Ryan Roberts
2024-04-25  8:40       ` Ryan Roberts
2024-04-25  9:16   ` David Hildenbrand
2024-04-25  9:16     ` David Hildenbrand
2024-04-25 10:29     ` Ryan Roberts
2024-04-25 10:29       ` Ryan Roberts
2024-04-25 10:37       ` Ryan Roberts
2024-04-25 10:37         ` Ryan Roberts
2024-04-26 14:48         ` Catalin Marinas
2024-04-26 14:48           ` Catalin Marinas
2024-04-29 10:04           ` Ryan Roberts
2024-04-29 10:04             ` Ryan Roberts
2024-04-29 12:38             ` Catalin Marinas [this message]
2024-04-29 12:38               ` Catalin Marinas
2024-04-29 13:01               ` Ryan Roberts
2024-04-29 13:01                 ` Ryan Roberts
2024-04-29 13:23                 ` Ryan Roberts
2024-04-29 13:23                   ` Ryan Roberts
2024-04-29 14:18                   ` Catalin Marinas
2024-04-29 14:18                     ` Catalin Marinas
2024-04-29 15:04                     ` Ryan Roberts
2024-04-29 15:04                       ` Ryan Roberts
2024-04-24 11:10 ` [PATCH v1 2/2] arm64/mm: Add uffd write-protect support Ryan Roberts
2024-04-24 11:10   ` Ryan Roberts
2024-04-24 11:57   ` Peter Xu
2024-04-24 11:57     ` Peter Xu
2024-04-24 12:51     ` Ryan Roberts
2024-04-24 12:51       ` Ryan Roberts
2024-04-26 13:17     ` Ryan Roberts
2024-04-26 13:17       ` Ryan Roberts
2024-04-26 13:54       ` Peter Xu
2024-04-26 13:54         ` Peter Xu
2024-04-29  9:39         ` Ryan Roberts
2024-04-29  9:39           ` Ryan Roberts
2024-04-24 16:46   ` Catalin Marinas
2024-04-24 16:46     ` Catalin Marinas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Zi-UyS5IC_truh8M@arm.com \
    --to=catalin.marinas@arm.com \
    --cc=anshuman.khandual@arm.com \
    --cc=ardb@kernel.org \
    --cc=david@redhat.com \
    --cc=joey.gouly@arm.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=peterx@redhat.com \
    --cc=rppt@linux.ibm.com \
    --cc=ryan.roberts@arm.com \
    --cc=shivanshvij@outlook.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.