All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
From: Ryan Roberts <ryan.roberts@arm.com>
To: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Hildenbrand <david@redhat.com>,
	Will Deacon <will@kernel.org>, Joey Gouly <joey.gouly@arm.com>,
	Ard Biesheuvel <ardb@kernel.org>,
	Mark Rutland <mark.rutland@arm.com>,
	Anshuman Khandual <anshuman.khandual@arm.com>,
	Peter Xu <peterx@redhat.com>, Mike Rapoport <rppt@linux.ibm.com>,
	Shivansh Vij <shivanshvij@outlook.com>,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH v1 1/2] arm64/mm: Move PTE_PROT_NONE and PMD_PRESENT_INVALID
Date: Mon, 29 Apr 2024 14:01:53 +0100	[thread overview]
Message-ID: <e946c510-9ba3-4d7b-9561-5ded86086df0@arm.com> (raw)
In-Reply-To: <Zi-UyS5IC_truh8M@arm.com>

On 29/04/2024 13:38, Catalin Marinas wrote:
> On Mon, Apr 29, 2024 at 11:04:53AM +0100, Ryan Roberts wrote:
>> On 26/04/2024 15:48, Catalin Marinas wrote:
>>> On Thu, Apr 25, 2024 at 11:37:42AM +0100, Ryan Roberts wrote:
>>>> Also, IMHO we shouldn't really need to reserve PMD_PRESENT_INVALID for swap
>>>> ptes; it would be cleaner to have one bit that defines "present" when valid is
>>>> clear (similar to PTE_PROT_NONE today) then another bit which is only defined
>>>> when "present && !valid" which tells us if this is PTE_PROT_NONE or
>>>> PMD_PRESENT_INVALID (I don't think you can ever have both at the same time?).
>>>
>>> I think this make sense, maybe rename the above to PTE_PRESENT_INVALID
>>> and use it for both ptes and pmds.
>>
>> Yep, sounds good. I've already got a patch to do this, but it's exposed a bug in
>> core-mm so will now fix that before I can validate my change. see
>> https://lore.kernel.org/linux-arm-kernel/ZiuyGXt0XWwRgFh9@x1n/
>>
>> With this in place, I'm proposing to remove PTE_PROT_NONE entirely and instead
>> represent PROT_NONE as a present but invalid pte (PTE_VALID=0, PTE_INVALID=1)
>> with both PTE_WRITE=0 and PTE_RDONLY=0.
>>
>> While the HW would interpret PTE_WRITE=0/PTE_RDONLY=0 as "RW without dirty bit
>> modification", this is not a problem as the pte is invalid, so the HW doesn't
>> interpret it. And SW always uses the PTE_WRITE bit to interpret the writability
>> of the pte. So PTE_WRITE=0/PTE_RDONLY=0 was previously an unused combination
>> that we now repurpose for PROT_NONE.
> 
> Why not just keep the bits currently in PAGE_NONE (PTE_RDONLY would be
> set) and check PTE_USER|PTE_UXN == 0b01 which is a unique combination
> for PAGE_NONE (bar the kernel mappings).

Yes I guess that works. I personally prefer my proposal because it is more
intuitive; you have an R bit and a W bit, and you encode RO, WR, and NONE. But
if you think reusing the kernel mapping check (PTE_USER|PTE_UXN == 0b01) is
preferable, then I'll go with that.

> 
> For ptes, it doesn't matter, we can assume that PTE_PRESENT_INVALID
> means pte_protnone(). For pmds, however, we can end up with
> pmd_protnone(pmd_mkinvalid(pmd)) == true for any of the PAGE_*
> permissions encoded into a valid pmd. That's where a dedicated
> PTE_PROT_NONE bit helped.

Yes agreed.

> 
> Let's say a CPU starts splitting a pmd and does a pmdp_invalidate*()
> first to set PTE_PRESENT_INVALID. A different CPU gets a fault and since
> the pmd is present, it goes and checks pmd_protnone() which returns
> true, ending up on do_huge_pmd_numa_page() path. Maybe some locks help
> but it looks fragile to rely on them.
> 
> So I think for protnone we need to check some other bits (like USER and
> UXN) in addition to PTE_PRESENT_INVALID.

Yes 100% agree. But using PTE_WRITE|PTE_RDONLY==0b00 is just as valid for that
purpose, I think?

> 
>> This will subtly change behaviour in an edge case though. Imagine:
>>
>> pte_t pte;
>>
>> pte = pte_modify(pte, PAGE_NONE);
>> pte = pte_mkwrite_novma(pte);
>> WARN_ON(pte_protnone(pte));
>>
>> Should that warning fire or not? Previously, because we had a dedicated bit for
>> PTE_PROT_NONE it would fire. With my proposed change it will not fire. To me
>> it's more intuitive if it doesn't fire. Regardless there is no core code that
>> ever does this. Once you have a protnone pte, its terminal - nothing ever
>> modifies it with these helpers AFAICS.
> 
> I don't think any core code should try to make page a PAGE_NONE pte
> writeable.

I looked at some other arches; some (at least alpha and hexagon) will not fire
this warning because they have R and W bits and 0b00 means NONE. Others (x86)
will fire it because they have an explicit NONE bit and don't remove it on
permission change. So I conclude its UB and fine to do either.

> 
>> Personally I think this is a nice tidy up that saves a SW bit in both present
>> and swap ptes. What do you think? (I'll just post the series if its easier to
>> provide feedback in that context).
> 
> It would be nice to tidy this up and get rid of PTE_PROT_NONE as long as
> it doesn't affect the pmd case I mentioned above.
> 
>>>> But there is a problem with this: __split_huge_pmd_locked() calls
>>>> pmdp_invalidate() for a pmd before it determines that it is pmd_present(). So
>>>> the PMD_PRESENT_INVALID can be set in a swap pte today. That feels wrong to me,
>>>> but was trying to avoid the whole thing unravelling so didn't persue.
>>>
>>> Maybe what's wrong is the arm64 implementation setting this bit on a
>>> swap/migration pmd (though we could handle this in the core code as
>>> well, it depends what the other architectures do). The only check for
>>> the PMD_PRESENT_INVALID bit is in the arm64 code and it can be absorbed
>>> into the pmd_present() check. I think it is currently broken as
>>> pmd_present() can return true for a swap pmd after pmd_mkinvalid().
>>
>> I've posted a fix here:
>> https://lore.kernel.org/linux-mm/20240425170704.3379492-1-ryan.roberts@arm.com/
>>
>> My position is that you shouldn't be calling pmd_mkinvalid() on a non-present pmd.
> 
> I agree, thanks.
> 


WARNING: multiple messages have this Message-ID (diff)
From: Ryan Roberts <ryan.roberts@arm.com>
To: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Hildenbrand <david@redhat.com>,
	Will Deacon <will@kernel.org>, Joey Gouly <joey.gouly@arm.com>,
	Ard Biesheuvel <ardb@kernel.org>,
	Mark Rutland <mark.rutland@arm.com>,
	Anshuman Khandual <anshuman.khandual@arm.com>,
	Peter Xu <peterx@redhat.com>, Mike Rapoport <rppt@linux.ibm.com>,
	Shivansh Vij <shivanshvij@outlook.com>,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH v1 1/2] arm64/mm: Move PTE_PROT_NONE and PMD_PRESENT_INVALID
Date: Mon, 29 Apr 2024 14:01:53 +0100	[thread overview]
Message-ID: <e946c510-9ba3-4d7b-9561-5ded86086df0@arm.com> (raw)
In-Reply-To: <Zi-UyS5IC_truh8M@arm.com>

On 29/04/2024 13:38, Catalin Marinas wrote:
> On Mon, Apr 29, 2024 at 11:04:53AM +0100, Ryan Roberts wrote:
>> On 26/04/2024 15:48, Catalin Marinas wrote:
>>> On Thu, Apr 25, 2024 at 11:37:42AM +0100, Ryan Roberts wrote:
>>>> Also, IMHO we shouldn't really need to reserve PMD_PRESENT_INVALID for swap
>>>> ptes; it would be cleaner to have one bit that defines "present" when valid is
>>>> clear (similar to PTE_PROT_NONE today) then another bit which is only defined
>>>> when "present && !valid" which tells us if this is PTE_PROT_NONE or
>>>> PMD_PRESENT_INVALID (I don't think you can ever have both at the same time?).
>>>
>>> I think this make sense, maybe rename the above to PTE_PRESENT_INVALID
>>> and use it for both ptes and pmds.
>>
>> Yep, sounds good. I've already got a patch to do this, but it's exposed a bug in
>> core-mm so will now fix that before I can validate my change. see
>> https://lore.kernel.org/linux-arm-kernel/ZiuyGXt0XWwRgFh9@x1n/
>>
>> With this in place, I'm proposing to remove PTE_PROT_NONE entirely and instead
>> represent PROT_NONE as a present but invalid pte (PTE_VALID=0, PTE_INVALID=1)
>> with both PTE_WRITE=0 and PTE_RDONLY=0.
>>
>> While the HW would interpret PTE_WRITE=0/PTE_RDONLY=0 as "RW without dirty bit
>> modification", this is not a problem as the pte is invalid, so the HW doesn't
>> interpret it. And SW always uses the PTE_WRITE bit to interpret the writability
>> of the pte. So PTE_WRITE=0/PTE_RDONLY=0 was previously an unused combination
>> that we now repurpose for PROT_NONE.
> 
> Why not just keep the bits currently in PAGE_NONE (PTE_RDONLY would be
> set) and check PTE_USER|PTE_UXN == 0b01 which is a unique combination
> for PAGE_NONE (bar the kernel mappings).

Yes I guess that works. I personally prefer my proposal because it is more
intuitive; you have an R bit and a W bit, and you encode RO, WR, and NONE. But
if you think reusing the kernel mapping check (PTE_USER|PTE_UXN == 0b01) is
preferable, then I'll go with that.

> 
> For ptes, it doesn't matter, we can assume that PTE_PRESENT_INVALID
> means pte_protnone(). For pmds, however, we can end up with
> pmd_protnone(pmd_mkinvalid(pmd)) == true for any of the PAGE_*
> permissions encoded into a valid pmd. That's where a dedicated
> PTE_PROT_NONE bit helped.

Yes agreed.

> 
> Let's say a CPU starts splitting a pmd and does a pmdp_invalidate*()
> first to set PTE_PRESENT_INVALID. A different CPU gets a fault and since
> the pmd is present, it goes and checks pmd_protnone() which returns
> true, ending up on do_huge_pmd_numa_page() path. Maybe some locks help
> but it looks fragile to rely on them.
> 
> So I think for protnone we need to check some other bits (like USER and
> UXN) in addition to PTE_PRESENT_INVALID.

Yes 100% agree. But using PTE_WRITE|PTE_RDONLY==0b00 is just as valid for that
purpose, I think?

> 
>> This will subtly change behaviour in an edge case though. Imagine:
>>
>> pte_t pte;
>>
>> pte = pte_modify(pte, PAGE_NONE);
>> pte = pte_mkwrite_novma(pte);
>> WARN_ON(pte_protnone(pte));
>>
>> Should that warning fire or not? Previously, because we had a dedicated bit for
>> PTE_PROT_NONE it would fire. With my proposed change it will not fire. To me
>> it's more intuitive if it doesn't fire. Regardless there is no core code that
>> ever does this. Once you have a protnone pte, its terminal - nothing ever
>> modifies it with these helpers AFAICS.
> 
> I don't think any core code should try to make page a PAGE_NONE pte
> writeable.

I looked at some other arches; some (at least alpha and hexagon) will not fire
this warning because they have R and W bits and 0b00 means NONE. Others (x86)
will fire it because they have an explicit NONE bit and don't remove it on
permission change. So I conclude its UB and fine to do either.

> 
>> Personally I think this is a nice tidy up that saves a SW bit in both present
>> and swap ptes. What do you think? (I'll just post the series if its easier to
>> provide feedback in that context).
> 
> It would be nice to tidy this up and get rid of PTE_PROT_NONE as long as
> it doesn't affect the pmd case I mentioned above.
> 
>>>> But there is a problem with this: __split_huge_pmd_locked() calls
>>>> pmdp_invalidate() for a pmd before it determines that it is pmd_present(). So
>>>> the PMD_PRESENT_INVALID can be set in a swap pte today. That feels wrong to me,
>>>> but was trying to avoid the whole thing unravelling so didn't persue.
>>>
>>> Maybe what's wrong is the arm64 implementation setting this bit on a
>>> swap/migration pmd (though we could handle this in the core code as
>>> well, it depends what the other architectures do). The only check for
>>> the PMD_PRESENT_INVALID bit is in the arm64 code and it can be absorbed
>>> into the pmd_present() check. I think it is currently broken as
>>> pmd_present() can return true for a swap pmd after pmd_mkinvalid().
>>
>> I've posted a fix here:
>> https://lore.kernel.org/linux-mm/20240425170704.3379492-1-ryan.roberts@arm.com/
>>
>> My position is that you shouldn't be calling pmd_mkinvalid() on a non-present pmd.
> 
> I agree, thanks.
> 


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2024-04-29 13:02 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-24 11:10 [PATCH v1 0/2] arm64/mm: Enable userfaultfd write-protect Ryan Roberts
2024-04-24 11:10 ` Ryan Roberts
2024-04-24 11:10 ` [PATCH v1 1/2] arm64/mm: Move PTE_PROT_NONE and PMD_PRESENT_INVALID Ryan Roberts
2024-04-24 11:10   ` Ryan Roberts
2024-04-24 16:43   ` Catalin Marinas
2024-04-24 16:43     ` Catalin Marinas
2024-04-25  8:40     ` Ryan Roberts
2024-04-25  8:40       ` Ryan Roberts
2024-04-25  9:16   ` David Hildenbrand
2024-04-25  9:16     ` David Hildenbrand
2024-04-25 10:29     ` Ryan Roberts
2024-04-25 10:29       ` Ryan Roberts
2024-04-25 10:37       ` Ryan Roberts
2024-04-25 10:37         ` Ryan Roberts
2024-04-26 14:48         ` Catalin Marinas
2024-04-26 14:48           ` Catalin Marinas
2024-04-29 10:04           ` Ryan Roberts
2024-04-29 10:04             ` Ryan Roberts
2024-04-29 12:38             ` Catalin Marinas
2024-04-29 12:38               ` Catalin Marinas
2024-04-29 13:01               ` Ryan Roberts [this message]
2024-04-29 13:01                 ` Ryan Roberts
2024-04-29 13:23                 ` Ryan Roberts
2024-04-29 13:23                   ` Ryan Roberts
2024-04-29 14:18                   ` Catalin Marinas
2024-04-29 14:18                     ` Catalin Marinas
2024-04-29 15:04                     ` Ryan Roberts
2024-04-29 15:04                       ` Ryan Roberts
2024-04-24 11:10 ` [PATCH v1 2/2] arm64/mm: Add uffd write-protect support Ryan Roberts
2024-04-24 11:10   ` Ryan Roberts
2024-04-24 11:57   ` Peter Xu
2024-04-24 11:57     ` Peter Xu
2024-04-24 12:51     ` Ryan Roberts
2024-04-24 12:51       ` Ryan Roberts
2024-04-26 13:17     ` Ryan Roberts
2024-04-26 13:17       ` Ryan Roberts
2024-04-26 13:54       ` Peter Xu
2024-04-26 13:54         ` Peter Xu
2024-04-29  9:39         ` Ryan Roberts
2024-04-29  9:39           ` Ryan Roberts
2024-04-24 16:46   ` Catalin Marinas
2024-04-24 16:46     ` Catalin Marinas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e946c510-9ba3-4d7b-9561-5ded86086df0@arm.com \
    --to=ryan.roberts@arm.com \
    --cc=anshuman.khandual@arm.com \
    --cc=ardb@kernel.org \
    --cc=catalin.marinas@arm.com \
    --cc=david@redhat.com \
    --cc=joey.gouly@arm.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=peterx@redhat.com \
    --cc=rppt@linux.ibm.com \
    --cc=shivanshvij@outlook.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.