All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
From: Ryan Roberts <ryan.roberts@arm.com>
To: David Hildenbrand <david@redhat.com>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Will Deacon <will@kernel.org>, Joey Gouly <joey.gouly@arm.com>,
	Ard Biesheuvel <ardb@kernel.org>,
	Mark Rutland <mark.rutland@arm.com>,
	Anshuman Khandual <anshuman.khandual@arm.com>,
	Peter Xu <peterx@redhat.com>, Mike Rapoport <rppt@linux.ibm.com>,
	Shivansh Vij <shivanshvij@outlook.com>
Cc: linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v1 1/2] arm64/mm: Move PTE_PROT_NONE and PMD_PRESENT_INVALID
Date: Thu, 25 Apr 2024 11:37:42 +0100	[thread overview]
Message-ID: <eed172b5-c71a-469f-a790-76126760ca7c@arm.com> (raw)
In-Reply-To: <df0475e1-9078-4629-b23d-0919ab1e37c2@arm.com>

On 25/04/2024 11:29, Ryan Roberts wrote:
> On 25/04/2024 10:16, David Hildenbrand wrote:
>> On 24.04.24 13:10, Ryan Roberts wrote:
>>> Previously PTE_PROT_NONE was occupying bit 58, one of the bits reserved
>>> for SW use when the PTE is valid. This is a waste of those precious SW
>>> bits since PTE_PROT_NONE can only ever be set when valid is clear.
>>> Instead let's overlay it on what would be a HW bit if valid was set.
>>>
>>> We need to be careful about which HW bit to choose since some of them
>>> must be preserved; when pte_present() is true (as it is for a
>>> PTE_PROT_NONE pte), it is legitimate for the core to call various
>>> accessors, e.g. pte_dirty(), pte_write() etc. There are also some
>>> accessors that are private to the arch which must continue to be
>>> honoured, e.g. pte_user(), pte_user_exec() etc.
>>>
>>> So we choose to overlay PTE_UXN; This effectively means that whenever a
>>> pte has PTE_PROT_NONE set, it will always report pte_user_exec() ==
>>> false, which is obviously always correct.
>>>
>>> As a result of this change, we must shuffle the layout of the
>>> arch-specific swap pte so that PTE_PROT_NONE is always zero and not
>>> overlapping with any other field. As a result of this, there is no way
>>> to keep the `type` field contiguous without conflicting with
>>> PMD_PRESENT_INVALID (bit 59), which must also be 0 for a swap pte. So
>>> let's move PMD_PRESENT_INVALID to bit 60.
>>
>> A note that some archs split/re-combine type and/or offset, to make use of every
>> bit possible :) But that's mostly relevant for 32bit.
>>
>> (and as long as PFNs can still fit into the swp offset for migration entries etc.)
> 
> Yeah, I considered splitting the type or offset field to avoid moving
> PMD_PRESENT_INVALID, but thought it was better to avoid the extra mask and shift.

Also, IMHO we shouldn't really need to reserve PMD_PRESENT_INVALID for swap
ptes; it would be cleaner to have one bit that defines "present" when valid is
clear (similar to PTE_PROT_NONE today) then another bit which is only defined
when "present && !valid" which tells us if this is PTE_PROT_NONE or
PMD_PRESENT_INVALID (I don't think you can ever have both at the same time?).

But there is a problem with this: __split_huge_pmd_locked() calls
pmdp_invalidate() for a pmd before it determines that it is pmd_present(). So
the PMD_PRESENT_INVALID can be set in a swap pte today. That feels wrong to me,
but was trying to avoid the whole thing unravelling so didn't persue.

> 
>>
>>>
>>> In the end, this frees up bit 58 for future use as a proper SW bit (e.g.
>>> soft-dirty or uffd-wp).
>>
>> I was briefly confused about how you would use these bits as SW bits for swap
>> PTEs (which you can't as they overlay the type). See below regarding bit 3.
>>
>> I would have said here "proper SW bit for present PTEs".
> 
> Yes; I'll clarify in the next version.
> 
>>
>>>
>>> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
>>> ---
>>>   arch/arm64/include/asm/pgtable-prot.h |  4 ++--
>>>   arch/arm64/include/asm/pgtable.h      | 16 +++++++++-------
>>>   2 files changed, 11 insertions(+), 9 deletions(-)
>>>
>>> diff --git a/arch/arm64/include/asm/pgtable-prot.h
>>> b/arch/arm64/include/asm/pgtable-prot.h
>>> index dd9ee67d1d87..ef952d69fd04 100644
>>> --- a/arch/arm64/include/asm/pgtable-prot.h
>>> +++ b/arch/arm64/include/asm/pgtable-prot.h
>>> @@ -18,14 +18,14 @@
>>>   #define PTE_DIRTY        (_AT(pteval_t, 1) << 55)
>>>   #define PTE_SPECIAL        (_AT(pteval_t, 1) << 56)
>>>   #define PTE_DEVMAP        (_AT(pteval_t, 1) << 57)
>>> -#define PTE_PROT_NONE        (_AT(pteval_t, 1) << 58) /* only when !PTE_VALID */
>>> +#define PTE_PROT_NONE        (PTE_UXN)         /* Reuse PTE_UXN; only when
>>> !PTE_VALID */
>>>     /*
>>>    * This bit indicates that the entry is present i.e. pmd_page()
>>>    * still points to a valid huge page in memory even if the pmd
>>>    * has been invalidated.
>>>    */
>>> -#define PMD_PRESENT_INVALID    (_AT(pteval_t, 1) << 59) /* only when
>>> !PMD_SECT_VALID */
>>> +#define PMD_PRESENT_INVALID    (_AT(pteval_t, 1) << 60) /* only when
>>> !PMD_SECT_VALID */
>>>     #define _PROT_DEFAULT        (PTE_TYPE_PAGE | PTE_AF | PTE_SHARED)
>>>   #define _PROT_SECT_DEFAULT    (PMD_TYPE_SECT | PMD_SECT_AF | PMD_SECT_S)
>>> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
>>> index afdd56d26ad7..23aabff4fa6f 100644
>>> --- a/arch/arm64/include/asm/pgtable.h
>>> +++ b/arch/arm64/include/asm/pgtable.h
>>> @@ -1248,20 +1248,22 @@ static inline pmd_t pmdp_establish(struct
>>> vm_area_struct *vma,
>>>    * Encode and decode a swap entry:
>>>    *    bits 0-1:    present (must be zero)
>>>    *    bits 2:        remember PG_anon_exclusive
>>> - *    bits 3-7:    swap type
>>> - *    bits 8-57:    swap offset
>>> - *    bit  58:    PTE_PROT_NONE (must be zero)
>>
>> Reading this patch alone: what happened to bit 3? Please mention that that it
>> will be used as a swap pte metadata bit (uffd-wp).
> 
> Will do. It's all a bit arbitrary though. I could have put offset in 3-52, and
> then 53 would have been spare for uffd-wp. I'm not sure there is any advantage
> to either option.
> 
>>
>>> + *    bits 4-53:    swap offset
>>
>> So we'll still have 50bit for the offset, good. We could even use 61-63 if ever
>> required to store bigger PFNs.
> 
> yep, or more sw bits.
> 
>>
>> LGTM
>>
>> Reviewed-by: David Hildenbrand <david@redhat.com>
> 
> Thanks!
> 
> 


WARNING: multiple messages have this Message-ID (diff)
From: Ryan Roberts <ryan.roberts@arm.com>
To: David Hildenbrand <david@redhat.com>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Will Deacon <will@kernel.org>, Joey Gouly <joey.gouly@arm.com>,
	Ard Biesheuvel <ardb@kernel.org>,
	Mark Rutland <mark.rutland@arm.com>,
	Anshuman Khandual <anshuman.khandual@arm.com>,
	Peter Xu <peterx@redhat.com>, Mike Rapoport <rppt@linux.ibm.com>,
	Shivansh Vij <shivanshvij@outlook.com>
Cc: linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v1 1/2] arm64/mm: Move PTE_PROT_NONE and PMD_PRESENT_INVALID
Date: Thu, 25 Apr 2024 11:37:42 +0100	[thread overview]
Message-ID: <eed172b5-c71a-469f-a790-76126760ca7c@arm.com> (raw)
In-Reply-To: <df0475e1-9078-4629-b23d-0919ab1e37c2@arm.com>

On 25/04/2024 11:29, Ryan Roberts wrote:
> On 25/04/2024 10:16, David Hildenbrand wrote:
>> On 24.04.24 13:10, Ryan Roberts wrote:
>>> Previously PTE_PROT_NONE was occupying bit 58, one of the bits reserved
>>> for SW use when the PTE is valid. This is a waste of those precious SW
>>> bits since PTE_PROT_NONE can only ever be set when valid is clear.
>>> Instead let's overlay it on what would be a HW bit if valid was set.
>>>
>>> We need to be careful about which HW bit to choose since some of them
>>> must be preserved; when pte_present() is true (as it is for a
>>> PTE_PROT_NONE pte), it is legitimate for the core to call various
>>> accessors, e.g. pte_dirty(), pte_write() etc. There are also some
>>> accessors that are private to the arch which must continue to be
>>> honoured, e.g. pte_user(), pte_user_exec() etc.
>>>
>>> So we choose to overlay PTE_UXN; This effectively means that whenever a
>>> pte has PTE_PROT_NONE set, it will always report pte_user_exec() ==
>>> false, which is obviously always correct.
>>>
>>> As a result of this change, we must shuffle the layout of the
>>> arch-specific swap pte so that PTE_PROT_NONE is always zero and not
>>> overlapping with any other field. As a result of this, there is no way
>>> to keep the `type` field contiguous without conflicting with
>>> PMD_PRESENT_INVALID (bit 59), which must also be 0 for a swap pte. So
>>> let's move PMD_PRESENT_INVALID to bit 60.
>>
>> A note that some archs split/re-combine type and/or offset, to make use of every
>> bit possible :) But that's mostly relevant for 32bit.
>>
>> (and as long as PFNs can still fit into the swp offset for migration entries etc.)
> 
> Yeah, I considered splitting the type or offset field to avoid moving
> PMD_PRESENT_INVALID, but thought it was better to avoid the extra mask and shift.

Also, IMHO we shouldn't really need to reserve PMD_PRESENT_INVALID for swap
ptes; it would be cleaner to have one bit that defines "present" when valid is
clear (similar to PTE_PROT_NONE today) then another bit which is only defined
when "present && !valid" which tells us if this is PTE_PROT_NONE or
PMD_PRESENT_INVALID (I don't think you can ever have both at the same time?).

But there is a problem with this: __split_huge_pmd_locked() calls
pmdp_invalidate() for a pmd before it determines that it is pmd_present(). So
the PMD_PRESENT_INVALID can be set in a swap pte today. That feels wrong to me,
but was trying to avoid the whole thing unravelling so didn't persue.

> 
>>
>>>
>>> In the end, this frees up bit 58 for future use as a proper SW bit (e.g.
>>> soft-dirty or uffd-wp).
>>
>> I was briefly confused about how you would use these bits as SW bits for swap
>> PTEs (which you can't as they overlay the type). See below regarding bit 3.
>>
>> I would have said here "proper SW bit for present PTEs".
> 
> Yes; I'll clarify in the next version.
> 
>>
>>>
>>> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
>>> ---
>>>   arch/arm64/include/asm/pgtable-prot.h |  4 ++--
>>>   arch/arm64/include/asm/pgtable.h      | 16 +++++++++-------
>>>   2 files changed, 11 insertions(+), 9 deletions(-)
>>>
>>> diff --git a/arch/arm64/include/asm/pgtable-prot.h
>>> b/arch/arm64/include/asm/pgtable-prot.h
>>> index dd9ee67d1d87..ef952d69fd04 100644
>>> --- a/arch/arm64/include/asm/pgtable-prot.h
>>> +++ b/arch/arm64/include/asm/pgtable-prot.h
>>> @@ -18,14 +18,14 @@
>>>   #define PTE_DIRTY        (_AT(pteval_t, 1) << 55)
>>>   #define PTE_SPECIAL        (_AT(pteval_t, 1) << 56)
>>>   #define PTE_DEVMAP        (_AT(pteval_t, 1) << 57)
>>> -#define PTE_PROT_NONE        (_AT(pteval_t, 1) << 58) /* only when !PTE_VALID */
>>> +#define PTE_PROT_NONE        (PTE_UXN)         /* Reuse PTE_UXN; only when
>>> !PTE_VALID */
>>>     /*
>>>    * This bit indicates that the entry is present i.e. pmd_page()
>>>    * still points to a valid huge page in memory even if the pmd
>>>    * has been invalidated.
>>>    */
>>> -#define PMD_PRESENT_INVALID    (_AT(pteval_t, 1) << 59) /* only when
>>> !PMD_SECT_VALID */
>>> +#define PMD_PRESENT_INVALID    (_AT(pteval_t, 1) << 60) /* only when
>>> !PMD_SECT_VALID */
>>>     #define _PROT_DEFAULT        (PTE_TYPE_PAGE | PTE_AF | PTE_SHARED)
>>>   #define _PROT_SECT_DEFAULT    (PMD_TYPE_SECT | PMD_SECT_AF | PMD_SECT_S)
>>> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
>>> index afdd56d26ad7..23aabff4fa6f 100644
>>> --- a/arch/arm64/include/asm/pgtable.h
>>> +++ b/arch/arm64/include/asm/pgtable.h
>>> @@ -1248,20 +1248,22 @@ static inline pmd_t pmdp_establish(struct
>>> vm_area_struct *vma,
>>>    * Encode and decode a swap entry:
>>>    *    bits 0-1:    present (must be zero)
>>>    *    bits 2:        remember PG_anon_exclusive
>>> - *    bits 3-7:    swap type
>>> - *    bits 8-57:    swap offset
>>> - *    bit  58:    PTE_PROT_NONE (must be zero)
>>
>> Reading this patch alone: what happened to bit 3? Please mention that that it
>> will be used as a swap pte metadata bit (uffd-wp).
> 
> Will do. It's all a bit arbitrary though. I could have put offset in 3-52, and
> then 53 would have been spare for uffd-wp. I'm not sure there is any advantage
> to either option.
> 
>>
>>> + *    bits 4-53:    swap offset
>>
>> So we'll still have 50bit for the offset, good. We could even use 61-63 if ever
>> required to store bigger PFNs.
> 
> yep, or more sw bits.
> 
>>
>> LGTM
>>
>> Reviewed-by: David Hildenbrand <david@redhat.com>
> 
> Thanks!
> 
> 


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2024-04-25 10:37 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-24 11:10 [PATCH v1 0/2] arm64/mm: Enable userfaultfd write-protect Ryan Roberts
2024-04-24 11:10 ` Ryan Roberts
2024-04-24 11:10 ` [PATCH v1 1/2] arm64/mm: Move PTE_PROT_NONE and PMD_PRESENT_INVALID Ryan Roberts
2024-04-24 11:10   ` Ryan Roberts
2024-04-24 16:43   ` Catalin Marinas
2024-04-24 16:43     ` Catalin Marinas
2024-04-25  8:40     ` Ryan Roberts
2024-04-25  8:40       ` Ryan Roberts
2024-04-25  9:16   ` David Hildenbrand
2024-04-25  9:16     ` David Hildenbrand
2024-04-25 10:29     ` Ryan Roberts
2024-04-25 10:29       ` Ryan Roberts
2024-04-25 10:37       ` Ryan Roberts [this message]
2024-04-25 10:37         ` Ryan Roberts
2024-04-26 14:48         ` Catalin Marinas
2024-04-26 14:48           ` Catalin Marinas
2024-04-29 10:04           ` Ryan Roberts
2024-04-29 10:04             ` Ryan Roberts
2024-04-29 12:38             ` Catalin Marinas
2024-04-29 12:38               ` Catalin Marinas
2024-04-29 13:01               ` Ryan Roberts
2024-04-29 13:01                 ` Ryan Roberts
2024-04-29 13:23                 ` Ryan Roberts
2024-04-29 13:23                   ` Ryan Roberts
2024-04-29 14:18                   ` Catalin Marinas
2024-04-29 14:18                     ` Catalin Marinas
2024-04-29 15:04                     ` Ryan Roberts
2024-04-29 15:04                       ` Ryan Roberts
2024-04-24 11:10 ` [PATCH v1 2/2] arm64/mm: Add uffd write-protect support Ryan Roberts
2024-04-24 11:10   ` Ryan Roberts
2024-04-24 11:57   ` Peter Xu
2024-04-24 11:57     ` Peter Xu
2024-04-24 12:51     ` Ryan Roberts
2024-04-24 12:51       ` Ryan Roberts
2024-04-26 13:17     ` Ryan Roberts
2024-04-26 13:17       ` Ryan Roberts
2024-04-26 13:54       ` Peter Xu
2024-04-26 13:54         ` Peter Xu
2024-04-29  9:39         ` Ryan Roberts
2024-04-29  9:39           ` Ryan Roberts
2024-04-24 16:46   ` Catalin Marinas
2024-04-24 16:46     ` Catalin Marinas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=eed172b5-c71a-469f-a790-76126760ca7c@arm.com \
    --to=ryan.roberts@arm.com \
    --cc=anshuman.khandual@arm.com \
    --cc=ardb@kernel.org \
    --cc=catalin.marinas@arm.com \
    --cc=david@redhat.com \
    --cc=joey.gouly@arm.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=peterx@redhat.com \
    --cc=rppt@linux.ibm.com \
    --cc=shivanshvij@outlook.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.