All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4] x86/mm/ident_map: On UV systems, use gbpages only where full GB page should be mapped.
@ 2024-03-28 16:06 Steve Wahl
  2024-03-28 16:10 ` kernel test robot
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Steve Wahl @ 2024-03-28 16:06 UTC (permalink / raw)
  To: Steve Wahl, Dave Hansen, Andy Lutomirski, Peter Zijlstra,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	H. Peter Anvin, linux-kernel, Linux regressions mailing list,
	Pavin Joseph, stable, Eric Hagberg
  Cc: Simon Horman, Eric Biederman, Dave Young, Sarah Brofeldt,
	Russ Anderson, Dimitri Sivanich, Hou Wenlong, Andrew Morton,
	Baoquan He, Yuntao Wang, Bjorn Helgaas

When ident_pud_init() uses only gbpages to create identity maps, large
ranges of addresses not actually requested can be included in the
resulting table; a 4K request will map a full GB.  On UV systems, this
ends up including regions that will cause hardware to halt the system
if accessed (these are marked "reserved" by BIOS).  Even processor
speculation into these regions is enough to trigger the system halt.
And MTRRs cannot be used to restrict this speculation, there are not
enough MTRRs to cover all the reserved regions.

The fix for that would be to only use gbpages when map creation
requests include the full GB page of space, and falling back to using
smaller 2M pages when only portions of a GB page are included in the
request.

But on some other systems, possibly due to buggy bios, that solution
leaves some areas out of the identity map that are needed for kexec to
succeed.  It is believed that these areas are not marked properly for
map_acpi_tables() in arch/x86/kernel/machine_kexec_64.c to catch and
map them.  The nogbpages kernel command line option also causes these
systems to fail even without these changes.

So, create kexec identity maps using full GB pages on all platforms
but UV; on UV, use narrower 2MB pages in the identity map where a full
GB page would include areas outside the region requested.

No attempt is made to coalesce mapping requests. If a request requires
a map entry at the 2M (pmd) level, subsequent mapping requests within
the same 1G region will also be at the pmd level, even if adjacent or
overlapping such requests could have been combined to map a full
gbpage.  Existing usage starts with larger regions and then adds
smaller regions, so this should not have any great consequence.

Signed-off-by: Steve Wahl <steve.wahl@hpe.com>

Fixes: d794734c9bbf ("x86/mm/ident_map: Use gbpages only where full GB page should be mapped.")
Reported-by: Pavin Joseph <me@pavinjoseph.com>
Closes: https://lore.kernel.org/all/3a1b9909-45ac-4f97-ad68-d16ef1ce99db@pavinjoseph.com/
Link: https://lore.kernel.org/all/20240322162135.3984233-1-steve.wahl@hpe.com/
Tested-by: Pavin Joseph <me@pavinjoseph.com>
Tested-by: Eric Hagberg <ehagberg@gmail.com>
Tested-by: Sarah Brofeldt <srhb@dbc.dk>
---

v4: Incorporate fix for regression on systems relying on gbpages
    mapping more than the ranges actually requested for successful
    kexec, by limiting the effects of the change to UV systems.
    This patch based on tip/x86/urgent.

v3: per Dave Hansen review, re-arrange changelog info,
    refactor code to use bool variable and split out conditions.

v2: per Dave Hansen review: Additional changelog info,
    moved pud_large() check earlier in the code, and
    improved the comment describing the conditions
    that restrict gbpage usage.
   

 arch/x86/include/asm/init.h        |  1 +
 arch/x86/kernel/machine_kexec_64.c | 10 ++++++++++
 arch/x86/mm/ident_map.c            | 24 +++++++++++++++++++-----
 3 files changed, 30 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/init.h b/arch/x86/include/asm/init.h
index cc9ccf61b6bd..371d9faea8bc 100644
--- a/arch/x86/include/asm/init.h
+++ b/arch/x86/include/asm/init.h
@@ -10,6 +10,7 @@ struct x86_mapping_info {
 	unsigned long page_flag;	 /* page flag for PMD or PUD entry */
 	unsigned long offset;		 /* ident mapping offset */
 	bool direct_gbpages;		 /* PUD level 1GB page support */
+	bool direct_gbpages_only;	 /* use 1GB pages exclusively */
 	unsigned long kernpg_flag;	 /* kernel pagetable flag override */
 };
 
diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c
index b180d8e497c3..3a2f5d291a88 100644
--- a/arch/x86/kernel/machine_kexec_64.c
+++ b/arch/x86/kernel/machine_kexec_64.c
@@ -28,6 +28,7 @@
 #include <asm/setup.h>
 #include <asm/set_memory.h>
 #include <asm/cpu.h>
+#include <asm/uv/uv.h>
 
 #ifdef CONFIG_ACPI
 /*
@@ -212,6 +213,15 @@ static int init_pgtable(struct kimage *image, unsigned long start_pgtable)
 
 	if (direct_gbpages)
 		info.direct_gbpages = true;
+	/*
+	 * UV systems need restrained use of gbpages in the identity
+	 * maps to avoid system halts.  But some other systems rely on
+	 * using gbpages to expand mappings outside the regions
+	 * actually listed, to include areas required for kexec but
+	 * not explicitly named by the bios.
+	 */
+	if (!is_uv_system())
+		info.direct_gbpages_only = true;
 
 	for (i = 0; i < nr_pfn_mapped; i++) {
 		mstart = pfn_mapped[i].start << PAGE_SHIFT;
diff --git a/arch/x86/mm/ident_map.c b/arch/x86/mm/ident_map.c
index 968d7005f4a7..a538a54aba5d 100644
--- a/arch/x86/mm/ident_map.c
+++ b/arch/x86/mm/ident_map.c
@@ -26,18 +26,32 @@ static int ident_pud_init(struct x86_mapping_info *info, pud_t *pud_page,
 	for (; addr < end; addr = next) {
 		pud_t *pud = pud_page + pud_index(addr);
 		pmd_t *pmd;
+		bool use_gbpage;
 
 		next = (addr & PUD_MASK) + PUD_SIZE;
 		if (next > end)
 			next = end;
 
-		if (info->direct_gbpages) {
-			pud_t pudval;
+		/* if this is already a gbpage, this portion is already mapped */
+		if (pud_leaf(*pud))
+			continue;
+
+		/* Is using a gbpage allowed? */
+		use_gbpage = info->direct_gbpages;
 
-			if (pud_present(*pud))
-				continue;
+		if (!info->direct_gbpages_only) {
+			/* Don't use gbpage if it maps more than the requested region. */
+			/* at the beginning: */
+			use_gbpage &= ((addr & ~PUD_MASK) == 0);
+			/* ... or at the end: */
+			use_gbpage &= ((next & ~PUD_MASK) == 0);
+		}
+		/* Never overwrite existing mappings */
+		use_gbpage &= !pud_present(*pud);
+
+		if (use_gbpage) {
+			pud_t pudval;
 
-			addr &= PUD_MASK;
 			pudval = __pud((addr - info->offset) | info->page_flag);
 			set_pud(pud, pudval);
 			continue;

base-commit: b6540de9b5c867b4c8bc31225db181cc017d8cc7
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v4] x86/mm/ident_map: On UV systems, use gbpages only where full GB page should be mapped.
  2024-03-28 16:06 [PATCH v4] x86/mm/ident_map: On UV systems, use gbpages only where full GB page should be mapped Steve Wahl
@ 2024-03-28 16:10 ` kernel test robot
  2024-03-28 16:17 ` Steve Wahl
  2024-03-29  7:15 ` Ingo Molnar
  2 siblings, 0 replies; 10+ messages in thread
From: kernel test robot @ 2024-03-28 16:10 UTC (permalink / raw)
  To: Steve Wahl; +Cc: stable, oe-kbuild-all

Hi,

Thanks for your patch.

FYI: kernel test robot notices the stable kernel rule is not satisfied.

The check is based on https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html#option-1

Rule: add the tag "Cc: stable@vger.kernel.org" in the sign-off area to have the patch automatically included in the stable tree.
Subject: [PATCH v4] x86/mm/ident_map: On UV systems, use gbpages only where full GB page should be mapped.
Link: https://lore.kernel.org/stable/20240328160614.1838496-1-steve.wahl%40hpe.com

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki




^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v4] x86/mm/ident_map: On UV systems, use gbpages only where full GB page should be mapped.
  2024-03-28 16:06 [PATCH v4] x86/mm/ident_map: On UV systems, use gbpages only where full GB page should be mapped Steve Wahl
  2024-03-28 16:10 ` kernel test robot
@ 2024-03-28 16:17 ` Steve Wahl
  2024-03-29  7:15 ` Ingo Molnar
  2 siblings, 0 replies; 10+ messages in thread
From: Steve Wahl @ 2024-03-28 16:17 UTC (permalink / raw)
  To: Steve Wahl, Dave Hansen, Andy Lutomirski, Peter Zijlstra,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	H. Peter Anvin, linux-kernel, Linux regressions mailing list,
	Pavin Joseph, stable, Eric Hagberg
  Cc: Simon Horman, Eric Biederman, Dave Young, Sarah Brofeldt,
	Russ Anderson, Dimitri Sivanich, Hou Wenlong, Andrew Morton,
	Baoquan He, Yuntao Wang, Bjorn Helgaas

Note: I cc:'d stable in the email headers by mistake.  NO CC: stable
tag, I don't want this to go into stable.

Thanks,

--> Steve

On Thu, Mar 28, 2024 at 11:06:14AM -0500, Steve Wahl wrote:
> When ident_pud_init() uses only gbpages to create identity maps, large
> ranges of addresses not actually requested can be included in the
> resulting table; a 4K request will map a full GB.  On UV systems, this
> ends up including regions that will cause hardware to halt the system
> if accessed (these are marked "reserved" by BIOS).  Even processor
> speculation into these regions is enough to trigger the system halt.
> And MTRRs cannot be used to restrict this speculation, there are not
> enough MTRRs to cover all the reserved regions.
> 
> The fix for that would be to only use gbpages when map creation
> requests include the full GB page of space, and falling back to using
> smaller 2M pages when only portions of a GB page are included in the
> request.
> 
> But on some other systems, possibly due to buggy bios, that solution
> leaves some areas out of the identity map that are needed for kexec to
> succeed.  It is believed that these areas are not marked properly for
> map_acpi_tables() in arch/x86/kernel/machine_kexec_64.c to catch and
> map them.  The nogbpages kernel command line option also causes these
> systems to fail even without these changes.
> 
> So, create kexec identity maps using full GB pages on all platforms
> but UV; on UV, use narrower 2MB pages in the identity map where a full
> GB page would include areas outside the region requested.
> 
> No attempt is made to coalesce mapping requests. If a request requires
> a map entry at the 2M (pmd) level, subsequent mapping requests within
> the same 1G region will also be at the pmd level, even if adjacent or
> overlapping such requests could have been combined to map a full
> gbpage.  Existing usage starts with larger regions and then adds
> smaller regions, so this should not have any great consequence.
> 
> Signed-off-by: Steve Wahl <steve.wahl@hpe.com>
> 
> Fixes: d794734c9bbf ("x86/mm/ident_map: Use gbpages only where full GB page should be mapped.")
> Reported-by: Pavin Joseph <me@pavinjoseph.com>
> Closes: https://lore.kernel.org/all/3a1b9909-45ac-4f97-ad68-d16ef1ce99db@pavinjoseph.com/
> Link: https://lore.kernel.org/all/20240322162135.3984233-1-steve.wahl@hpe.com/
> Tested-by: Pavin Joseph <me@pavinjoseph.com>
> Tested-by: Eric Hagberg <ehagberg@gmail.com>
> Tested-by: Sarah Brofeldt <srhb@dbc.dk>
> ---
> 
> v4: Incorporate fix for regression on systems relying on gbpages
>     mapping more than the ranges actually requested for successful
>     kexec, by limiting the effects of the change to UV systems.
>     This patch based on tip/x86/urgent.
> 
> v3: per Dave Hansen review, re-arrange changelog info,
>     refactor code to use bool variable and split out conditions.
> 
> v2: per Dave Hansen review: Additional changelog info,
>     moved pud_large() check earlier in the code, and
>     improved the comment describing the conditions
>     that restrict gbpage usage.
>    
> 
>  arch/x86/include/asm/init.h        |  1 +
>  arch/x86/kernel/machine_kexec_64.c | 10 ++++++++++
>  arch/x86/mm/ident_map.c            | 24 +++++++++++++++++++-----
>  3 files changed, 30 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/x86/include/asm/init.h b/arch/x86/include/asm/init.h
> index cc9ccf61b6bd..371d9faea8bc 100644
> --- a/arch/x86/include/asm/init.h
> +++ b/arch/x86/include/asm/init.h
> @@ -10,6 +10,7 @@ struct x86_mapping_info {
>  	unsigned long page_flag;	 /* page flag for PMD or PUD entry */
>  	unsigned long offset;		 /* ident mapping offset */
>  	bool direct_gbpages;		 /* PUD level 1GB page support */
> +	bool direct_gbpages_only;	 /* use 1GB pages exclusively */
>  	unsigned long kernpg_flag;	 /* kernel pagetable flag override */
>  };
>  
> diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c
> index b180d8e497c3..3a2f5d291a88 100644
> --- a/arch/x86/kernel/machine_kexec_64.c
> +++ b/arch/x86/kernel/machine_kexec_64.c
> @@ -28,6 +28,7 @@
>  #include <asm/setup.h>
>  #include <asm/set_memory.h>
>  #include <asm/cpu.h>
> +#include <asm/uv/uv.h>
>  
>  #ifdef CONFIG_ACPI
>  /*
> @@ -212,6 +213,15 @@ static int init_pgtable(struct kimage *image, unsigned long start_pgtable)
>  
>  	if (direct_gbpages)
>  		info.direct_gbpages = true;
> +	/*
> +	 * UV systems need restrained use of gbpages in the identity
> +	 * maps to avoid system halts.  But some other systems rely on
> +	 * using gbpages to expand mappings outside the regions
> +	 * actually listed, to include areas required for kexec but
> +	 * not explicitly named by the bios.
> +	 */
> +	if (!is_uv_system())
> +		info.direct_gbpages_only = true;
>  
>  	for (i = 0; i < nr_pfn_mapped; i++) {
>  		mstart = pfn_mapped[i].start << PAGE_SHIFT;
> diff --git a/arch/x86/mm/ident_map.c b/arch/x86/mm/ident_map.c
> index 968d7005f4a7..a538a54aba5d 100644
> --- a/arch/x86/mm/ident_map.c
> +++ b/arch/x86/mm/ident_map.c
> @@ -26,18 +26,32 @@ static int ident_pud_init(struct x86_mapping_info *info, pud_t *pud_page,
>  	for (; addr < end; addr = next) {
>  		pud_t *pud = pud_page + pud_index(addr);
>  		pmd_t *pmd;
> +		bool use_gbpage;
>  
>  		next = (addr & PUD_MASK) + PUD_SIZE;
>  		if (next > end)
>  			next = end;
>  
> -		if (info->direct_gbpages) {
> -			pud_t pudval;
> +		/* if this is already a gbpage, this portion is already mapped */
> +		if (pud_leaf(*pud))
> +			continue;
> +
> +		/* Is using a gbpage allowed? */
> +		use_gbpage = info->direct_gbpages;
>  
> -			if (pud_present(*pud))
> -				continue;
> +		if (!info->direct_gbpages_only) {
> +			/* Don't use gbpage if it maps more than the requested region. */
> +			/* at the beginning: */
> +			use_gbpage &= ((addr & ~PUD_MASK) == 0);
> +			/* ... or at the end: */
> +			use_gbpage &= ((next & ~PUD_MASK) == 0);
> +		}
> +		/* Never overwrite existing mappings */
> +		use_gbpage &= !pud_present(*pud);
> +
> +		if (use_gbpage) {
> +			pud_t pudval;
>  
> -			addr &= PUD_MASK;
>  			pudval = __pud((addr - info->offset) | info->page_flag);
>  			set_pud(pud, pudval);
>  			continue;
> 
> base-commit: b6540de9b5c867b4c8bc31225db181cc017d8cc7
> -- 
> 2.26.2
> 

-- 
Steve Wahl, Hewlett Packard Enterprise

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v4] x86/mm/ident_map: On UV systems, use gbpages only where full GB page should be mapped.
  2024-03-28 16:06 [PATCH v4] x86/mm/ident_map: On UV systems, use gbpages only where full GB page should be mapped Steve Wahl
  2024-03-28 16:10 ` kernel test robot
  2024-03-28 16:17 ` Steve Wahl
@ 2024-03-29  7:15 ` Ingo Molnar
  2024-03-29  8:01   ` Pavin Joseph
  2 siblings, 1 reply; 10+ messages in thread
From: Ingo Molnar @ 2024-03-29  7:15 UTC (permalink / raw)
  To: Steve Wahl
  Cc: Dave Hansen, Andy Lutomirski, Peter Zijlstra, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, x86, H. Peter Anvin, linux-kernel,
	Linux regressions mailing list, Pavin Joseph, stable,
	Eric Hagberg, Simon Horman, Eric Biederman, Dave Young,
	Sarah Brofeldt, Russ Anderson, Dimitri Sivanich, Hou Wenlong,
	Andrew Morton, Baoquan He, Yuntao Wang, Bjorn Helgaas


* Steve Wahl <steve.wahl@hpe.com> wrote:

> When ident_pud_init() uses only gbpages to create identity maps, large
> ranges of addresses not actually requested can be included in the
> resulting table; a 4K request will map a full GB.  On UV systems, this
> ends up including regions that will cause hardware to halt the system
> if accessed (these are marked "reserved" by BIOS).  Even processor
> speculation into these regions is enough to trigger the system halt.
> And MTRRs cannot be used to restrict this speculation, there are not
> enough MTRRs to cover all the reserved regions.

Nor should MTRRs be (ab-)used for this really.

> The fix for that would be to only use gbpages when map creation 
> requests include the full GB page of space, and falling back to using 
> smaller 2M pages when only portions of a GB page are included in the 
> request.
> 
> But on some other systems, possibly due to buggy bios, that solution 
> leaves some areas out of the identity map that are needed for kexec 
> to succeed.  It is believed that these areas are not marked properly 
> for map_acpi_tables() in arch/x86/kernel/machine_kexec_64.c to catch 
> and map them.  The nogbpages kernel command line option also causes 
> these systems to fail even without these changes.

Does the 'nogbpages' kernel command line option fail on these systems 
even outside of kexec (ie. regular boot), or only in combination with 
kexec?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v4] x86/mm/ident_map: On UV systems, use gbpages only where full GB page should be mapped.
  2024-03-29  7:15 ` Ingo Molnar
@ 2024-03-29  8:01   ` Pavin Joseph
  2024-03-29  8:15     ` Ingo Molnar
  0 siblings, 1 reply; 10+ messages in thread
From: Pavin Joseph @ 2024-03-29  8:01 UTC (permalink / raw)
  To: Ingo Molnar, Steve Wahl
  Cc: Dave Hansen, Andy Lutomirski, Peter Zijlstra, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, x86, H. Peter Anvin, linux-kernel,
	Linux regressions mailing list, stable, Eric Hagberg,
	Simon Horman, Eric Biederman, Dave Young, Sarah Brofeldt,
	Russ Anderson, Dimitri Sivanich, Hou Wenlong, Andrew Morton,
	Baoquan He, Yuntao Wang, Bjorn Helgaas

On 3/29/24 12:45, Ingo Molnar wrote:
> Does the 'nogbpages' kernel command line option fail on these systems
> even outside of kexec (ie. regular boot), or only in combination with
> kexec?

Original reporter here, using nogbpages allows for normal bootup, but 
kexec fails with it on my two similar systems.

Pavin.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v4] x86/mm/ident_map: On UV systems, use gbpages only where full GB page should be mapped.
  2024-03-29  8:01   ` Pavin Joseph
@ 2024-03-29  8:15     ` Ingo Molnar
  2024-03-29  8:56       ` Pavin Joseph
  0 siblings, 1 reply; 10+ messages in thread
From: Ingo Molnar @ 2024-03-29  8:15 UTC (permalink / raw)
  To: Pavin Joseph
  Cc: Steve Wahl, Dave Hansen, Andy Lutomirski, Peter Zijlstra,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	H. Peter Anvin, linux-kernel, Linux regressions mailing list,
	stable, Eric Hagberg, Simon Horman, Eric Biederman, Dave Young,
	Sarah Brofeldt, Russ Anderson, Dimitri Sivanich, Hou Wenlong,
	Andrew Morton, Baoquan He, Yuntao Wang, Bjorn Helgaas


* Pavin Joseph <me@pavinjoseph.com> wrote:

> On 3/29/24 12:45, Ingo Molnar wrote:
> > Does the 'nogbpages' kernel command line option fail on these systems
> > even outside of kexec (ie. regular boot), or only in combination with
> > kexec?
> 
> Original reporter here, using nogbpages allows for normal bootup, but 
> kexec fails with it on my two similar systems.

Just to clarify, we have the following 3 upstream (and soon to be upstream) versions:

 v1: pre-d794734c9bbf kernels
 v2: d794734c9bbf x86/mm/ident_map: Use gbpages only where full GB page should be mapped.
 v3: c567f2948f57 Revert "x86/mm/ident_map: Use gbpages only where full GB page should be mapped."

Where v1 and v3 ought to be the same in behavior.

So how does the failure matrix look like on your systems? Is my 
understanding accurate:


           regular boot  | regular kexec | nogbpages boot | nogbpages kexec boot
 ------------------------|---------------------------------------------------
 v1:       OK            | OK            | OK             | FAIL
 v2:       OK            | FAIL          | FAIL           | FAIL

?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v4] x86/mm/ident_map: On UV systems, use gbpages only where full GB page should be mapped.
  2024-03-29  8:15     ` Ingo Molnar
@ 2024-03-29  8:56       ` Pavin Joseph
  2024-03-29 13:30         ` Ingo Molnar
  0 siblings, 1 reply; 10+ messages in thread
From: Pavin Joseph @ 2024-03-29  8:56 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Steve Wahl, Dave Hansen, Andy Lutomirski, Peter Zijlstra,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	H. Peter Anvin, linux-kernel, Linux regressions mailing list,
	stable, Eric Hagberg, Simon Horman, Eric Biederman, Dave Young,
	Sarah Brofeldt, Russ Anderson, Dimitri Sivanich, Hou Wenlong,
	Andrew Morton, Baoquan He, Yuntao Wang, Bjorn Helgaas

On 3/29/24 13:45, Ingo Molnar wrote:
> Just to clarify, we have the following 3 upstream (and soon to be upstream) versions:
> 
>   v1: pre-d794734c9bbf kernels
>   v2: d794734c9bbf x86/mm/ident_map: Use gbpages only where full GB page should be mapped.
>   v3: c567f2948f57 Revert "x86/mm/ident_map: Use gbpages only where full GB page should be mapped."
> 
> Where v1 and v3 ought to be the same in behavior.
> 
> So how does the failure matrix look like on your systems? Is my
> understanding accurate:
> 
> 
>             regular boot  | regular kexec | nogbpages boot | nogbpages kexec boot
>   ------------------------|---------------------------------------------------
>   v1:       OK            | OK            | OK             | FAIL
>   v2:       OK            | FAIL          | FAIL           | FAIL

Slight correction:

    regular boot  | regular kexec | nogbpages boot | nogbpages kexec boot
-----------------|---------------|----------------|------------------
v1:       OK     | OK            | OK             | FAIL
v2:       OK     | FAIL          | OK             | FAIL

Pavin.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v4] x86/mm/ident_map: On UV systems, use gbpages only where full GB page should be mapped.
  2024-03-29  8:56       ` Pavin Joseph
@ 2024-03-29 13:30         ` Ingo Molnar
  2024-03-31  3:55           ` Eric W. Biederman
  0 siblings, 1 reply; 10+ messages in thread
From: Ingo Molnar @ 2024-03-29 13:30 UTC (permalink / raw)
  To: Pavin Joseph
  Cc: Steve Wahl, Dave Hansen, Andy Lutomirski, Peter Zijlstra,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	H. Peter Anvin, linux-kernel, Linux regressions mailing list,
	stable, Eric Hagberg, Simon Horman, Eric Biederman, Dave Young,
	Sarah Brofeldt, Russ Anderson, Dimitri Sivanich, Hou Wenlong,
	Andrew Morton, Baoquan He, Yuntao Wang, Bjorn Helgaas


* Pavin Joseph <me@pavinjoseph.com> wrote:

> On 3/29/24 13:45, Ingo Molnar wrote:
> > Just to clarify, we have the following 3 upstream (and soon to be upstream) versions:
> > 
> >   v1: pre-d794734c9bbf kernels
> >   v2: d794734c9bbf x86/mm/ident_map: Use gbpages only where full GB page should be mapped.
> >   v3: c567f2948f57 Revert "x86/mm/ident_map: Use gbpages only where full GB page should be mapped."
> > 
> > Where v1 and v3 ought to be the same in behavior.
> > 
> > So how does the failure matrix look like on your systems? Is my
> > understanding accurate:

> Slight correction:
> 
>    regular boot  | regular kexec | nogbpages boot | nogbpages kexec boot
> -----------------|---------------|----------------|------------------
> v1:       OK     | OK            | OK             | FAIL
> v2:       OK     | FAIL          | OK             | FAIL

Thanks!

So the question is now: does anyone have a theory about in what fashion 
the kexec nogbpages bootup differs from the regular nogbpages bootup to 
break on your system?

I'd have expected the described root cause of the firmware not properly 
enumerating all memory areas that need to be mapped to cause trouble on 
regular, non-kexec nogbpages bootups too. What makes the kexec bootup 
special to trigger this crash?

	Ingo

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v4] x86/mm/ident_map: On UV systems, use gbpages only where full GB page should be mapped.
  2024-03-29 13:30         ` Ingo Molnar
@ 2024-03-31  3:55           ` Eric W. Biederman
  2024-04-01 13:30             ` Pavin Joseph
  0 siblings, 1 reply; 10+ messages in thread
From: Eric W. Biederman @ 2024-03-31  3:55 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Pavin Joseph, Steve Wahl, Dave Hansen, Andy Lutomirski,
	Peter Zijlstra, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	x86, H. Peter Anvin, linux-kernel, Linux regressions mailing list,
	stable, Eric Hagberg, Simon Horman, Dave Young, Sarah Brofeldt,
	Russ Anderson, Dimitri Sivanich, Hou Wenlong, Andrew Morton,
	Baoquan He, Yuntao Wang, Bjorn Helgaas

Ingo Molnar <mingo@kernel.org> writes:

> * Pavin Joseph <me@pavinjoseph.com> wrote:
>
>> On 3/29/24 13:45, Ingo Molnar wrote:
>> > Just to clarify, we have the following 3 upstream (and soon to be upstream) versions:
>> > 
>> >   v1: pre-d794734c9bbf kernels
>> >   v2: d794734c9bbf x86/mm/ident_map: Use gbpages only where full GB page should be mapped.
>> >   v3: c567f2948f57 Revert "x86/mm/ident_map: Use gbpages only where full GB page should be mapped."
>> > 
>> > Where v1 and v3 ought to be the same in behavior.
>> > 
>> > So how does the failure matrix look like on your systems? Is my
>> > understanding accurate:
>
>> Slight correction:
>> 
>>    regular boot  | regular kexec | nogbpages boot | nogbpages kexec boot
>> -----------------|---------------|----------------|------------------
>> v1:       OK     | OK            | OK             | FAIL
>> v2:       OK     | FAIL          | OK             | FAIL
>
> Thanks!
>
> So the question is now: does anyone have a theory about in what fashion 
> the kexec nogbpages bootup differs from the regular nogbpages bootup to 
> break on your system?
>
> I'd have expected the described root cause of the firmware not properly 
> enumerating all memory areas that need to be mapped to cause trouble on 
> regular, non-kexec nogbpages bootups too. What makes the kexec bootup 
> special to trigger this crash?

My blind hunch would be something in the first 1MiB being different.
The first 1MiB is where all of the historical stuff is and where
I have seen historical memory maps be less than perfectly accurate.

Changing what is mapped being the difference between success and failure
sounds like some place that is dark and hard to debug a page fault is
being triggered and that in turn becoming a triple fault.

Paving Joseph is there any chance you can provide your memory map?
Perhaps just cat /proc/iomem?

If I have something to go one other than works/doesn't work I can
probably say something intelligent.

Eric

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v4] x86/mm/ident_map: On UV systems, use gbpages only where full GB page should be mapped.
  2024-03-31  3:55           ` Eric W. Biederman
@ 2024-04-01 13:30             ` Pavin Joseph
  0 siblings, 0 replies; 10+ messages in thread
From: Pavin Joseph @ 2024-04-01 13:30 UTC (permalink / raw)
  To: Eric W. Biederman, Ingo Molnar
  Cc: Steve Wahl, Dave Hansen, Andy Lutomirski, Peter Zijlstra,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	H. Peter Anvin, linux-kernel, Linux regressions mailing list,
	stable, Eric Hagberg, Simon Horman, Dave Young, Sarah Brofeldt,
	Russ Anderson, Dimitri Sivanich, Hou Wenlong, Andrew Morton,
	Baoquan He, Yuntao Wang, Bjorn Helgaas

Hi Eric,

Here's the output of /proc/iomem:

suse-laptop:~ # cat /proc/iomem
00000000-00000fff : Reserved
00001000-0009221f : System RAM
00092220-0009229f : System RAM
000922a0-0009828f : System RAM
00098290-0009829f : System RAM
000982a0-0009efff : System RAM
0009f000-0009ffff : Reserved
000e0000-000fffff : Reserved
   000a0000-000effff : PCI Bus 0000:00
   000f0000-000fffff : System ROM
00100000-09bfffff : System RAM
   06200000-071fffff : Kernel code
   07200000-07e6dfff : Kernel rodata
   08000000-082e3eff : Kernel data
   08ba8000-08ffffff : Kernel bss
09c00000-09d90fff : Reserved
09d91000-09efffff : System RAM
09f00000-09f0efff : ACPI Non-volatile Storage
09f0f000-bf5a2017 : System RAM
   ba000000-be7fffff : Crash kernel
bf5a2018-bf5af857 : System RAM
bf5af858-c3a60fff : System RAM
c3a61000-c3b54fff : Reserved
c3b55000-c443dfff : System RAM
c443e000-c443efff : Reserved
c443f000-c51adfff : System RAM
c51ae000-c51aefff : Reserved
c51af000-c747dfff : System RAM
c747e000-cb67dfff : Reserved
   cb669000-cb66cfff : MSFT0101:00
     cb669000-cb66cfff : MSFT0101:00
   cb66d000-cb670fff : MSFT0101:00
     cb66d000-cb670fff : MSFT0101:00
cb67e000-cd77dfff : ACPI Non-volatile Storage
cd77e000-cd7fdfff : ACPI Tables
cd7fe000-ce7fffff : System RAM
ce800000-cfffffff : Reserved
d0000000-f7ffffff : PCI Bus 0000:00
f8000000-fbffffff : PCI ECAM 0000 [bus 00-3f]
   f8000000-fbffffff : Reserved
     f8000000-fbffffff : pnp 00:00
fc000000-fdffffff : PCI Bus 0000:00
   fd000000-fd0fffff : PCI Bus 0000:05
     fd000000-fd0007ff : 0000:05:00.1
       fd000000-fd0007ff : ahci
     fd001000-fd0017ff : 0000:05:00.0
       fd001000-fd0017ff : ahci
   fd100000-fd4fffff : PCI Bus 0000:04
     fd100000-fd1fffff : 0000:04:00.3
       fd100000-fd1fffff : xhci-hcd
     fd200000-fd2fffff : 0000:04:00.4
       fd200000-fd2fffff : xhci-hcd
     fd300000-fd3fffff : 0000:04:00.2
       fd300000-fd3fffff : ccp
     fd400000-fd47ffff : 0000:04:00.0
     fd480000-fd4bffff : 0000:04:00.5
     fd4c0000-fd4c7fff : 0000:04:00.6
       fd4c0000-fd4c7fff : ICH HD audio
     fd4c8000-fd4cbfff : 0000:04:00.1
       fd4c8000-fd4cbfff : ICH HD audio
     fd4cc000-fd4cdfff : 0000:04:00.2
       fd4cc000-fd4cdfff : ccp
   fd500000-fd5fffff : PCI Bus 0000:03
     fd500000-fd503fff : 0000:03:00.0
       fd500000-fd503fff : nvme
   fd600000-fd6fffff : PCI Bus 0000:02
     fd600000-fd60ffff : 0000:02:00.0
       fd600000-fd60ffff : rtw88_pci
   fd700000-fd7fffff : PCI Bus 0000:01
     fd700000-fd703fff : 0000:01:00.0
     fd704000-fd704fff : 0000:01:00.0
       fd704000-fd704fff : r8169
   fde10510-fde1053f : MSFT0101:00
   fdf00000-fdf7ffff : amd_iommu
feb00000-feb00007 : SB800 TCO
fec00000-fec003ff : IOAPIC 0
fec01000-fec013ff : IOAPIC 1
fec10000-fec1001f : pnp 00:04
fed00000-fed003ff : HPET 2
   fed00000-fed003ff : PNP0103:00
     fed00000-fed003ff : pnp 00:04
fed61000-fed613ff : pnp 00:04
fed80000-fed80fff : Reserved
   fed80000-fed80fff : pnp 00:04
fed81200-fed812ff : AMDI0030:00
fed81500-fed818ff : AMDI0030:00
   fed81500-fed818ff : AMDI0030:00 AMDI0030:00
fedc2000-fedc2fff : AMDI0010:00
   fedc2000-fedc2fff : AMDI0010:00 AMDI0010:00
fedc3000-fedc3fff : AMDI0010:01
   fedc3000-fedc3fff : AMDI0010:01 AMDI0010:01
fedc4000-fedc4fff : AMDI0010:02
   fedc4000-fedc4fff : AMDI0010:02 AMDI0010:02
fee00000-fee00fff : pnp 00:00
ff000000-ffffffff : pnp 00:04
100000000-3af37ffff : System RAM
   399000000-3ae4fffff : Crash kernel
3af380000-42fffffff : Reserved
430000000-ffffffffff : PCI Bus 0000:00
   460000000-4701fffff : PCI Bus 0000:04
     460000000-46fffffff : 0000:04:00.0
     470000000-4701fffff : 0000:04:00.0
3fff80000000-3fffffffffff : 0000:04:00.0


Thanks for creating kexec btw, it's invaluable for systems with slow 
firmware and loader 🚀

Pavin.

On 3/31/24 09:25, Eric W. Biederman wrote:
> Ingo Molnar <mingo@kernel.org> writes:
> 
>> * Pavin Joseph <me@pavinjoseph.com> wrote:
>>
>>> On 3/29/24 13:45, Ingo Molnar wrote:
>>>> Just to clarify, we have the following 3 upstream (and soon to be upstream) versions:
>>>>
>>>>    v1: pre-d794734c9bbf kernels
>>>>    v2: d794734c9bbf x86/mm/ident_map: Use gbpages only where full GB page should be mapped.
>>>>    v3: c567f2948f57 Revert "x86/mm/ident_map: Use gbpages only where full GB page should be mapped."
>>>>
>>>> Where v1 and v3 ought to be the same in behavior.
>>>>
>>>> So how does the failure matrix look like on your systems? Is my
>>>> understanding accurate:
>>
>>> Slight correction:
>>>
>>>     regular boot  | regular kexec | nogbpages boot | nogbpages kexec boot
>>> -----------------|---------------|----------------|------------------
>>> v1:       OK     | OK            | OK             | FAIL
>>> v2:       OK     | FAIL          | OK             | FAIL
>>
>> Thanks!
>>
>> So the question is now: does anyone have a theory about in what fashion
>> the kexec nogbpages bootup differs from the regular nogbpages bootup to
>> break on your system?
>>
>> I'd have expected the described root cause of the firmware not properly
>> enumerating all memory areas that need to be mapped to cause trouble on
>> regular, non-kexec nogbpages bootups too. What makes the kexec bootup
>> special to trigger this crash?
> 
> My blind hunch would be something in the first 1MiB being different.
> The first 1MiB is where all of the historical stuff is and where
> I have seen historical memory maps be less than perfectly accurate.
> 
> Changing what is mapped being the difference between success and failure
> sounds like some place that is dark and hard to debug a page fault is
> being triggered and that in turn becoming a triple fault.
> 
> Paving Joseph is there any chance you can provide your memory map?
> Perhaps just cat /proc/iomem?
> 
> If I have something to go one other than works/doesn't work I can
> probably say something intelligent.
> 
> Eric

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2024-04-01 13:31 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-28 16:06 [PATCH v4] x86/mm/ident_map: On UV systems, use gbpages only where full GB page should be mapped Steve Wahl
2024-03-28 16:10 ` kernel test robot
2024-03-28 16:17 ` Steve Wahl
2024-03-29  7:15 ` Ingo Molnar
2024-03-29  8:01   ` Pavin Joseph
2024-03-29  8:15     ` Ingo Molnar
2024-03-29  8:56       ` Pavin Joseph
2024-03-29 13:30         ` Ingo Molnar
2024-03-31  3:55           ` Eric W. Biederman
2024-04-01 13:30             ` Pavin Joseph

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.