All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
From: Christophe Leroy <christophe.leroy@csgroup.eu>
To: Erhard Furtner <erhard_f@mailbox.org>,
	Nicholas Piggin <npiggin@gmail.com>
Cc: "linux-mm@kvack.org" <linux-mm@kvack.org>,
	"linuxppc-dev@lists.ozlabs.org" <linuxppc-dev@lists.ozlabs.org>,
	Rohan McLure <rmclure@linux.ibm.com>
Subject: Re: BUG: Bad page map in process init pte:c0ab684c pmd:01182000 (on a PowerMac G4 DP)
Date: Thu, 29 Feb 2024 17:11:28 +0000	[thread overview]
Message-ID: <707f617f-45c8-4fa1-83aa-779f2b542871@csgroup.eu> (raw)
In-Reply-To: <20240229020941.2b30abe0@yea>



Le 29/02/2024 à 02:09, Erhard Furtner a écrit :
> On Mon, 12 Dec 2022 14:31:35 +1000
> "Nicholas Piggin" <npiggin@gmail.com> wrote:
> 
>> On Thu Dec 1, 2022 at 7:44 AM AEST, Erhard F. wrote:
>>> Getting this at boot sometimes, but not always (PowerMac G4 DP, kernel 6.0.9):
>>>
>>> [...]
>>> Freeing unused kernel image (initmem) memory: 1328K
>>> Checked W+X mappings: passed, no W+X pages found
>>> rodata_test: all tests were successful
>>> Run /sbin/init as init process
>>> _swap_info_get: Bad swap file entry 24c0ab68
>>> BUG: Bad page map in process init  pte:c0ab684c pmd:01182000
>>
>> Have you run memtest on the system? Are the messages related to a
>> kernel upgrade? This and your KASAN bugs look possibly like random
>> corruption.
>>
>> Although with that KASAN one it's strange that kernfs_node_cache
>> was involved both times, it's strange that page tables are pointing
>> to that same slab memory. It could be a page table page use-after
>> -free maybe? Maybe with the page table fragment code. I'm sure other
>> people would have hit that before though, so I don't know what to
>> suggest.
>>
>> Thanks,
>> Nick
> 
> Revisited the issue on kernel v6.8-rc6 and I can still reproduce it.
> 
> Short summary as my last post was over a year ago:
>   (x) I get this memory corruption only when CONFIG_VMAP_STACK=y and CONFIG_SMP=y is enabled.
>   (x) I don't get this memory corruption when only one of the above is enabled. ^^
>   (x) memtester says the 2 GiB RAM in my G4 DP are fine.
>   (x) I don't get this issue on my G5 11,2 or Talos II.
>   (x) "stress -m 2 --vm-bytes 965M" provokes the issue in < 10 secs. (https://salsa.debian.org/debian/stress)
> 
> For the test I used CONFIG_KASAN_INLINE=y for v6.8-rc6 and debug_pagealloc=on, page_owner=on and got this dmesg:
> 
> [...]
> pagealloc: memory corruption
> f5fcfff0: 00 00 00 00                                      ....
> CPU: 1 PID: 1788 Comm: stress Tainted: G    B              6.8.0-rc6-PMacG4 #15
> Hardware name: PowerMac3,6 7455 0x80010303 PowerMac
> Call Trace:
> [f3bfbac0] [c162a8e8] dump_stack_lvl+0x60/0x94 (unreliable)
> [f3bfbae0] [c04edf9c] __kernel_unpoison_pages+0x1e0/0x1f0
> [f3bfbb30] [c04a8aa0] post_alloc_hook+0xe0/0x174
> [f3bfbb60] [c04a8b58] prep_new_page+0x24/0xbc
> [f3bfbb80] [c04abcc4] get_page_from_freelist+0xcd0/0xf10
> [f3bfbc50] [c04aecd8] __alloc_pages+0x204/0xe2c
> [f3bfbda0] [c04b07a8] __folio_alloc+0x18/0x88
> [f3bfbdc0] [c0461a10] vma_alloc_zeroed_movable_folio.isra.0+0x2c/0x6c
> [f3bfbde0] [c046bb90] handle_mm_fault+0x91c/0x19ac
> [f3bfbec0] [c0047b8c] ___do_page_fault+0x93c/0xc14
> [f3bfbf10] [c0048278] do_page_fault+0x28/0x60
> [f3bfbf30] [c000433c] DataAccess_virt+0x124/0x17c
> --- interrupt: 300 at 0xbe30d8
> NIP:  00be30d8 LR: 00be30b4 CTR: 00000000
> REGS: f3bfbf40 TRAP: 0300   Tainted: G    B               (6.8.0-rc6-PMacG4)
> MSR:  0000d032 <EE,PR,ME,IR,DR,RI>  CR: 20882464  XER: 00000000
> DAR: 88c7a010 DSISR: 42000000
> GPR00: 00be30b4 af8397d0 a78436c0 6b2ee010 3c500000 20224462 fe77f7e1 00b00264
> GPR08: 1d98d000 1d98c000 00000000 40ae256a 20882262 00bffff4 00000000 00000000
> GPR16: 00000000 00000002 00000000 0000005a 40802262 80002262 40002262 00c000a4
> GPR24: ffffffff ffffffff 3c500000 00000000 00000000 6b2ee010 00c07d64 00001000
> NIP [00be30d8] 0xbe30d8
> LR [00be30b4] 0xbe30b4
> --- interrupt: 300
> page:ef4bd92c refcount:1 mapcount:0 mapping:00000000 index:0x1 pfn:0x310b3
> flags: 0x80000000(zone=2)
> page_type: 0xffffffff()
> raw: 80000000 00000100 00000122 00000000 00000001 00000000 ffffffff 00000001
> raw: 00000000
> page dumped because: pagealloc: corrupted page details
> page_owner info is not present (never set?)
> swapper/1: page allocation failure: order:0, mode:0x820(GFP_ATOMIC), nodemask=(null),cpuset=/,mems_allowed=0
> CPU: 1 PID: 0 Comm: swapper/1 Tainted: G    B              6.8.0-rc6-PMacG4 #15
> Hardware name: PowerMac3,6 7455 0x80010303 PowerMac
> Call Trace:
> [f101b9d0] [c162a8e8] dump_stack_lvl+0x60/0x94 (unreliable)
> [f101b9f0] [c04ae948] warn_alloc+0x154/0x2e0
> [f101bab0] [c04af030] __alloc_pages+0x55c/0xe2c
> SLUB: Unable to allocate memory on node -1, gfp=0x820(GFP_ATOMIC)
>    cache: skbuff_head_cache, object size: 176, buffer size: 288, default order: 0, min order: 0
>    node 0: slabs: 509, objs: 7126, free: 0
> SLUB: Unable to allocate memory on node -1, gfp=0x820(GFP_ATOMIC)
>    cache: skbuff_head_cache, object size: 176, buffer size: 288, default order: 0, min order: 0
>    node 0: slabs: 509, objs: 7126, free: 0
> SLUB: Unable to allocate memory on node -1, gfp=0x820(GFP_ATOMIC)
>    cache: skbuff_head_cache, object size: 176, buffer size: 288, default order: 0, min order: 0
>    node 0: slabs: 509, objs: 7126, free: 0
> SLUB: Unable to allocate memory on node -1, gfp=0x820(GFP_ATOMIC)
>    cache: skbuff_head_cache, object size: 176, buffer size: 288, default order: 0, min order: 0
>    node 0: slabs: 509, objs: 7126, free: 0
> SLUB: Unable to allocate memory on node -1, gfp=0x820(GFP_ATOMIC)
>    cache: skbuff_head_cache, object size: 176, buffer size: 288, default order: 0, min order: 0
>    node 0: slabs: 509, objs: 7126, free: 0
> SLUB: Unable to allocate memory on node -1, gfp=0x820(GFP_ATOMIC)
>    cache: skbuff_head_cache, object size: 176, buffer size: 288, default order: 0, min order: 0
>    node 0: slabs: 509, objs: 7126, free: 0
> SLUB: Unable to allocate memory on node -1, gfp=0x820(GFP_ATOMIC)
>    cache: skbuff_head_cache, object size: 176, buffer size: 288, default order: 0, min order: 0
>    node 0: slabs: 509, objs: 7126, free: 0
> SLUB: Unable to allocate memory on node -1, gfp=0x820(GFP_ATOMIC)
>    cache: skbuff_head_cache, object size: 176, buffer size: 288, default order: 0, min order: 0
>    node 0: slabs: 509, objs: 7126, free: 0
> SLUB: Unable to allocate memory on node -1, gfp=0x820(GFP_ATOMIC)
>    cache: skbuff_head_cache, object size: 176, buffer size: 288, default order: 0, min order: 0
>    node 0: slabs: 509, objs: 7126, free: 0
> SLUB: Unable to allocate memory on node -1, gfp=0x820(GFP_ATOMIC)
>    cache: skbuff_head_cache, object size: 176, buffer size: 288, default order: 0, min order: 0
>    node 0: slabs: 509, objs: 7126, free: 0
> [...]
> 
> New findings:
>   (x) The page corruption only shows up the 1st time I run "stress -m 2 --vm-bytes 965M". When I quit and restart stress no additional page corruption shows up.
>   (x) The page corruption shows up shortly after I run "stress -m 2 --vm-bytes 965M" but no additional page corruption shows up afterwards, even if left running for 30min.
> 
> 
> For additional testing I thought it would be a good idea to try "modprobe test_vmalloc" but this remained inconclusive. Sometimes a 'BUG: Unable to handle kernel data access on read at 0xe0000000' like this shows up but not always:
> 

Interesting.

I guess 0xe0000000 is where linear RAM starts to be mapped with pages ? 
Can you confirm with a dump of 
/sys/kernel/debug/powerpc/block_address_translation ?

Do we have a problem of race with hash table ?

Would KCSAN help with that ?

Christophe

WARNING: multiple messages have this Message-ID (diff)
From: Christophe Leroy <christophe.leroy@csgroup.eu>
To: Erhard Furtner <erhard_f@mailbox.org>,
	Nicholas Piggin <npiggin@gmail.com>
Cc: "linux-mm@kvack.org" <linux-mm@kvack.org>,
	Rohan McLure <rmclure@linux.ibm.com>,
	"linuxppc-dev@lists.ozlabs.org" <linuxppc-dev@lists.ozlabs.org>
Subject: Re: BUG: Bad page map in process init pte:c0ab684c pmd:01182000 (on a PowerMac G4 DP)
Date: Thu, 29 Feb 2024 17:11:28 +0000	[thread overview]
Message-ID: <707f617f-45c8-4fa1-83aa-779f2b542871@csgroup.eu> (raw)
In-Reply-To: <20240229020941.2b30abe0@yea>



Le 29/02/2024 à 02:09, Erhard Furtner a écrit :
> On Mon, 12 Dec 2022 14:31:35 +1000
> "Nicholas Piggin" <npiggin@gmail.com> wrote:
> 
>> On Thu Dec 1, 2022 at 7:44 AM AEST, Erhard F. wrote:
>>> Getting this at boot sometimes, but not always (PowerMac G4 DP, kernel 6.0.9):
>>>
>>> [...]
>>> Freeing unused kernel image (initmem) memory: 1328K
>>> Checked W+X mappings: passed, no W+X pages found
>>> rodata_test: all tests were successful
>>> Run /sbin/init as init process
>>> _swap_info_get: Bad swap file entry 24c0ab68
>>> BUG: Bad page map in process init  pte:c0ab684c pmd:01182000
>>
>> Have you run memtest on the system? Are the messages related to a
>> kernel upgrade? This and your KASAN bugs look possibly like random
>> corruption.
>>
>> Although with that KASAN one it's strange that kernfs_node_cache
>> was involved both times, it's strange that page tables are pointing
>> to that same slab memory. It could be a page table page use-after
>> -free maybe? Maybe with the page table fragment code. I'm sure other
>> people would have hit that before though, so I don't know what to
>> suggest.
>>
>> Thanks,
>> Nick
> 
> Revisited the issue on kernel v6.8-rc6 and I can still reproduce it.
> 
> Short summary as my last post was over a year ago:
>   (x) I get this memory corruption only when CONFIG_VMAP_STACK=y and CONFIG_SMP=y is enabled.
>   (x) I don't get this memory corruption when only one of the above is enabled. ^^
>   (x) memtester says the 2 GiB RAM in my G4 DP are fine.
>   (x) I don't get this issue on my G5 11,2 or Talos II.
>   (x) "stress -m 2 --vm-bytes 965M" provokes the issue in < 10 secs. (https://salsa.debian.org/debian/stress)
> 
> For the test I used CONFIG_KASAN_INLINE=y for v6.8-rc6 and debug_pagealloc=on, page_owner=on and got this dmesg:
> 
> [...]
> pagealloc: memory corruption
> f5fcfff0: 00 00 00 00                                      ....
> CPU: 1 PID: 1788 Comm: stress Tainted: G    B              6.8.0-rc6-PMacG4 #15
> Hardware name: PowerMac3,6 7455 0x80010303 PowerMac
> Call Trace:
> [f3bfbac0] [c162a8e8] dump_stack_lvl+0x60/0x94 (unreliable)
> [f3bfbae0] [c04edf9c] __kernel_unpoison_pages+0x1e0/0x1f0
> [f3bfbb30] [c04a8aa0] post_alloc_hook+0xe0/0x174
> [f3bfbb60] [c04a8b58] prep_new_page+0x24/0xbc
> [f3bfbb80] [c04abcc4] get_page_from_freelist+0xcd0/0xf10
> [f3bfbc50] [c04aecd8] __alloc_pages+0x204/0xe2c
> [f3bfbda0] [c04b07a8] __folio_alloc+0x18/0x88
> [f3bfbdc0] [c0461a10] vma_alloc_zeroed_movable_folio.isra.0+0x2c/0x6c
> [f3bfbde0] [c046bb90] handle_mm_fault+0x91c/0x19ac
> [f3bfbec0] [c0047b8c] ___do_page_fault+0x93c/0xc14
> [f3bfbf10] [c0048278] do_page_fault+0x28/0x60
> [f3bfbf30] [c000433c] DataAccess_virt+0x124/0x17c
> --- interrupt: 300 at 0xbe30d8
> NIP:  00be30d8 LR: 00be30b4 CTR: 00000000
> REGS: f3bfbf40 TRAP: 0300   Tainted: G    B               (6.8.0-rc6-PMacG4)
> MSR:  0000d032 <EE,PR,ME,IR,DR,RI>  CR: 20882464  XER: 00000000
> DAR: 88c7a010 DSISR: 42000000
> GPR00: 00be30b4 af8397d0 a78436c0 6b2ee010 3c500000 20224462 fe77f7e1 00b00264
> GPR08: 1d98d000 1d98c000 00000000 40ae256a 20882262 00bffff4 00000000 00000000
> GPR16: 00000000 00000002 00000000 0000005a 40802262 80002262 40002262 00c000a4
> GPR24: ffffffff ffffffff 3c500000 00000000 00000000 6b2ee010 00c07d64 00001000
> NIP [00be30d8] 0xbe30d8
> LR [00be30b4] 0xbe30b4
> --- interrupt: 300
> page:ef4bd92c refcount:1 mapcount:0 mapping:00000000 index:0x1 pfn:0x310b3
> flags: 0x80000000(zone=2)
> page_type: 0xffffffff()
> raw: 80000000 00000100 00000122 00000000 00000001 00000000 ffffffff 00000001
> raw: 00000000
> page dumped because: pagealloc: corrupted page details
> page_owner info is not present (never set?)
> swapper/1: page allocation failure: order:0, mode:0x820(GFP_ATOMIC), nodemask=(null),cpuset=/,mems_allowed=0
> CPU: 1 PID: 0 Comm: swapper/1 Tainted: G    B              6.8.0-rc6-PMacG4 #15
> Hardware name: PowerMac3,6 7455 0x80010303 PowerMac
> Call Trace:
> [f101b9d0] [c162a8e8] dump_stack_lvl+0x60/0x94 (unreliable)
> [f101b9f0] [c04ae948] warn_alloc+0x154/0x2e0
> [f101bab0] [c04af030] __alloc_pages+0x55c/0xe2c
> SLUB: Unable to allocate memory on node -1, gfp=0x820(GFP_ATOMIC)
>    cache: skbuff_head_cache, object size: 176, buffer size: 288, default order: 0, min order: 0
>    node 0: slabs: 509, objs: 7126, free: 0
> SLUB: Unable to allocate memory on node -1, gfp=0x820(GFP_ATOMIC)
>    cache: skbuff_head_cache, object size: 176, buffer size: 288, default order: 0, min order: 0
>    node 0: slabs: 509, objs: 7126, free: 0
> SLUB: Unable to allocate memory on node -1, gfp=0x820(GFP_ATOMIC)
>    cache: skbuff_head_cache, object size: 176, buffer size: 288, default order: 0, min order: 0
>    node 0: slabs: 509, objs: 7126, free: 0
> SLUB: Unable to allocate memory on node -1, gfp=0x820(GFP_ATOMIC)
>    cache: skbuff_head_cache, object size: 176, buffer size: 288, default order: 0, min order: 0
>    node 0: slabs: 509, objs: 7126, free: 0
> SLUB: Unable to allocate memory on node -1, gfp=0x820(GFP_ATOMIC)
>    cache: skbuff_head_cache, object size: 176, buffer size: 288, default order: 0, min order: 0
>    node 0: slabs: 509, objs: 7126, free: 0
> SLUB: Unable to allocate memory on node -1, gfp=0x820(GFP_ATOMIC)
>    cache: skbuff_head_cache, object size: 176, buffer size: 288, default order: 0, min order: 0
>    node 0: slabs: 509, objs: 7126, free: 0
> SLUB: Unable to allocate memory on node -1, gfp=0x820(GFP_ATOMIC)
>    cache: skbuff_head_cache, object size: 176, buffer size: 288, default order: 0, min order: 0
>    node 0: slabs: 509, objs: 7126, free: 0
> SLUB: Unable to allocate memory on node -1, gfp=0x820(GFP_ATOMIC)
>    cache: skbuff_head_cache, object size: 176, buffer size: 288, default order: 0, min order: 0
>    node 0: slabs: 509, objs: 7126, free: 0
> SLUB: Unable to allocate memory on node -1, gfp=0x820(GFP_ATOMIC)
>    cache: skbuff_head_cache, object size: 176, buffer size: 288, default order: 0, min order: 0
>    node 0: slabs: 509, objs: 7126, free: 0
> SLUB: Unable to allocate memory on node -1, gfp=0x820(GFP_ATOMIC)
>    cache: skbuff_head_cache, object size: 176, buffer size: 288, default order: 0, min order: 0
>    node 0: slabs: 509, objs: 7126, free: 0
> [...]
> 
> New findings:
>   (x) The page corruption only shows up the 1st time I run "stress -m 2 --vm-bytes 965M". When I quit and restart stress no additional page corruption shows up.
>   (x) The page corruption shows up shortly after I run "stress -m 2 --vm-bytes 965M" but no additional page corruption shows up afterwards, even if left running for 30min.
> 
> 
> For additional testing I thought it would be a good idea to try "modprobe test_vmalloc" but this remained inconclusive. Sometimes a 'BUG: Unable to handle kernel data access on read at 0xe0000000' like this shows up but not always:
> 

Interesting.

I guess 0xe0000000 is where linear RAM starts to be mapped with pages ? 
Can you confirm with a dump of 
/sys/kernel/debug/powerpc/block_address_translation ?

Do we have a problem of race with hash table ?

Would KCSAN help with that ?

Christophe

  reply	other threads:[~2024-02-29 17:11 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-30 21:44 BUG: Bad page map in process init pte:c0ab684c pmd:01182000 (on a PowerMac G4 DP) Erhard F.
2022-12-12  4:31 ` Nicholas Piggin
2022-12-12 22:17   ` Erhard F.
2022-12-17 21:39   ` Erhard F.
2022-12-18 11:38     ` Christophe Leroy
2022-12-18 22:47       ` Erhard F.
2022-12-31 17:22   ` Erhard F.
2024-02-29  1:09   ` Erhard Furtner
2024-02-29 17:11     ` Christophe Leroy [this message]
2024-02-29 17:11       ` Christophe Leroy
2024-03-05  1:29       ` Erhard Furtner
2024-03-05  1:29         ` Erhard Furtner
2024-03-05  1:57       ` Erhard Furtner
2024-03-05  1:57         ` Erhard Furtner
2024-04-17  0:56       ` Erhard Furtner
2024-04-17  0:56         ` Erhard Furtner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=707f617f-45c8-4fa1-83aa-779f2b542871@csgroup.eu \
    --to=christophe.leroy@csgroup.eu \
    --cc=erhard_f@mailbox.org \
    --cc=linux-mm@kvack.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=npiggin@gmail.com \
    --cc=rmclure@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.