On Thu, 29 Feb 2024 17:11:28 +0000
Christophe Leroy <christophe.leroy@csgroup.eu> wrote:

> > Revisited the issue on kernel v6.8-rc6 and I can still reproduce it.
> > 
> > Short summary as my last post was over a year ago:
> >   (x) I get this memory corruption only when CONFIG_VMAP_STACK=y and CONFIG_SMP=y is enabled.
> >   (x) I don't get this memory corruption when only one of the above is enabled. ^^
> >   (x) memtester says the 2 GiB RAM in my G4 DP are fine.
> >   (x) I don't get this issue on my G5 11,2 or Talos II.
> >   (x) "stress -m 2 --vm-bytes 965M" provokes the issue in < 10 secs. (https://salsa.debian.org/debian/stress)
> > 
> > For the test I used CONFIG_KASAN_INLINE=y for v6.8-rc6 and debug_pagealloc=on, page_owner=on and got this dmesg:
> > 
> > [...]
> > pagealloc: memory corruption
> > f5fcfff0: 00 00 00 00                                      ....
> > CPU: 1 PID: 1788 Comm: stress Tainted: G    B              6.8.0-rc6-PMacG4 #15
> > Hardware name: PowerMac3,6 7455 0x80010303 PowerMac
> > Call Trace:
> > [f3bfbac0] [c162a8e8] dump_stack_lvl+0x60/0x94 (unreliable)
> > [f3bfbae0] [c04edf9c] __kernel_unpoison_pages+0x1e0/0x1f0
> > [f3bfbb30] [c04a8aa0] post_alloc_hook+0xe0/0x174
> > [f3bfbb60] [c04a8b58] prep_new_page+0x24/0xbc
> > [f3bfbb80] [c04abcc4] get_page_from_freelist+0xcd0/0xf10
> > [f3bfbc50] [c04aecd8] __alloc_pages+0x204/0xe2c
> > [f3bfbda0] [c04b07a8] __folio_alloc+0x18/0x88
> > [f3bfbdc0] [c0461a10] vma_alloc_zeroed_movable_folio.isra.0+0x2c/0x6c
> > [f3bfbde0] [c046bb90] handle_mm_fault+0x91c/0x19ac
> > [f3bfbec0] [c0047b8c] ___do_page_fault+0x93c/0xc14
> > [f3bfbf10] [c0048278] do_page_fault+0x28/0x60
> > [f3bfbf30] [c000433c] DataAccess_virt+0x124/0x17c
> > --- interrupt: 300 at 0xbe30d8
> > NIP:  00be30d8 LR: 00be30b4 CTR: 00000000
> > REGS: f3bfbf40 TRAP: 0300   Tainted: G    B               (6.8.0-rc6-PMacG4)
> > MSR:  0000d032 <EE,PR,ME,IR,DR,RI>  CR: 20882464  XER: 00000000
> > DAR: 88c7a010 DSISR: 42000000
> > GPR00: 00be30b4 af8397d0 a78436c0 6b2ee010 3c500000 20224462 fe77f7e1 00b00264
> > GPR08: 1d98d000 1d98c000 00000000 40ae256a 20882262 00bffff4 00000000 00000000
> > GPR16: 00000000 00000002 00000000 0000005a 40802262 80002262 40002262 00c000a4
> > GPR24: ffffffff ffffffff 3c500000 00000000 00000000 6b2ee010 00c07d64 00001000
> > NIP [00be30d8] 0xbe30d8
> > LR [00be30b4] 0xbe30b4
> > --- interrupt: 300
> > page:ef4bd92c refcount:1 mapcount:0 mapping:00000000 index:0x1 pfn:0x310b3
> > flags: 0x80000000(zone=2)
> > page_type: 0xffffffff()
> > raw: 80000000 00000100 00000122 00000000 00000001 00000000 ffffffff 00000001
> > raw: 00000000
> > page dumped because: pagealloc: corrupted page details
> > page_owner info is not present (never set?)
> > swapper/1: page allocation failure: order:0, mode:0x820(GFP_ATOMIC), nodemask=(null),cpuset=/,mems_allowed=0
> > CPU: 1 PID: 0 Comm: swapper/1 Tainted: G    B              6.8.0-rc6-PMacG4 #15
> > Hardware name: PowerMac3,6 7455 0x80010303 PowerMac
> > Call Trace:
> > [f101b9d0] [c162a8e8] dump_stack_lvl+0x60/0x94 (unreliable)
> > [f101b9f0] [c04ae948] warn_alloc+0x154/0x2e0
> > [f101bab0] [c04af030] __alloc_pages+0x55c/0xe2c
> > SLUB: Unable to allocate memory on node -1, gfp=0x820(GFP_ATOMIC)
> >    cache: skbuff_head_cache, object size: 176, buffer size: 288, default order: 0, min order: 0
> >    node 0: slabs: 509, objs: 7126, free: 0
> > SLUB: Unable to allocate memory on node -1, gfp=0x820(GFP_ATOMIC)
> >    cache: skbuff_head_cache, object size: 176, buffer size: 288, default order: 0, min order: 0
> >    node 0: slabs: 509, objs: 7126, free: 0
> > SLUB: Unable to allocate memory on node -1, gfp=0x820(GFP_ATOMIC)
> >    cache: skbuff_head_cache, object size: 176, buffer size: 288, default order: 0, min order: 0
> >    node 0: slabs: 509, objs: 7126, free: 0
> > SLUB: Unable to allocate memory on node -1, gfp=0x820(GFP_ATOMIC)
> >    cache: skbuff_head_cache, object size: 176, buffer size: 288, default order: 0, min order: 0
> >    node 0: slabs: 509, objs: 7126, free: 0
> > SLUB: Unable to allocate memory on node -1, gfp=0x820(GFP_ATOMIC)
> >    cache: skbuff_head_cache, object size: 176, buffer size: 288, default order: 0, min order: 0
> >    node 0: slabs: 509, objs: 7126, free: 0
> > SLUB: Unable to allocate memory on node -1, gfp=0x820(GFP_ATOMIC)
> >    cache: skbuff_head_cache, object size: 176, buffer size: 288, default order: 0, min order: 0
> >    node 0: slabs: 509, objs: 7126, free: 0
> > SLUB: Unable to allocate memory on node -1, gfp=0x820(GFP_ATOMIC)
> >    cache: skbuff_head_cache, object size: 176, buffer size: 288, default order: 0, min order: 0
> >    node 0: slabs: 509, objs: 7126, free: 0
> > SLUB: Unable to allocate memory on node -1, gfp=0x820(GFP_ATOMIC)
> >    cache: skbuff_head_cache, object size: 176, buffer size: 288, default order: 0, min order: 0
> >    node 0: slabs: 509, objs: 7126, free: 0
> > SLUB: Unable to allocate memory on node -1, gfp=0x820(GFP_ATOMIC)
> >    cache: skbuff_head_cache, object size: 176, buffer size: 288, default order: 0, min order: 0
> >    node 0: slabs: 509, objs: 7126, free: 0
> > SLUB: Unable to allocate memory on node -1, gfp=0x820(GFP_ATOMIC)
> >    cache: skbuff_head_cache, object size: 176, buffer size: 288, default order: 0, min order: 0
> >    node 0: slabs: 509, objs: 7126, free: 0
> > [...]
> > 
> > New findings:
> >   (x) The page corruption only shows up the 1st time I run "stress -m 2 --vm-bytes 965M". When I quit and restart stress no additional page corruption shows up.
> >   (x) The page corruption shows up shortly after I run "stress -m 2 --vm-bytes 965M" but no additional page corruption shows up afterwards, even if left running for 30min.
> > 
> > 
> > For additional testing I thought it would be a good idea to try "modprobe test_vmalloc" but this remained inconclusive. Sometimes a 'BUG: Unable to handle kernel data access on read at 0xe0000000' like this shows up but not always:
> >   
> 
> Interesting.
> 
> I guess 0xe0000000 is where linear RAM starts to be mapped with pages ? 
> Can you confirm with a dump of 
> /sys/kernel/debug/powerpc/block_address_translation ?
> 
> Do we have a problem of race with hash table ?
> 
> Would KCSAN help with that ?

Revisited the issue on kernel v6.9-rc4 and I can still reproduce it. Did some runs now with KCSAN_EARLY_ENABLE=y (+ KCSAN_SKIP_WATCH=4000 + KCSAN_STRICT=y) which made KCSAN a lot more verbose.

On v6.9-rc4 I have not seen the "SLUB: Unable to allocate memory on node -1, gfp=0x820(GFP_ATOMIC)" I reported some time ago and no other KASAN hits at boot or afterwards so I disabled KASAN. The general memory corruption issue remains however.

At running "stress -m 2 --vm-bytes 965M" I get some "BUG: KCSAN: data-race in list_add / lru_gen_look_around" and "BUG: KCSAN: data-race in zswap_store / zswap_update_total_size" which I don't get otherwise:

[...]
BUG: KCSAN: data-race in list_add / lru_gen_look_around

read (marked) to 0xefa6fa40 of 4 bytes by task 1619 on cpu 0:
 lru_gen_look_around+0x320/0x634
 folio_referenced_one+0x32c/0x404
 rmap_walk_anon+0x1c4/0x24c
 rmap_walk+0x70/0x7c
 folio_referenced+0x194/0x1ec
 shrink_folio_list+0x6a8/0xd28
 evict_folios+0xcc0/0x1204
 try_to_shrink_lruvec+0x214/0x2f0
 shrink_one+0x104/0x1e8
 shrink_node+0x314/0xc3c
 do_try_to_free_pages+0x500/0x7e4
 try_to_free_pages+0x150/0x18c
 __alloc_pages+0x460/0x8dc
 folio_prealloc.isra.0+0x44/0xec
 handle_mm_fault+0x488/0xed0
 ___do_page_fault+0x4d8/0x630
 do_page_fault+0x28/0x40
 DataAccess_virt+0x124/0x17c

write to 0xefa6fa40 of 4 bytes by task 40 on cpu 1:
 list_add+0x58/0x94
 evict_folios+0xb04/0x1204
 try_to_shrink_lruvec+0x214/0x2f0
 shrink_one+0x104/0x1e8
 shrink_node+0x314/0xc3c
 balance_pgdat+0x498/0x914
 kswapd+0x304/0x398
 kthread+0x174/0x178
 start_kernel_thread+0x10/0x14

Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 40 Comm: kswapd0 Not tainted 6.9.0-rc4-PMacG4-dirty #10
Hardware name: PowerMac3,6 7455 0x80010303 PowerMac
[...]
BUG: KCSAN: data-race in zswap_update_total_size / zswap_update_total_size

write to 0xc121b328 of 8 bytes by task 1619 on cpu 0:
 zswap_update_total_size+0x58/0xe8
 zswap_entry_free+0xdc/0x1c0
 zswap_load+0x190/0x19c
 swap_read_folio+0xbc/0x450
 swap_cluster_readahead+0x2f8/0x338
 swapin_readahead+0x430/0x438
 do_swap_page+0x1e0/0x9bc
 handle_mm_fault+0xecc/0xed0
 ___do_page_fault+0x4d8/0x630
 do_page_fault+0x28/0x40
 DataAccess_virt+0x124/0x17c

write to 0xc121b328 of 8 bytes by task 40 on cpu 1:
 zswap_update_total_size+0x58/0xe8
 zswap_store+0x5a8/0xa18
 swap_writepage+0x4c/0xe8
 pageout+0x1dc/0x304
 shrink_folio_list+0xa70/0xd28
 evict_folios+0xcc0/0x1204
 try_to_shrink_lruvec+0x214/0x2f0
 shrink_one+0x104/0x1e8
 shrink_node+0x314/0xc3c
 balance_pgdat+0x498/0x914
 kswapd+0x304/0x398
 kthread+0x174/0x178
 start_kernel_thread+0x10/0x14

Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 40 Comm: kswapd0 Not tainted 6.9.0-rc4-PMacG4-dirty #10
Hardware name: PowerMac3,6 7455 0x80010303 PowerMac
[...]

One time I also got another page allocation failure:

[...]
==================================================================
kworker/u9:1: page allocation failure: order:0, mode:0x820(GFP_ATOMIC), nodemask=(null),cpuset=/,mems_allowed=0
CPU: 1 PID: 39 Comm: kworker/u9:1 Not tainted 6.9.0-rc4-PMacG4-dirty #10
Hardware name: PowerMac3,6 7455 0x80010303 PowerMac
Workqueue: events_freezable_pwr_efficient disk_events_workfn (events_freezable_pwr_ef)
Call Trace:
[f100dc50] [c0be4e84] dump_stack_lvl+0x88/0xcc (unreliable)
[f100dc70] [c0be4ee8] dump_stack+0x20/0x34
[f100dc80] [c029de40] warn_alloc+0x100/0x178
[f100dce0] [c029e234] __alloc_pages+0x37c/0x8dc
[f100dda0] [c029e884] __page_frag_alloc_align+0x74/0x194
[f100ddd0] [c09bafc0] __netdev_alloc_skb+0x108/0x234
[f100de00] [bef1a5a8] setup_rx_descbuffer+0x5c/0x258 [b43legacy]
[f100de40] [bef1c43c] b43legacy_dma_rx+0x3e4/0x488 [b43legacy]
[f100deb0] [bef0b034] b43legacy_interrupt_tasklet+0x7bc/0x7f0 [b43legacy]
[f100df50] [c006f8c8] tasklet_action_common.isra.0+0xb0/0xe8
[f100df80] [c0c1fc8c] __do_softirq+0x1dc/0x218
[f100dff0] [c00091d8] do_softirq_own_stack+0x54/0x74
[f10dd760] [c00091c8] do_softirq_own_stack+0x44/0x74
[f10dd780] [c006f114] __irq_exit_rcu+0x6c/0xbc
[f10dd790] [c006f588] irq_exit+0x10/0x20
[f10dd7a0] [c0008b58] interrupt_async_exit_prepare.isra.0+0x18/0x2c
[f10dd7b0] [c000917c] do_IRQ+0x24/0x2c
[f10dd7d0] [c00045b4] HardwareInterrupt_virt+0x108/0x10c
--- interrupt: 500 at _raw_spin_unlock_irq+0x30/0x48
NIP:  c0c1f49c LR: c0c1f490 CTR: 00000000
REGS: f10dd7e0 TRAP: 0500   Not tainted  (6.9.0-rc4-PMacG4-dirty)
MSR:  00209032 <EE,ME,IR,DR,RI>  CR: 84882802  XER: 00000000

GPR00: c0c1f490 f10dd8a0 c1c28020 c49d6828 00016828 0001682b 00000003 c12399ec 
GPR08: 00000000 00009032 0000001d f10dd860 24882802 00000000 00000001 00000000 
GPR16: 00000800 00000800 00000000 00000000 00000002 00000004 00000004 00000000 
GPR24: c49d6850 00000004 00000000 00000007 00000001 c49d6850 f10ddbb4 c49d6828 
NIP [c0c1f49c] _raw_spin_unlock_irq+0x30/0x48
LR [c0c1f490] _raw_spin_unlock_irq+0x24/0x48
--- interrupt: 500
[f10dd8c0] [c0246150] evict_folios+0xc74/0x1204
[f10dd9d0] [c02468f4] try_to_shrink_lruvec+0x214/0x2f0
[f10dda50] [c0246ad4] shrink_one+0x104/0x1e8
[f10dda90] [c0248eb8] shrink_node+0x314/0xc3c
[f10ddb20] [c024a98c] do_try_to_free_pages+0x500/0x7e4
[f10ddba0] [c024b110] try_to_free_pages+0x150/0x18c
[f10ddc20] [c029e318] __alloc_pages+0x460/0x8dc
[f10ddce0] [c06088ac] alloc_pages.constprop.0+0x30/0x50
[f10ddd00] [c0608ad4] blk_rq_map_kern+0x208/0x404
[f10ddd50] [c089c048] scsi_execute_cmd+0x350/0x534
[f10dddc0] [c08b77cc] sr_check_events+0x108/0x4bc
[f10dde40] [c08fb620] cdrom_update_events+0x54/0xb8
[f10dde60] [c08fb6b4] cdrom_check_events+0x30/0x70
[f10dde80] [c08b7c44] sr_block_check_events+0x60/0x90
[f10ddea0] [c0630444] disk_check_events+0x68/0x168
[f10ddee0] [c063056c] disk_events_workfn+0x28/0x40
[f10ddf00] [c008df0c] process_scheduled_works+0x350/0x494
[f10ddf70] [c008ee2c] worker_thread+0x2a4/0x300
[f10ddfc0] [c009b87c] kthread+0x174/0x178
[f10ddff0] [c001c304] start_kernel_thread+0x10/0x14
Mem-Info:
active_anon:292700 inactive_anon:181968 isolated_anon:0
 active_file:6404 inactive_file:5560 isolated_file:0
 unevictable:0 dirty:11 writeback:0
 slab_reclaimable:1183 slab_unreclaimable:6185
 mapped:7898 shmem:133 pagetables:675
 sec_pagetables:0 bounce:0
 kernel_misc_reclaimable:0
 free:1193 free_pcp:778 free_cma:0
Node 0 active_anon:1170800kB inactive_anon:727872kB active_file:25616kB inactive_file:22240kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:31592kB dirty:44kB writeback:0kB shmem:532kB writeback_tmp:0kB kernel_stack:952kB pagetables:2700kB sec_pagetables:0kB all_unreclaimable? no
DMA free:0kB boost:7564kB min:10928kB low:11768kB high:12608kB reserved_highatomic:0KB active_anon:568836kB inactive_anon:92340kB active_file:12kB inactive_file:1248kB unevictable:0kB writepending:40kB present:786432kB managed:709428kB mlocked:0kB bounce:0kB free_pcp:3112kB local_pcp:1844kB free_cma:0kB
lowmem_reserve[]: 0 0 1280 1280
DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 0kB
39962 total pagecache pages
27865 pages in swap cache
Free swap  = 8240252kB
Total swap = 8388604kB
524288 pages RAM
327680 pages HighMem/MovableOnly
19251 pages reserved
b43legacy-phy0 debug: DMA RX: setup_rx_descbuffer() failed
[...]

To fix a "refcount_t: decrement hit 0; leaking memory." issue which showed up otherwise I applied following patchset on top of v6.9-rc4: https://lore.kernel.org/all/mhng-4caed5c9-bc46-42fe-90d4-9d845376578f@palmer-ri-x1c9a/

Kernel .config attached. For more details on the KCSAN hits dmesg of 2 runs attached.

Regards,
Erhard