All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
From: Mirsad Todorovac <mirsad.todorovac@alu.unizg.hr>
To: "Ma, Jun" <majun@amd.com>,
	linux-kernel@vger.kernel.org, amd-gfx@lists.freedesktop.org
Cc: "Sathishkumar S" <sathishkumar.sundararaju@amd.com>,
	"Lijo Lazar" <lijo.lazar@amd.com>,
	"Srinivasan Shanmugam" <srinivasan.shanmugam@amd.com>,
	"Guchun Chen" <guchun.chen@amd.com>, "Lang Yu" <Lang.Yu@amd.com>,
	"Felix Kuehling" <Felix.Kuehling@amd.com>,
	"Pan, Xinhui" <Xinhui.Pan@amd.com>,
	dri-devel@lists.freedesktop.org,
	"Marek Olšák" <marek.olsak@amd.com>,
	"Boyuan Zhang" <boyuan.zhang@amd.com>,
	"Daniel Vetter" <daniel@ffwll.ch>,
	"David Francis" <David.Francis@amd.com>,
	"Alex Deucher" <alexander.deucher@amd.com>,
	"David Airlie" <airlied@gmail.com>,
	"Christian König" <christian.koenig@amd.com>
Subject: Re: BUG [RESEND]: kernel NULL pointer dereference, address: 0000000000000008
Date: Mon, 22 Jan 2024 23:39:27 +0100	[thread overview]
Message-ID: <bb91dc43-d331-4999-b43f-a741c865f7f2@alu.unizg.hr> (raw)
In-Reply-To: <1bc1a054-2aa8-4229-9a05-df7bac1ec0d8@amd.com>

On 22. 01. 2024. 09:34, Ma, Jun wrote:
> Perhaps similar to the problem I encountered earlier, you can
> try the following patch
> 
> https://lists.freedesktop.org/archives/amd-gfx/2024-January/103259.html

Appaarently, this patch prevented NULL dereference, it was no longer in the log.

However, there is another hang in XWayland password entry dialog, but I do not
think that I figured out what is wrong.

Best regards,
Mirsad

> Regards,
> Ma Jun
> 
> On 1/21/2024 3:54 AM, Mirsad Todorovac wrote:
>> Hi,
>>
>> The last email did not pass to the most of the recipients due to banned .xz attachment.
>>
>> As the .config is too big to send inline or uncompressed either, I will omit it in this
>> attempt. In the meantime, I had some success in decoding the stack trace, but sadly not
>> complete.
>>
>> I don't think this Oops is deterministic, but I am working on a reproducer.
>>
>> The platform is Ubuntu 22.04 LTS.
>>
>> Complete list of hardware and .config is available here:
>>
>> https://domac.alu.unizg.hr/~mtodorov/linux/bugreports/amdgpu/6.7.0-rtl-v02-nokcsan-09928-g052d534373b7/
>>
>> Best regards,
>> Mirsad
>>
>> -------------------------------------------------------------------------------------------
>> kernel: [    5.576702] BUG: kernel NULL pointer dereference, address: 0000000000000008
>> kernel: [    5.576707] #PF: supervisor read access in kernel mode
>> kernel: [    5.576710] #PF: error_code(0x0000) - not-present page
>> kernel: [    5.576712] PGD 0 P4D 0
>> kernel: [    5.576715] Oops: 0000 [#1] PREEMPT SMP NOPTI
>> kernel: [    5.576718] CPU: 9 PID: 650 Comm: systemd-udevd Not tainted 6.7.0-rtl-v0.2-nokcsan-09928-g052d534373b7 #2
>> kernel: [    5.576723] Hardware name: ASRock X670E PG Lightning/X670E PG Lightning, BIOS 1.21 04/26/2023
>> kernel: [    5.576726] RIP: 0010:gfx_v10_0_early_init (drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c:4009 drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c:7478) amdgpu
>> kernel: [ 5.576872] Code: 8d 55 a8 4c 89 ff e8 e4 83 ec ff 41 89 c2 83 f8 ed 0f 84 b3 fd ff ff 85 c0 74 05 0f 1f 44 00 00 49 8b 87 08 87 01 00 4c 89 ff <48> 8b 40 08 0f b7 50 0a 0f b7 70 08 e8 e4 42 fb ff 41 89 c2 85 c0
>> All code
>> ========
>>     0:	8d 55 a8             	lea    -0x58(%rbp),%edx
>>     3:	4c 89 ff             	mov    %r15,%rdi
>>     6:	e8 e4 83 ec ff       	call   0xffffffffffec83ef
>>     b:	41 89 c2             	mov    %eax,%r10d
>>     e:	83 f8 ed             	cmp    $0xffffffed,%eax
>>    11:	0f 84 b3 fd ff ff    	je     0xfffffffffffffdca
>>    17:	85 c0                	test   %eax,%eax
>>    19:	74 05                	je     0x20
>>    1b:	0f 1f 44 00 00       	nopl   0x0(%rax,%rax,1)
>>    20:	49 8b 87 08 87 01 00 	mov    0x18708(%r15),%rax
>>    27:	4c 89 ff             	mov    %r15,%rdi
>>    2a:*	48 8b 40 08          	mov    0x8(%rax),%rax		<-- trapping instruction
>>    2e:	0f b7 50 0a          	movzwl 0xa(%rax),%edx
>>    32:	0f b7 70 08          	movzwl 0x8(%rax),%esi
>>    36:	e8 e4 42 fb ff       	call   0xfffffffffffb431f
>>    3b:	41 89 c2             	mov    %eax,%r10d
>>    3e:	85 c0                	test   %eax,%eax
>>
>> Code starting with the faulting instruction
>> ===========================================
>>     0:	48 8b 40 08          	mov    0x8(%rax),%rax
>>     4:	0f b7 50 0a          	movzwl 0xa(%rax),%edx
>>     8:	0f b7 70 08          	movzwl 0x8(%rax),%esi
>>     c:	e8 e4 42 fb ff       	call   0xfffffffffffb42f5
>>    11:	41 89 c2             	mov    %eax,%r10d
>>    14:	85 c0                	test   %eax,%eax
>> kernel: [    5.576878] RSP: 0018:ffffa5b3c103f720 EFLAGS: 00010282
>> kernel: [    5.576881] RAX: 0000000000000000 RBX: ffffffffc1d73489 RCX: 0000000000000000
>> kernel: [    5.576884] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff91ae4fa80000
>> kernel: [    5.576886] RBP: ffffa5b3c103f7b0 R08: 0000000000000000 R09: 0000000000000000
>> kernel: [    5.576889] R10: 00000000ffffffea R11: 0000000000000000 R12: ffff91ae4fa986e8
>> kernel: [    5.576892] R13: ffff91ae4fa986d8 R14: ffff91ae4fa986f8 R15: ffff91ae4fa80000
>> kernel: [    5.576895] FS:  00007fdaa343c8c0(0000) GS:ffff91bd58440000(0000) knlGS:0000000000000000
>> kernel: [    5.576898] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> kernel: [    5.576900] CR2: 0000000000000008 CR3: 00000001222d0000 CR4: 0000000000750ef0
>> kernel: [    5.576903] PKRU: 55555554
>> kernel: [    5.576905] Call Trace:
>> kernel: [    5.576907]  <TASK>
>> kernel: [    5.576909] ? show_regs (arch/x86/kernel/dumpstack.c:479)
>> kernel: [    5.576914] ? __die (arch/x86/kernel/dumpstack.c:421 arch/x86/kernel/dumpstack.c:434)
>> kernel: [    5.576917] ? page_fault_oops (arch/x86/mm/fault.c:707)
>> kernel: [    5.576921] ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:181)
>> kernel: [    5.576925] ? crypto_alloc_tfmmem.isra.0 (crypto/api.c:497)
>> kernel: [    5.576930] ? do_user_addr_fault (arch/x86/mm/fault.c:1264)
>> kernel: [    5.576934] ? exc_page_fault (./arch/x86/include/asm/paravirt.h:693 arch/x86/mm/fault.c:1515 arch/x86/mm/fault.c:1563)
>> kernel: [    5.576937] ? asm_exc_page_fault (./arch/x86/include/asm/idtentry.h:570)
>> kernel: [    5.576942] ? gfx_v10_0_early_init (drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c:4009 drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c:7478) amdgpu
>> kernel: [    5.577056] amdgpu_device_init (drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:2465 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:4042) amdgpu
>> kernel: [    5.577158] ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:181)
>> kernel: [    5.577161] ? pci_bus_read_config_word (drivers/pci/access.c:67 (discriminator 2))
>> kernel: [    5.577166] ? pci_read_config_word (drivers/pci/access.c:563)
>> kernel: [    5.577168] ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:181)
>> kernel: [    5.577171] ? do_pci_enable_device (drivers/pci/pci.c:1975 drivers/pci/pci.c:1949)
>> kernel: [    5.577176] amdgpu_driver_load_kms (drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c:146) amdgpu
>> kernel: [    5.577275] amdgpu_pci_probe (drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c:2237) amdgpu
>> kernel: [    5.577373] local_pci_probe (drivers/pci/pci-driver.c:324)
>> kernel: [    5.577377] pci_device_probe (drivers/pci/pci-driver.c:392 drivers/pci/pci-driver.c:417 drivers/pci/pci-driver.c:460)
>> kernel: [    5.577381] really_probe (drivers/base/dd.c:579 drivers/base/dd.c:658)
>> kernel: [    5.577386] __driver_probe_device (drivers/base/dd.c:800)
>> kernel: [    5.577389] driver_probe_device (drivers/base/dd.c:830)
>> kernel: [    5.577392] __driver_attach (drivers/base/dd.c:1217)
>> kernel: [    5.577396] ? __pfx___driver_attach (drivers/base/dd.c:1157)
>> kernel: [    5.577399] bus_for_each_dev (drivers/base/bus.c:368)
>> kernel: [    5.577402] driver_attach (drivers/base/dd.c:1234)
>> kernel: [    5.577405] bus_add_driver (drivers/base/bus.c:674)
>> kernel: [    5.577409] driver_register (drivers/base/driver.c:246)
>> kernel: [    5.577411] ? __pfx_amdgpu_init (drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c:2497) amdgpu
>> kernel: [    5.577521] __pci_register_driver (drivers/pci/pci-driver.c:1456)
>> kernel: [    5.577524] amdgpu_init (drivers/gpu/drm/amd/amdgpu/amdgpu_drvc:2805) amdgpu
>> kernel: [    5.577628] do_one_initcall (init/main.c:1236)
>> kernel: [    5.577632] ? kmalloc_trace (mm/slub.c:3816 mm/slub.c:3860 mm/slub.c:4007)
>> kernel: [    5.577637] do_init_module (kernel/module/main.c:2533)
>> kernel: [    5.577640] load_module (kernel/module/main.c:2984)
>> kernel: [    5.577647] init_module_from_file (kernel/module/main.c:3151)
>> kernel: [    5.577649] ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:181)
>> kernel: [    5.577652] ? init_module_from_file (kernel/module/main.c:3151)
>> kernel: [    5.577657] idempotent_init_module (kernel/module/main.c:3168)
>> kernel: [    5.577661] __x64_sys_finit_module (./include/linux/file.h:45 kernel/module/main.c:3190 kernel/module/main.c:3172 kernel/module/main.c:3172)
>> kernel: [    5.577664] do_syscall_64 (arch/x86/entry/common.c:52 arch/x86/entry/common.c:83)
>> kernel: [    5.577668] ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:181)
>> kernel: [    5.577671] ? ksys_mmap_pgoff (mm/mmap.c:1428)
>> kernel: [    5.577675] ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:181)
>> kernel: [    5.577678] ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:181)
>> kernel: [    5.577681] ? syscall_exit_to_user_mode (kernel/entry/common.c:215)
>> kernel: [    5.577684] ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:181)
>> kernel: [    5.577687] ? do_syscall_64 (./arch/x86/include/asm/cpufeatureh:171 arch/x86/entry/common.c:98)
>> kernel: [    5.577689] ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:181)
>> kernel: [    5.577692] ? do_syscall_64 (./arch/x86/include/asm/cpufeatureh:171 arch/x86/entry/common.c:98)
>> kernel: [    5.577695] ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:181)
>> kernel: [    5.577698] ? do_syscall_64 (./arch/x86/include/asm/cpufeatureh:171 arch/x86/entry/common.c:98)
>> kernel: [    5.577700] ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:181)
>> kernel: [    5.577703] ? sysvec_call_function (arch/x86/kernel/smp.c:253 (discriminator 69))
>> kernel: [    5.577707] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:129)
>> kernel: [    5.577709] RIP: 0033:0x7fdaa331e88d
>> kernel: [ 5.577724] Code: 5b 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 73 b5 0f 00 f7 d8 64 89 01 48
>> All code
>> ========
>>     0:	5b                   	pop    %rbx
>>     1:	41 5c                	pop    %r12
>>     3:	c3                   	ret
>>     4:	66 0f 1f 84 00 00 00 	nopw   0x0(%rax,%rax,1)
>>     b:	00 00
>>     d:	f3 0f 1e fa          	endbr64
>>    11:	48 89 f8             	mov    %rdi,%rax
>>    14:	48 89 f7             	mov    %rsi,%rdi
>>    17:	48 89 d6             	mov    %rdx,%rsi
>>    1a:	48 89 ca             	mov    %rcx,%rdx
>>    1d:	4d 89 c2             	mov    %r8,%r10
>>    20:	4d 89 c8             	mov    %r9,%r8
>>    23:	4c 8b 4c 24 08       	mov    0x8(%rsp),%r9
>>    28:	0f 05                	syscall
>>    2a:*	48 3d 01 f0 ff ff    	cmp    $0xfffffffffffff001,%rax		<-- trapping instruction
>>    30:	73 01                	jae    0x33
>>    32:	c3                   	ret
>>    33:	48 8b 0d 73 b5 0f 00 	mov    0xfb573(%rip),%rcx        # 0xfb5ad
>>    3a:	f7 d8                	neg    %eax
>>    3c:	64 89 01             	mov    %eax,%fs:(%rcx)
>>    3f:	48                   	rex.W
>>
>> Code starting with the faulting instruction
>> ===========================================
>>     0:	48 3d 01 f0 ff ff    	cmp    $0xfffffffffffff001,%rax
>>     6:	73 01                	jae    0x9
>>     8:	c3                   	ret
>>     9:	48 8b 0d 73 b5 0f 00 	mov    0xfb573(%rip),%rcx        # 0xfb583
>>    10:	f7 d8                	neg    %eax
>>    12:	64 89 01             	mov    %eax,%fs:(%rcx)
>>    15:	48                   	rex.W
>> kernel: [    5.577729] RSP: 002b:00007ffeb4f87d28 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
>> kernel: [    5.577733] RAX: ffffffffffffffda RBX: 000055aedf3eeeb0 RCX: 00007fdaa331e88d
>> kernel: [    5.577736] RDX: 0000000000000000 RSI: 000055aedf3efb80 RDI: 000000000000001a
>> kernel: [    5.577738] RBP: 0000000000020000 R08: 0000000000000000 R09: 0000000000000002
>> kernel: [    5.577741] R10: 000000000000001a R11: 0000000000000246 R12: 000055aedf3efb80
>> kernel: [    5.577744] R13: 000055aedf3f2060 R14: 0000000000000000 R15: 000055aedf2b1220
>> kernel: [    5.577748]  </TASK>
>> kernel: [    5.577750] Modules linked in: intel_rapl_msr intel_rapl_common amdgpu(+) edac_mce_amd kvm_amd kvm snd_hda_codec_realtek snd_hda_codec_generic irqbypass ledtrig_audio crct10dif_pclmul polyval_clmulni polyval_generic snd_hda_codec_hdmi ghash_clmulni_intel sha512_ssse3 sha256_ssse3 sha1_ssse3 amdxcp snd_hda_intel aesni_intel drm_exec snd_intel_dspcfg crypto_simd gpu_sched snd_intel_sdw_acpi cryptd nls_iso8859_1 drm_buddy snd_hda_codec snd_seq_midi drm_suballoc_helper snd_seq_midi_event drm_ttm_helper joydev snd_hda_core input_leds ttm rapl snd_rawmidi snd_hwdep drm_display_helper snd_seq snd_pcm wmi_bmof cec k10temp snd_seq_device ccp rc_core snd_timer snd drm_kms_helper i2c_algo_bit soundcore mac_hid tcp_bbr sch_fq msr parport_pc ppdev lp drm parport efi_pstore ip_tables x_tables autofs4 btrfs blake2b_generic xor raid6_pq libcrc32c hid_generic usbhid hid crc32_pclmul nvme r8169 ahci nvme_core i2c_piix4 xhci_pci libahci xhci_pci_renesas realtek video wmi gpio_amdpt
>> kernel: [    5.577817] CR2: 0000000000000008
>> kernel: [    5.577820] ---[ end trace 0000000000000000 ]---
>> kernel: [    5.914230] RIP: 0010:gfx_v10_0_early_init (drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c:4009 drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c:7478) amdgpu
>> kernel: [ 5.914388] Code: 8d 55 a8 4c 89 ff e8 e4 83 ec ff 41 89 c2 83 f8 ed 0f 84 b3 fd ff ff 85 c0 74 05 0f 1f 44 00 00 49 8b 87 08 87 01 00 4c 89 ff <48> 8b 40 08 0f b7 50 0a 0f b7 70 08 e8 e4 42 fb ff 41 89 c2 85 c0
>> All code
>> ========
>>     0:	8d 55 a8             	lea    -0x58(%rbp),%edx
>>     3:	4c 89 ff             	mov    %r15,%rdi
>>     6:	e8 e4 83 ec ff       	call   0xffffffffffec83ef
>>     b:	41 89 c2             	mov    %eax,%r10d
>>     e:	83 f8 ed             	cmp    $0xffffffed,%eax
>>    11:	0f 84 b3 fd ff ff    	je     0xfffffffffffffdca
>>    17:	85 c0                	test   %eax,%eax
>>    19:	74 05                	je     0x20
>>    1b:	0f 1f 44 00 00       	nopl   0x0(%rax,%rax,1)
>>    20:	49 8b 87 08 87 01 00 	mov    0x18708(%r15),%rax
>>    27:	4c 89 ff             	mov    %r15,%rdi
>>    2a:*	48 8b 40 08          	mov    0x8(%rax),%rax		<-- trapping instruction
>>    2e:	0f b7 50 0a          	movzwl 0xa(%rax),%edx
>>    32:	0f b7 70 08          	movzwl 0x8(%rax),%esi
>>    36:	e8 e4 42 fb ff       	call   0xfffffffffffb431f
>>    3b:	41 89 c2             	mov    %eax,%r10d
>>    3e:	85 c0                	test   %eax,%eax
>>
>> Code starting with the faulting instruction
>> ===========================================
>>     0:	48 8b 40 08          	mov    0x8(%rax),%rax
>>     4:	0f b7 50 0a          	movzwl 0xa(%rax),%edx
>>     8:	0f b7 70 08          	movzwl 0x8(%rax),%esi
>>     c:	e8 e4 42 fb ff       	call   0xfffffffffffb42f5
>>    11:	41 89 c2             	mov    %eax,%r10d
>>    14:	85 c0                	test   %eax,%eax
>> rsyslogd: rsyslogd's groupid changed to 111
>> kernel: [    5.914394] RSP: 0018:ffffa5b3c103f720 EFLAGS: 00010282
>> kernel: [    5.914397] RAX: 0000000000000000 RBX: ffffffffc1d73489 RCX: 0000000000000000
>> kernel: [    5.914399] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff91ae4fa80000
>> kernel: [    5.914402] RBP: ffffa5b3c103f7b0 R08: 0000000000000000 R09: 0000000000000000
>> kernel: [    5.914405] R10: 00000000ffffffea R11: 0000000000000000 R12: ffff91ae4fa986e8
>> kernel: [    5.914408] R13: ffff91ae4fa986d8 R14: ffff91ae4fa986f8 R15: ffff91ae4fa80000
>> kernel: [    5.914410] FS:  00007fdaa343c8c0(0000) GS:ffff91bd58440000(0000) knlGS:0000000000000000
>> kernel: [    5.914414] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> kernel: [    5.914416] CR2: 0000000000000008 CR3: 00000001222d0000 CR4: 0000000000750ef0
>> kernel: [    5.914419] PKRU: 55555554
>>
>> Best regards,
>> Mirsad
>>
>> On 1/18/24 18:23, Mirsad Todorovac wrote:
>>> Hi,
>>>
>>> Unfortunately, I was not able to reboot in this kernel again to do the stack decode, but I thought
>>> that any information about the NULL pointer dereference is better than no info.
>>>
>>> The system is Ubuntu 23.10 Mantic with AMD product: Navi 23 [Radeon RX 6600/6600 XT/6600M]
>>> graphic card.
>>>
>>> Please find the config and the hw listing attached.
>>>
>>> Best regards,
>>> Mirsad
>>
>>
>>
>>> kernel: [    5.576702] BUG: kernel NULL pointer dereference, address: 0000000000000008
>>> kernel: [    5.576707] #PF: supervisor read access in kernel mode
>>> kernel: [    5.576710] #PF: error_code(0x0000) - not-present page
>>> kernel: [    5.576712] PGD 0 P4D 0
>>> kernel: [    5.576715] Oops: 0000 [#1] PREEMPT SMP NOPTI
>>> kernel: [    5.576718] CPU: 9 PID: 650 Comm: systemd-udevd Not tainted 6.7.0-rtl-v0.2-nokcsan-09928-g052d534373b7 #2
>>> kernel: [    5.576723] Hardware name: ASRock X670E PG Lightning/X670E PG Lightning, BIOS 1.21 04/26/2023
>>> kernel: [    5.576726] RIP: 0010:gfx_v10_0_early_init+0x5ab/0x8d0 [amdgpu]
>>> kernel: [    5.576872] Code: 8d 55 a8 4c 89 ff e8 e4 83 ec ff 41 89 c2 83 f8 ed 0f 84 b3 fd ff ff 85 c0 74 05 0f 1f 44 00 00 49 8b 87 08 87 01 00 4c 89 ff <48> 8b 40 08 0f b7 50 0a 0f b7 70 08 e8 e4 42 fb ff 41 89 c2 85 c0
>>> kernel: [    5.576878] RSP: 0018:ffffa5b3c103f720 EFLAGS: 00010282
>>> kernel: [    5.576881] RAX: 0000000000000000 RBX: ffffffffc1d73489 RCX: 0000000000000000
>>> kernel: [    5.576884] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff91ae4fa80000
>>> kernel: [    5.576886] RBP: ffffa5b3c103f7b0 R08: 0000000000000000 R09: 0000000000000000
>>> kernel: [    5.576889] R10: 00000000ffffffea R11: 0000000000000000 R12: ffff91ae4fa986e8
>>> kernel: [    5.576892] R13: ffff91ae4fa986d8 R14: ffff91ae4fa986f8 R15: ffff91ae4fa80000
>>> kernel: [    5.576895] FS:  00007fdaa343c8c0(0000) GS:ffff91bd58440000(0000) knlGS:0000000000000000
>>> kernel: [    5.576898] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> kernel: [    5.576900] CR2: 0000000000000008 CR3: 00000001222d0000 CR4: 0000000000750ef0
>>> kernel: [    5.576903] PKRU: 55555554
>>> kernel: [    5.576905] Call Trace:
>>> kernel: [    5.576907]  <TASK>
>>> kernel: [    5.576909]  ? show_regs+0x72/0x90
>>> kernel: [    5.576914]  ? __die+0x25/0x80
>>> kernel: [    5.576917]  ? page_fault_oops+0x154/0x4c0
>>> kernel: [    5.576921]  ? srso_alias_return_thunk+0x5/0xfbef5
>>> kernel: [    5.576925]  ? crypto_alloc_tfmmem.isra.0+0x35/0x70
>>> kernel: [    5.576930]  ? do_user_addr_fault+0x30e/0x6e0
>>> kernel: [    5.576934]  ? exc_page_fault+0x84/0x1b0
>>> kernel: [    5.576937]  ? asm_exc_page_fault+0x27/0x30
>>> kernel: [    5.576942]  ? gfx_v10_0_early_init+0x5ab/0x8d0 [amdgpu]
>>> kernel: [    5.577056]  amdgpu_device_init+0xefa/0x2de0 [amdgpu]
>>> kernel: [    5.577158]  ? srso_alias_return_thunk+0x5/0xfbef5
>>> kernel: [    5.577161]  ? pci_bus_read_config_word+0x47/0x90
>>> kernel: [    5.577166]  ? pci_read_config_word+0x27/0x60
>>> kernel: [    5.577168]  ? srso_alias_return_thunk+0x5/0xfbef5
>>> kernel: [    5.577171]  ? do_pci_enable_device+0xe1/0x110
>>> kernel: [    5.577176]  amdgpu_driver_load_kms+0x1a/0x1c0 [amdgpu]
>>> kernel: [    5.577275]  amdgpu_pci_probe+0x1a8/0x5e0 [amdgpu]
>>> kernel: [    5.577373]  local_pci_probe+0x48/0xb0
>>> kernel: [    5.577377]  pci_device_probe+0xc8/0x290
>>> kernel: [    5.577381]  really_probe+0x1d2/0x440
>>> kernel: [    5.577386]  __driver_probe_device+0x8a/0x190
>>> kernel: [    5.577389]  driver_probe_device+0x23/0xd0
>>> kernel: [    5.577392]  __driver_attach+0x10f/0x220
>>> kernel: [    5.577396]  ? __pfx___driver_attach+0x10/0x10
>>> kernel: [    5.577399]  bus_for_each_dev+0x7a/0xe0
>>> kernel: [    5.577402]  driver_attach+0x1e/0x30
>>> kernel: [    5.577405]  bus_add_driver+0x127/0x240
>>> kernel: [    5.577409]  driver_register+0x64/0x140
>>> kernel: [    5.577411]  ? __pfx_amdgpu_init+0x10/0x10 [amdgpu]
>>> kernel: [    5.577521]  __pci_register_driver+0x68/0x80
>>> kernel: [    5.577524]  amdgpu_init+0x69/0xff0 [amdgpu]
>>> kernel: [    5.577628]  do_one_initcall+0x46/0x330
>>> kernel: [    5.577632]  ? kmalloc_trace+0x136/0x370
>>> kernel: [    5.577637]  do_init_module+0x6a/0x280
>>> kernel: [    5.577640]  load_module+0x2419/0x2500
>>> kernel: [    5.577647]  init_module_from_file+0x9c/0xf0
>>> kernel: [    5.577649]  ? srso_alias_return_thunk+0x5/0xfbef5
>>> kernel: [    5.577652]  ? init_module_from_file+0x9c/0xf0
>>> kernel: [    5.577657]  idempotent_init_module+0x184/0x240
>>> kernel: [    5.577661]  __x64_sys_finit_module+0x64/0xd0
>>> kernel: [    5.577664]  do_syscall_64+0x76/0x140
>>> kernel: [    5.577668]  ? srso_alias_return_thunk+0x5/0xfbef5
>>> kernel: [    5.577671]  ? ksys_mmap_pgoff+0x123/0x270
>>> kernel: [    5.577675]  ? srso_alias_return_thunk+0x5/0xfbef5
>>> kernel: [    5.577678]  ? srso_alias_return_thunk+0x5/0xfbef5
>>> kernel: [    5.577681]  ? syscall_exit_to_user_mode+0x97/0x1e0
>>> kernel: [    5.577684]  ? srso_alias_return_thunk+0x5/0xfbef5
>>> kernel: [    5.577687]  ? do_syscall_64+0x85/0x140
>>> kernel: [    5.577689]  ? srso_alias_return_thunk+0x5/0xfbef5
>>> kernel: [    5.577692]  ? do_syscall_64+0x85/0x140
>>> kernel: [    5.577695]  ? srso_alias_return_thunk+0x5/0xfbef5
>>> kernel: [    5.577698]  ? do_syscall_64+0x85/0x140
>>> kernel: [    5.577700]  ? srso_alias_return_thunk+0x5/0xfbef5
>>> kernel: [    5.577703]  ? sysvec_call_function+0x4e/0xb0
>>> kernel: [    5.577707]  entry_SYSCALL_64_after_hwframe+0x6e/0x76
>>> kernel: [    5.577709] RIP: 0033:0x7fdaa331e88d
>>> kernel: [    5.577724] Code: 5b 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 73 b5 0f 00 f7 d8 64 89 01 48
>>> kernel: [    5.577729] RSP: 002b:00007ffeb4f87d28 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
>>> kernel: [    5.577733] RAX: ffffffffffffffda RBX: 000055aedf3eeeb0 RCX: 00007fdaa331e88d
>>> kernel: [    5.577736] RDX: 0000000000000000 RSI: 000055aedf3efb80 RDI: 000000000000001a
>>> kernel: [    5.577738] RBP: 0000000000020000 R08: 0000000000000000 R09: 0000000000000002
>>> kernel: [    5.577741] R10: 000000000000001a R11: 0000000000000246 R12: 000055aedf3efb80
>>> kernel: [    5.577744] R13: 000055aedf3f2060 R14: 0000000000000000 R15: 000055aedf2b1220
>>> kernel: [    5.577748]  </TASK>
>>> kernel: [    5.577750] Modules linked in: intel_rapl_msr intel_rapl_common amdgpu(+) edac_mce_amd kvm_amd kvm snd_hda_codec_realtek snd_hda_codec_generic irqbypass ledtrig_audio crct10dif_pclmul polyval_clmulni polyval_generic snd_hda_codec_hdmi ghash_clmulni_intel sha512_ssse3 sha256_ssse3 sha1_ssse3 amdxcp snd_hda_intel aesni_intel drm_exec snd_intel_dspcfg crypto_simd gpu_sched snd_intel_sdw_acpi cryptd nls_iso8859_1 drm_buddy snd_hda_codec snd_seq_midi drm_suballoc_helper snd_seq_midi_event drm_ttm_helper joydev snd_hda_core input_leds ttm rapl snd_rawmidi snd_hwdep drm_display_helper snd_seq snd_pcm wmi_bmof cec k10temp snd_seq_device ccp rc_core snd_timer snd drm_kms_helper i2c_algo_bit soundcore mac_hid tcp_bbr sch_fq msr parport_pc ppdev lp drm parport efi_pstore ip_tables x_tables autofs4 btrfs blake2b_generic xor raid6_pq libcrc32c hid_generic usbhid hid crc32_pclmul nvme r8169 ahci nvme_core i2c_piix4 xhci_pci libahci xhci_pci_renesas realtek video wmi gpio_amdpt
>>> kernel: [    5.577817] CR2: 0000000000000008
>>> kernel: [    5.577820] ---[ end trace 0000000000000000 ]---
>>> kernel: [    5.914230] RIP: 0010:gfx_v10_0_early_init+0x5ab/0x8d0 [amdgpu]
>>> kernel: [    5.914388] Code: 8d 55 a8 4c 89 ff e8 e4 83 ec ff 41 89 c2 83 f8 ed 0f 84 b3 fd ff ff 85 c0 74 05 0f 1f 44 00 00 49 8b 87 08 87 01 00 4c 89 ff <48> 8b 40 08 0f b7 50 0a 0f b7 70 08 e8 e4 42 fb ff 41 89 c2 85 c0
>>> rsyslogd: rsyslogd's groupid changed to 111
>>> kernel: [    5.914394] RSP: 0018:ffffa5b3c103f720 EFLAGS: 00010282
>>> kernel: [    5.914397] RAX: 0000000000000000 RBX: ffffffffc1d73489 RCX: 0000000000000000
>>> kernel: [    5.914399] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff91ae4fa80000
>>> kernel: [    5.914402] RBP: ffffa5b3c103f7b0 R08: 0000000000000000 R09: 0000000000000000
>>> kernel: [    5.914405] R10: 00000000ffffffea R11: 0000000000000000 R12: ffff91ae4fa986e8
>>> kernel: [    5.914408] R13: ffff91ae4fa986d8 R14: ffff91ae4fa986f8 R15: ffff91ae4fa80000
>>> kernel: [    5.914410] FS:  00007fdaa343c8c0(0000) GS:ffff91bd58440000(0000) knlGS:0000000000000000
>>> kernel: [    5.914414] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> kernel: [    5.914416] CR2: 0000000000000008 CR3: 00000001222d0000 CR4: 0000000000750ef0
>>> kernel: [    5.914419] PKRU: 55555554

-- 
Mirsad Goran Todorovac
Sistem inženjer
Grafički fakultet | Akademija likovnih umjetnosti
Sveučilište u Zagrebu
 
System engineer
Faculty of Graphic Arts | Academy of Fine Arts
University of Zagreb, Republic of Croatia
The European Union

"I see something approaching fast ... Will it be friends with me?"


WARNING: multiple messages have this Message-ID (diff)
From: Mirsad Todorovac <mirsad.todorovac@alu.unizg.hr>
To: "Ma, Jun" <majun@amd.com>,
	linux-kernel@vger.kernel.org, amd-gfx@lists.freedesktop.org
Cc: "Sathishkumar S" <sathishkumar.sundararaju@amd.com>,
	"Pan, Xinhui" <Xinhui.Pan@amd.com>,
	"Srinivasan Shanmugam" <srinivasan.shanmugam@amd.com>,
	"Guchun Chen" <guchun.chen@amd.com>,
	"David Airlie" <airlied@gmail.com>,
	"Felix Kuehling" <Felix.Kuehling@amd.com>,
	"Lijo Lazar" <lijo.lazar@amd.com>,
	dri-devel@lists.freedesktop.org,
	"Christian König" <christian.koenig@amd.com>,
	"Boyuan Zhang" <boyuan.zhang@amd.com>,
	"Daniel Vetter" <daniel@ffwll.ch>,
	"David Francis" <David.Francis@amd.com>,
	"Alex Deucher" <alexander.deucher@amd.com>,
	"Lang Yu" <Lang.Yu@amd.com>, "Marek Olšák" <marek.olsak@amd.com>
Subject: Re: BUG [RESEND]: kernel NULL pointer dereference, address: 0000000000000008
Date: Mon, 22 Jan 2024 23:39:27 +0100	[thread overview]
Message-ID: <bb91dc43-d331-4999-b43f-a741c865f7f2@alu.unizg.hr> (raw)
In-Reply-To: <1bc1a054-2aa8-4229-9a05-df7bac1ec0d8@amd.com>

On 22. 01. 2024. 09:34, Ma, Jun wrote:
> Perhaps similar to the problem I encountered earlier, you can
> try the following patch
> 
> https://lists.freedesktop.org/archives/amd-gfx/2024-January/103259.html

Appaarently, this patch prevented NULL dereference, it was no longer in the log.

However, there is another hang in XWayland password entry dialog, but I do not
think that I figured out what is wrong.

Best regards,
Mirsad

> Regards,
> Ma Jun
> 
> On 1/21/2024 3:54 AM, Mirsad Todorovac wrote:
>> Hi,
>>
>> The last email did not pass to the most of the recipients due to banned .xz attachment.
>>
>> As the .config is too big to send inline or uncompressed either, I will omit it in this
>> attempt. In the meantime, I had some success in decoding the stack trace, but sadly not
>> complete.
>>
>> I don't think this Oops is deterministic, but I am working on a reproducer.
>>
>> The platform is Ubuntu 22.04 LTS.
>>
>> Complete list of hardware and .config is available here:
>>
>> https://domac.alu.unizg.hr/~mtodorov/linux/bugreports/amdgpu/6.7.0-rtl-v02-nokcsan-09928-g052d534373b7/
>>
>> Best regards,
>> Mirsad
>>
>> -------------------------------------------------------------------------------------------
>> kernel: [    5.576702] BUG: kernel NULL pointer dereference, address: 0000000000000008
>> kernel: [    5.576707] #PF: supervisor read access in kernel mode
>> kernel: [    5.576710] #PF: error_code(0x0000) - not-present page
>> kernel: [    5.576712] PGD 0 P4D 0
>> kernel: [    5.576715] Oops: 0000 [#1] PREEMPT SMP NOPTI
>> kernel: [    5.576718] CPU: 9 PID: 650 Comm: systemd-udevd Not tainted 6.7.0-rtl-v0.2-nokcsan-09928-g052d534373b7 #2
>> kernel: [    5.576723] Hardware name: ASRock X670E PG Lightning/X670E PG Lightning, BIOS 1.21 04/26/2023
>> kernel: [    5.576726] RIP: 0010:gfx_v10_0_early_init (drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c:4009 drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c:7478) amdgpu
>> kernel: [ 5.576872] Code: 8d 55 a8 4c 89 ff e8 e4 83 ec ff 41 89 c2 83 f8 ed 0f 84 b3 fd ff ff 85 c0 74 05 0f 1f 44 00 00 49 8b 87 08 87 01 00 4c 89 ff <48> 8b 40 08 0f b7 50 0a 0f b7 70 08 e8 e4 42 fb ff 41 89 c2 85 c0
>> All code
>> ========
>>     0:	8d 55 a8             	lea    -0x58(%rbp),%edx
>>     3:	4c 89 ff             	mov    %r15,%rdi
>>     6:	e8 e4 83 ec ff       	call   0xffffffffffec83ef
>>     b:	41 89 c2             	mov    %eax,%r10d
>>     e:	83 f8 ed             	cmp    $0xffffffed,%eax
>>    11:	0f 84 b3 fd ff ff    	je     0xfffffffffffffdca
>>    17:	85 c0                	test   %eax,%eax
>>    19:	74 05                	je     0x20
>>    1b:	0f 1f 44 00 00       	nopl   0x0(%rax,%rax,1)
>>    20:	49 8b 87 08 87 01 00 	mov    0x18708(%r15),%rax
>>    27:	4c 89 ff             	mov    %r15,%rdi
>>    2a:*	48 8b 40 08          	mov    0x8(%rax),%rax		<-- trapping instruction
>>    2e:	0f b7 50 0a          	movzwl 0xa(%rax),%edx
>>    32:	0f b7 70 08          	movzwl 0x8(%rax),%esi
>>    36:	e8 e4 42 fb ff       	call   0xfffffffffffb431f
>>    3b:	41 89 c2             	mov    %eax,%r10d
>>    3e:	85 c0                	test   %eax,%eax
>>
>> Code starting with the faulting instruction
>> ===========================================
>>     0:	48 8b 40 08          	mov    0x8(%rax),%rax
>>     4:	0f b7 50 0a          	movzwl 0xa(%rax),%edx
>>     8:	0f b7 70 08          	movzwl 0x8(%rax),%esi
>>     c:	e8 e4 42 fb ff       	call   0xfffffffffffb42f5
>>    11:	41 89 c2             	mov    %eax,%r10d
>>    14:	85 c0                	test   %eax,%eax
>> kernel: [    5.576878] RSP: 0018:ffffa5b3c103f720 EFLAGS: 00010282
>> kernel: [    5.576881] RAX: 0000000000000000 RBX: ffffffffc1d73489 RCX: 0000000000000000
>> kernel: [    5.576884] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff91ae4fa80000
>> kernel: [    5.576886] RBP: ffffa5b3c103f7b0 R08: 0000000000000000 R09: 0000000000000000
>> kernel: [    5.576889] R10: 00000000ffffffea R11: 0000000000000000 R12: ffff91ae4fa986e8
>> kernel: [    5.576892] R13: ffff91ae4fa986d8 R14: ffff91ae4fa986f8 R15: ffff91ae4fa80000
>> kernel: [    5.576895] FS:  00007fdaa343c8c0(0000) GS:ffff91bd58440000(0000) knlGS:0000000000000000
>> kernel: [    5.576898] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> kernel: [    5.576900] CR2: 0000000000000008 CR3: 00000001222d0000 CR4: 0000000000750ef0
>> kernel: [    5.576903] PKRU: 55555554
>> kernel: [    5.576905] Call Trace:
>> kernel: [    5.576907]  <TASK>
>> kernel: [    5.576909] ? show_regs (arch/x86/kernel/dumpstack.c:479)
>> kernel: [    5.576914] ? __die (arch/x86/kernel/dumpstack.c:421 arch/x86/kernel/dumpstack.c:434)
>> kernel: [    5.576917] ? page_fault_oops (arch/x86/mm/fault.c:707)
>> kernel: [    5.576921] ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:181)
>> kernel: [    5.576925] ? crypto_alloc_tfmmem.isra.0 (crypto/api.c:497)
>> kernel: [    5.576930] ? do_user_addr_fault (arch/x86/mm/fault.c:1264)
>> kernel: [    5.576934] ? exc_page_fault (./arch/x86/include/asm/paravirt.h:693 arch/x86/mm/fault.c:1515 arch/x86/mm/fault.c:1563)
>> kernel: [    5.576937] ? asm_exc_page_fault (./arch/x86/include/asm/idtentry.h:570)
>> kernel: [    5.576942] ? gfx_v10_0_early_init (drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c:4009 drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c:7478) amdgpu
>> kernel: [    5.577056] amdgpu_device_init (drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:2465 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c:4042) amdgpu
>> kernel: [    5.577158] ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:181)
>> kernel: [    5.577161] ? pci_bus_read_config_word (drivers/pci/access.c:67 (discriminator 2))
>> kernel: [    5.577166] ? pci_read_config_word (drivers/pci/access.c:563)
>> kernel: [    5.577168] ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:181)
>> kernel: [    5.577171] ? do_pci_enable_device (drivers/pci/pci.c:1975 drivers/pci/pci.c:1949)
>> kernel: [    5.577176] amdgpu_driver_load_kms (drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c:146) amdgpu
>> kernel: [    5.577275] amdgpu_pci_probe (drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c:2237) amdgpu
>> kernel: [    5.577373] local_pci_probe (drivers/pci/pci-driver.c:324)
>> kernel: [    5.577377] pci_device_probe (drivers/pci/pci-driver.c:392 drivers/pci/pci-driver.c:417 drivers/pci/pci-driver.c:460)
>> kernel: [    5.577381] really_probe (drivers/base/dd.c:579 drivers/base/dd.c:658)
>> kernel: [    5.577386] __driver_probe_device (drivers/base/dd.c:800)
>> kernel: [    5.577389] driver_probe_device (drivers/base/dd.c:830)
>> kernel: [    5.577392] __driver_attach (drivers/base/dd.c:1217)
>> kernel: [    5.577396] ? __pfx___driver_attach (drivers/base/dd.c:1157)
>> kernel: [    5.577399] bus_for_each_dev (drivers/base/bus.c:368)
>> kernel: [    5.577402] driver_attach (drivers/base/dd.c:1234)
>> kernel: [    5.577405] bus_add_driver (drivers/base/bus.c:674)
>> kernel: [    5.577409] driver_register (drivers/base/driver.c:246)
>> kernel: [    5.577411] ? __pfx_amdgpu_init (drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c:2497) amdgpu
>> kernel: [    5.577521] __pci_register_driver (drivers/pci/pci-driver.c:1456)
>> kernel: [    5.577524] amdgpu_init (drivers/gpu/drm/amd/amdgpu/amdgpu_drvc:2805) amdgpu
>> kernel: [    5.577628] do_one_initcall (init/main.c:1236)
>> kernel: [    5.577632] ? kmalloc_trace (mm/slub.c:3816 mm/slub.c:3860 mm/slub.c:4007)
>> kernel: [    5.577637] do_init_module (kernel/module/main.c:2533)
>> kernel: [    5.577640] load_module (kernel/module/main.c:2984)
>> kernel: [    5.577647] init_module_from_file (kernel/module/main.c:3151)
>> kernel: [    5.577649] ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:181)
>> kernel: [    5.577652] ? init_module_from_file (kernel/module/main.c:3151)
>> kernel: [    5.577657] idempotent_init_module (kernel/module/main.c:3168)
>> kernel: [    5.577661] __x64_sys_finit_module (./include/linux/file.h:45 kernel/module/main.c:3190 kernel/module/main.c:3172 kernel/module/main.c:3172)
>> kernel: [    5.577664] do_syscall_64 (arch/x86/entry/common.c:52 arch/x86/entry/common.c:83)
>> kernel: [    5.577668] ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:181)
>> kernel: [    5.577671] ? ksys_mmap_pgoff (mm/mmap.c:1428)
>> kernel: [    5.577675] ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:181)
>> kernel: [    5.577678] ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:181)
>> kernel: [    5.577681] ? syscall_exit_to_user_mode (kernel/entry/common.c:215)
>> kernel: [    5.577684] ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:181)
>> kernel: [    5.577687] ? do_syscall_64 (./arch/x86/include/asm/cpufeatureh:171 arch/x86/entry/common.c:98)
>> kernel: [    5.577689] ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:181)
>> kernel: [    5.577692] ? do_syscall_64 (./arch/x86/include/asm/cpufeatureh:171 arch/x86/entry/common.c:98)
>> kernel: [    5.577695] ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:181)
>> kernel: [    5.577698] ? do_syscall_64 (./arch/x86/include/asm/cpufeatureh:171 arch/x86/entry/common.c:98)
>> kernel: [    5.577700] ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:181)
>> kernel: [    5.577703] ? sysvec_call_function (arch/x86/kernel/smp.c:253 (discriminator 69))
>> kernel: [    5.577707] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:129)
>> kernel: [    5.577709] RIP: 0033:0x7fdaa331e88d
>> kernel: [ 5.577724] Code: 5b 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 73 b5 0f 00 f7 d8 64 89 01 48
>> All code
>> ========
>>     0:	5b                   	pop    %rbx
>>     1:	41 5c                	pop    %r12
>>     3:	c3                   	ret
>>     4:	66 0f 1f 84 00 00 00 	nopw   0x0(%rax,%rax,1)
>>     b:	00 00
>>     d:	f3 0f 1e fa          	endbr64
>>    11:	48 89 f8             	mov    %rdi,%rax
>>    14:	48 89 f7             	mov    %rsi,%rdi
>>    17:	48 89 d6             	mov    %rdx,%rsi
>>    1a:	48 89 ca             	mov    %rcx,%rdx
>>    1d:	4d 89 c2             	mov    %r8,%r10
>>    20:	4d 89 c8             	mov    %r9,%r8
>>    23:	4c 8b 4c 24 08       	mov    0x8(%rsp),%r9
>>    28:	0f 05                	syscall
>>    2a:*	48 3d 01 f0 ff ff    	cmp    $0xfffffffffffff001,%rax		<-- trapping instruction
>>    30:	73 01                	jae    0x33
>>    32:	c3                   	ret
>>    33:	48 8b 0d 73 b5 0f 00 	mov    0xfb573(%rip),%rcx        # 0xfb5ad
>>    3a:	f7 d8                	neg    %eax
>>    3c:	64 89 01             	mov    %eax,%fs:(%rcx)
>>    3f:	48                   	rex.W
>>
>> Code starting with the faulting instruction
>> ===========================================
>>     0:	48 3d 01 f0 ff ff    	cmp    $0xfffffffffffff001,%rax
>>     6:	73 01                	jae    0x9
>>     8:	c3                   	ret
>>     9:	48 8b 0d 73 b5 0f 00 	mov    0xfb573(%rip),%rcx        # 0xfb583
>>    10:	f7 d8                	neg    %eax
>>    12:	64 89 01             	mov    %eax,%fs:(%rcx)
>>    15:	48                   	rex.W
>> kernel: [    5.577729] RSP: 002b:00007ffeb4f87d28 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
>> kernel: [    5.577733] RAX: ffffffffffffffda RBX: 000055aedf3eeeb0 RCX: 00007fdaa331e88d
>> kernel: [    5.577736] RDX: 0000000000000000 RSI: 000055aedf3efb80 RDI: 000000000000001a
>> kernel: [    5.577738] RBP: 0000000000020000 R08: 0000000000000000 R09: 0000000000000002
>> kernel: [    5.577741] R10: 000000000000001a R11: 0000000000000246 R12: 000055aedf3efb80
>> kernel: [    5.577744] R13: 000055aedf3f2060 R14: 0000000000000000 R15: 000055aedf2b1220
>> kernel: [    5.577748]  </TASK>
>> kernel: [    5.577750] Modules linked in: intel_rapl_msr intel_rapl_common amdgpu(+) edac_mce_amd kvm_amd kvm snd_hda_codec_realtek snd_hda_codec_generic irqbypass ledtrig_audio crct10dif_pclmul polyval_clmulni polyval_generic snd_hda_codec_hdmi ghash_clmulni_intel sha512_ssse3 sha256_ssse3 sha1_ssse3 amdxcp snd_hda_intel aesni_intel drm_exec snd_intel_dspcfg crypto_simd gpu_sched snd_intel_sdw_acpi cryptd nls_iso8859_1 drm_buddy snd_hda_codec snd_seq_midi drm_suballoc_helper snd_seq_midi_event drm_ttm_helper joydev snd_hda_core input_leds ttm rapl snd_rawmidi snd_hwdep drm_display_helper snd_seq snd_pcm wmi_bmof cec k10temp snd_seq_device ccp rc_core snd_timer snd drm_kms_helper i2c_algo_bit soundcore mac_hid tcp_bbr sch_fq msr parport_pc ppdev lp drm parport efi_pstore ip_tables x_tables autofs4 btrfs blake2b_generic xor raid6_pq libcrc32c hid_generic usbhid hid crc32_pclmul nvme r8169 ahci nvme_core i2c_piix4 xhci_pci libahci xhci_pci_renesas realtek video wmi gpio_amdpt
>> kernel: [    5.577817] CR2: 0000000000000008
>> kernel: [    5.577820] ---[ end trace 0000000000000000 ]---
>> kernel: [    5.914230] RIP: 0010:gfx_v10_0_early_init (drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c:4009 drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c:7478) amdgpu
>> kernel: [ 5.914388] Code: 8d 55 a8 4c 89 ff e8 e4 83 ec ff 41 89 c2 83 f8 ed 0f 84 b3 fd ff ff 85 c0 74 05 0f 1f 44 00 00 49 8b 87 08 87 01 00 4c 89 ff <48> 8b 40 08 0f b7 50 0a 0f b7 70 08 e8 e4 42 fb ff 41 89 c2 85 c0
>> All code
>> ========
>>     0:	8d 55 a8             	lea    -0x58(%rbp),%edx
>>     3:	4c 89 ff             	mov    %r15,%rdi
>>     6:	e8 e4 83 ec ff       	call   0xffffffffffec83ef
>>     b:	41 89 c2             	mov    %eax,%r10d
>>     e:	83 f8 ed             	cmp    $0xffffffed,%eax
>>    11:	0f 84 b3 fd ff ff    	je     0xfffffffffffffdca
>>    17:	85 c0                	test   %eax,%eax
>>    19:	74 05                	je     0x20
>>    1b:	0f 1f 44 00 00       	nopl   0x0(%rax,%rax,1)
>>    20:	49 8b 87 08 87 01 00 	mov    0x18708(%r15),%rax
>>    27:	4c 89 ff             	mov    %r15,%rdi
>>    2a:*	48 8b 40 08          	mov    0x8(%rax),%rax		<-- trapping instruction
>>    2e:	0f b7 50 0a          	movzwl 0xa(%rax),%edx
>>    32:	0f b7 70 08          	movzwl 0x8(%rax),%esi
>>    36:	e8 e4 42 fb ff       	call   0xfffffffffffb431f
>>    3b:	41 89 c2             	mov    %eax,%r10d
>>    3e:	85 c0                	test   %eax,%eax
>>
>> Code starting with the faulting instruction
>> ===========================================
>>     0:	48 8b 40 08          	mov    0x8(%rax),%rax
>>     4:	0f b7 50 0a          	movzwl 0xa(%rax),%edx
>>     8:	0f b7 70 08          	movzwl 0x8(%rax),%esi
>>     c:	e8 e4 42 fb ff       	call   0xfffffffffffb42f5
>>    11:	41 89 c2             	mov    %eax,%r10d
>>    14:	85 c0                	test   %eax,%eax
>> rsyslogd: rsyslogd's groupid changed to 111
>> kernel: [    5.914394] RSP: 0018:ffffa5b3c103f720 EFLAGS: 00010282
>> kernel: [    5.914397] RAX: 0000000000000000 RBX: ffffffffc1d73489 RCX: 0000000000000000
>> kernel: [    5.914399] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff91ae4fa80000
>> kernel: [    5.914402] RBP: ffffa5b3c103f7b0 R08: 0000000000000000 R09: 0000000000000000
>> kernel: [    5.914405] R10: 00000000ffffffea R11: 0000000000000000 R12: ffff91ae4fa986e8
>> kernel: [    5.914408] R13: ffff91ae4fa986d8 R14: ffff91ae4fa986f8 R15: ffff91ae4fa80000
>> kernel: [    5.914410] FS:  00007fdaa343c8c0(0000) GS:ffff91bd58440000(0000) knlGS:0000000000000000
>> kernel: [    5.914414] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> kernel: [    5.914416] CR2: 0000000000000008 CR3: 00000001222d0000 CR4: 0000000000750ef0
>> kernel: [    5.914419] PKRU: 55555554
>>
>> Best regards,
>> Mirsad
>>
>> On 1/18/24 18:23, Mirsad Todorovac wrote:
>>> Hi,
>>>
>>> Unfortunately, I was not able to reboot in this kernel again to do the stack decode, but I thought
>>> that any information about the NULL pointer dereference is better than no info.
>>>
>>> The system is Ubuntu 23.10 Mantic with AMD product: Navi 23 [Radeon RX 6600/6600 XT/6600M]
>>> graphic card.
>>>
>>> Please find the config and the hw listing attached.
>>>
>>> Best regards,
>>> Mirsad
>>
>>
>>
>>> kernel: [    5.576702] BUG: kernel NULL pointer dereference, address: 0000000000000008
>>> kernel: [    5.576707] #PF: supervisor read access in kernel mode
>>> kernel: [    5.576710] #PF: error_code(0x0000) - not-present page
>>> kernel: [    5.576712] PGD 0 P4D 0
>>> kernel: [    5.576715] Oops: 0000 [#1] PREEMPT SMP NOPTI
>>> kernel: [    5.576718] CPU: 9 PID: 650 Comm: systemd-udevd Not tainted 6.7.0-rtl-v0.2-nokcsan-09928-g052d534373b7 #2
>>> kernel: [    5.576723] Hardware name: ASRock X670E PG Lightning/X670E PG Lightning, BIOS 1.21 04/26/2023
>>> kernel: [    5.576726] RIP: 0010:gfx_v10_0_early_init+0x5ab/0x8d0 [amdgpu]
>>> kernel: [    5.576872] Code: 8d 55 a8 4c 89 ff e8 e4 83 ec ff 41 89 c2 83 f8 ed 0f 84 b3 fd ff ff 85 c0 74 05 0f 1f 44 00 00 49 8b 87 08 87 01 00 4c 89 ff <48> 8b 40 08 0f b7 50 0a 0f b7 70 08 e8 e4 42 fb ff 41 89 c2 85 c0
>>> kernel: [    5.576878] RSP: 0018:ffffa5b3c103f720 EFLAGS: 00010282
>>> kernel: [    5.576881] RAX: 0000000000000000 RBX: ffffffffc1d73489 RCX: 0000000000000000
>>> kernel: [    5.576884] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff91ae4fa80000
>>> kernel: [    5.576886] RBP: ffffa5b3c103f7b0 R08: 0000000000000000 R09: 0000000000000000
>>> kernel: [    5.576889] R10: 00000000ffffffea R11: 0000000000000000 R12: ffff91ae4fa986e8
>>> kernel: [    5.576892] R13: ffff91ae4fa986d8 R14: ffff91ae4fa986f8 R15: ffff91ae4fa80000
>>> kernel: [    5.576895] FS:  00007fdaa343c8c0(0000) GS:ffff91bd58440000(0000) knlGS:0000000000000000
>>> kernel: [    5.576898] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> kernel: [    5.576900] CR2: 0000000000000008 CR3: 00000001222d0000 CR4: 0000000000750ef0
>>> kernel: [    5.576903] PKRU: 55555554
>>> kernel: [    5.576905] Call Trace:
>>> kernel: [    5.576907]  <TASK>
>>> kernel: [    5.576909]  ? show_regs+0x72/0x90
>>> kernel: [    5.576914]  ? __die+0x25/0x80
>>> kernel: [    5.576917]  ? page_fault_oops+0x154/0x4c0
>>> kernel: [    5.576921]  ? srso_alias_return_thunk+0x5/0xfbef5
>>> kernel: [    5.576925]  ? crypto_alloc_tfmmem.isra.0+0x35/0x70
>>> kernel: [    5.576930]  ? do_user_addr_fault+0x30e/0x6e0
>>> kernel: [    5.576934]  ? exc_page_fault+0x84/0x1b0
>>> kernel: [    5.576937]  ? asm_exc_page_fault+0x27/0x30
>>> kernel: [    5.576942]  ? gfx_v10_0_early_init+0x5ab/0x8d0 [amdgpu]
>>> kernel: [    5.577056]  amdgpu_device_init+0xefa/0x2de0 [amdgpu]
>>> kernel: [    5.577158]  ? srso_alias_return_thunk+0x5/0xfbef5
>>> kernel: [    5.577161]  ? pci_bus_read_config_word+0x47/0x90
>>> kernel: [    5.577166]  ? pci_read_config_word+0x27/0x60
>>> kernel: [    5.577168]  ? srso_alias_return_thunk+0x5/0xfbef5
>>> kernel: [    5.577171]  ? do_pci_enable_device+0xe1/0x110
>>> kernel: [    5.577176]  amdgpu_driver_load_kms+0x1a/0x1c0 [amdgpu]
>>> kernel: [    5.577275]  amdgpu_pci_probe+0x1a8/0x5e0 [amdgpu]
>>> kernel: [    5.577373]  local_pci_probe+0x48/0xb0
>>> kernel: [    5.577377]  pci_device_probe+0xc8/0x290
>>> kernel: [    5.577381]  really_probe+0x1d2/0x440
>>> kernel: [    5.577386]  __driver_probe_device+0x8a/0x190
>>> kernel: [    5.577389]  driver_probe_device+0x23/0xd0
>>> kernel: [    5.577392]  __driver_attach+0x10f/0x220
>>> kernel: [    5.577396]  ? __pfx___driver_attach+0x10/0x10
>>> kernel: [    5.577399]  bus_for_each_dev+0x7a/0xe0
>>> kernel: [    5.577402]  driver_attach+0x1e/0x30
>>> kernel: [    5.577405]  bus_add_driver+0x127/0x240
>>> kernel: [    5.577409]  driver_register+0x64/0x140
>>> kernel: [    5.577411]  ? __pfx_amdgpu_init+0x10/0x10 [amdgpu]
>>> kernel: [    5.577521]  __pci_register_driver+0x68/0x80
>>> kernel: [    5.577524]  amdgpu_init+0x69/0xff0 [amdgpu]
>>> kernel: [    5.577628]  do_one_initcall+0x46/0x330
>>> kernel: [    5.577632]  ? kmalloc_trace+0x136/0x370
>>> kernel: [    5.577637]  do_init_module+0x6a/0x280
>>> kernel: [    5.577640]  load_module+0x2419/0x2500
>>> kernel: [    5.577647]  init_module_from_file+0x9c/0xf0
>>> kernel: [    5.577649]  ? srso_alias_return_thunk+0x5/0xfbef5
>>> kernel: [    5.577652]  ? init_module_from_file+0x9c/0xf0
>>> kernel: [    5.577657]  idempotent_init_module+0x184/0x240
>>> kernel: [    5.577661]  __x64_sys_finit_module+0x64/0xd0
>>> kernel: [    5.577664]  do_syscall_64+0x76/0x140
>>> kernel: [    5.577668]  ? srso_alias_return_thunk+0x5/0xfbef5
>>> kernel: [    5.577671]  ? ksys_mmap_pgoff+0x123/0x270
>>> kernel: [    5.577675]  ? srso_alias_return_thunk+0x5/0xfbef5
>>> kernel: [    5.577678]  ? srso_alias_return_thunk+0x5/0xfbef5
>>> kernel: [    5.577681]  ? syscall_exit_to_user_mode+0x97/0x1e0
>>> kernel: [    5.577684]  ? srso_alias_return_thunk+0x5/0xfbef5
>>> kernel: [    5.577687]  ? do_syscall_64+0x85/0x140
>>> kernel: [    5.577689]  ? srso_alias_return_thunk+0x5/0xfbef5
>>> kernel: [    5.577692]  ? do_syscall_64+0x85/0x140
>>> kernel: [    5.577695]  ? srso_alias_return_thunk+0x5/0xfbef5
>>> kernel: [    5.577698]  ? do_syscall_64+0x85/0x140
>>> kernel: [    5.577700]  ? srso_alias_return_thunk+0x5/0xfbef5
>>> kernel: [    5.577703]  ? sysvec_call_function+0x4e/0xb0
>>> kernel: [    5.577707]  entry_SYSCALL_64_after_hwframe+0x6e/0x76
>>> kernel: [    5.577709] RIP: 0033:0x7fdaa331e88d
>>> kernel: [    5.577724] Code: 5b 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 73 b5 0f 00 f7 d8 64 89 01 48
>>> kernel: [    5.577729] RSP: 002b:00007ffeb4f87d28 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
>>> kernel: [    5.577733] RAX: ffffffffffffffda RBX: 000055aedf3eeeb0 RCX: 00007fdaa331e88d
>>> kernel: [    5.577736] RDX: 0000000000000000 RSI: 000055aedf3efb80 RDI: 000000000000001a
>>> kernel: [    5.577738] RBP: 0000000000020000 R08: 0000000000000000 R09: 0000000000000002
>>> kernel: [    5.577741] R10: 000000000000001a R11: 0000000000000246 R12: 000055aedf3efb80
>>> kernel: [    5.577744] R13: 000055aedf3f2060 R14: 0000000000000000 R15: 000055aedf2b1220
>>> kernel: [    5.577748]  </TASK>
>>> kernel: [    5.577750] Modules linked in: intel_rapl_msr intel_rapl_common amdgpu(+) edac_mce_amd kvm_amd kvm snd_hda_codec_realtek snd_hda_codec_generic irqbypass ledtrig_audio crct10dif_pclmul polyval_clmulni polyval_generic snd_hda_codec_hdmi ghash_clmulni_intel sha512_ssse3 sha256_ssse3 sha1_ssse3 amdxcp snd_hda_intel aesni_intel drm_exec snd_intel_dspcfg crypto_simd gpu_sched snd_intel_sdw_acpi cryptd nls_iso8859_1 drm_buddy snd_hda_codec snd_seq_midi drm_suballoc_helper snd_seq_midi_event drm_ttm_helper joydev snd_hda_core input_leds ttm rapl snd_rawmidi snd_hwdep drm_display_helper snd_seq snd_pcm wmi_bmof cec k10temp snd_seq_device ccp rc_core snd_timer snd drm_kms_helper i2c_algo_bit soundcore mac_hid tcp_bbr sch_fq msr parport_pc ppdev lp drm parport efi_pstore ip_tables x_tables autofs4 btrfs blake2b_generic xor raid6_pq libcrc32c hid_generic usbhid hid crc32_pclmul nvme r8169 ahci nvme_core i2c_piix4 xhci_pci libahci xhci_pci_renesas realtek video wmi gpio_amdpt
>>> kernel: [    5.577817] CR2: 0000000000000008
>>> kernel: [    5.577820] ---[ end trace 0000000000000000 ]---
>>> kernel: [    5.914230] RIP: 0010:gfx_v10_0_early_init+0x5ab/0x8d0 [amdgpu]
>>> kernel: [    5.914388] Code: 8d 55 a8 4c 89 ff e8 e4 83 ec ff 41 89 c2 83 f8 ed 0f 84 b3 fd ff ff 85 c0 74 05 0f 1f 44 00 00 49 8b 87 08 87 01 00 4c 89 ff <48> 8b 40 08 0f b7 50 0a 0f b7 70 08 e8 e4 42 fb ff 41 89 c2 85 c0
>>> rsyslogd: rsyslogd's groupid changed to 111
>>> kernel: [    5.914394] RSP: 0018:ffffa5b3c103f720 EFLAGS: 00010282
>>> kernel: [    5.914397] RAX: 0000000000000000 RBX: ffffffffc1d73489 RCX: 0000000000000000
>>> kernel: [    5.914399] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff91ae4fa80000
>>> kernel: [    5.914402] RBP: ffffa5b3c103f7b0 R08: 0000000000000000 R09: 0000000000000000
>>> kernel: [    5.914405] R10: 00000000ffffffea R11: 0000000000000000 R12: ffff91ae4fa986e8
>>> kernel: [    5.914408] R13: ffff91ae4fa986d8 R14: ffff91ae4fa986f8 R15: ffff91ae4fa80000
>>> kernel: [    5.914410] FS:  00007fdaa343c8c0(0000) GS:ffff91bd58440000(0000) knlGS:0000000000000000
>>> kernel: [    5.914414] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> kernel: [    5.914416] CR2: 0000000000000008 CR3: 00000001222d0000 CR4: 0000000000750ef0
>>> kernel: [    5.914419] PKRU: 55555554

-- 
Mirsad Goran Todorovac
Sistem inženjer
Grafički fakultet | Akademija likovnih umjetnosti
Sveučilište u Zagrebu
 
System engineer
Faculty of Graphic Arts | Academy of Fine Arts
University of Zagreb, Republic of Croatia
The European Union

"I see something approaching fast ... Will it be friends with me?"


  reply	other threads:[~2024-01-22 22:39 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-18 17:23 BUG: kernel NULL pointer dereference, address: 0000000000000008 Mirsad Todorovac
2024-01-18 17:23 ` Mirsad Todorovac
2024-01-20 19:54 ` BUG [RESEND]: " Mirsad Todorovac
2024-01-20 19:54   ` Mirsad Todorovac
2024-01-22  8:34   ` Ma, Jun
2024-01-22  8:34     ` Ma, Jun
2024-01-22  8:34     ` Ma, Jun
2024-01-22 22:39     ` Mirsad Todorovac [this message]
2024-01-22 22:39       ` Mirsad Todorovac
2024-01-24 17:48     ` BUG [RESEND][NEW BUG]: " Mirsad Todorovac
2024-01-24 17:48       ` Mirsad Todorovac
2024-01-25  7:38       ` Ma, Jun
2024-01-25  7:38         ` Ma, Jun
2024-01-25  7:38         ` Ma, Jun
2024-01-25  9:29         ` Mirsad Todorovac
2024-01-25  9:29           ` Mirsad Todorovac
2024-01-25 18:02           ` Mirsad Todorovac
2024-01-25 18:02             ` Mirsad Todorovac

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bb91dc43-d331-4999-b43f-a741c865f7f2@alu.unizg.hr \
    --to=mirsad.todorovac@alu.unizg.hr \
    --cc=David.Francis@amd.com \
    --cc=Felix.Kuehling@amd.com \
    --cc=Lang.Yu@amd.com \
    --cc=Xinhui.Pan@amd.com \
    --cc=airlied@gmail.com \
    --cc=alexander.deucher@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=boyuan.zhang@amd.com \
    --cc=christian.koenig@amd.com \
    --cc=daniel@ffwll.ch \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=guchun.chen@amd.com \
    --cc=lijo.lazar@amd.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=majun@amd.com \
    --cc=marek.olsak@amd.com \
    --cc=sathishkumar.sundararaju@amd.com \
    --cc=srinivasan.shanmugam@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.