All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
* NULL pointer dereference in xenbus_thread->...
@ 2023-08-28 21:50 Marek Marczykowski-Górecki
  2023-10-22 14:14 ` Marek Marczykowski-Górecki
  0 siblings, 1 reply; 5+ messages in thread
From: Marek Marczykowski-Górecki @ 2023-08-28 21:50 UTC (permalink / raw
  To: xen-devel

[-- Attachment #1: Type: text/plain, Size: 5037 bytes --]

Hi,

I've noticed in Qubes's CI failure like this:

[  871.271292] BUG: kernel NULL pointer dereference, address: 0000000000000000
[  871.275290] #PF: supervisor read access in kernel mode
[  871.277282] #PF: error_code(0x0000) - not-present page
[  871.279182] PGD 106fdb067 P4D 106fdb067 PUD 106fdc067 PMD 0 
[  871.281071] Oops: 0000 [#1] PREEMPT SMP NOPTI
[  871.282698] CPU: 1 PID: 28 Comm: xenbus Not tainted 6.1.43-1.qubes.fc37.x86_64 #1
[  871.285222] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552-rebuilt.opensuse.org 04/01/2014
[  871.288883] RIP: e030:__wake_up_common+0x4c/0x180
[  871.292838] Code: 24 0c 89 4c 24 08 4d 85 c9 74 0a 41 f6 01 04 0f 85 a3 00 00 00 48 8b 43 08 4c 8d 40 e8 48 83 c3 08 49 8d 40 18 48 39 c3 74 5b <49> 8b 40 18 31 ed 4c 8d 70 e8 45 8b 28 41 f6 c5 04 75 5f 49 8b 40
[  871.299776] RSP: e02b:ffffc900400f7e10 EFLAGS: 00010082
[  871.301656] RAX: 0000000000000000 RBX: ffff88810541ce98 RCX: 0000000000000000
[  871.304255] RDX: 0000000000000001 RSI: 0000000000000003 RDI: ffff88810541ce90
[  871.306714] RBP: ffffc900400f0280 R08: ffffffffffffffe8 R09: ffffc900400f7e68
[  871.309937] R10: 0000000000007ff0 R11: ffff888100ad3000 R12: ffffc900400f7e68
[  871.312326] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[  871.314647] FS:  0000000000000000(0000) GS:ffff88813ff00000(0000) knlGS:0000000000000000
[  871.317677] CS:  10000e030 DS: 0000 ES: 0000 CR0: 0000000080050033
[  871.319644] CR2: 0000000000000000 CR3: 00000001067fe000 CR4: 0000000000040660
[  871.321973] Call Trace:
[  871.322782]  <TASK>
[  871.323494]  ? show_trace_log_lvl+0x1d3/0x2ef
[  871.324901]  ? show_trace_log_lvl+0x1d3/0x2ef
[  871.326310]  ? show_trace_log_lvl+0x1d3/0x2ef
[  871.327721]  ? __wake_up_common_lock+0x82/0xd0
[  871.329147]  ? __die_body.cold+0x8/0xd
[  871.330378]  ? page_fault_oops+0x163/0x1a0
[  871.331691]  ? exc_page_fault+0x70/0x170
[  871.332946]  ? asm_exc_page_fault+0x22/0x30
[  871.334454]  ? __wake_up_common+0x4c/0x180
[  871.335777]  __wake_up_common_lock+0x82/0xd0
[  871.337183]  ? process_writes+0x240/0x240
[  871.338461]  process_msg+0x18e/0x2f0
[  871.339627]  xenbus_thread+0x165/0x1c0
[  871.340830]  ? cpuusage_read+0x10/0x10
[  871.342032]  kthread+0xe9/0x110
[  871.343317]  ? kthread_complete_and_exit+0x20/0x20
[  871.345020]  ret_from_fork+0x22/0x30
[  871.346239]  </TASK>
[  871.347060] Modules linked in: snd_hda_codec_generic ledtrig_audio snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device joydev snd_pcm intel_rapl_msr ppdev intel_rapl_common snd_timer pcspkr e1000e snd soundcore i2c_piix4 parport_pc parport loop fuse xenfs dm_crypt crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic floppy ghash_clmulni_intel sha512_ssse3 serio_raw virtio_scsi virtio_console bochs xhci_pci xhci_pci_renesas xhci_hcd qemu_fw_cfg drm_vram_helper drm_ttm_helper ttm ata_generic pata_acpi xen_privcmd xen_pciback xen_blkback xen_gntalloc xen_gntdev xen_evtchn scsi_dh_rdac scsi_dh_emc scsi_dh_alua uinput dm_multipath
[  871.368892] CR2: 0000000000000000
[  871.370160] ---[ end trace 0000000000000000 ]---
[  871.371719] RIP: e030:__wake_up_common+0x4c/0x180
[  871.373273] Code: 24 0c 89 4c 24 08 4d 85 c9 74 0a 41 f6 01 04 0f 85 a3 00 00 00 48 8b 43 08 4c 8d 40 e8 48 83 c3 08 49 8d 40 18 48 39 c3 74 5b <49> 8b 40 18 31 ed 4c 8d 70 e8 45 8b 28 41 f6 c5 04 75 5f 49 8b 40
[  871.379866] RSP: e02b:ffffc900400f7e10 EFLAGS: 00010082
[  871.381689] RAX: 0000000000000000 RBX: ffff88810541ce98 RCX: 0000000000000000
[  871.383971] RDX: 0000000000000001 RSI: 0000000000000003 RDI: ffff88810541ce90
[  871.386235] RBP: ffffc900400f0280 R08: ffffffffffffffe8 R09: ffffc900400f7e68
[  871.388521] R10: 0000000000007ff0 R11: ffff888100ad3000 R12: ffffc900400f7e68
[  871.390789] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[  871.393101] FS:  0000000000000000(0000) GS:ffff88813ff00000(0000) knlGS:0000000000000000
[  871.395671] CS:  10000e030 DS: 0000 ES: 0000 CR0: 0000000080050033
[  871.397863] CR2: 0000000000000000 CR3: 00000001067fe000 CR4: 0000000000040660
[  871.400441] Kernel panic - not syncing: Fatal exception
[  871.402171] Kernel Offset: disabled
(XEN) Hardware Dom0 crashed: rebooting machine in 5 seconds.

It isn't the first time I see similar crash, but I can't really
reproduce it reliably. Restarted test usually passes.
Note this is Xen nested in KVM, so it could very well be some oddity
about nested virt, although looking at the stack trace, it's unlikely
and more likely some race condition hit only on slower system.

Unfortunately I don't have symbols for this kernel handy, but there is a
single wake_up() call in process_writes(), so it shouldn't be an issue.

Any ideas?

Full log at https://openqa.qubes-os.org/tests/80779/logfile?filename=serial0.txt

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: NULL pointer dereference in xenbus_thread->...
  2023-08-28 21:50 NULL pointer dereference in xenbus_thread-> Marek Marczykowski-Górecki
@ 2023-10-22 14:14 ` Marek Marczykowski-Górecki
  2024-03-25 16:17   ` Marek Marczykowski-Górecki
  0 siblings, 1 reply; 5+ messages in thread
From: Marek Marczykowski-Górecki @ 2023-10-22 14:14 UTC (permalink / raw
  To: xen-devel

[-- Attachment #1: Type: text/plain, Size: 5566 bytes --]

On Mon, Aug 28, 2023 at 11:50:36PM +0200, Marek Marczykowski-Górecki wrote:
> Hi,
> 
> I've noticed in Qubes's CI failure like this:
> 
> [  871.271292] BUG: kernel NULL pointer dereference, address: 0000000000000000
> [  871.275290] #PF: supervisor read access in kernel mode
> [  871.277282] #PF: error_code(0x0000) - not-present page
> [  871.279182] PGD 106fdb067 P4D 106fdb067 PUD 106fdc067 PMD 0 
> [  871.281071] Oops: 0000 [#1] PREEMPT SMP NOPTI
> [  871.282698] CPU: 1 PID: 28 Comm: xenbus Not tainted 6.1.43-1.qubes.fc37.x86_64 #1
> [  871.285222] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552-rebuilt.opensuse.org 04/01/2014
> [  871.288883] RIP: e030:__wake_up_common+0x4c/0x180
> [  871.292838] Code: 24 0c 89 4c 24 08 4d 85 c9 74 0a 41 f6 01 04 0f 85 a3 00 00 00 48 8b 43 08 4c 8d 40 e8 48 83 c3 08 49 8d 40 18 48 39 c3 74 5b <49> 8b 40 18 31 ed 4c 8d 70 e8 45 8b 28 41 f6 c5 04 75 5f 49 8b 40
> [  871.299776] RSP: e02b:ffffc900400f7e10 EFLAGS: 00010082
> [  871.301656] RAX: 0000000000000000 RBX: ffff88810541ce98 RCX: 0000000000000000
> [  871.304255] RDX: 0000000000000001 RSI: 0000000000000003 RDI: ffff88810541ce90
> [  871.306714] RBP: ffffc900400f0280 R08: ffffffffffffffe8 R09: ffffc900400f7e68
> [  871.309937] R10: 0000000000007ff0 R11: ffff888100ad3000 R12: ffffc900400f7e68
> [  871.312326] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> [  871.314647] FS:  0000000000000000(0000) GS:ffff88813ff00000(0000) knlGS:0000000000000000
> [  871.317677] CS:  10000e030 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  871.319644] CR2: 0000000000000000 CR3: 00000001067fe000 CR4: 0000000000040660
> [  871.321973] Call Trace:
> [  871.322782]  <TASK>
> [  871.323494]  ? show_trace_log_lvl+0x1d3/0x2ef
> [  871.324901]  ? show_trace_log_lvl+0x1d3/0x2ef
> [  871.326310]  ? show_trace_log_lvl+0x1d3/0x2ef
> [  871.327721]  ? __wake_up_common_lock+0x82/0xd0
> [  871.329147]  ? __die_body.cold+0x8/0xd
> [  871.330378]  ? page_fault_oops+0x163/0x1a0
> [  871.331691]  ? exc_page_fault+0x70/0x170
> [  871.332946]  ? asm_exc_page_fault+0x22/0x30
> [  871.334454]  ? __wake_up_common+0x4c/0x180
> [  871.335777]  __wake_up_common_lock+0x82/0xd0
> [  871.337183]  ? process_writes+0x240/0x240
> [  871.338461]  process_msg+0x18e/0x2f0
> [  871.339627]  xenbus_thread+0x165/0x1c0
> [  871.340830]  ? cpuusage_read+0x10/0x10
> [  871.342032]  kthread+0xe9/0x110
> [  871.343317]  ? kthread_complete_and_exit+0x20/0x20
> [  871.345020]  ret_from_fork+0x22/0x30
> [  871.346239]  </TASK>
> [  871.347060] Modules linked in: snd_hda_codec_generic ledtrig_audio snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device joydev snd_pcm intel_rapl_msr ppdev intel_rapl_common snd_timer pcspkr e1000e snd soundcore i2c_piix4 parport_pc parport loop fuse xenfs dm_crypt crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic floppy ghash_clmulni_intel sha512_ssse3 serio_raw virtio_scsi virtio_console bochs xhci_pci xhci_pci_renesas xhci_hcd qemu_fw_cfg drm_vram_helper drm_ttm_helper ttm ata_generic pata_acpi xen_privcmd xen_pciback xen_blkback xen_gntalloc xen_gntdev xen_evtchn scsi_dh_rdac scsi_dh_emc scsi_dh_alua uinput dm_multipath
> [  871.368892] CR2: 0000000000000000
> [  871.370160] ---[ end trace 0000000000000000 ]---
> [  871.371719] RIP: e030:__wake_up_common+0x4c/0x180
> [  871.373273] Code: 24 0c 89 4c 24 08 4d 85 c9 74 0a 41 f6 01 04 0f 85 a3 00 00 00 48 8b 43 08 4c 8d 40 e8 48 83 c3 08 49 8d 40 18 48 39 c3 74 5b <49> 8b 40 18 31 ed 4c 8d 70 e8 45 8b 28 41 f6 c5 04 75 5f 49 8b 40
> [  871.379866] RSP: e02b:ffffc900400f7e10 EFLAGS: 00010082
> [  871.381689] RAX: 0000000000000000 RBX: ffff88810541ce98 RCX: 0000000000000000
> [  871.383971] RDX: 0000000000000001 RSI: 0000000000000003 RDI: ffff88810541ce90
> [  871.386235] RBP: ffffc900400f0280 R08: ffffffffffffffe8 R09: ffffc900400f7e68
> [  871.388521] R10: 0000000000007ff0 R11: ffff888100ad3000 R12: ffffc900400f7e68
> [  871.390789] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> [  871.393101] FS:  0000000000000000(0000) GS:ffff88813ff00000(0000) knlGS:0000000000000000
> [  871.395671] CS:  10000e030 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  871.397863] CR2: 0000000000000000 CR3: 00000001067fe000 CR4: 0000000000040660
> [  871.400441] Kernel panic - not syncing: Fatal exception
> [  871.402171] Kernel Offset: disabled
> (XEN) Hardware Dom0 crashed: rebooting machine in 5 seconds.
> 
> It isn't the first time I see similar crash, but I can't really
> reproduce it reliably. Restarted test usually passes.
> Note this is Xen nested in KVM, so it could very well be some oddity
> about nested virt, although looking at the stack trace, it's unlikely
> and more likely some race condition hit only on slower system.

Recently I've got the same crash on a real system in domU too. And also
on nested on newer kernel 6.1.57 (here it happened in dom0). So, this is
still an issue and affects not only nested case :/

> Unfortunately I don't have symbols for this kernel handy, but there is a
> single wake_up() call in process_writes(), so it shouldn't be an issue.
> 
> Any ideas?
> 
> Full log at https://openqa.qubes-os.org/tests/80779/logfile?filename=serial0.txt

More links at https://github.com/QubesOS/qubes-issues/issues/8638,
including more recent stack trace.


-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: NULL pointer dereference in xenbus_thread->...
  2023-10-22 14:14 ` Marek Marczykowski-Górecki
@ 2024-03-25 16:17   ` Marek Marczykowski-Górecki
  2024-03-26 11:00     ` Julien Grall
  0 siblings, 1 reply; 5+ messages in thread
From: Marek Marczykowski-Górecki @ 2024-03-25 16:17 UTC (permalink / raw
  To: xen-devel

[-- Attachment #1: Type: text/plain, Size: 5902 bytes --]

On Sun, Oct 22, 2023 at 04:14:30PM +0200, Marek Marczykowski-Górecki wrote:
> On Mon, Aug 28, 2023 at 11:50:36PM +0200, Marek Marczykowski-Górecki wrote:
> > Hi,
> > 
> > I've noticed in Qubes's CI failure like this:
> > 
> > [  871.271292] BUG: kernel NULL pointer dereference, address: 0000000000000000
> > [  871.275290] #PF: supervisor read access in kernel mode
> > [  871.277282] #PF: error_code(0x0000) - not-present page
> > [  871.279182] PGD 106fdb067 P4D 106fdb067 PUD 106fdc067 PMD 0 
> > [  871.281071] Oops: 0000 [#1] PREEMPT SMP NOPTI
> > [  871.282698] CPU: 1 PID: 28 Comm: xenbus Not tainted 6.1.43-1.qubes.fc37.x86_64 #1
> > [  871.285222] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552-rebuilt.opensuse.org 04/01/2014
> > [  871.288883] RIP: e030:__wake_up_common+0x4c/0x180
> > [  871.292838] Code: 24 0c 89 4c 24 08 4d 85 c9 74 0a 41 f6 01 04 0f 85 a3 00 00 00 48 8b 43 08 4c 8d 40 e8 48 83 c3 08 49 8d 40 18 48 39 c3 74 5b <49> 8b 40 18 31 ed 4c 8d 70 e8 45 8b 28 41 f6 c5 04 75 5f 49 8b 40
> > [  871.299776] RSP: e02b:ffffc900400f7e10 EFLAGS: 00010082
> > [  871.301656] RAX: 0000000000000000 RBX: ffff88810541ce98 RCX: 0000000000000000
> > [  871.304255] RDX: 0000000000000001 RSI: 0000000000000003 RDI: ffff88810541ce90
> > [  871.306714] RBP: ffffc900400f0280 R08: ffffffffffffffe8 R09: ffffc900400f7e68
> > [  871.309937] R10: 0000000000007ff0 R11: ffff888100ad3000 R12: ffffc900400f7e68
> > [  871.312326] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> > [  871.314647] FS:  0000000000000000(0000) GS:ffff88813ff00000(0000) knlGS:0000000000000000
> > [  871.317677] CS:  10000e030 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [  871.319644] CR2: 0000000000000000 CR3: 00000001067fe000 CR4: 0000000000040660
> > [  871.321973] Call Trace:
> > [  871.322782]  <TASK>
> > [  871.323494]  ? show_trace_log_lvl+0x1d3/0x2ef
> > [  871.324901]  ? show_trace_log_lvl+0x1d3/0x2ef
> > [  871.326310]  ? show_trace_log_lvl+0x1d3/0x2ef
> > [  871.327721]  ? __wake_up_common_lock+0x82/0xd0
> > [  871.329147]  ? __die_body.cold+0x8/0xd
> > [  871.330378]  ? page_fault_oops+0x163/0x1a0
> > [  871.331691]  ? exc_page_fault+0x70/0x170
> > [  871.332946]  ? asm_exc_page_fault+0x22/0x30
> > [  871.334454]  ? __wake_up_common+0x4c/0x180
> > [  871.335777]  __wake_up_common_lock+0x82/0xd0
> > [  871.337183]  ? process_writes+0x240/0x240
> > [  871.338461]  process_msg+0x18e/0x2f0
> > [  871.339627]  xenbus_thread+0x165/0x1c0
> > [  871.340830]  ? cpuusage_read+0x10/0x10
> > [  871.342032]  kthread+0xe9/0x110
> > [  871.343317]  ? kthread_complete_and_exit+0x20/0x20
> > [  871.345020]  ret_from_fork+0x22/0x30
> > [  871.346239]  </TASK>
> > [  871.347060] Modules linked in: snd_hda_codec_generic ledtrig_audio snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device joydev snd_pcm intel_rapl_msr ppdev intel_rapl_common snd_timer pcspkr e1000e snd soundcore i2c_piix4 parport_pc parport loop fuse xenfs dm_crypt crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic floppy ghash_clmulni_intel sha512_ssse3 serio_raw virtio_scsi virtio_console bochs xhci_pci xhci_pci_renesas xhci_hcd qemu_fw_cfg drm_vram_helper drm_ttm_helper ttm ata_generic pata_acpi xen_privcmd xen_pciback xen_blkback xen_gntalloc xen_gntdev xen_evtchn scsi_dh_rdac scsi_dh_emc scsi_dh_alua uinput dm_multipath
> > [  871.368892] CR2: 0000000000000000
> > [  871.370160] ---[ end trace 0000000000000000 ]---
> > [  871.371719] RIP: e030:__wake_up_common+0x4c/0x180
> > [  871.373273] Code: 24 0c 89 4c 24 08 4d 85 c9 74 0a 41 f6 01 04 0f 85 a3 00 00 00 48 8b 43 08 4c 8d 40 e8 48 83 c3 08 49 8d 40 18 48 39 c3 74 5b <49> 8b 40 18 31 ed 4c 8d 70 e8 45 8b 28 41 f6 c5 04 75 5f 49 8b 40
> > [  871.379866] RSP: e02b:ffffc900400f7e10 EFLAGS: 00010082
> > [  871.381689] RAX: 0000000000000000 RBX: ffff88810541ce98 RCX: 0000000000000000
> > [  871.383971] RDX: 0000000000000001 RSI: 0000000000000003 RDI: ffff88810541ce90
> > [  871.386235] RBP: ffffc900400f0280 R08: ffffffffffffffe8 R09: ffffc900400f7e68
> > [  871.388521] R10: 0000000000007ff0 R11: ffff888100ad3000 R12: ffffc900400f7e68
> > [  871.390789] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> > [  871.393101] FS:  0000000000000000(0000) GS:ffff88813ff00000(0000) knlGS:0000000000000000
> > [  871.395671] CS:  10000e030 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [  871.397863] CR2: 0000000000000000 CR3: 00000001067fe000 CR4: 0000000000040660
> > [  871.400441] Kernel panic - not syncing: Fatal exception
> > [  871.402171] Kernel Offset: disabled
> > (XEN) Hardware Dom0 crashed: rebooting machine in 5 seconds.
> > 
> > It isn't the first time I see similar crash, but I can't really
> > reproduce it reliably. Restarted test usually passes.
> > Note this is Xen nested in KVM, so it could very well be some oddity
> > about nested virt, although looking at the stack trace, it's unlikely
> > and more likely some race condition hit only on slower system.
> 
> Recently I've got the same crash on a real system in domU too. And also
> on nested on newer kernel 6.1.57 (here it happened in dom0). So, this is
> still an issue and affects not only nested case :/
> 
> > Unfortunately I don't have symbols for this kernel handy, but there is a
> > single wake_up() call in process_writes(), so it shouldn't be an issue.
> > 
> > Any ideas?
> > 
> > Full log at https://openqa.qubes-os.org/tests/80779/logfile?filename=serial0.txt
> 
> More links at https://github.com/QubesOS/qubes-issues/issues/8638,
> including more recent stack trace.

Happens on 6.1.75 too (new stack trace I've added to the issue above,
but it's pretty similar).

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: NULL pointer dereference in xenbus_thread->...
  2024-03-25 16:17   ` Marek Marczykowski-Górecki
@ 2024-03-26 11:00     ` Julien Grall
  2024-05-31 22:48       ` Marek Marczykowski-Górecki
  0 siblings, 1 reply; 5+ messages in thread
From: Julien Grall @ 2024-03-26 11:00 UTC (permalink / raw
  To: Marek Marczykowski-Górecki, xen-devel; +Cc: Juergen Gross

Hi Marek,

+Juergen for visibility

When sending a bug report, I would suggest to CC relevant people as 
otherwise it can get lost (not may people monitors Xen devel if they are 
not CCed).

Cheers,

On 25/03/2024 16:17, Marek Marczykowski-Górecki wrote:
> On Sun, Oct 22, 2023 at 04:14:30PM +0200, Marek Marczykowski-Górecki wrote:
>> On Mon, Aug 28, 2023 at 11:50:36PM +0200, Marek Marczykowski-Górecki wrote:
>>> Hi,
>>>
>>> I've noticed in Qubes's CI failure like this:
>>>
>>> [  871.271292] BUG: kernel NULL pointer dereference, address: 0000000000000000
>>> [  871.275290] #PF: supervisor read access in kernel mode
>>> [  871.277282] #PF: error_code(0x0000) - not-present page
>>> [  871.279182] PGD 106fdb067 P4D 106fdb067 PUD 106fdc067 PMD 0
>>> [  871.281071] Oops: 0000 [#1] PREEMPT SMP NOPTI
>>> [  871.282698] CPU: 1 PID: 28 Comm: xenbus Not tainted 6.1.43-1.qubes.fc37.x86_64 #1
>>> [  871.285222] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552-rebuilt.opensuse.org 04/01/2014
>>> [  871.288883] RIP: e030:__wake_up_common+0x4c/0x180
>>> [  871.292838] Code: 24 0c 89 4c 24 08 4d 85 c9 74 0a 41 f6 01 04 0f 85 a3 00 00 00 48 8b 43 08 4c 8d 40 e8 48 83 c3 08 49 8d 40 18 48 39 c3 74 5b <49> 8b 40 18 31 ed 4c 8d 70 e8 45 8b 28 41 f6 c5 04 75 5f 49 8b 40
>>> [  871.299776] RSP: e02b:ffffc900400f7e10 EFLAGS: 00010082
>>> [  871.301656] RAX: 0000000000000000 RBX: ffff88810541ce98 RCX: 0000000000000000
>>> [  871.304255] RDX: 0000000000000001 RSI: 0000000000000003 RDI: ffff88810541ce90
>>> [  871.306714] RBP: ffffc900400f0280 R08: ffffffffffffffe8 R09: ffffc900400f7e68
>>> [  871.309937] R10: 0000000000007ff0 R11: ffff888100ad3000 R12: ffffc900400f7e68
>>> [  871.312326] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
>>> [  871.314647] FS:  0000000000000000(0000) GS:ffff88813ff00000(0000) knlGS:0000000000000000
>>> [  871.317677] CS:  10000e030 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [  871.319644] CR2: 0000000000000000 CR3: 00000001067fe000 CR4: 0000000000040660
>>> [  871.321973] Call Trace:
>>> [  871.322782]  <TASK>
>>> [  871.323494]  ? show_trace_log_lvl+0x1d3/0x2ef
>>> [  871.324901]  ? show_trace_log_lvl+0x1d3/0x2ef
>>> [  871.326310]  ? show_trace_log_lvl+0x1d3/0x2ef
>>> [  871.327721]  ? __wake_up_common_lock+0x82/0xd0
>>> [  871.329147]  ? __die_body.cold+0x8/0xd
>>> [  871.330378]  ? page_fault_oops+0x163/0x1a0
>>> [  871.331691]  ? exc_page_fault+0x70/0x170
>>> [  871.332946]  ? asm_exc_page_fault+0x22/0x30
>>> [  871.334454]  ? __wake_up_common+0x4c/0x180
>>> [  871.335777]  __wake_up_common_lock+0x82/0xd0
>>> [  871.337183]  ? process_writes+0x240/0x240
>>> [  871.338461]  process_msg+0x18e/0x2f0
>>> [  871.339627]  xenbus_thread+0x165/0x1c0
>>> [  871.340830]  ? cpuusage_read+0x10/0x10
>>> [  871.342032]  kthread+0xe9/0x110
>>> [  871.343317]  ? kthread_complete_and_exit+0x20/0x20
>>> [  871.345020]  ret_from_fork+0x22/0x30
>>> [  871.346239]  </TASK>
>>> [  871.347060] Modules linked in: snd_hda_codec_generic ledtrig_audio snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device joydev snd_pcm intel_rapl_msr ppdev intel_rapl_common snd_timer pcspkr e1000e snd soundcore i2c_piix4 parport_pc parport loop fuse xenfs dm_crypt crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic floppy ghash_clmulni_intel sha512_ssse3 serio_raw virtio_scsi virtio_console bochs xhci_pci xhci_pci_renesas xhci_hcd qemu_fw_cfg drm_vram_helper drm_ttm_helper ttm ata_generic pata_acpi xen_privcmd xen_pciback xen_blkback xen_gntalloc xen_gntdev xen_evtchn scsi_dh_rdac scsi_dh_emc scsi_dh_alua uinput dm_multipath
>>> [  871.368892] CR2: 0000000000000000
>>> [  871.370160] ---[ end trace 0000000000000000 ]---
>>> [  871.371719] RIP: e030:__wake_up_common+0x4c/0x180
>>> [  871.373273] Code: 24 0c 89 4c 24 08 4d 85 c9 74 0a 41 f6 01 04 0f 85 a3 00 00 00 48 8b 43 08 4c 8d 40 e8 48 83 c3 08 49 8d 40 18 48 39 c3 74 5b <49> 8b 40 18 31 ed 4c 8d 70 e8 45 8b 28 41 f6 c5 04 75 5f 49 8b 40
>>> [  871.379866] RSP: e02b:ffffc900400f7e10 EFLAGS: 00010082
>>> [  871.381689] RAX: 0000000000000000 RBX: ffff88810541ce98 RCX: 0000000000000000
>>> [  871.383971] RDX: 0000000000000001 RSI: 0000000000000003 RDI: ffff88810541ce90
>>> [  871.386235] RBP: ffffc900400f0280 R08: ffffffffffffffe8 R09: ffffc900400f7e68
>>> [  871.388521] R10: 0000000000007ff0 R11: ffff888100ad3000 R12: ffffc900400f7e68
>>> [  871.390789] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
>>> [  871.393101] FS:  0000000000000000(0000) GS:ffff88813ff00000(0000) knlGS:0000000000000000
>>> [  871.395671] CS:  10000e030 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [  871.397863] CR2: 0000000000000000 CR3: 00000001067fe000 CR4: 0000000000040660
>>> [  871.400441] Kernel panic - not syncing: Fatal exception
>>> [  871.402171] Kernel Offset: disabled
>>> (XEN) Hardware Dom0 crashed: rebooting machine in 5 seconds.
>>>
>>> It isn't the first time I see similar crash, but I can't really
>>> reproduce it reliably. Restarted test usually passes.
>>> Note this is Xen nested in KVM, so it could very well be some oddity
>>> about nested virt, although looking at the stack trace, it's unlikely
>>> and more likely some race condition hit only on slower system.
>>
>> Recently I've got the same crash on a real system in domU too. And also
>> on nested on newer kernel 6.1.57 (here it happened in dom0). So, this is
>> still an issue and affects not only nested case :/
>>
>>> Unfortunately I don't have symbols for this kernel handy, but there is a
>>> single wake_up() call in process_writes(), so it shouldn't be an issue.
>>>
>>> Any ideas?
>>>
>>> Full log at https://openqa.qubes-os.org/tests/80779/logfile?filename=serial0.txt
>>
>> More links at https://github.com/QubesOS/qubes-issues/issues/8638,
>> including more recent stack trace.
> 
> Happens on 6.1.75 too (new stack trace I've added to the issue above,
> but it's pretty similar).
> 

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: NULL pointer dereference in xenbus_thread->...
  2024-03-26 11:00     ` Julien Grall
@ 2024-05-31 22:48       ` Marek Marczykowski-Górecki
  0 siblings, 0 replies; 5+ messages in thread
From: Marek Marczykowski-Górecki @ 2024-05-31 22:48 UTC (permalink / raw
  To: Julien Grall; +Cc: xen-devel, Juergen Gross

[-- Attachment #1: Type: text/plain, Size: 7304 bytes --]

On Tue, Mar 26, 2024 at 11:00:50AM +0000, Julien Grall wrote:
> Hi Marek,
> 
> +Juergen for visibility
> 
> When sending a bug report, I would suggest to CC relevant people as
> otherwise it can get lost (not may people monitors Xen devel if they are not
> CCed).
> 
> Cheers,
> 
> On 25/03/2024 16:17, Marek Marczykowski-Górecki wrote:
> > On Sun, Oct 22, 2023 at 04:14:30PM +0200, Marek Marczykowski-Górecki wrote:
> > > On Mon, Aug 28, 2023 at 11:50:36PM +0200, Marek Marczykowski-Górecki wrote:
> > > > Hi,
> > > > 
> > > > I've noticed in Qubes's CI failure like this:
> > > > 
> > > > [  871.271292] BUG: kernel NULL pointer dereference, address: 0000000000000000
> > > > [  871.275290] #PF: supervisor read access in kernel mode
> > > > [  871.277282] #PF: error_code(0x0000) - not-present page
> > > > [  871.279182] PGD 106fdb067 P4D 106fdb067 PUD 106fdc067 PMD 0
> > > > [  871.281071] Oops: 0000 [#1] PREEMPT SMP NOPTI
> > > > [  871.282698] CPU: 1 PID: 28 Comm: xenbus Not tainted 6.1.43-1.qubes.fc37.x86_64 #1
> > > > [  871.285222] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552-rebuilt.opensuse.org 04/01/2014
> > > > [  871.288883] RIP: e030:__wake_up_common+0x4c/0x180
> > > > [  871.292838] Code: 24 0c 89 4c 24 08 4d 85 c9 74 0a 41 f6 01 04 0f 85 a3 00 00 00 48 8b 43 08 4c 8d 40 e8 48 83 c3 08 49 8d 40 18 48 39 c3 74 5b <49> 8b 40 18 31 ed 4c 8d 70 e8 45 8b 28 41 f6 c5 04 75 5f 49 8b 40
> > > > [  871.299776] RSP: e02b:ffffc900400f7e10 EFLAGS: 00010082
> > > > [  871.301656] RAX: 0000000000000000 RBX: ffff88810541ce98 RCX: 0000000000000000
> > > > [  871.304255] RDX: 0000000000000001 RSI: 0000000000000003 RDI: ffff88810541ce90
> > > > [  871.306714] RBP: ffffc900400f0280 R08: ffffffffffffffe8 R09: ffffc900400f7e68
> > > > [  871.309937] R10: 0000000000007ff0 R11: ffff888100ad3000 R12: ffffc900400f7e68
> > > > [  871.312326] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> > > > [  871.314647] FS:  0000000000000000(0000) GS:ffff88813ff00000(0000) knlGS:0000000000000000
> > > > [  871.317677] CS:  10000e030 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > [  871.319644] CR2: 0000000000000000 CR3: 00000001067fe000 CR4: 0000000000040660
> > > > [  871.321973] Call Trace:
> > > > [  871.322782]  <TASK>
> > > > [  871.323494]  ? show_trace_log_lvl+0x1d3/0x2ef
> > > > [  871.324901]  ? show_trace_log_lvl+0x1d3/0x2ef
> > > > [  871.326310]  ? show_trace_log_lvl+0x1d3/0x2ef
> > > > [  871.327721]  ? __wake_up_common_lock+0x82/0xd0
> > > > [  871.329147]  ? __die_body.cold+0x8/0xd
> > > > [  871.330378]  ? page_fault_oops+0x163/0x1a0
> > > > [  871.331691]  ? exc_page_fault+0x70/0x170
> > > > [  871.332946]  ? asm_exc_page_fault+0x22/0x30
> > > > [  871.334454]  ? __wake_up_common+0x4c/0x180
> > > > [  871.335777]  __wake_up_common_lock+0x82/0xd0
> > > > [  871.337183]  ? process_writes+0x240/0x240
> > > > [  871.338461]  process_msg+0x18e/0x2f0
> > > > [  871.339627]  xenbus_thread+0x165/0x1c0
> > > > [  871.340830]  ? cpuusage_read+0x10/0x10
> > > > [  871.342032]  kthread+0xe9/0x110
> > > > [  871.343317]  ? kthread_complete_and_exit+0x20/0x20
> > > > [  871.345020]  ret_from_fork+0x22/0x30
> > > > [  871.346239]  </TASK>
> > > > [  871.347060] Modules linked in: snd_hda_codec_generic ledtrig_audio snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device joydev snd_pcm intel_rapl_msr ppdev intel_rapl_common snd_timer pcspkr e1000e snd soundcore i2c_piix4 parport_pc parport loop fuse xenfs dm_crypt crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic floppy ghash_clmulni_intel sha512_ssse3 serio_raw virtio_scsi virtio_console bochs xhci_pci xhci_pci_renesas xhci_hcd qemu_fw_cfg drm_vram_helper drm_ttm_helper ttm ata_generic pata_acpi xen_privcmd xen_pciback xen_blkback xen_gntalloc xen_gntdev xen_evtchn scsi_dh_rdac scsi_dh_emc scsi_dh_alua uinput dm_multipath
> > > > [  871.368892] CR2: 0000000000000000
> > > > [  871.370160] ---[ end trace 0000000000000000 ]---
> > > > [  871.371719] RIP: e030:__wake_up_common+0x4c/0x180
> > > > [  871.373273] Code: 24 0c 89 4c 24 08 4d 85 c9 74 0a 41 f6 01 04 0f 85 a3 00 00 00 48 8b 43 08 4c 8d 40 e8 48 83 c3 08 49 8d 40 18 48 39 c3 74 5b <49> 8b 40 18 31 ed 4c 8d 70 e8 45 8b 28 41 f6 c5 04 75 5f 49 8b 40
> > > > [  871.379866] RSP: e02b:ffffc900400f7e10 EFLAGS: 00010082
> > > > [  871.381689] RAX: 0000000000000000 RBX: ffff88810541ce98 RCX: 0000000000000000
> > > > [  871.383971] RDX: 0000000000000001 RSI: 0000000000000003 RDI: ffff88810541ce90
> > > > [  871.386235] RBP: ffffc900400f0280 R08: ffffffffffffffe8 R09: ffffc900400f7e68
> > > > [  871.388521] R10: 0000000000007ff0 R11: ffff888100ad3000 R12: ffffc900400f7e68
> > > > [  871.390789] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> > > > [  871.393101] FS:  0000000000000000(0000) GS:ffff88813ff00000(0000) knlGS:0000000000000000
> > > > [  871.395671] CS:  10000e030 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > [  871.397863] CR2: 0000000000000000 CR3: 00000001067fe000 CR4: 0000000000040660
> > > > [  871.400441] Kernel panic - not syncing: Fatal exception
> > > > [  871.402171] Kernel Offset: disabled
> > > > (XEN) Hardware Dom0 crashed: rebooting machine in 5 seconds.
> > > > 
> > > > It isn't the first time I see similar crash, but I can't really
> > > > reproduce it reliably. Restarted test usually passes.
> > > > Note this is Xen nested in KVM, so it could very well be some oddity
> > > > about nested virt, although looking at the stack trace, it's unlikely
> > > > and more likely some race condition hit only on slower system.
> > > 
> > > Recently I've got the same crash on a real system in domU too. And also
> > > on nested on newer kernel 6.1.57 (here it happened in dom0). So, this is
> > > still an issue and affects not only nested case :/
> > > 
> > > > Unfortunately I don't have symbols for this kernel handy, but there is a
> > > > single wake_up() call in process_writes(), so it shouldn't be an issue.
> > > > 
> > > > Any ideas?
> > > > 
> > > > Full log at https://openqa.qubes-os.org/tests/80779/logfile?filename=serial0.txt
> > > 
> > > More links at https://github.com/QubesOS/qubes-issues/issues/8638,
> > > including more recent stack trace.
> > 
> > Happens on 6.1.75 too (new stack trace I've added to the issue above,
> > but it's pretty similar).

Recently I've got a report from another user about similar issue, on
6.6.29 this time. I also still encounter this issue once a month or so,
but the user claims they get it much more often:
https://github.com/QubesOS/qubes-issues/issues/8638#issuecomment-2135419896
The extra conditions reported by the user are:
- old AMD system (KGPE-D16 with Opteron 6282 SE) requiring
  `spec-ctrl=ibpb-entry=no-pv` to remain usable 
- Whonix domU, which has a bunch of sysctl parameters changed, listed
  at:
  - https://github.com/Kicksecure/security-misc
  - https://github.com/Kicksecure/security-misc/blob/master/usr/lib/sysctl.d/990-security-misc.conf
  (unsure which are relevant, maybe `vm.swappiness=1`?)


-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2024-05-31 22:49 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-08-28 21:50 NULL pointer dereference in xenbus_thread-> Marek Marczykowski-Górecki
2023-10-22 14:14 ` Marek Marczykowski-Górecki
2024-03-25 16:17   ` Marek Marczykowski-Górecki
2024-03-26 11:00     ` Julien Grall
2024-05-31 22:48       ` Marek Marczykowski-Górecki

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.