Nouveau Archive mirror
 help / color / mirror / Atom feed
* [Nouveau] nouveau bug in linux/6.1.38-2
@ 2023-08-02 21:28 Olaf Skibbe
  2023-08-04 12:02 ` Thorsten Leemhuis
  0 siblings, 1 reply; 11+ messages in thread
From: Olaf Skibbe @ 2023-08-02 21:28 UTC (permalink / raw
  To: dri-devel, nouveau; +Cc: 1042753

Dear Maintainers,

Hereby I would like to report an apparent bug in the nouveau driver in
linux/6.1.38-2.

Running a current debian stable on a Dell Latitude E6510 with a
"NVIDIA Corporation GT218M" graphic card, the monitor turns black
after the grub screen. Also switching to a console (Strg-Alt-F2) shows
just a black screen. Access via ssh is possible.

~# uname -r
6.1.0-10-amd64

demesg shows the following error message:

[    3.560153] WARNING: CPU: 0 PID: 176 at drivers/gpu/drm/nouveau/nvkm/engine/disp/dp.c:460 nvkm_dp_acquire+0x26a/0x490 [nouveau]
[    3.560287] Modules linked in: sd_mod t10_pi sr_mod crc64_rocksoft cdrom crc64 crc_t10dif crct10dif_generic nouveau(+) ahci libahci mxm_wmi i2c_algo_bit drm_display_helper libata cec rc_core drm_ttm_helper ttm scsi_mod e1000e drm_kms_helper ptp firewire_ohci sdhci_pci cqhci ehci_pci sdhci ehci_hcd firewire_core i2c_i801 crct10dif_pclmul crct10dif_common drm crc32_pclmul crc32c_intel psmouse usbcore mmc_core crc_itu_t pps_core scsi_common i2c_smbus lpc_ich usb_common battery video wmi button
[    3.560322] CPU: 0 PID: 176 Comm: kworker/u16:5 Not tainted 6.1.0-10-amd64 #1  Debian 6.1.38-2
[    3.560325] Hardware name: Dell Inc. Latitude E6510/0N5KHN, BIOS A17 05/12/2017
[    3.560327] Workqueue: nvkm-disp nv50_disp_super [nouveau]
[    3.560433] RIP: 0010:nvkm_dp_acquire+0x26a/0x490 [nouveau]
[    3.560538] Code: 48 8b 44 24 58 65 48 2b 04 25 28 00 00 00 0f 85 37 02 00 00 48 83 c4 60 44 89 e0 5b 5d 41 5c 41 5d 41 5e 41 5f c3 cc cc cc cc <0f> 0b c1 e8 03 41 88 6d 62 44 89 fe 48 89 df 48 69 c0 cf 0d d6 26
[    3.560541] RSP: 0018:ffff9899c048bd60 EFLAGS: 00010246
[    3.560542] RAX: 0000000000041eb0 RBX: ffff88e0209d2600 RCX: 0000000000041eb0
[    3.560544] RDX: ffffffffc079f760 RSI: 0000000000000000 RDI: ffff9899c048bcf0
[    3.560545] RBP: 0000000000000001 R08: ffff9899c048bc64 R09: 0000000000005b76
[    3.560546] R10: 000000000000000d R11: ffff9899c048bde0 R12: 00000000ffffffea
[    3.560548] R13: ffff88e00b39e480 R14: 0000000000044d45 R15: 0000000000000000
[    3.560549] FS:  0000000000000000(0000) GS:ffff88e123c00000(0000) knlGS:0000000000000000
[    3.560551] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    3.560552] CR2: 00007f57f4e90451 CR3: 0000000181410000 CR4: 00000000000006f0
[    3.560554] Call Trace:
[    3.560558]  <TASK>
[    3.560560]  ? __warn+0x7d/0xc0
[    3.560566]  ? nvkm_dp_acquire+0x26a/0x490 [nouveau]
[    3.560671]  ? report_bug+0xe6/0x170
[    3.560675]  ? handle_bug+0x41/0x70
[    3.560679]  ? exc_invalid_op+0x13/0x60
[    3.560681]  ? asm_exc_invalid_op+0x16/0x20
[    3.560685]  ? init_reset_begun+0x20/0x20 [nouveau]
[    3.560769]  ? nvkm_dp_acquire+0x26a/0x490 [nouveau]
[    3.560888]  nv50_disp_super_2_2+0x70/0x430 [nouveau]
[    3.560997]  nv50_disp_super+0x113/0x210 [nouveau]
[    3.561103]  process_one_work+0x1c7/0x380
[    3.561109]  worker_thread+0x4d/0x380
[    3.561113]  ? rescuer_thread+0x3a0/0x3a0
[    3.561116]  kthread+0xe9/0x110
[    3.561120]  ? kthread_complete_and_exit+0x20/0x20
[    3.561122]  ret_from_fork+0x22/0x30
[    3.561130]  </TASK>

Further information:

$ lspci -v -s $(lspci | grep -i vga | awk '{ print $1 }')
01:00.0 VGA compatible controller: NVIDIA Corporation GT218M [NVS 3100M] (rev a2) (prog-if 00 [VGA controller])
 	Subsystem: Dell Latitude E6510
 	Flags: bus master, fast devsel, latency 0, IRQ 27
 	Memory at e2000000 (32-bit, non-prefetchable) [size=16M]
 	Memory at d0000000 (64-bit, prefetchable) [size=256M]
 	Memory at e0000000 (64-bit, prefetchable) [size=32M]
 	I/O ports at 7000 [size=128]
 	Expansion ROM at 000c0000 [disabled] [size=128K]
 	Capabilities: <access denied>
 	Kernel driver in use: nouveau
 	Kernel modules: nouveau

I reported this bug to debian already, see
https://bugs.debian.org/1042753 for context.

With support (thanks Diederik!) I managed to figure out that the cause
was a regression between upstream kernel version 6.1.27 and 6.1.38.

I build a new 6.1.38 kernel with these commits reverted:

62aecf23f3d1 drm/nouveau: add nv_encoder pointer check for NULL
fb725beca62d drm/nouveau/dp: check for NULL nv_connector->native_mode
90748be0f4f3 drm/nouveau: don't detect DSM for non-NVIDIA device
5a144bad3e75 nouveau: fix client work fence deletion race

With that kernel the graphic works again.

Please inform me if further tests are required.

Cheers,
Olaf


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Nouveau] nouveau bug in linux/6.1.38-2
  2023-08-02 21:28 [Nouveau] nouveau bug in linux/6.1.38-2 Olaf Skibbe
@ 2023-08-04 12:02 ` Thorsten Leemhuis
  2023-08-04 12:15   ` Karol Herbst
  2023-08-31  9:40   ` Linux regression tracking #update (Thorsten Leemhuis)
  0 siblings, 2 replies; 11+ messages in thread
From: Thorsten Leemhuis @ 2023-08-04 12:02 UTC (permalink / raw
  To: Olaf Skibbe, dri-devel, nouveau
  Cc: Linux kernel regressions list, 1042753, stable@vger.kernel.org,
	Ben Skeggs

Hi!

On 02.08.23 23:28, Olaf Skibbe wrote:
> Dear Maintainers,
> 
> Hereby I would like to report an apparent bug in the nouveau driver in
> linux/6.1.38-2.

Thx for your report. Maybe your problem is caused by a incomplete
backport. I Cced the maintainers for the drivers (and the regressions
and the stable list), maybe one of them has an idea, as they know the
driver.

If they don't reply in the next few days, please check if the problem is
also present in mainline. If not, check if the latest 6.1.y. release
already fixes this. If not, try to check which of the four patches you
reverted to make things going is actually causing this (e.g. first only
revert the one that was applied last; then the two last ones; ...).

> Running a current debian stable on a Dell Latitude E6510 with a
> "NVIDIA Corporation GT218M" graphic card, the monitor turns black
> after the grub screen. Also switching to a console (Strg-Alt-F2) shows
> just a black screen. Access via ssh is possible.
> 
> ~# uname -r
> 6.1.0-10-amd64
> 
> demesg shows the following error message:
> 
> [    3.560153] WARNING: CPU: 0 PID: 176 at
> drivers/gpu/drm/nouveau/nvkm/engine/disp/dp.c:460
> nvkm_dp_acquire+0x26a/0x490 [nouveau]
> [    3.560287] Modules linked in: sd_mod t10_pi sr_mod crc64_rocksoft
> cdrom crc64 crc_t10dif crct10dif_generic nouveau(+) ahci libahci mxm_wmi
> i2c_algo_bit drm_display_helper libata cec rc_core drm_ttm_helper ttm
> scsi_mod e1000e drm_kms_helper ptp firewire_ohci sdhci_pci cqhci
> ehci_pci sdhci ehci_hcd firewire_core i2c_i801 crct10dif_pclmul
> crct10dif_common drm crc32_pclmul crc32c_intel psmouse usbcore mmc_core
> crc_itu_t pps_core scsi_common i2c_smbus lpc_ich usb_common battery
> video wmi button
> [    3.560322] CPU: 0 PID: 176 Comm: kworker/u16:5 Not tainted
> 6.1.0-10-amd64 #1  Debian 6.1.38-2
> [    3.560325] Hardware name: Dell Inc. Latitude E6510/0N5KHN, BIOS A17
> 05/12/2017
> [    3.560327] Workqueue: nvkm-disp nv50_disp_super [nouveau]
> [    3.560433] RIP: 0010:nvkm_dp_acquire+0x26a/0x490 [nouveau]
> [    3.560538] Code: 48 8b 44 24 58 65 48 2b 04 25 28 00 00 00 0f 85 37
> 02 00 00 48 83 c4 60 44 89 e0 5b 5d 41 5c 41 5d 41 5e 41 5f c3 cc cc cc
> cc <0f> 0b c1 e8 03 41 88 6d 62 44 89 fe 48 89 df 48 69 c0 cf 0d d6 26
> [    3.560541] RSP: 0018:ffff9899c048bd60 EFLAGS: 00010246
> [    3.560542] RAX: 0000000000041eb0 RBX: ffff88e0209d2600 RCX:
> 0000000000041eb0
> [    3.560544] RDX: ffffffffc079f760 RSI: 0000000000000000 RDI:
> ffff9899c048bcf0
> [    3.560545] RBP: 0000000000000001 R08: ffff9899c048bc64 R09:
> 0000000000005b76
> [    3.560546] R10: 000000000000000d R11: ffff9899c048bde0 R12:
> 00000000ffffffea
> [    3.560548] R13: ffff88e00b39e480 R14: 0000000000044d45 R15:
> 0000000000000000
> [    3.560549] FS:  0000000000000000(0000) GS:ffff88e123c00000(0000)
> knlGS:0000000000000000
> [    3.560551] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    3.560552] CR2: 00007f57f4e90451 CR3: 0000000181410000 CR4:
> 00000000000006f0
> [    3.560554] Call Trace:
> [    3.560558]  <TASK>
> [    3.560560]  ? __warn+0x7d/0xc0
> [    3.560566]  ? nvkm_dp_acquire+0x26a/0x490 [nouveau]
> [    3.560671]  ? report_bug+0xe6/0x170
> [    3.560675]  ? handle_bug+0x41/0x70
> [    3.560679]  ? exc_invalid_op+0x13/0x60
> [    3.560681]  ? asm_exc_invalid_op+0x16/0x20
> [    3.560685]  ? init_reset_begun+0x20/0x20 [nouveau]
> [    3.560769]  ? nvkm_dp_acquire+0x26a/0x490 [nouveau]
> [    3.560888]  nv50_disp_super_2_2+0x70/0x430 [nouveau]
> [    3.560997]  nv50_disp_super+0x113/0x210 [nouveau]
> [    3.561103]  process_one_work+0x1c7/0x380
> [    3.561109]  worker_thread+0x4d/0x380
> [    3.561113]  ? rescuer_thread+0x3a0/0x3a0
> [    3.561116]  kthread+0xe9/0x110
> [    3.561120]  ? kthread_complete_and_exit+0x20/0x20
> [    3.561122]  ret_from_fork+0x22/0x30
> [    3.561130]  </TASK>
> 
> Further information:
> 
> $ lspci -v -s $(lspci | grep -i vga | awk '{ print $1 }')
> 01:00.0 VGA compatible controller: NVIDIA Corporation GT218M [NVS 3100M]
> (rev a2) (prog-if 00 [VGA controller])
>     Subsystem: Dell Latitude E6510
>     Flags: bus master, fast devsel, latency 0, IRQ 27
>     Memory at e2000000 (32-bit, non-prefetchable) [size=16M]
>     Memory at d0000000 (64-bit, prefetchable) [size=256M]
>     Memory at e0000000 (64-bit, prefetchable) [size=32M]
>     I/O ports at 7000 [size=128]
>     Expansion ROM at 000c0000 [disabled] [size=128K]
>     Capabilities: <access denied>
>     Kernel driver in use: nouveau
>     Kernel modules: nouveau
> 
> I reported this bug to debian already, see
> https://bugs.debian.org/1042753 for context.
> 
> With support (thanks Diederik!) I managed to figure out that the cause
> was a regression between upstream kernel version 6.1.27 and 6.1.38.
> 
> I build a new 6.1.38 kernel with these commits reverted:
> 
> 62aecf23f3d1 drm/nouveau: add nv_encoder pointer check for NULL
> fb725beca62d drm/nouveau/dp: check for NULL nv_connector->native_mode
> 90748be0f4f3 drm/nouveau: don't detect DSM for non-NVIDIA device
> 5a144bad3e75 nouveau: fix client work fence deletion race
> 
> With that kernel the graphic works again.
> 
> Please inform me if further tests are required.

FWIW, to be sure the issue doesn't fall through the cracks unnoticed,
I'm adding it to regzbot, the Linux kernel regression tracking bot:

#regzbot ^introduced v6.1.27..v6.1.38
#regzbot title drm/nouveau: display stays black
#regzbot ignore-activity

This isn't a regression? This issue or a fix for it are already
discussed somewhere else? It was fixed already? You want to clarify when
the regression started to happen? Or point out I got the title or
something else totally wrong? Then just reply and tell me -- ideally
while also telling regzbot about it, as explained by the page listed in
the footer of this mail.

Developers: When fixing the issue, remember to add 'Link:' tags pointing
to the report (the parent of this mail). See page linked in footer for
details.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
That page also explains what to do if mails like this annoy you.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Nouveau] nouveau bug in linux/6.1.38-2
  2023-08-04 12:02 ` Thorsten Leemhuis
@ 2023-08-04 12:15   ` Karol Herbst
  2023-08-04 12:46     ` Olaf Skibbe
  2023-08-04 18:08     ` [Nouveau] " Olaf Skibbe
  2023-08-31  9:40   ` Linux regression tracking #update (Thorsten Leemhuis)
  1 sibling, 2 replies; 11+ messages in thread
From: Karol Herbst @ 2023-08-04 12:15 UTC (permalink / raw
  To: Thorsten Leemhuis
  Cc: Linux kernel regressions list, Olaf Skibbe, nouveau, 1042753,
	dri-devel, Ben Skeggs, stable@vger.kernel.org

On Fri, Aug 4, 2023 at 2:02 PM Thorsten Leemhuis
<regressions@leemhuis.info> wrote:
>
> Hi!
>
> On 02.08.23 23:28, Olaf Skibbe wrote:
> > Dear Maintainers,
> >
> > Hereby I would like to report an apparent bug in the nouveau driver in
> > linux/6.1.38-2.
>
> Thx for your report. Maybe your problem is caused by a incomplete
> backport. I Cced the maintainers for the drivers (and the regressions
> and the stable list), maybe one of them has an idea, as they know the
> driver.
>
> If they don't reply in the next few days, please check if the problem is
> also present in mainline. If not, check if the latest 6.1.y. release
> already fixes this. If not, try to check which of the four patches you
> reverted to make things going is actually causing this (e.g. first only
> revert the one that was applied last; then the two last ones; ...).
>
> > Running a current debian stable on a Dell Latitude E6510 with a
> > "NVIDIA Corporation GT218M" graphic card, the monitor turns black
> > after the grub screen. Also switching to a console (Strg-Alt-F2) shows
> > just a black screen. Access via ssh is possible.
> >
> > ~# uname -r
> > 6.1.0-10-amd64
> >
> > demesg shows the following error message:
> >
> > [    3.560153] WARNING: CPU: 0 PID: 176 at
> > drivers/gpu/drm/nouveau/nvkm/engine/disp/dp.c:460
> > nvkm_dp_acquire+0x26a/0x490 [nouveau]
> > [    3.560287] Modules linked in: sd_mod t10_pi sr_mod crc64_rocksoft
> > cdrom crc64 crc_t10dif crct10dif_generic nouveau(+) ahci libahci mxm_wmi
> > i2c_algo_bit drm_display_helper libata cec rc_core drm_ttm_helper ttm
> > scsi_mod e1000e drm_kms_helper ptp firewire_ohci sdhci_pci cqhci
> > ehci_pci sdhci ehci_hcd firewire_core i2c_i801 crct10dif_pclmul
> > crct10dif_common drm crc32_pclmul crc32c_intel psmouse usbcore mmc_core
> > crc_itu_t pps_core scsi_common i2c_smbus lpc_ich usb_common battery
> > video wmi button
> > [    3.560322] CPU: 0 PID: 176 Comm: kworker/u16:5 Not tainted
> > 6.1.0-10-amd64 #1  Debian 6.1.38-2
> > [    3.560325] Hardware name: Dell Inc. Latitude E6510/0N5KHN, BIOS A17
> > 05/12/2017
> > [    3.560327] Workqueue: nvkm-disp nv50_disp_super [nouveau]
> > [    3.560433] RIP: 0010:nvkm_dp_acquire+0x26a/0x490 [nouveau]
> > [    3.560538] Code: 48 8b 44 24 58 65 48 2b 04 25 28 00 00 00 0f 85 37
> > 02 00 00 48 83 c4 60 44 89 e0 5b 5d 41 5c 41 5d 41 5e 41 5f c3 cc cc cc
> > cc <0f> 0b c1 e8 03 41 88 6d 62 44 89 fe 48 89 df 48 69 c0 cf 0d d6 26
> > [    3.560541] RSP: 0018:ffff9899c048bd60 EFLAGS: 00010246
> > [    3.560542] RAX: 0000000000041eb0 RBX: ffff88e0209d2600 RCX:
> > 0000000000041eb0
> > [    3.560544] RDX: ffffffffc079f760 RSI: 0000000000000000 RDI:
> > ffff9899c048bcf0
> > [    3.560545] RBP: 0000000000000001 R08: ffff9899c048bc64 R09:
> > 0000000000005b76
> > [    3.560546] R10: 000000000000000d R11: ffff9899c048bde0 R12:
> > 00000000ffffffea
> > [    3.560548] R13: ffff88e00b39e480 R14: 0000000000044d45 R15:
> > 0000000000000000
> > [    3.560549] FS:  0000000000000000(0000) GS:ffff88e123c00000(0000)
> > knlGS:0000000000000000
> > [    3.560551] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [    3.560552] CR2: 00007f57f4e90451 CR3: 0000000181410000 CR4:
> > 00000000000006f0
> > [    3.560554] Call Trace:
> > [    3.560558]  <TASK>
> > [    3.560560]  ? __warn+0x7d/0xc0
> > [    3.560566]  ? nvkm_dp_acquire+0x26a/0x490 [nouveau]
> > [    3.560671]  ? report_bug+0xe6/0x170
> > [    3.560675]  ? handle_bug+0x41/0x70
> > [    3.560679]  ? exc_invalid_op+0x13/0x60
> > [    3.560681]  ? asm_exc_invalid_op+0x16/0x20
> > [    3.560685]  ? init_reset_begun+0x20/0x20 [nouveau]
> > [    3.560769]  ? nvkm_dp_acquire+0x26a/0x490 [nouveau]
> > [    3.560888]  nv50_disp_super_2_2+0x70/0x430 [nouveau]
> > [    3.560997]  nv50_disp_super+0x113/0x210 [nouveau]
> > [    3.561103]  process_one_work+0x1c7/0x380
> > [    3.561109]  worker_thread+0x4d/0x380
> > [    3.561113]  ? rescuer_thread+0x3a0/0x3a0
> > [    3.561116]  kthread+0xe9/0x110
> > [    3.561120]  ? kthread_complete_and_exit+0x20/0x20
> > [    3.561122]  ret_from_fork+0x22/0x30
> > [    3.561130]  </TASK>
> >
> > Further information:
> >
> > $ lspci -v -s $(lspci | grep -i vga | awk '{ print $1 }')
> > 01:00.0 VGA compatible controller: NVIDIA Corporation GT218M [NVS 3100M]
> > (rev a2) (prog-if 00 [VGA controller])
> >     Subsystem: Dell Latitude E6510
> >     Flags: bus master, fast devsel, latency 0, IRQ 27
> >     Memory at e2000000 (32-bit, non-prefetchable) [size=16M]
> >     Memory at d0000000 (64-bit, prefetchable) [size=256M]
> >     Memory at e0000000 (64-bit, prefetchable) [size=32M]
> >     I/O ports at 7000 [size=128]
> >     Expansion ROM at 000c0000 [disabled] [size=128K]
> >     Capabilities: <access denied>
> >     Kernel driver in use: nouveau
> >     Kernel modules: nouveau
> >
> > I reported this bug to debian already, see
> > https://bugs.debian.org/1042753 for context.
> >
> > With support (thanks Diederik!) I managed to figure out that the cause
> > was a regression between upstream kernel version 6.1.27 and 6.1.38.
> >
> > I build a new 6.1.38 kernel with these commits reverted:
> >
> > 62aecf23f3d1 drm/nouveau: add nv_encoder pointer check for NULL
> > fb725beca62d drm/nouveau/dp: check for NULL nv_connector->native_mode
> > 90748be0f4f3 drm/nouveau: don't detect DSM for non-NVIDIA device
> > 5a144bad3e75 nouveau: fix client work fence deletion race
> >

mind retrying with only fb725beca62d and 62aecf23f3d1 reverted? Would
be weird if the other two commits are causing it. If that's the case,
it's a bit worrying that reverting either of the those causes issues,
but maybe there is a good reason for it. Anyway, mind figuring out
which of the two you need reverted to fix your issue? Thanks!

> > With that kernel the graphic works again.
> >
> > Please inform me if further tests are required.
>
> FWIW, to be sure the issue doesn't fall through the cracks unnoticed,
> I'm adding it to regzbot, the Linux kernel regression tracking bot:
>
> #regzbot ^introduced v6.1.27..v6.1.38
> #regzbot title drm/nouveau: display stays black
> #regzbot ignore-activity
>
> This isn't a regression? This issue or a fix for it are already
> discussed somewhere else? It was fixed already? You want to clarify when
> the regression started to happen? Or point out I got the title or
> something else totally wrong? Then just reply and tell me -- ideally
> while also telling regzbot about it, as explained by the page listed in
> the footer of this mail.
>
> Developers: When fixing the issue, remember to add 'Link:' tags pointing
> to the report (the parent of this mail). See page linked in footer for
> details.
>
> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
> --
> Everything you wanna know about Linux kernel regression tracking:
> https://linux-regtracking.leemhuis.info/about/#tldr
> That page also explains what to do if mails like this annoy you.
>


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Nouveau] nouveau bug in linux/6.1.38-2
  2023-08-04 12:15   ` Karol Herbst
@ 2023-08-04 12:46     ` Olaf Skibbe
  2023-08-04 12:51       ` Karol Herbst
  2023-08-04 18:08     ` [Nouveau] " Olaf Skibbe
  1 sibling, 1 reply; 11+ messages in thread
From: Olaf Skibbe @ 2023-08-04 12:46 UTC (permalink / raw
  To: Karol Herbst
  Cc: Linux kernel regressions list, nouveau, 1042753, dri-devel,
	Thorsten Leemhuis, Ben Skeggs, stable@vger.kernel.org

On Fri, 4 Aug 2023 at 14:15, Karol Herbst wrote:

> mind retrying with only fb725beca62d and 62aecf23f3d1 reverted?

I will do this later this day (takes some time, it is a slow machine).

> Would be weird if the other two commits are causing it. If that's the 
> case, it's a bit worrying that reverting either of the those causes 
> issues, but maybe there is a good reason for it. Anyway, mind figuring 
> out which of the two you need reverted to fix your issue? Thanks!

I can do this. But if I build two kernels anyway, isn't it faster to 
build each with only one of the patches applied? Or do you expect the 
patches to interact (so that the bug would only be present when both are 
applied)?

Cheers,
Olaf

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Nouveau] nouveau bug in linux/6.1.38-2
  2023-08-04 12:46     ` Olaf Skibbe
@ 2023-08-04 12:51       ` Karol Herbst
  2023-08-04 13:11         ` Olaf Skibbe
  0 siblings, 1 reply; 11+ messages in thread
From: Karol Herbst @ 2023-08-04 12:51 UTC (permalink / raw
  To: Olaf Skibbe
  Cc: Linux kernel regressions list, nouveau, 1042753, dri-devel,
	Thorsten Leemhuis, Ben Skeggs, stable@vger.kernel.org

On Fri, Aug 4, 2023 at 2:48 PM Olaf Skibbe <news@kravcenko.com> wrote:
>
> On Fri, 4 Aug 2023 at 14:15, Karol Herbst wrote:
>
> > mind retrying with only fb725beca62d and 62aecf23f3d1 reverted?
>
> I will do this later this day (takes some time, it is a slow machine).
>
> > Would be weird if the other two commits are causing it. If that's the
> > case, it's a bit worrying that reverting either of the those causes
> > issues, but maybe there is a good reason for it. Anyway, mind figuring
> > out which of the two you need reverted to fix your issue? Thanks!
>
> I can do this. But if I build two kernels anyway, isn't it faster to
> build each with only one of the patches applied? Or do you expect the
> patches to interact (so that the bug would only be present when both are
> applied)?
>

How are you building the kernel? Because normally from git reverting
one of those shouldn't take long, because it doesn't recompile the
entire kernel. But yeah, you can potentially just revert one of one
for now and it should be fine.

> Cheers,
> Olaf
>


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Nouveau] nouveau bug in linux/6.1.38-2
  2023-08-04 12:51       ` Karol Herbst
@ 2023-08-04 13:11         ` Olaf Skibbe
  2023-08-04 13:38           ` [Nouveau] Bug#1042753: " Diederik de Haas
  0 siblings, 1 reply; 11+ messages in thread
From: Olaf Skibbe @ 2023-08-04 13:11 UTC (permalink / raw
  To: Karol Herbst
  Cc: Linux kernel regressions list, nouveau, 1042753, dri-devel,
	Thorsten Leemhuis, Ben Skeggs, stable@vger.kernel.org

On Fri, 4 Aug 2023 at 14:51, Karol Herbst wrote:

> How are you building the kernel? Because normally from git reverting 
> one of those shouldn't take long, because it doesn't recompile the 
> entire kernel. But yeah, you can potentially just revert one of one 
> for now and it should be fine.

I am using the `test-patches` script described here: 
https://kernel-team.pages.debian.net/kernel-handbook/ch-common-tasks.html#id-1.6.6.4 
This worked for my limited knowledge (first kernel I ever compiled).

(On the occasion a maybe silly question: am I right assuming that the 
kernel has to be build on the machine we want to reproduce the bug on? 
Otherwise it could use much faster hardware (running also bookworm).)

Cheers,
Olaf

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Nouveau] Bug#1042753: nouveau bug in linux/6.1.38-2
  2023-08-04 13:11         ` Olaf Skibbe
@ 2023-08-04 13:38           ` Diederik de Haas
  0 siblings, 0 replies; 11+ messages in thread
From: Diederik de Haas @ 2023-08-04 13:38 UTC (permalink / raw
  To: Karol Herbst, Thorsten Leemhuis, Olaf Skibbe
  Cc: Linux kernel regressions list, nouveau, 1042753, dri-devel,
	Ben Skeggs, stable@vger.kernel.org

[-- Attachment #1: Type: text/plain, Size: 402 bytes --]

On Friday, 4 August 2023 15:11:46 CEST Olaf Skibbe wrote:
> (On the occasion a maybe silly question: am I right assuming that the
> kernel has to be build on the machine we want to reproduce the bug on?
> Otherwise it could use much faster hardware (running also bookworm).)

If that is also an amd64 machine running Debian kernel 6.1.38-2, it should be 
fine to build the kernel on the faster machine.

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Nouveau] nouveau bug in linux/6.1.38-2
  2023-08-04 12:15   ` Karol Herbst
  2023-08-04 12:46     ` Olaf Skibbe
@ 2023-08-04 18:08     ` Olaf Skibbe
  2023-08-04 23:09       ` Karol Herbst
  1 sibling, 1 reply; 11+ messages in thread
From: Olaf Skibbe @ 2023-08-04 18:08 UTC (permalink / raw
  To: Karol Herbst
  Cc: Linux kernel regressions list, nouveau, 1042753, dri-devel,
	Thorsten Leemhuis, Ben Skeggs, stable@vger.kernel.org

[-- Attachment #1: Type: text/plain, Size: 4255 bytes --]

Dear all,

On Fri, 4 Aug 2023 at 14:15, Karol Herbst wrote:

>>> 62aecf23f3d1 drm/nouveau: add nv_encoder pointer check for NULL
>>> fb725beca62d drm/nouveau/dp: check for NULL nv_connector->native_mode
>>> 90748be0f4f3 drm/nouveau: don't detect DSM for non-NVIDIA device
>>> 5a144bad3e75 nouveau: fix client work fence deletion race
>
> mind retrying with only fb725beca62d and 62aecf23f3d1 reverted? Would 
> be weird if the other two commits are causing it. If that's the case, 
> it's a bit worrying that reverting either of the those causes issues, 
> but maybe there is a good reason for it. Anyway, mind figuring out 
> which of the two you need reverted to fix your issue? Thanks!

The result is:

Patch with commit fb725beca62d reverted: Graphics works. I attached the 
respective patch again to this mail.

Patch with commit 62aecf23f3d1 reverted: Screen remains black, error 
message:

# dmesg | grep -A 36 "cut here"
[    2.921358] ------------[ cut here ]------------
[    2.921361] WARNING: CPU: 1 PID: 176 at drivers/gpu/drm/nouveau/nvkm/engine/disp/dp.c:460 nvkm_dp_acquire+0x26a/0x490 [nouveau]
[    2.921627] Modules linked in: sd_mod(E) t10_pi(E) crc64_rocksoft(E) sr_mod(E) crc64(E) crc_t10dif(E) crct10dif_generic(E) cdrom(E) nouveau(E+) mxm_wmi(E) i2c_algo_bit(E) drm_display_helper(E) cec(E) ahci(E) rc_core(E) drm_ttm_helper(E) libahci(E) ttm(E) ehci_pci(E) crct10dif_pclmul(E) crct10dif_common(E) ehci_hcd(E) drm_kms_helper(E) crc32_pclmul(E) firewire_ohci(E) sdhci_pci(E) cqhci(E) libata(E) e1000e(E) sdhci(E) psmouse(E) crc32c_intel(E) lpc_ich(E) ptp(E) i2c_i801(E) scsi_mod(E) i2c_smbus(E) firewire_core(E) scsi_common(E) usbcore(E) crc_itu_t(E) mmc_core(E) drm(E) pps_core(E) usb_common(E) battery(E) video(E) wmi(E) button(E)
[    2.921695] CPU: 1 PID: 176 Comm: kworker/u16:5 Tainted: G            E      6.1.0-0.a.test-amd64 #1  Debian 6.1.38-2a~test
[    2.921701] Hardware name: Dell Inc. Latitude E6510/0N5KHN, BIOS A17 05/12/2017
[    2.921705] Workqueue: nvkm-disp nv50_disp_super [nouveau]
[    2.921948] RIP: 0010:nvkm_dp_acquire+0x26a/0x490 [nouveau]
[    2.922192] Code: 48 8b 44 24 58 65 48 2b 04 25 28 00 00 00 0f 85 37 02 00 00 48 83 c4 60 44 89 e0 5b 5d 41 5c 41 5d 41 5e 41 5f c3 cc cc cc cc <0f> 0b c1 e8 03 41 88 6d 62 44 89 fe 48 89 df 48 69 c0 cf 0d d6 26
[    2.922196] RSP: 0018:ffffc077c04dfd60 EFLAGS: 00010246
[    2.922201] RAX: 0000000000041eb0 RBX: ffff9a8482624c00 RCX: 0000000000041eb0
[    2.922204] RDX: ffffffffc0b47760 RSI: 0000000000000000 RDI: ffffc077c04dfcf0
[    2.922206] RBP: 0000000000000001 R08: ffffc077c04dfc64 R09: 0000000000005b76
[    2.922209] R10: 000000000000000d R11: ffffc077c04dfde0 R12: 00000000ffffffea
[    2.922212] R13: ffff9a8517541e00 R14: 0000000000044d45 R15: 0000000000000000
[    2.922215] FS:  0000000000000000(0000) GS:ffff9a85a3c40000(0000) knlGS:0000000000000000
[    2.922219] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    2.922222] CR2: 000055f660bcb3a8 CR3: 0000000197610000 CR4: 00000000000006e0
[    2.922226] Call Trace:
[    2.922231]  <TASK>
[    2.922235]  ? __warn+0x7d/0xc0
[    2.922244]  ? nvkm_dp_acquire+0x26a/0x490 [nouveau]
[    2.922487]  ? report_bug+0xe6/0x170
[    2.922494]  ? handle_bug+0x41/0x70
[    2.922501]  ? exc_invalid_op+0x13/0x60
[    2.922505]  ? asm_exc_invalid_op+0x16/0x20
[    2.922512]  ? init_reset_begun+0x20/0x20 [nouveau]
[    2.922708]  ? nvkm_dp_acquire+0x26a/0x490 [nouveau]
[    2.922954]  nv50_disp_super_2_2+0x70/0x430 [nouveau]
[    2.923200]  nv50_disp_super+0x113/0x210 [nouveau]
[    2.923445]  process_one_work+0x1c7/0x380
[    2.923456]  worker_thread+0x4d/0x380
[    2.923463]  ? rescuer_thread+0x3a0/0x3a0
[    2.923469]  kthread+0xe9/0x110
[    2.923476]  ? kthread_complete_and_exit+0x20/0x20
[    2.923482]  ret_from_fork+0x22/0x30
[    2.923493]  </TASK>
[    2.923494] ---[ end trace 0000000000000000 ]---

(Maybe it's worth to mention that the LED back-light is on, while the 
screen appears black.)

Cheers,
Olaf

P.S.: By the way: as a linux user for more than 20 years, I am very 
pleased to have the opportunity to contribute at least a little bit to 
the improvement. I'd like to use the chance to thank you all very much 
for building and developing this great operating system.

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: Type: text/x-diff; name=0002-Revert-drm-nouveau-dp-check-for-NULL-nv_connector-na.patch, Size: 1612 bytes --]

From 47c0e938beef7335ffa179f1006754f9664c6c4d Mon Sep 17 00:00:00 2001
From: Diederik de Haas <didi.debian@cknow.org>
Date: Mon, 31 Jul 2023 19:55:54 +0200
Subject: [PATCH 2/4] Revert "drm/nouveau/dp: check for NULL
 nv_connector->native_mode"

This reverts commit fb725beca62d175c02ca619c27037c14f7ab8e7c.
---
 drivers/gpu/drm/nouveau/nouveau_connector.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_connector.c b/drivers/gpu/drm/nouveau/nouveau_connector.c
index fd984733b8e6..1991bbb1d05c 100644
--- a/drivers/gpu/drm/nouveau/nouveau_connector.c
+++ b/drivers/gpu/drm/nouveau/nouveau_connector.c
@@ -966,7 +966,7 @@ nouveau_connector_get_modes(struct drm_connector *connector)
 	/* Determine display colour depth for everything except LVDS now,
 	 * DP requires this before mode_valid() is called.
 	 */
-	if (connector->connector_type != DRM_MODE_CONNECTOR_LVDS && nv_connector->native_mode)
+	if (connector->connector_type != DRM_MODE_CONNECTOR_LVDS)
 		nouveau_connector_detect_depth(connector);
 
 	/* Find the native mode if this is a digital panel, if we didn't
@@ -987,7 +987,7 @@ nouveau_connector_get_modes(struct drm_connector *connector)
 	 * "native" mode as some VBIOS tables require us to use the
 	 * pixel clock as part of the lookup...
 	 */
-	if (connector->connector_type == DRM_MODE_CONNECTOR_LVDS && nv_connector->native_mode)
+	if (connector->connector_type == DRM_MODE_CONNECTOR_LVDS)
 		nouveau_connector_detect_depth(connector);
 
 	if (nv_encoder->dcb->type == DCB_OUTPUT_TV)
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [Nouveau] nouveau bug in linux/6.1.38-2
  2023-08-04 18:08     ` [Nouveau] " Olaf Skibbe
@ 2023-08-04 23:09       ` Karol Herbst
  2023-08-05  9:44         ` Olaf Skibbe
  0 siblings, 1 reply; 11+ messages in thread
From: Karol Herbst @ 2023-08-04 23:09 UTC (permalink / raw
  To: Olaf Skibbe
  Cc: Linux kernel regressions list, nouveau, 1042753, dri-devel,
	Thorsten Leemhuis, Ben Skeggs, stable@vger.kernel.org

On Fri, Aug 4, 2023 at 8:10 PM Olaf Skibbe <news@kravcenko.com> wrote:
>
> Dear all,
>
> On Fri, 4 Aug 2023 at 14:15, Karol Herbst wrote:
>
> >>> 62aecf23f3d1 drm/nouveau: add nv_encoder pointer check for NULL
> >>> fb725beca62d drm/nouveau/dp: check for NULL nv_connector->native_mode
> >>> 90748be0f4f3 drm/nouveau: don't detect DSM for non-NVIDIA device
> >>> 5a144bad3e75 nouveau: fix client work fence deletion race
> >
> > mind retrying with only fb725beca62d and 62aecf23f3d1 reverted? Would
> > be weird if the other two commits are causing it. If that's the case,
> > it's a bit worrying that reverting either of the those causes issues,
> > but maybe there is a good reason for it. Anyway, mind figuring out
> > which of the two you need reverted to fix your issue? Thanks!
>
> The result is:
>
> Patch with commit fb725beca62d reverted: Graphics works. I attached the
> respective patch again to this mail.
>

Mind checking if instead of reverting the entire commit that this is
enough to fix it as well?

https://gitlab.freedesktop.org/karolherbst/nouveau/-/commit/f99ae069876f7ffeb6368da0381485e8c3adda43.patch


> Patch with commit 62aecf23f3d1 reverted: Screen remains black, error
> message:
>
> # dmesg | grep -A 36 "cut here"
> [    2.921358] ------------[ cut here ]------------
> [    2.921361] WARNING: CPU: 1 PID: 176 at drivers/gpu/drm/nouveau/nvkm/engine/disp/dp.c:460 nvkm_dp_acquire+0x26a/0x490 [nouveau]
> [    2.921627] Modules linked in: sd_mod(E) t10_pi(E) crc64_rocksoft(E) sr_mod(E) crc64(E) crc_t10dif(E) crct10dif_generic(E) cdrom(E) nouveau(E+) mxm_wmi(E) i2c_algo_bit(E) drm_display_helper(E) cec(E) ahci(E) rc_core(E) drm_ttm_helper(E) libahci(E) ttm(E) ehci_pci(E) crct10dif_pclmul(E) crct10dif_common(E) ehci_hcd(E) drm_kms_helper(E) crc32_pclmul(E) firewire_ohci(E) sdhci_pci(E) cqhci(E) libata(E) e1000e(E) sdhci(E) psmouse(E) crc32c_intel(E) lpc_ich(E) ptp(E) i2c_i801(E) scsi_mod(E) i2c_smbus(E) firewire_core(E) scsi_common(E) usbcore(E) crc_itu_t(E) mmc_core(E) drm(E) pps_core(E) usb_common(E) battery(E) video(E) wmi(E) button(E)
> [    2.921695] CPU: 1 PID: 176 Comm: kworker/u16:5 Tainted: G            E      6.1.0-0.a.test-amd64 #1  Debian 6.1.38-2a~test
> [    2.921701] Hardware name: Dell Inc. Latitude E6510/0N5KHN, BIOS A17 05/12/2017
> [    2.921705] Workqueue: nvkm-disp nv50_disp_super [nouveau]
> [    2.921948] RIP: 0010:nvkm_dp_acquire+0x26a/0x490 [nouveau]
> [    2.922192] Code: 48 8b 44 24 58 65 48 2b 04 25 28 00 00 00 0f 85 37 02 00 00 48 83 c4 60 44 89 e0 5b 5d 41 5c 41 5d 41 5e 41 5f c3 cc cc cc cc <0f> 0b c1 e8 03 41 88 6d 62 44 89 fe 48 89 df 48 69 c0 cf 0d d6 26
> [    2.922196] RSP: 0018:ffffc077c04dfd60 EFLAGS: 00010246
> [    2.922201] RAX: 0000000000041eb0 RBX: ffff9a8482624c00 RCX: 0000000000041eb0
> [    2.922204] RDX: ffffffffc0b47760 RSI: 0000000000000000 RDI: ffffc077c04dfcf0
> [    2.922206] RBP: 0000000000000001 R08: ffffc077c04dfc64 R09: 0000000000005b76
> [    2.922209] R10: 000000000000000d R11: ffffc077c04dfde0 R12: 00000000ffffffea
> [    2.922212] R13: ffff9a8517541e00 R14: 0000000000044d45 R15: 0000000000000000
> [    2.922215] FS:  0000000000000000(0000) GS:ffff9a85a3c40000(0000) knlGS:0000000000000000
> [    2.922219] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    2.922222] CR2: 000055f660bcb3a8 CR3: 0000000197610000 CR4: 00000000000006e0
> [    2.922226] Call Trace:
> [    2.922231]  <TASK>
> [    2.922235]  ? __warn+0x7d/0xc0
> [    2.922244]  ? nvkm_dp_acquire+0x26a/0x490 [nouveau]
> [    2.922487]  ? report_bug+0xe6/0x170
> [    2.922494]  ? handle_bug+0x41/0x70
> [    2.922501]  ? exc_invalid_op+0x13/0x60
> [    2.922505]  ? asm_exc_invalid_op+0x16/0x20
> [    2.922512]  ? init_reset_begun+0x20/0x20 [nouveau]
> [    2.922708]  ? nvkm_dp_acquire+0x26a/0x490 [nouveau]
> [    2.922954]  nv50_disp_super_2_2+0x70/0x430 [nouveau]
> [    2.923200]  nv50_disp_super+0x113/0x210 [nouveau]
> [    2.923445]  process_one_work+0x1c7/0x380
> [    2.923456]  worker_thread+0x4d/0x380
> [    2.923463]  ? rescuer_thread+0x3a0/0x3a0
> [    2.923469]  kthread+0xe9/0x110
> [    2.923476]  ? kthread_complete_and_exit+0x20/0x20
> [    2.923482]  ret_from_fork+0x22/0x30
> [    2.923493]  </TASK>
> [    2.923494] ---[ end trace 0000000000000000 ]---
>
> (Maybe it's worth to mention that the LED back-light is on, while the
> screen appears black.)
>
> Cheers,
> Olaf
>
> P.S.: By the way: as a linux user for more than 20 years, I am very
> pleased to have the opportunity to contribute at least a little bit to
> the improvement. I'd like to use the chance to thank you all very much
> for building and developing this great operating system.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Nouveau] nouveau bug in linux/6.1.38-2
  2023-08-04 23:09       ` Karol Herbst
@ 2023-08-05  9:44         ` Olaf Skibbe
  0 siblings, 0 replies; 11+ messages in thread
From: Olaf Skibbe @ 2023-08-05  9:44 UTC (permalink / raw
  To: Karol Herbst
  Cc: Linux kernel regressions list, nouveau, 1042753, dri-devel,
	Thorsten Leemhuis, Ben Skeggs, stable@vger.kernel.org

On Sat, 5 Aug 2023 at 01:09, Karol Herbst wrote:

> Mind checking if instead of reverting the entire commit that this is
> enough to fix it as well?
>
> https://gitlab.freedesktop.org/karolherbst/nouveau/-/commit/f99ae069876f7ffeb6368da0381485e8c3adda43.patch

This patch does fix the problem as well: Screen works.

Cheers,
Olaf

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Nouveau] nouveau bug in linux/6.1.38-2
  2023-08-04 12:02 ` Thorsten Leemhuis
  2023-08-04 12:15   ` Karol Herbst
@ 2023-08-31  9:40   ` Linux regression tracking #update (Thorsten Leemhuis)
  1 sibling, 0 replies; 11+ messages in thread
From: Linux regression tracking #update (Thorsten Leemhuis) @ 2023-08-31  9:40 UTC (permalink / raw
  To: dri-devel, nouveau; +Cc: Linux kernel regressions list, stable@vger.kernel.org

[TLDR: This mail in primarily relevant for Linux kernel regression
tracking. See link in footer if these mails annoy you.]

On 04.08.23 14:02, Thorsten Leemhuis wrote:
> On 02.08.23 23:28, Olaf Skibbe wrote:
>> Dear Maintainers,
>>
>> Hereby I would like to report an apparent bug in the nouveau driver in
>> linux/6.1.38-2.
> 
> Thx for your report. Maybe your problem is caused by a incomplete
> backport. I Cced the maintainers for the drivers (and the regressions
> and the stable list), maybe one of them has an idea, as they know the
> driver.

#regzbot fix: 98e470dc73a9b3539e5a7a3c72f6b7c01c98
#regzbot ignore-activity

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
That page also explains what to do if mails like this annoy you.



^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2023-08-31  9:40 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-08-02 21:28 [Nouveau] nouveau bug in linux/6.1.38-2 Olaf Skibbe
2023-08-04 12:02 ` Thorsten Leemhuis
2023-08-04 12:15   ` Karol Herbst
2023-08-04 12:46     ` Olaf Skibbe
2023-08-04 12:51       ` Karol Herbst
2023-08-04 13:11         ` Olaf Skibbe
2023-08-04 13:38           ` [Nouveau] Bug#1042753: " Diederik de Haas
2023-08-04 18:08     ` [Nouveau] " Olaf Skibbe
2023-08-04 23:09       ` Karol Herbst
2023-08-05  9:44         ` Olaf Skibbe
2023-08-31  9:40   ` Linux regression tracking #update (Thorsten Leemhuis)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).