AMD-GFX Archive mirror
 help / color / mirror / Atom feed
* regression/bisected/6.8 commit f7fe64ad0f22ff034f8ebcfbd7299ee9cc9b57d7 leads to GPU hang when I open GNOME activities
@ 2024-01-24  2:19 Mikhail Gavrilov
  2024-01-24 14:37 ` Mikhail Gavrilov
  0 siblings, 1 reply; 3+ messages in thread
From: Mikhail Gavrilov @ 2024-01-24  2:19 UTC (permalink / raw
  To: matthew.brost, ltuikov89, Alex Deucher, Christian König,
	dri-devel, amd-gfx list, Linux List Kernel Mailing

[-- Attachment #1: Type: text/plain, Size: 2330 bytes --]

Hi,
I spotted that between commits 70d201a40823 and 052d534373b7 my GPU
begins randomly hanging when I open the GNOME shell activity screen.
I found a good reproducing script.
- Launch Elden Ring game
- Continue game (game world should be loaded)
- Press start (windows) button
Here GPU hanged with 99% probability, if GPU not hanged that press
start button several times for ensure.

And founded bad commit is looking so:
f7fe64ad0f22ff034f8ebcfbd7299ee9cc9b57d7 is the first bad commit
commit f7fe64ad0f22ff034f8ebcfbd7299ee9cc9b57d7
Author: Matthew Brost <matthew.brost@intel.com>
Date:   Mon Oct 30 20:24:37 2023 -0700

    drm/sched: Split free_job into own work item

    Rather than call free_job and run_job in same work item have a dedicated
    work item for each. This aligns with the design and intended use of work
    queues.

    v2:
       - Test for DMA_FENCE_FLAG_TIMESTAMP_BIT before setting
         timestamp in free_job() work item (Danilo)
    v3:
      - Drop forward dec of drm_sched_select_entity (Boris)
      - Return in drm_sched_run_job_work if entity NULL (Boris)
    v4:
      - Replace dequeue with peek and invert logic (Luben)
      - Wrap to 100 lines (Luben)
      - Update comments for *_queue / *_queue_if_ready functions (Luben)
    v5:
      - Drop peek argument, blindly reinit idle (Luben)
      - s/drm_sched_free_job_queue_if_ready/drm_sched_free_job_queue_if_done
(Luben)
      - Update work_run_job & work_free_job kernel doc (Luben)
    v6:
      - Do not move drm_sched_select_entity in file (Luben)

    Signed-off-by: Matthew Brost <matthew.brost@intel.com>
    Link: https://lore.kernel.org/r/20231031032439.1558703-4-matthew.brost@intel.com
    Reviewed-by: Luben Tuikov <ltuikov89@gmail.com>
    Signed-off-by: Luben Tuikov <ltuikov89@gmail.com>

 drivers/gpu/drm/scheduler/sched_main.c | 146 ++++++++++++++++++++++-----------
 include/drm/gpu_scheduler.h            |   4 +-
 2 files changed, 101 insertions(+), 49 deletions(-)

Unfortunately GPU hangs still occurs even on 6.8-rc1 so why I wrote
here bug report.

GPU: Radeon 7900XTX
CPU: Ryzen 7950X
Full hardware specs are here: https://linux-hardware.org/?probe=9e5edb123e
Also I attach full bisect logs and kernel logs from each bisect step
in archives.

Who could dig into it, please?

-- 
Best Regards,
Mike Gavrilov.

[-- Attachment #2: bisect-GPU-hang-issue-log.zip --]
[-- Type: application/zip, Size: 1278 bytes --]

[-- Attachment #3: kernel-logs.zip --]
[-- Type: application/zip, Size: 696348 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: regression/bisected/6.8 commit f7fe64ad0f22ff034f8ebcfbd7299ee9cc9b57d7 leads to GPU hang when I open GNOME activities
  2024-01-24  2:19 regression/bisected/6.8 commit f7fe64ad0f22ff034f8ebcfbd7299ee9cc9b57d7 leads to GPU hang when I open GNOME activities Mikhail Gavrilov
@ 2024-01-24 14:37 ` Mikhail Gavrilov
  2024-01-24 15:43   ` Mario Limonciello
  0 siblings, 1 reply; 3+ messages in thread
From: Mikhail Gavrilov @ 2024-01-24 14:37 UTC (permalink / raw
  To: matthew.brost, ltuikov89, Alex Deucher, Christian König,
	dri-devel, amd-gfx list, Linux List Kernel Mailing,
	Limonciello, Mario

[-- Attachment #1: Type: text/plain, Size: 386 bytes --]

On Wed, Jan 24, 2024 at 7:19 AM Mikhail Gavrilov
<mikhail.v.gavrilov@gmail.com> wrote:
>
> Who could dig into it, please?

You decided to revert it?
https://lkml.org/lkml/2024/1/22/1866

Also I forgot to attach the kernel build .config in the previous
message. I'm going to fix it here.
It may be useful for reproducing my bug script.

-- 
Best Regards,
Mike Gavrilov.

[-- Attachment #2: .config.zip --]
[-- Type: application/zip, Size: 64720 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: regression/bisected/6.8 commit f7fe64ad0f22ff034f8ebcfbd7299ee9cc9b57d7 leads to GPU hang when I open GNOME activities
  2024-01-24 14:37 ` Mikhail Gavrilov
@ 2024-01-24 15:43   ` Mario Limonciello
  0 siblings, 0 replies; 3+ messages in thread
From: Mario Limonciello @ 2024-01-24 15:43 UTC (permalink / raw
  To: Mikhail Gavrilov, matthew.brost, ltuikov89, Alex Deucher,
	Christian König, dri-devel, amd-gfx list,
	Linux List Kernel Mailing

On 1/24/2024 08:37, Mikhail Gavrilov wrote:
> On Wed, Jan 24, 2024 at 7:19 AM Mikhail Gavrilov
> <mikhail.v.gavrilov@gmail.com> wrote:
>>
>> Who could dig into it, please?
> 
> You decided to revert it?
> https://lkml.org/lkml/2024/1/22/1866

It's not a straight "git revert" on 6.8-rc1 because of some other 
contextual changes.

I posted that as an RFC specifically "in case" that's the direction we 
go and don't get a proper solution together.

Matthew also posted a debugging patch here for use with ftrace and the 
GPU scheduler events: https://gitlab.freedesktop.org/drm/amd/-/issues/3124

I reproduced it with that as well and posted my ftrace results.

> 
> Also I forgot to attach the kernel build .config in the previous
> message. I'm going to fix it here.
> It may be useful for reproducing my bug script.
> 


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2024-01-24 15:43 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-01-24  2:19 regression/bisected/6.8 commit f7fe64ad0f22ff034f8ebcfbd7299ee9cc9b57d7 leads to GPU hang when I open GNOME activities Mikhail Gavrilov
2024-01-24 14:37 ` Mikhail Gavrilov
2024-01-24 15:43   ` Mario Limonciello

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).