All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
From: Steven Rostedt <rostedt@goodmis.org>
To: LKML <linux-kernel@vger.kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Alex Constantino <dreaming.about.electric.sheep@gmail.com>,
	Maxime Ripard <mripard@kernel.org>,
	Timo Lindfors <timo.lindfors@iki.fi>,
	Dave Airlie <airlied@redhat.com>,
	Gerd Hoffmann <kraxel@redhat.com>,
	Maarten Lankhorst <maarten.lankhorst@linux.intel.com>,
	Thomas Zimmermann <tzimmermann@suse.de>,
	Daniel Vetter <daniel@ffwll.ch>
Subject: [BUG][v6.9-rc6] Deadlock with: Revert "drm/qxl: simplify qxl_fence_wait"
Date: Thu, 2 May 2024 08:16:41 -0400	[thread overview]
Message-ID: <20240502081641.457aa25f@gandalf.local.home> (raw)

I went to run my tests on my VMs and the tests hung on boot up.
Unfortunately, the most I ever got out was:

[   93.607888] Testing event system initcall: OK
[   93.667730] Running tests on all trace events:
[   93.669757] Testing all events: OK
[   95.631064] ------------[ cut here ]------------
Timed out after 60 seconds

I ran a bisect and it came up with:

 # first bad commit: [07ed11afb68d94eadd4ffc082b97c2331307c5ea] Revert "drm/qxl: simplify qxl_fence_wait"

I checked out 07ed11afb68d94eadd4ffc082b97c2331307c5ea~1 and it booted
fine. Added back that commit, it failed to boot. I did this twice, and got
the same results.

But the last time I ran it, it did trigger this:

 ------------[ cut here ]------------
 
 ======================================================
 WARNING: possible circular locking dependency detected
 6.9.0-rc1-test-00021-g07ed11afb68d #5 Not tainted
 ------------------------------------------------------
 kworker/u24:3/119 is trying to acquire lock:
 ffffffff95aa4600 (console_owner){....}-{0:0}, at: console_flush_all+0x1f5/0x530
 
 but task is already holding lock:
 ffff93c4bbd37218 (&pool->lock){-.-.}-{2:2}, at: __flush_work+0xc1/0x440
 
 which lock already depends on the new lock.
 
 
 the existing dependency chain (in reverse order) is:
 
 -> #1 (&pool->lock){-.-.}-{2:2}:
        _raw_spin_lock+0x33/0x40
        __queue_work+0xd6/0x610
        queue_work_on+0x8a/0x90
        soft_cursor+0x1a0/0x230
        bit_cursor+0x386/0x5f0
        hide_cursor+0x27/0xb0
        vt_console_print+0x474/0x490
        console_flush_all+0x22e/0x530
        console_unlock+0x56/0x160
        vprintk_emit+0x160/0x390
        dev_printk_emit+0xa5/0xd0
        _dev_info+0x79/0xa0
        __drm_fb_helper_initial_config_and_unlock+0x3a9/0x5f0
        drm_fbdev_generic_client_hotplug+0x69/0xc0
        drm_client_register+0x7b/0xc0
        qxl_pci_probe+0x107/0x1a0
        local_pci_probe+0x45/0xa0
        pci_device_probe+0xc7/0x240
        really_probe+0xd6/0x390
        __driver_probe_device+0x78/0x150
        driver_probe_device+0x1f/0x90
        __driver_attach+0xd6/0x1d0
        bus_for_each_dev+0x8f/0xe0
        bus_add_driver+0x119/0x220
        driver_register+0x59/0x100
        do_one_initcall+0x76/0x3c0
        kernel_init_freeable+0x3a5/0x5b0
        kernel_init+0x1a/0x1c0
        ret_from_fork+0x34/0x50
        ret_from_fork_asm+0x1a/0x30
 
 -> #0 (console_owner){....}-{0:0}:
        __lock_acquire+0x13e7/0x2180
        lock_acquire+0xd9/0x300
        console_flush_all+0x212/0x530
        console_unlock+0x56/0x160
        vprintk_emit+0x160/0x390
        _printk+0x64/0x80
        __warn_printk+0x8e/0x180
        check_flush_dependency+0xfd/0x120
        __flush_work+0xfa/0x440
        qxl_queue_garbage_collect+0x83/0x90
        qxl_fence_wait+0xa4/0x1a0
        dma_fence_wait_timeout+0x98/0x1e0
        dma_resv_wait_timeout+0x7f/0xe0
        ttm_bo_delayed_delete+0x2b/0x90
        process_one_work+0x228/0x740
        worker_thread+0x1dc/0x3c0
        kthread+0xf2/0x120
        ret_from_fork+0x34/0x50
        ret_from_fork_asm+0x1a/0x30
 
 other info that might help us debug this:
 
  Possible unsafe locking scenario:
 
        CPU0                    CPU1
        ----                    ----
   lock(&pool->lock);
                                lock(console_owner);
                                lock(&pool->lock);
   lock(console_owner);
 
  *** DEADLOCK ***
 
 6 locks held by kworker/u24:3/119:
  #0: ffff93c440245948 ((wq_completion)ttm){+.+.}-{0:0}, at: process_one_work+0x43a/0x740
  #1: ffffa01380d83e60 ((work_completion)(&bo->delayed_delete)){+.+.}-{0:0}, at: process_one_work+0x1e2/0x740
  #2: ffffffff95b17880 (rcu_read_lock){....}-{1:2}, at: __flush_work+0x86/0x440
  #3: ffff93c4bbd37218 (&pool->lock){-.-.}-{2:2}, at: __flush_work+0xc1/0x440
  #4: ffffffff95b149c0 (console_lock){+.+.}-{0:0}, at: _printk+0x64/0x80
  #5: ffffffff95b14a10 (console_srcu){....}-{0:0}, at: console_flush_all+0x7b/0x530
 
 stack backtrace:
 CPU: 2 PID: 119 Comm: kworker/u24:3 Not tainted 6.9.0-rc1-test-00021-g07ed11afb68d #5
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
 Workqueue: ttm ttm_bo_delayed_delete
 Call Trace:
  <TASK>
  dump_stack_lvl+0x77/0xb0
  check_noncircular+0x148/0x160
  __lock_acquire+0x13e7/0x2180
  lock_acquire+0xd9/0x300
  ? console_flush_all+0x1f5/0x530
  ? lock_release+0x147/0x2c0
  ? console_flush_all+0x1f5/0x530
  console_flush_all+0x212/0x530
  ? console_flush_all+0x1f5/0x530
  console_unlock+0x56/0x160
  vprintk_emit+0x160/0x390
  _printk+0x64/0x80
  ? __pfx_ttm_bo_delayed_delete+0x10/0x10
  ? __pfx_qxl_gc_work+0x10/0x10
  __warn_printk+0x8e/0x180
  ? __pfx_ttm_bo_delayed_delete+0x10/0x10
  ? __pfx_qxl_gc_work+0x10/0x10
  ? __pfx_qxl_gc_work+0x10/0x10
  check_flush_dependency+0xfd/0x120
  __flush_work+0xfa/0x440
  qxl_queue_garbage_collect+0x83/0x90
  qxl_fence_wait+0xa4/0x1a0
  dma_fence_wait_timeout+0x98/0x1e0
  dma_resv_wait_timeout+0x7f/0xe0
  ttm_bo_delayed_delete+0x2b/0x90
  process_one_work+0x228/0x740
  worker_thread+0x1dc/0x3c0
  ? __pfx_worker_thread+0x10/0x10
  kthread+0xf2/0x120
  ? __pfx_kthread+0x10/0x10
  ret_from_fork+0x34/0x50
  ? __pfx_kthread+0x10/0x10
  ret_from_fork_asm+0x1a/0x30
  </TASK>
 workqueue: WQ_MEM_RECLAIM ttm:ttm_bo_delayed_delete is flushing !WQ_MEM_RECLAIM events:qxl_gc_work
 WARNING: CPU: 2 PID: 119 at kernel/workqueue.c:3728 check_flush_dependency+0xfd/0x120
 Modules linked in:
 CPU: 2 PID: 119 Comm: kworker/u24:3 Not tainted 6.9.0-rc1-test-00021-g07ed11afb68d #5
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
 Workqueue: ttm ttm_bo_delayed_delete
 RIP: 0010:check_flush_dependency+0xfd/0x120
 Code: 8b 45 18 48 8d b2 70 01 00 00 49 89 e8 48 8d 8b 70 01 00 00 48 c7 c7 60 46 7b 95 c6 05 48 67 d2 01 01 48 89 c2 e8 63 40 fd ff <0f> 0b e9 1e ff ff ff 80 3d 33 67 d2 01 00 75 93 e9 4a ff ff ff 66
 RSP: 0000:ffffa01380d83c28 EFLAGS: 00010086
 RAX: 0000000000000000 RBX: ffff93c44004ee00 RCX: 0000000000000000
 RDX: 0000000080000003 RSI: 00000000ffffefff RDI: 0000000000000001
 RBP: ffffffff9497b100 R08: 0000000000000000 R09: 0000000000000003
 R10: ffffa01380d83ab8 R11: ffffffff95b14828 R12: ffff93c443980000
 R13: ffff93c440fbe300 R14: 0000000000000001 R15: ffff93c44004ee00
 FS:  0000000000000000(0000) GS:ffff93c4bbd00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000000 CR3: 000000007c864001 CR4: 0000000000170ef0
 Call Trace:
  <TASK>
  ? __warn+0x8c/0x180
  ? check_flush_dependency+0xfd/0x120
  ? report_bug+0x191/0x1c0
  ? prb_read_valid+0x1b/0x30
  ? handle_bug+0x3c/0x80
  ? exc_invalid_op+0x17/0x70
  ? asm_exc_invalid_op+0x1a/0x20
  ? __pfx_qxl_gc_work+0x10/0x10
  ? check_flush_dependency+0xfd/0x120
  ? check_flush_dependency+0xfd/0x120
  __flush_work+0xfa/0x440
  qxl_queue_garbage_collect+0x83/0x90
  qxl_fence_wait+0xa4/0x1a0
  dma_fence_wait_timeout+0x98/0x1e0
  dma_resv_wait_timeout+0x7f/0xe0
  ttm_bo_delayed_delete+0x2b/0x90
  process_one_work+0x228/0x740
  worker_thread+0x1dc/0x3c0
  ? __pfx_worker_thread+0x10/0x10
  kthread+0xf2/0x120
  ? __pfx_kthread+0x10/0x10
  ret_from_fork+0x34/0x50
  ? __pfx_kthread+0x10/0x10
  ret_from_fork_asm+0x1a/0x30
  </TASK>
 irq event stamp: 58
 hardirqs last  enabled at (57): [<ffffffff93fede30>] queue_work_on+0x60/0x90
 hardirqs last disabled at (58): [<ffffffff94ea7f66>] _raw_spin_lock_irq+0x56/0x60
 softirqs last  enabled at (0): [<ffffffff93fbae27>] copy_process+0xc07/0x2c60
 softirqs last disabled at (0): [<0000000000000000>] 0x0
 ---[ end trace 0000000000000000 ]---

So there's an issue with dma_fence and a workqueue.

-- Steve


             reply	other threads:[~2024-05-02 12:15 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-02 12:16 Steven Rostedt [this message]
2024-05-02 12:30 ` [BUG][v6.9-rc6] Deadlock with: Revert "drm/qxl: simplify qxl_fence_wait" Steven Rostedt
2024-05-04  8:39 ` Steven Rostedt
2024-05-06 12:45   ` Maxime Ripard
2024-05-06 20:28     ` Linus Torvalds
2024-05-07  5:54       ` David Airlie
2024-05-07  6:38         ` Timo Lindfors
2024-05-07 10:21           ` Gerd Hoffmann
2024-05-07 15:46             ` Timo Lindfors
2024-05-08  9:56               ` Gerd Hoffmann
2024-05-07  9:03         ` Steven Rostedt
2024-05-08 12:42           ` Anders Blomdell

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240502081641.457aa25f@gandalf.local.home \
    --to=rostedt@goodmis.org \
    --cc=airlied@redhat.com \
    --cc=daniel@ffwll.ch \
    --cc=dreaming.about.electric.sheep@gmail.com \
    --cc=kraxel@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=maarten.lankhorst@linux.intel.com \
    --cc=mripard@kernel.org \
    --cc=timo.lindfors@iki.fi \
    --cc=torvalds@linux-foundation.org \
    --cc=tzimmermann@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.