From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 385E21C695 for ; Mon, 6 May 2024 12:46:01 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1714999561; cv=none; b=XvqKjMChDaouRPZJrq46nKGm1vcZDqML41ENy/g85Ck8tU8//vLrOqNHtIlzDqUsJHNSUX9gYSLd+Ng/teoDq7/8zYu2LJpj3d39zpzI7jyEUXv18qmeWCYhkBSttQp7Xg2FinwiXX4dVkBPQH7ESzFIjZKqOBzuJ0mf4hNBj30= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1714999561; c=relaxed/simple; bh=0JEXGYIioEANUpG6Sn+eEjZ33k9gXN8hruc4JylEN5c=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=DWAnSez2u0QohoaSBaF8gUPyq8Oypau4R1s5nLw2gCw7UUlNTl5b/FQ2V85A/ycmpsILVP04VrMD7FWlWzbqdJH+6juv8at8Tai6KV/AOT6KkTEnTOwzMPQK/32jWEKREqUoUirYCaZgmlqS18zj2JSiY4j6SytWWUr+o2EK1kM= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=KMP0uMdg; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="KMP0uMdg" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 8A95DC116B1; Mon, 6 May 2024 12:46:00 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1714999560; bh=0JEXGYIioEANUpG6Sn+eEjZ33k9gXN8hruc4JylEN5c=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=KMP0uMdgBhs6+JPbGTjsuImO6YTC77SZ2L1O584/Oj7rKo9CjgGU1qU++FHcbPUxk 3ONqkkmt3u8EdWufF+EuUnMvOLFdTcH9imYcFUwVtGX/I9WTGAU5B8gsk7Pl49vqJ8 AkhuJ8cIuVYRNs3mwq/bO3UWAiQl5I/XG4i9UH3KGhb6ML2hbbCt882knNrjIOWBhz 9YbAwVJNugEALwdZSc1F3WeRJvPM4iUy1+YTo5FNQnPoqGkPHaOtJoHwFoXxTRTZM2 mmFtIk/VqM3B0KdkkPzZkcvzRzNk0+0kpGIFqteZ/0CIglld30ZMm88yMQP2cgzcUe 6fZzkAiKma5Aw== Date: Mon, 6 May 2024 14:45:58 +0200 From: Maxime Ripard To: Steven Rostedt Cc: LKML , Linus Torvalds , Alex Constantino , Timo Lindfors , Dave Airlie , Gerd Hoffmann , Maarten Lankhorst , Thomas Zimmermann , Daniel Vetter Subject: Re: [BUG][v6.9-rc6] Deadlock with: Revert "drm/qxl: simplify qxl_fence_wait" Message-ID: <20240506-cuddly-elated-agouti-be981d@houat> References: <20240502081641.457aa25f@gandalf.local.home> <20240504043957.417aa98c@rorschach.local.home> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha384; protocol="application/pgp-signature"; boundary="fkvg7keqrgo5ho52" Content-Disposition: inline In-Reply-To: <20240504043957.417aa98c@rorschach.local.home> --fkvg7keqrgo5ho52 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Hi Steven, It looks like qxl is not well maintained. Please send the revert, we'll merge it. Maxime On Sat, May 04, 2024 at 04:39:57AM GMT, Steven Rostedt wrote: >=20 > Did anyone see this? >=20 > -- Steve >=20 >=20 > On Thu, 2 May 2024 08:16:41 -0400 > Steven Rostedt wrote: >=20 > > I went to run my tests on my VMs and the tests hung on boot up. > > Unfortunately, the most I ever got out was: > >=20 > > [ 93.607888] Testing event system initcall: OK > > [ 93.667730] Running tests on all trace events: > > [ 93.669757] Testing all events: OK > > [ 95.631064] ------------[ cut here ]------------ > > Timed out after 60 seconds > >=20 > > I ran a bisect and it came up with: > >=20 > > # first bad commit: [07ed11afb68d94eadd4ffc082b97c2331307c5ea] Revert = "drm/qxl: simplify qxl_fence_wait" > >=20 > > I checked out 07ed11afb68d94eadd4ffc082b97c2331307c5ea~1 and it booted > > fine. Added back that commit, it failed to boot. I did this twice, and = got > > the same results. > >=20 > > But the last time I ran it, it did trigger this: > >=20 > > ------------[ cut here ]------------ > > =20 > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D > > WARNING: possible circular locking dependency detected > > 6.9.0-rc1-test-00021-g07ed11afb68d #5 Not tainted > > ------------------------------------------------------ > > kworker/u24:3/119 is trying to acquire lock: > > ffffffff95aa4600 (console_owner){....}-{0:0}, at: console_flush_all+0x= 1f5/0x530 > > =20 > > but task is already holding lock: > > ffff93c4bbd37218 (&pool->lock){-.-.}-{2:2}, at: __flush_work+0xc1/0x440 > > =20 > > which lock already depends on the new lock. > > =20 > > =20 > > the existing dependency chain (in reverse order) is: > > =20 > > -> #1 (&pool->lock){-.-.}-{2:2}: =20 > > _raw_spin_lock+0x33/0x40 > > __queue_work+0xd6/0x610 > > queue_work_on+0x8a/0x90 > > soft_cursor+0x1a0/0x230 > > bit_cursor+0x386/0x5f0 > > hide_cursor+0x27/0xb0 > > vt_console_print+0x474/0x490 > > console_flush_all+0x22e/0x530 > > console_unlock+0x56/0x160 > > vprintk_emit+0x160/0x390 > > dev_printk_emit+0xa5/0xd0 > > _dev_info+0x79/0xa0 > > __drm_fb_helper_initial_config_and_unlock+0x3a9/0x5f0 > > drm_fbdev_generic_client_hotplug+0x69/0xc0 > > drm_client_register+0x7b/0xc0 > > qxl_pci_probe+0x107/0x1a0 > > local_pci_probe+0x45/0xa0 > > pci_device_probe+0xc7/0x240 > > really_probe+0xd6/0x390 > > __driver_probe_device+0x78/0x150 > > driver_probe_device+0x1f/0x90 > > __driver_attach+0xd6/0x1d0 > > bus_for_each_dev+0x8f/0xe0 > > bus_add_driver+0x119/0x220 > > driver_register+0x59/0x100 > > do_one_initcall+0x76/0x3c0 > > kernel_init_freeable+0x3a5/0x5b0 > > kernel_init+0x1a/0x1c0 > > ret_from_fork+0x34/0x50 > > ret_from_fork_asm+0x1a/0x30 > > =20 > > -> #0 (console_owner){....}-{0:0}: =20 > > __lock_acquire+0x13e7/0x2180 > > lock_acquire+0xd9/0x300 > > console_flush_all+0x212/0x530 > > console_unlock+0x56/0x160 > > vprintk_emit+0x160/0x390 > > _printk+0x64/0x80 > > __warn_printk+0x8e/0x180 > > check_flush_dependency+0xfd/0x120 > > __flush_work+0xfa/0x440 > > qxl_queue_garbage_collect+0x83/0x90 > > qxl_fence_wait+0xa4/0x1a0 > > dma_fence_wait_timeout+0x98/0x1e0 > > dma_resv_wait_timeout+0x7f/0xe0 > > ttm_bo_delayed_delete+0x2b/0x90 > > process_one_work+0x228/0x740 > > worker_thread+0x1dc/0x3c0 > > kthread+0xf2/0x120 > > ret_from_fork+0x34/0x50 > > ret_from_fork_asm+0x1a/0x30 > > =20 > > other info that might help us debug this: > > =20 > > Possible unsafe locking scenario: > > =20 > > CPU0 CPU1 > > ---- ---- > > lock(&pool->lock); > > lock(console_owner); > > lock(&pool->lock); > > lock(console_owner); > > =20 > > *** DEADLOCK *** > > =20 > > 6 locks held by kworker/u24:3/119: > > #0: ffff93c440245948 ((wq_completion)ttm){+.+.}-{0:0}, at: process_on= e_work+0x43a/0x740 > > #1: ffffa01380d83e60 ((work_completion)(&bo->delayed_delete)){+.+.}-{= 0:0}, at: process_one_work+0x1e2/0x740 > > #2: ffffffff95b17880 (rcu_read_lock){....}-{1:2}, at: __flush_work+0x= 86/0x440 > > #3: ffff93c4bbd37218 (&pool->lock){-.-.}-{2:2}, at: __flush_work+0xc1= /0x440 > > #4: ffffffff95b149c0 (console_lock){+.+.}-{0:0}, at: _printk+0x64/0x80 > > #5: ffffffff95b14a10 (console_srcu){....}-{0:0}, at: console_flush_al= l+0x7b/0x530 > > =20 > > stack backtrace: > > CPU: 2 PID: 119 Comm: kworker/u24:3 Not tainted 6.9.0-rc1-test-00021-g= 07ed11afb68d #5 > > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian= -1.16.3-2 04/01/2014 > > Workqueue: ttm ttm_bo_delayed_delete > > Call Trace: > > > > dump_stack_lvl+0x77/0xb0 > > check_noncircular+0x148/0x160 > > __lock_acquire+0x13e7/0x2180 > > lock_acquire+0xd9/0x300 > > ? console_flush_all+0x1f5/0x530 > > ? lock_release+0x147/0x2c0 > > ? console_flush_all+0x1f5/0x530 > > console_flush_all+0x212/0x530 > > ? console_flush_all+0x1f5/0x530 > > console_unlock+0x56/0x160 > > vprintk_emit+0x160/0x390 > > _printk+0x64/0x80 > > ? __pfx_ttm_bo_delayed_delete+0x10/0x10 > > ? __pfx_qxl_gc_work+0x10/0x10 > > __warn_printk+0x8e/0x180 > > ? __pfx_ttm_bo_delayed_delete+0x10/0x10 > > ? __pfx_qxl_gc_work+0x10/0x10 > > ? __pfx_qxl_gc_work+0x10/0x10 > > check_flush_dependency+0xfd/0x120 > > __flush_work+0xfa/0x440 > > qxl_queue_garbage_collect+0x83/0x90 > > qxl_fence_wait+0xa4/0x1a0 > > dma_fence_wait_timeout+0x98/0x1e0 > > dma_resv_wait_timeout+0x7f/0xe0 > > ttm_bo_delayed_delete+0x2b/0x90 > > process_one_work+0x228/0x740 > > worker_thread+0x1dc/0x3c0 > > ? __pfx_worker_thread+0x10/0x10 > > kthread+0xf2/0x120 > > ? __pfx_kthread+0x10/0x10 > > ret_from_fork+0x34/0x50 > > ? __pfx_kthread+0x10/0x10 > > ret_from_fork_asm+0x1a/0x30 > > > > workqueue: WQ_MEM_RECLAIM ttm:ttm_bo_delayed_delete is flushing !WQ_ME= M_RECLAIM events:qxl_gc_work > > WARNING: CPU: 2 PID: 119 at kernel/workqueue.c:3728 check_flush_depend= ency+0xfd/0x120 > > Modules linked in: > > CPU: 2 PID: 119 Comm: kworker/u24:3 Not tainted 6.9.0-rc1-test-00021-g= 07ed11afb68d #5 > > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian= -1.16.3-2 04/01/2014 > > Workqueue: ttm ttm_bo_delayed_delete > > RIP: 0010:check_flush_dependency+0xfd/0x120 > > Code: 8b 45 18 48 8d b2 70 01 00 00 49 89 e8 48 8d 8b 70 01 00 00 48 c= 7 c7 60 46 7b 95 c6 05 48 67 d2 01 01 48 89 c2 e8 63 40 fd ff <0f> 0b e9 1e= ff ff ff 80 3d 33 67 d2 01 00 75 93 e9 4a ff ff ff 66 > > RSP: 0000:ffffa01380d83c28 EFLAGS: 00010086 > > RAX: 0000000000000000 RBX: ffff93c44004ee00 RCX: 0000000000000000 > > RDX: 0000000080000003 RSI: 00000000ffffefff RDI: 0000000000000001 > > RBP: ffffffff9497b100 R08: 0000000000000000 R09: 0000000000000003 > > R10: ffffa01380d83ab8 R11: ffffffff95b14828 R12: ffff93c443980000 > > R13: ffff93c440fbe300 R14: 0000000000000001 R15: ffff93c44004ee00 > > FS: 0000000000000000(0000) GS:ffff93c4bbd00000(0000) knlGS:0000000000= 000000 > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > CR2: 0000000000000000 CR3: 000000007c864001 CR4: 0000000000170ef0 > > Call Trace: > > > > ? __warn+0x8c/0x180 > > ? check_flush_dependency+0xfd/0x120 > > ? report_bug+0x191/0x1c0 > > ? prb_read_valid+0x1b/0x30 > > ? handle_bug+0x3c/0x80 > > ? exc_invalid_op+0x17/0x70 > > ? asm_exc_invalid_op+0x1a/0x20 > > ? __pfx_qxl_gc_work+0x10/0x10 > > ? check_flush_dependency+0xfd/0x120 > > ? check_flush_dependency+0xfd/0x120 > > __flush_work+0xfa/0x440 > > qxl_queue_garbage_collect+0x83/0x90 > > qxl_fence_wait+0xa4/0x1a0 > > dma_fence_wait_timeout+0x98/0x1e0 > > dma_resv_wait_timeout+0x7f/0xe0 > > ttm_bo_delayed_delete+0x2b/0x90 > > process_one_work+0x228/0x740 > > worker_thread+0x1dc/0x3c0 > > ? __pfx_worker_thread+0x10/0x10 > > kthread+0xf2/0x120 > > ? __pfx_kthread+0x10/0x10 > > ret_from_fork+0x34/0x50 > > ? __pfx_kthread+0x10/0x10 > > ret_from_fork_asm+0x1a/0x30 > > > > irq event stamp: 58 > > hardirqs last enabled at (57): [] queue_work_on+0x6= 0/0x90 > > hardirqs last disabled at (58): [] _raw_spin_lock_ir= q+0x56/0x60 > > softirqs last enabled at (0): [] copy_process+0xc07= /0x2c60 > > softirqs last disabled at (0): [<0000000000000000>] 0x0 > > ---[ end trace 0000000000000000 ]--- > >=20 > > So there's an issue with dma_fence and a workqueue. > >=20 > > -- Steve > >=20 >=20 --fkvg7keqrgo5ho52 Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iJUEABMJAB0WIQTkHFbLp4ejekA/qfgnX84Zoj2+dgUCZjjRAQAKCRAnX84Zoj2+ dgU5AYCpbMS8F0dQyI1IFykzdx3HflOiMlWsrVgHdw99AN59S+xEma/wAtLMAUs1 Nt5oGqkBgMRvmjUt6tJ8iSHGi2Sq+eFq4xEu5h/H0FJgPck2nuYIx6yeYCm0Sk39 l3+U63jHBg== =3RhG -----END PGP SIGNATURE----- --fkvg7keqrgo5ho52--