asahi.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
From: Asahi Lina <lina@asahilina.net>
To: "Luben Tuikov" <luben.tuikov@amd.com>,
	"David Airlie" <airlied@gmail.com>,
	"Daniel Vetter" <daniel@ffwll.ch>,
	"Sumit Semwal" <sumit.semwal@linaro.org>,
	"Christian König" <christian.koenig@amd.com>
Cc: Faith Ekstrand <faith.ekstrand@collabora.com>,
	 Alyssa Rosenzweig <alyssa@rosenzweig.io>,
	dri-devel@lists.freedesktop.org,  linux-kernel@vger.kernel.org,
	linux-media@vger.kernel.org,  asahi@lists.linux.dev,
	Asahi Lina <lina@asahilina.net>
Subject: [PATCH 0/3] DRM scheduler documentation & bug fixes
Date: Fri, 14 Jul 2023 17:21:28 +0900	[thread overview]
Message-ID: <20230714-drm-sched-fixes-v1-0-c567249709f7@asahilina.net> (raw)

Based on the previous discussion while I was writing the Rust
abstractions for the DRM scheduler, it looks like we're overdue for some
documentation.

This series first attempts to document what I've learned about the
scheduler and what I believe should be the *intended* lifetime
semantics, and then fixes a few bugs that result from that:

1. The DRM scheduler fences cannot be required to be outlived by the
   scheduler. This is non-negotiable. The whole point of these fences is
   to decouple the underlying hardware/driver from consumers, such as
   dma-bufs with an attached fence. If this requirement were not met,
   then we'd have to somehow keep the scheduler and all the driver
   components associated with it alive as long as a dma-buf with an
   attached drm_sched fence is alive, which could be indefinitely even
   after the hardware that produced that dma-buf is long gone. Consider,
   for example, using a hot-pluggable GPU to write to a dma-buf in main
   memory, which gets presented on an integrated display controller, and
   then the GPU is unplugged. That buffer could potentially live
   forever, we can't block GPU driver cleanup on that.

2. Make the DRM scheduler properly clean up jobs on shutdown, such that
   we can support the use case of tearing down the scheduler with
   in-flight jobs. This is important to cleanly support the firmware
   scheduling use case, where the DRM scheduler is attached to a file
   (which we want to be able to tear down quickly when userspace closes
   it) while firmware could continue to (attempt to) run in-flight jobs
   after that point. The major missing codepath to make this work is
   detaching jobs from their HW fences on scheduler shutdown, so
   implement that. This also makes writing a safe Rust abstraction
   plausible, since otherwise we'd have to add a huge pile of complexity
   to that side in order to enforce the invariant that the scheduler
   outlives its jobs (including setting up a workqueue to handle
   scheduler teardown and other craziness, which is an unacceptable
   level of complexity for what should be a lightweight abstraction).

I believe there *may* still be at least one UAF-type bug related to case
2 above, but it's very hard to trigger and I wasn't able to figure out
what causes it the one time I saw it recently. Other than that, things
look quite robust on the Asahi driver with these patches, even when
trying to break things by killing GPU consumers in a tight loop and
things like that. If we agree this is a good way forward, I think this
is a good start even if there's still a bug lurking somewhere.

Aside (but related to the previous discussion): the can_run_job thing
is gone, I'm using fences returned from prepare() now and that works
well (and actually fixes one corner case related to wait contexts I'd
missed), so hopefully that's OK with everyone ^^

Changes from the previous version of patch #2: explicitly signal
detached job fences with an error. I'd missed that and I think it's what
was causing us some rare lockups due to fences never getting signaled.

Signed-off-by: Asahi Lina <lina@asahilina.net>
---
Asahi Lina (3):
      drm/scheduler: Add more documentation
      drm/scheduler: Fix UAF in drm_sched_fence_get_timeline_name
      drm/scheduler: Clean up jobs when the scheduler is torn down.
 drivers/gpu/drm/scheduler/sched_entity.c |  7 ++-
 drivers/gpu/drm/scheduler/sched_fence.c  |  4 +-
 drivers/gpu/drm/scheduler/sched_main.c   | 90 ++++++++++++++++++++++++++++++--
 include/drm/gpu_scheduler.h              |  5 ++
 4 files changed, 99 insertions(+), 7 deletions(-)
---
base-commit: 06c2afb862f9da8dc5efa4b6076a0e48c3fbaaa5
change-id: 20230714-drm-sched-fixes-94bea043bbe7

Thank you,
~~ Lina


             reply	other threads:[~2023-07-14  8:21 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-14  8:21 Asahi Lina [this message]
2023-07-14  8:21 ` [PATCH 1/3] drm/scheduler: Add more documentation Asahi Lina
2023-07-14  8:40   ` Christian König
2023-07-14  9:39     ` Asahi Lina
2023-07-14  9:47       ` Christian König
2023-07-14  9:51         ` Asahi Lina
2023-07-14  8:21 ` [PATCH 2/3] drm/scheduler: Fix UAF in drm_sched_fence_get_timeline_name Asahi Lina
2023-07-14  8:43   ` Christian König
2023-07-14  9:44     ` Asahi Lina
2023-07-14  9:51       ` Christian König
2023-07-14 10:07         ` Asahi Lina
2023-07-14 10:29           ` Christian König
2023-07-14  9:49     ` Asahi Lina
2023-07-14  9:57       ` Christian König
2023-07-14 10:06         ` Asahi Lina
2023-07-14 10:18           ` Christian König
2023-07-14 12:13             ` Asahi Lina
2023-07-15  4:03         ` Luben Tuikov
2023-07-15 14:14         ` alyssa
2023-07-17 15:55           ` Christian König
2023-07-18  2:35             ` Asahi Lina
2023-07-18  5:45               ` Luben Tuikov
2023-07-21 10:33                 ` Asahi Lina
2023-07-31  8:09                   ` Christian König
2023-11-01  6:59                     ` Dave Airlie
2023-11-01  8:13                       ` Daniel Vetter
2023-11-02 10:48                         ` Christian König
2023-11-02 11:19                           ` Lucas Stach
2023-11-02 12:39                             ` Christian König
2023-07-28  7:48               ` Christian König
2023-07-18  8:21             ` Pekka Paalanen
2023-07-14  8:21 ` [PATCH 3/3] drm/scheduler: Clean up jobs when the scheduler is torn down Asahi Lina
2023-07-15  7:14   ` Luben Tuikov
2023-07-16  7:51     ` Asahi Lina
2023-07-17 17:40       ` Luben Tuikov
2023-07-17 22:45         ` Asahi Lina
2023-07-18  5:14           ` Luben Tuikov
2023-07-19 18:16           ` Konstantin Ryabitsev
2023-07-19 18:58             ` Luben Tuikov
2023-08-02  4:06         ` Matthew Brost
2023-08-02 14:12           ` Luben Tuikov
2023-07-19  8:45       ` Christian König
2023-07-19 15:05         ` Luben Tuikov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230714-drm-sched-fixes-v1-0-c567249709f7@asahilina.net \
    --to=lina@asahilina.net \
    --cc=airlied@gmail.com \
    --cc=alyssa@rosenzweig.io \
    --cc=asahi@lists.linux.dev \
    --cc=christian.koenig@amd.com \
    --cc=daniel@ffwll.ch \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=faith.ekstrand@collabora.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-media@vger.kernel.org \
    --cc=luben.tuikov@amd.com \
    --cc=sumit.semwal@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).