All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
From: Jason Ekstrand <jason@jlekstrand.net>
To: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: intel-gfx <intel-gfx@lists.freedesktop.org>,
	dri-devel <dri-devel@lists.freedesktop.org>,
	"Matthew Auld" <matthew.auld@intel.com>,
	"Dave Airlie" <airlied@redhat.com>,
	"Christian König" <christian.koenig@amd.com>
Subject: Re: [Intel-gfx] [PATCH 0/5] dma-fence, i915: Stop allowing SLAB_TYPESAFE_BY_RCU for dma_fence
Date: Thu, 10 Jun 2021 15:09:47 -0500	[thread overview]
Message-ID: <CAOFGe96KrBfvBKxcUNwths5Sigk7fk7ycLeYbgxutL3msEgfyA@mail.gmail.com> (raw)
In-Reply-To: <CAOFGe95BhZ7jXLxarL=2_zNYDydEoPJWnDWAG3aaeEJsDzR5dA@mail.gmail.com>

On Thu, Jun 10, 2021 at 8:35 AM Jason Ekstrand <jason@jlekstrand.net> wrote:
> On Thu, Jun 10, 2021 at 6:30 AM Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> > On Thu, Jun 10, 2021 at 11:39 AM Christian König
> > <christian.koenig@amd.com> wrote:
> > > Am 10.06.21 um 11:29 schrieb Tvrtko Ursulin:
> > > > On 09/06/2021 22:29, Jason Ekstrand wrote:
> > > >>
> > > >> We've tried to keep it somewhat contained by doing most of the hard work
> > > >> to prevent access of recycled objects via dma_fence_get_rcu_safe().
> > > >> However, a quick grep of kernel sources says that, of the 30 instances
> > > >> of dma_fence_get_rcu*, only 11 of them use dma_fence_get_rcu_safe().
> > > >> It's likely there bear traps in DRM and related subsystems just waiting
> > > >> for someone to accidentally step in them.
> > > >
> > > > ...because dma_fence_get_rcu_safe apears to be about whether the
> > > > *pointer* to the fence itself is rcu protected, not about the fence
> > > > object itself.
> > >
> > > Yes, exactly that.
>
> The fact that both of you think this either means that I've completely
> missed what's going on with RCUs here (possible but, in this case, I
> think unlikely) or RCUs on dma fences should scare us all.

Taking a step back for a second and ignoring SLAB_TYPESAFE_BY_RCU as
such,  I'd like to ask a slightly different question:  What are the
rules about what is allowed to be done under the RCU read lock and
what guarantees does a driver need to provide?

I think so far that we've all agreed on the following:

 1. Freeing an unsignaled fence is ok as long as it doesn't have any
pending callbacks.  (Callbacks should hold a reference anyway).

 2. The pointer race solved by dma_fence_get_rcu_safe is real and
requires the loop to sort out.

But let's say I have a dma_fence pointer that I got from, say, calling
dma_resv_excl_fence() under rcu_read_lock().  What am I allowed to do
with it under the RCU lock?  What assumptions can I make?  Is this
code, for instance, ok?

rcu_read_lock();
fence = dma_resv_excl_fence(obj);
idle = !fence || test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags);
rcu_read_unlock();

This code very much looks correct under the following assumptions:

 1. A valid fence pointer stays alive under the RCU read lock
 2. SIGNALED_BIT is set-once (it's never unset after being set).

However, if it were, we wouldn't have dma_resv_test_singnaled(), now
would we? :-)

The moment you introduce ANY dma_fence recycling that recycles a
dma_fence within a single RCU grace period, all your assumptions break
down.  SLAB_TYPESAFE_BY_RCU is just one way that i915 does this.  We
also have a little i915_request recycler to try and help with memory
pressure scenarios in certain critical sections that also doesn't
respect RCU grace periods.  And, as mentioned multiple times, our
recycling leaks into every other driver because, thanks to i915's
choice, the above 4-line code snippet isn't valid ANYWHERE in the
kernel.

So the question I'm raising isn't so much about the rules today.
Today, we live in the wild wild west where everything is YOLO.  But
where do we want to go?  Do we like this wild west world?  So we want
more consistency under the RCU read lock?  If so, what do we want the
rules to be?

One option would be to accept the wild-west world we live in and say
"The RCU read lock gains you nothing.  If you want to touch the guts
of a dma_fence, take a reference".  But, at that point, we're eating
two atomics for every time someone wants to look at a dma_fence.  Do
we want that?

Alternatively, and this what I think Daniel and I were trying to
propose here, is that we place some constraints on dma_fence
recycling.  Specifically that, under the RCU read lock, the fence
doesn't suddenly become a new fence.  All of the immutability and
once-mutability guarantees of various bits of dma_fence hold as long
as you have the RCU read lock.

--Jason
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

WARNING: multiple messages have this Message-ID (diff)
From: Jason Ekstrand <jason@jlekstrand.net>
To: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: "Tvrtko Ursulin" <tvrtko.ursulin@linux.intel.com>,
	intel-gfx <intel-gfx@lists.freedesktop.org>,
	dri-devel <dri-devel@lists.freedesktop.org>,
	"Matthew Auld" <matthew.auld@intel.com>,
	"Dave Airlie" <airlied@redhat.com>,
	"Christian König" <christian.koenig@amd.com>
Subject: Re: [Intel-gfx] [PATCH 0/5] dma-fence, i915: Stop allowing SLAB_TYPESAFE_BY_RCU for dma_fence
Date: Thu, 10 Jun 2021 15:09:47 -0500	[thread overview]
Message-ID: <CAOFGe96KrBfvBKxcUNwths5Sigk7fk7ycLeYbgxutL3msEgfyA@mail.gmail.com> (raw)
In-Reply-To: <CAOFGe95BhZ7jXLxarL=2_zNYDydEoPJWnDWAG3aaeEJsDzR5dA@mail.gmail.com>

On Thu, Jun 10, 2021 at 8:35 AM Jason Ekstrand <jason@jlekstrand.net> wrote:
> On Thu, Jun 10, 2021 at 6:30 AM Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> > On Thu, Jun 10, 2021 at 11:39 AM Christian König
> > <christian.koenig@amd.com> wrote:
> > > Am 10.06.21 um 11:29 schrieb Tvrtko Ursulin:
> > > > On 09/06/2021 22:29, Jason Ekstrand wrote:
> > > >>
> > > >> We've tried to keep it somewhat contained by doing most of the hard work
> > > >> to prevent access of recycled objects via dma_fence_get_rcu_safe().
> > > >> However, a quick grep of kernel sources says that, of the 30 instances
> > > >> of dma_fence_get_rcu*, only 11 of them use dma_fence_get_rcu_safe().
> > > >> It's likely there bear traps in DRM and related subsystems just waiting
> > > >> for someone to accidentally step in them.
> > > >
> > > > ...because dma_fence_get_rcu_safe apears to be about whether the
> > > > *pointer* to the fence itself is rcu protected, not about the fence
> > > > object itself.
> > >
> > > Yes, exactly that.
>
> The fact that both of you think this either means that I've completely
> missed what's going on with RCUs here (possible but, in this case, I
> think unlikely) or RCUs on dma fences should scare us all.

Taking a step back for a second and ignoring SLAB_TYPESAFE_BY_RCU as
such,  I'd like to ask a slightly different question:  What are the
rules about what is allowed to be done under the RCU read lock and
what guarantees does a driver need to provide?

I think so far that we've all agreed on the following:

 1. Freeing an unsignaled fence is ok as long as it doesn't have any
pending callbacks.  (Callbacks should hold a reference anyway).

 2. The pointer race solved by dma_fence_get_rcu_safe is real and
requires the loop to sort out.

But let's say I have a dma_fence pointer that I got from, say, calling
dma_resv_excl_fence() under rcu_read_lock().  What am I allowed to do
with it under the RCU lock?  What assumptions can I make?  Is this
code, for instance, ok?

rcu_read_lock();
fence = dma_resv_excl_fence(obj);
idle = !fence || test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags);
rcu_read_unlock();

This code very much looks correct under the following assumptions:

 1. A valid fence pointer stays alive under the RCU read lock
 2. SIGNALED_BIT is set-once (it's never unset after being set).

However, if it were, we wouldn't have dma_resv_test_singnaled(), now
would we? :-)

The moment you introduce ANY dma_fence recycling that recycles a
dma_fence within a single RCU grace period, all your assumptions break
down.  SLAB_TYPESAFE_BY_RCU is just one way that i915 does this.  We
also have a little i915_request recycler to try and help with memory
pressure scenarios in certain critical sections that also doesn't
respect RCU grace periods.  And, as mentioned multiple times, our
recycling leaks into every other driver because, thanks to i915's
choice, the above 4-line code snippet isn't valid ANYWHERE in the
kernel.

So the question I'm raising isn't so much about the rules today.
Today, we live in the wild wild west where everything is YOLO.  But
where do we want to go?  Do we like this wild west world?  So we want
more consistency under the RCU read lock?  If so, what do we want the
rules to be?

One option would be to accept the wild-west world we live in and say
"The RCU read lock gains you nothing.  If you want to touch the guts
of a dma_fence, take a reference".  But, at that point, we're eating
two atomics for every time someone wants to look at a dma_fence.  Do
we want that?

Alternatively, and this what I think Daniel and I were trying to
propose here, is that we place some constraints on dma_fence
recycling.  Specifically that, under the RCU read lock, the fence
doesn't suddenly become a new fence.  All of the immutability and
once-mutability guarantees of various bits of dma_fence hold as long
as you have the RCU read lock.

--Jason

  reply	other threads:[~2021-06-10 20:10 UTC|newest]

Thread overview: 82+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-09 21:29 [Intel-gfx] [PATCH 0/5] dma-fence, i915: Stop allowing SLAB_TYPESAFE_BY_RCU for dma_fence Jason Ekstrand
2021-06-09 21:29 ` Jason Ekstrand
2021-06-09 21:29 ` [Intel-gfx] [PATCH 1/5] drm/i915: Move intel_engine_free_request_pool to i915_request.c Jason Ekstrand
2021-06-09 21:29   ` Jason Ekstrand
2021-06-10 10:03   ` [Intel-gfx] " Tvrtko Ursulin
2021-06-10 10:03     ` Tvrtko Ursulin
2021-06-10 13:57     ` Jason Ekstrand
2021-06-10 13:57       ` Jason Ekstrand
2021-06-10 15:07       ` Tvrtko Ursulin
2021-06-10 15:07         ` Tvrtko Ursulin
2021-06-10 16:32         ` Jason Ekstrand
2021-06-10 16:32           ` Jason Ekstrand
2021-06-09 21:29 ` [Intel-gfx] [PATCH 2/5] drm/i915: Use a simpler scheme for caching i915_request Jason Ekstrand
2021-06-09 21:29   ` Jason Ekstrand
2021-06-10 10:08   ` [Intel-gfx] " Tvrtko Ursulin
2021-06-10 10:08     ` Tvrtko Ursulin
2021-06-10 13:50     ` Jason Ekstrand
2021-06-10 13:50       ` Jason Ekstrand
2021-06-09 21:29 ` [Intel-gfx] [PATCH 3/5] drm/i915: Stop using SLAB_TYPESAFE_BY_RCU for i915_request Jason Ekstrand
2021-06-09 21:29   ` Jason Ekstrand
2021-06-09 21:29 ` [Intel-gfx] [PATCH 4/5] dma-buf: Stop using SLAB_TYPESAFE_BY_RCU in selftests Jason Ekstrand
2021-06-09 21:29   ` Jason Ekstrand
2021-06-16 12:47   ` [Intel-gfx] " kernel test robot
2021-06-16 12:47     ` kernel test robot
2021-06-16 12:47     ` kernel test robot
2021-06-09 21:29 ` [Intel-gfx] [PATCH 5/5] DONOTMERGE: dma-buf: Get rid of dma_fence_get_rcu_safe Jason Ekstrand
2021-06-09 21:29   ` Jason Ekstrand
2021-06-10  6:51   ` [Intel-gfx] " Christian König
2021-06-10  6:51     ` Christian König
2021-06-10 13:59     ` [Intel-gfx] " Jason Ekstrand
2021-06-10 13:59       ` Jason Ekstrand
2021-06-10 15:13       ` [Intel-gfx] " Daniel Vetter
2021-06-10 15:13         ` Daniel Vetter
2021-06-10 16:24         ` [Intel-gfx] " Jason Ekstrand
2021-06-10 16:24           ` Jason Ekstrand
2021-06-10 16:37           ` [Intel-gfx] " Daniel Vetter
2021-06-10 16:37             ` Daniel Vetter
2021-06-10 16:52             ` [Intel-gfx] " Jason Ekstrand
2021-06-10 16:52               ` Jason Ekstrand
2021-06-10 17:06               ` [Intel-gfx] " Daniel Vetter
2021-06-10 17:06                 ` Daniel Vetter
2021-06-10 16:54             ` [Intel-gfx] " Christian König
2021-06-10 16:54               ` Christian König
2021-06-10 17:11               ` [Intel-gfx] " Daniel Vetter
2021-06-10 17:11                 ` Daniel Vetter
2021-06-10 18:12                 ` Christian König
2021-06-10 18:12                   ` [Intel-gfx] " Christian König
2021-06-16 16:38   ` kernel test robot
2021-06-16 16:38     ` kernel test robot
2021-06-16 16:38     ` kernel test robot
2021-06-09 21:52 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for dma-fence, i915: Stop allowing SLAB_TYPESAFE_BY_RCU for dma_fence Patchwork
2021-06-09 21:54 ` [Intel-gfx] ✗ Fi.CI.SPARSE: " Patchwork
2021-06-09 22:22 ` [Intel-gfx] ✗ Fi.CI.BAT: failure " Patchwork
2021-06-09 22:22 ` [Intel-gfx] ✗ Fi.CI.BUILD: warning " Patchwork
2021-06-10  9:29 ` [Intel-gfx] [PATCH 0/5] " Tvrtko Ursulin
2021-06-10  9:29   ` Tvrtko Ursulin
2021-06-10  9:39   ` Christian König
2021-06-10  9:39     ` Christian König
2021-06-10 11:29     ` Daniel Vetter
2021-06-10 11:29       ` Daniel Vetter
2021-06-10 11:53       ` Daniel Vetter
2021-06-10 11:53         ` Daniel Vetter
2021-06-10 13:07       ` Tvrtko Ursulin
2021-06-10 13:07         ` Tvrtko Ursulin
2021-06-10 13:35       ` Jason Ekstrand
2021-06-10 13:35         ` Jason Ekstrand
2021-06-10 20:09         ` Jason Ekstrand [this message]
2021-06-10 20:09           ` Jason Ekstrand
2021-06-10 20:42           ` Daniel Vetter
2021-06-10 20:42             ` Daniel Vetter
2021-06-11  6:55             ` Christian König
2021-06-11  6:55               ` Christian König
2021-06-11  7:20               ` Daniel Vetter
2021-06-11  7:20                 ` Daniel Vetter
2021-06-11  7:42                 ` Christian König
2021-06-11  7:42                   ` Christian König
2021-06-11  9:33                   ` Daniel Vetter
2021-06-11  9:33                     ` Daniel Vetter
2021-06-11 10:03                     ` Christian König
2021-06-11 10:03                       ` Christian König
2021-06-11 15:08                       ` Daniel Vetter
2021-06-11 15:08                         ` Daniel Vetter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAOFGe96KrBfvBKxcUNwths5Sigk7fk7ycLeYbgxutL3msEgfyA@mail.gmail.com \
    --to=jason@jlekstrand.net \
    --cc=airlied@redhat.com \
    --cc=christian.koenig@amd.com \
    --cc=daniel.vetter@ffwll.ch \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=matthew.auld@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.