dri-devel Archive mirror
 help / color / mirror / Atom feed
From: "Christian König" <christian.koenig@amd.com>
To: "Daniel Vetter" <daniel.vetter@ffwll.ch>,
	"Thomas Hellström" <thomas.hellstrom@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>,
	intel-gfx <intel-gfx@lists.freedesktop.org>,
	dri-devel <dri-devel@lists.freedesktop.org>,
	Matthew Auld <matthew.auld@intel.com>,
	Jason Ekstrand <jason@jlekstrand.net>,
	Dave Airlie <airlied@redhat.com>
Subject: Re: [Intel-gfx] [PATCH 0/5] dma-fence, i915: Stop allowing SLAB_TYPESAFE_BY_RCU for dma_fence
Date: Fri, 11 Jun 2021 09:42:07 +0200	[thread overview]
Message-ID: <b475e546-02d5-bacf-8764-233efd784ba0@amd.com> (raw)
In-Reply-To: <CAKMK7uHhL3kepoaznCvAsx8H20sBjWQZgsnWY+zm63KgfCA4CQ@mail.gmail.com>

Am 11.06.21 um 09:20 schrieb Daniel Vetter:
> On Fri, Jun 11, 2021 at 8:55 AM Christian König
> <christian.koenig@amd.com> wrote:
>> Am 10.06.21 um 22:42 schrieb Daniel Vetter:
>>> On Thu, Jun 10, 2021 at 10:10 PM Jason Ekstrand <jason@jlekstrand.net> wrote:
>>>> On Thu, Jun 10, 2021 at 8:35 AM Jason Ekstrand <jason@jlekstrand.net> wrote:
>>>>> On Thu, Jun 10, 2021 at 6:30 AM Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
>>>>>> On Thu, Jun 10, 2021 at 11:39 AM Christian König
>>>>>> <christian.koenig@amd.com> wrote:
>>>>>>> Am 10.06.21 um 11:29 schrieb Tvrtko Ursulin:
>>>>>>>> On 09/06/2021 22:29, Jason Ekstrand wrote:
>>>>>>>>> We've tried to keep it somewhat contained by doing most of the hard work
>>>>>>>>> to prevent access of recycled objects via dma_fence_get_rcu_safe().
>>>>>>>>> However, a quick grep of kernel sources says that, of the 30 instances
>>>>>>>>> of dma_fence_get_rcu*, only 11 of them use dma_fence_get_rcu_safe().
>>>>>>>>> It's likely there bear traps in DRM and related subsystems just waiting
>>>>>>>>> for someone to accidentally step in them.
>>>>>>>> ...because dma_fence_get_rcu_safe apears to be about whether the
>>>>>>>> *pointer* to the fence itself is rcu protected, not about the fence
>>>>>>>> object itself.
>>>>>>> Yes, exactly that.
>>>>> The fact that both of you think this either means that I've completely
>>>>> missed what's going on with RCUs here (possible but, in this case, I
>>>>> think unlikely) or RCUs on dma fences should scare us all.
>>>> Taking a step back for a second and ignoring SLAB_TYPESAFE_BY_RCU as
>>>> such,  I'd like to ask a slightly different question:  What are the
>>>> rules about what is allowed to be done under the RCU read lock and
>>>> what guarantees does a driver need to provide?
>>>>
>>>> I think so far that we've all agreed on the following:
>>>>
>>>>    1. Freeing an unsignaled fence is ok as long as it doesn't have any
>>>> pending callbacks.  (Callbacks should hold a reference anyway).
>>>>
>>>>    2. The pointer race solved by dma_fence_get_rcu_safe is real and
>>>> requires the loop to sort out.
>>>>
>>>> But let's say I have a dma_fence pointer that I got from, say, calling
>>>> dma_resv_excl_fence() under rcu_read_lock().  What am I allowed to do
>>>> with it under the RCU lock?  What assumptions can I make?  Is this
>>>> code, for instance, ok?
>>>>
>>>> rcu_read_lock();
>>>> fence = dma_resv_excl_fence(obj);
>>>> idle = !fence || test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags);
>>>> rcu_read_unlock();
>>>>
>>>> This code very much looks correct under the following assumptions:
>>>>
>>>>    1. A valid fence pointer stays alive under the RCU read lock
>>>>    2. SIGNALED_BIT is set-once (it's never unset after being set).
>>>>
>>>> However, if it were, we wouldn't have dma_resv_test_singnaled(), now
>>>> would we? :-)
>>>>
>>>> The moment you introduce ANY dma_fence recycling that recycles a
>>>> dma_fence within a single RCU grace period, all your assumptions break
>>>> down.  SLAB_TYPESAFE_BY_RCU is just one way that i915 does this.  We
>>>> also have a little i915_request recycler to try and help with memory
>>>> pressure scenarios in certain critical sections that also doesn't
>>>> respect RCU grace periods.  And, as mentioned multiple times, our
>>>> recycling leaks into every other driver because, thanks to i915's
>>>> choice, the above 4-line code snippet isn't valid ANYWHERE in the
>>>> kernel.
>>>>
>>>> So the question I'm raising isn't so much about the rules today.
>>>> Today, we live in the wild wild west where everything is YOLO.  But
>>>> where do we want to go?  Do we like this wild west world?  So we want
>>>> more consistency under the RCU read lock?  If so, what do we want the
>>>> rules to be?
>>>>
>>>> One option would be to accept the wild-west world we live in and say
>>>> "The RCU read lock gains you nothing.  If you want to touch the guts
>>>> of a dma_fence, take a reference".  But, at that point, we're eating
>>>> two atomics for every time someone wants to look at a dma_fence.  Do
>>>> we want that?
>>>>
>>>> Alternatively, and this what I think Daniel and I were trying to
>>>> propose here, is that we place some constraints on dma_fence
>>>> recycling.  Specifically that, under the RCU read lock, the fence
>>>> doesn't suddenly become a new fence.  All of the immutability and
>>>> once-mutability guarantees of various bits of dma_fence hold as long
>>>> as you have the RCU read lock.
>>> Yeah this is suboptimal. Too many potential bugs, not enough benefits.
>>>
>>> This entire __rcu business started so that there would be a lockless
>>> way to get at fences, or at least the exclusive one. That did not
>>> really pan out. I think we have a few options:
>>>
>>> - drop the idea of rcu/lockless dma-fence access outright. A quick
>>> sequence of grabbing the lock, acquiring the dma_fence and then
>>> dropping your lock again is probably plenty good. There's a lot of
>>> call_rcu and other stuff we could probably delete. I have no idea what
>>> the perf impact across all the drivers would be.
>> The question is maybe not the perf impact, but rather if that is
>> possible over all.
>>
>> IIRC we now have some cases in TTM where RCU is mandatory and we simply
>> don't have any other choice than using it.
> Adding Thomas Hellstrom.
>
> Where is that stuff? If we end up with all the dma_resv locking
> complexity just for an oddball, then I think that would be rather big
> bummer.

This is during buffer destruction. See the call to dma_resv_copy_fences().

But that is basically just using a dma_resv function which accesses the 
object without taking a lock.

>>> - try to make all drivers follow some stricter rules. The trouble is
>>> that at least with radeon dma_fence callbacks aren't even very
>>> reliable (that's why it has its own dma_fence_wait implementation), so
>>> things are wobbly anyway.
>>>
>>> - live with the current situation, but radically delete all unsafe
>>> interfaces. I.e. nothing is allowed to directly deref an rcu fence
>>> pointer, everything goes through dma_fence_get_rcu_safe. The
>>> kref_get_unless_zero would become an internal implementation detail.
>>> Our "fast" and "lockless" dma_resv fence access stays a pile of
>>> seqlock, retry loop and an a conditional atomic inc + atomic dec. The
>>> only thing that's slightly faster would be dma_resv_test_signaled()
>>>
>>> - I guess minimally we should rename dma_fence_get_rcu to
>>> dma_fence_tryget. It has nothing to do with rcu really, and the use is
>>> very, very limited.
>> I think what we should do is to use RCU internally in the dma_resv
>> object but disallow drivers/frameworks to mess with that directly.
>>
>> In other words drivers should use one of the following:
>> 1. dma_resv_wait_timeout()
>> 2. dma_resv_test_signaled()
>> 3. dma_resv_copy_fences()
>> 4. dma_resv_get_fences()
>> 5. dma_resv_for_each_fence() <- to be implemented
>> 6. dma_resv_for_each_fence_unlocked() <- to be implemented
>>
>> Inside those functions we then make sure that we only save ways of
>> accessing the RCU protected data structures.
>>
>> This way we only need to make sure that those accessor functions are
>> sane and don't need to audit every driver individually.
> Yeah better encapsulation for dma_resv sounds like a good thing, least
> for all the other issues we've been discussing recently. I guess your
> list is also missing the various "add/replace some more fences"
> functions, but we have them already.
>
>> I can tackle implementing for the dma_res_for_each_fence()/_unlocked().
>> Already got a large bunch of that coded out anyway.
> When/where do we need ot iterate over fences unlocked? Given how much
> pain it is to get a consistent snapshot of the fences or fence state
> (I've read  the dma-buf poll implementation, and it looks a bit buggy
> in that regard, but not sure, just as an example) and unlocked
> iterator sounds very dangerous to me.

This is to make implementation of the other functions easier. Currently 
they basically each roll their own loop implementation which at least 
for dma_resv_test_signaled() looks a bit questionable to me.

Additionally to those we we have one more case in i915 and the unlocked 
polling implementation which I agree is a bit questionable as well.

My idea is to have the problematic logic in the iterator and only give 
back fence which have a reference and are 100% sure the right one.

Probably best if I show some code around to explain what I mean.

Regards,
Christian.

> -Daniel


  reply	other threads:[~2021-06-11  7:42 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-09 21:29 [PATCH 0/5] dma-fence, i915: Stop allowing SLAB_TYPESAFE_BY_RCU for dma_fence Jason Ekstrand
2021-06-09 21:29 ` [PATCH 1/5] drm/i915: Move intel_engine_free_request_pool to i915_request.c Jason Ekstrand
2021-06-10 10:03   ` [Intel-gfx] " Tvrtko Ursulin
2021-06-10 13:57     ` Jason Ekstrand
2021-06-10 15:07       ` Tvrtko Ursulin
2021-06-10 16:32         ` Jason Ekstrand
2021-06-09 21:29 ` [PATCH 2/5] drm/i915: Use a simpler scheme for caching i915_request Jason Ekstrand
2021-06-10 10:08   ` [Intel-gfx] " Tvrtko Ursulin
2021-06-10 13:50     ` Jason Ekstrand
2021-06-09 21:29 ` [PATCH 3/5] drm/i915: Stop using SLAB_TYPESAFE_BY_RCU for i915_request Jason Ekstrand
2021-06-09 21:29 ` [PATCH 4/5] dma-buf: Stop using SLAB_TYPESAFE_BY_RCU in selftests Jason Ekstrand
2021-06-16 12:47   ` [Intel-gfx] " kernel test robot
2021-06-09 21:29 ` [PATCH 5/5] DONOTMERGE: dma-buf: Get rid of dma_fence_get_rcu_safe Jason Ekstrand
2021-06-10  6:51   ` Christian König
2021-06-10 13:59     ` Jason Ekstrand
2021-06-10 15:13       ` Daniel Vetter
2021-06-10 16:24         ` Jason Ekstrand
2021-06-10 16:37           ` Daniel Vetter
2021-06-10 16:52             ` Jason Ekstrand
2021-06-10 17:06               ` Daniel Vetter
2021-06-10 16:54             ` Christian König
2021-06-10 17:11               ` Daniel Vetter
2021-06-10 18:12                 ` Christian König
2021-06-16 16:38   ` [Intel-gfx] " kernel test robot
2021-06-10  9:29 ` [Intel-gfx] [PATCH 0/5] dma-fence, i915: Stop allowing SLAB_TYPESAFE_BY_RCU for dma_fence Tvrtko Ursulin
2021-06-10  9:39   ` Christian König
2021-06-10 11:29     ` Daniel Vetter
2021-06-10 11:53       ` Daniel Vetter
2021-06-10 13:07       ` Tvrtko Ursulin
2021-06-10 13:35       ` Jason Ekstrand
2021-06-10 20:09         ` Jason Ekstrand
2021-06-10 20:42           ` Daniel Vetter
2021-06-11  6:55             ` Christian König
2021-06-11  7:20               ` Daniel Vetter
2021-06-11  7:42                 ` Christian König [this message]
2021-06-11  9:33                   ` Daniel Vetter
2021-06-11 10:03                     ` Christian König
2021-06-11 15:08                       ` Daniel Vetter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b475e546-02d5-bacf-8764-233efd784ba0@amd.com \
    --to=christian.koenig@amd.com \
    --cc=airlied@redhat.com \
    --cc=daniel.vetter@ffwll.ch \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=jason@jlekstrand.net \
    --cc=matthew.auld@intel.com \
    --cc=thomas.hellstrom@linux.intel.com \
    --cc=tvrtko.ursulin@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).