Re: [PATCH] Revert "drm/i915: Remove extra multi-gt pm-references"

dri-devel Archive mirror
 help / color / mirror / Atom feed

From: Nirmoy Das <nirmoy.das@linux.intel.com>
To: Rodrigo Vivi <rodrigo.vivi@intel.com>,
	Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
Cc: intel-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org,
	Jani Nikula <jani.nikula@linux.intel.com>,
	Joonas Lahtinen <joonas.lahtinen@linux.intel.com>,
	Tvrtko Ursulin <tursulin@ursulin.net>,
	Andi Shyti <andi.shyti@linux.intel.com>,
	Nirmoy Das <nirmoy.das@intel.com>,
	Chris Wilson <chris.p.wilson@linux.intel.com>
Subject: Re: [PATCH] Revert "drm/i915: Remove extra multi-gt pm-references"
Date: Tue, 7 May 2024 23:27:15 +0200	[thread overview]
Message-ID: <a95d0897-7e23-447e-9d97-4db97f82af06@linux.intel.com> (raw)
In-Reply-To: <Zjpgga6ODnpmzeB9@intel.com>


On 5/7/2024 7:10 PM, Rodrigo Vivi wrote:
> On Tue, May 07, 2024 at 10:54:11AM +0200, Janusz Krzysztofik wrote:
>> On Tuesday, 7 May 2024 09:30:15 GMT+2 Nirmoy Das wrote:
>>> Hi Janusz,
>>>
>>>
>>> Just realized we need Fixes tag for this.
>>>
>>> Fixes: 1f33dc0c1189 ("drm/i915: Remove extra multi-gt pm-references")
>> Whoever is going to push this patch, please feel free to add this tag.
> dim b4-shazam gets that automagically, now it was sent in reply ;)
Nice!
>
> I just pushed the patch. thanks for the patch and reviews.


Thanks,

Nirmoy

>
>> Thanks,
>> Janusz
>>
>>>
>>> Regards,
>>>
>>> Nirmoy
>>>
>>> On 5/6/2024 8:02 PM, Janusz Krzysztofik wrote:
>>>> This reverts commit 1f33dc0c1189efb9ae19c6fc22b64dd3e26261fb.
>>>>
>>>> There was a patch supposed to fix an issue of illegal attempts to free a
>>>> still active i915 VMA object when parking a GT believed to be idle,
>>>> reported by CI on 2-GT Meteor Lake.  As a solution, an extra wakeref for
>>>> a Primary GT was acquired from i915_gem_do_execbuffer() -- see commit
>>>> f56fe3e91787 ("drm/i915: Fix a VMA UAF for multi-gt platform").
>>>>
>>>> However, that fix occurred insufficient -- the issue was still reported by
>>>> CI.  That wakeref was released on exit from i915_gem_do_execbuffer(), then
>>>> potentially before completion of the request and deactivation of its
>>>> associated VMAs.  Moreover, CI reports indicated that single-GT platforms
>>>> also suffered sporadically from the same race.
>>>>
>>>> Since that issue was fixed by another commit f3c71b2ded5c ("drm/i915/vma:
>>>> Fix UAF on destroy against retire race"), the changes introduced by that
>>>> insufficient fix were dropped as no longer useful.  However, that series
>>>> resulted in another VMA UAF scenario now being triggered in CI.
>>>>
>>>> <4> [260.290809] ------------[ cut here ]------------
>>>> <4> [260.290988] list_del corruption. prev->next should be ffff888118c5d990, but was ffff888118c5a510. (prev=ffff888118c5a510)
>>>> <4> [260.291004] WARNING: CPU: 2 PID: 1143 at lib/list_debug.c:62 __list_del_entry_valid_or_report+0xb7/0xe0
>>>> ..
>>>> <4> [260.291055] CPU: 2 PID: 1143 Comm: kms_plane Not tainted 6.9.0-rc2-CI_DRM_14524-ga25d180c6853+ #1
>>>> <4> [260.291058] Hardware name: Intel Corporation Meteor Lake Client Platform/MTL-P LP5x T3 RVP, BIOS MTLPFWI1.R00.3471.D91.2401310918 01/31/2024
>>>> <4> [260.291060] RIP: 0010:__list_del_entry_valid_or_report+0xb7/0xe0
>>>> ...
>>>> <4> [260.291087] Call Trace:
>>>> <4> [260.291089]  <TASK>
>>>> <4> [260.291124]  i915_vma_reopen+0x43/0x80 [i915]
>>>> <4> [260.291298]  eb_lookup_vmas+0x9cb/0xcc0 [i915]
>>>> <4> [260.291579]  i915_gem_do_execbuffer+0xc9a/0x26d0 [i915]
>>>> <4> [260.291883]  i915_gem_execbuffer2_ioctl+0x123/0x2a0 [i915]
>>>> ...
>>>> <4> [260.292301]  </TASK>
>>>> ...
>>>> <4> [260.292506] ---[ end trace 0000000000000000 ]---
>>>> <4> [260.292782] general protection fault, probably for non-canonical address 0x6b6b6b6b6b6b6ca3: 0000 [#1] PREEMPT SMP NOPTI
>>>> <4> [260.303575] CPU: 2 PID: 1143 Comm: kms_plane Tainted: G        W          6.9.0-rc2-CI_DRM_14524-ga25d180c6853+ #1
>>>> <4> [260.313851] Hardware name: Intel Corporation Meteor Lake Client Platform/MTL-P LP5x T3 RVP, BIOS MTLPFWI1.R00.3471.D91.2401310918 01/31/2024
>>>> <4> [260.326359] RIP: 0010:eb_validate_vmas+0x114/0xd80 [i915]
>>>> ...
>>>> <4> [260.428756] Call Trace:
>>>> <4> [260.431192]  <TASK>
>>>> <4> [639.283393]  i915_gem_do_execbuffer+0xd05/0x26d0 [i915]
>>>> <4> [639.305245]  i915_gem_execbuffer2_ioctl+0x123/0x2a0 [i915]
>>>> ...
>>>> <4> [639.411134]  </TASK>
>>>> ...
>>>> <4> [639.449979] ---[ end trace 0000000000000000 ]---
>>>>
>>>> We defer actually closing, unbinding and destroying a VMA until next idle
>>>> point, or until the object is freed in the meantime.  By postponing the
>>>> unbind, we allow for the VMA to be reopened by the client, avoiding the
>>>> work required to rebind the VMA.
>>>>
>>>> Starting from commit b0647a5e79b1 ("drm/i915: Avoid live-lock with
>>>> i915_vma_parked()"), we assume that as long as a GT is held idle, no VMA
>>>> would be reopened while we destroy them.  That assumption is no longer
>>>> true in multi-GT configurations, where a VMA we reopen may be handled by a
>>>> GT different from the one that we already keep active via its engine while
>>>> we set up an execbuf request.
>>>>
>>>> Restoring the extra GT0 PM wakeref removed from i915_gem_do_execbuffer()
>>>> processing path seems to fix this issue.
>>>>
>>>> Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/10608
>>>> Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
>>>> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
>>>> Cc: Nirmoy Das <nirmoy.das@linux.intel.com>
>>>> ---
>>>>    drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c | 18 ++++++++++++++++++
>>>>    1 file changed, 18 insertions(+)
>>>>
>>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
>>>> index 42619fc05de48..090724fa766c9 100644
>>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
>>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
>>>> @@ -255,6 +255,7 @@ struct i915_execbuffer {
>>>>    	struct intel_context *context; /* logical state for the request */
>>>>    	struct i915_gem_context *gem_context; /** caller's context */
>>>>    	intel_wakeref_t wakeref;
>>>> +	intel_wakeref_t wakeref_gt0;
>>>>    
>>>>    	/** our requests to build */
>>>>    	struct i915_request *requests[MAX_ENGINE_INSTANCE + 1];
>>>> @@ -2685,6 +2686,7 @@ static int
>>>>    eb_select_engine(struct i915_execbuffer *eb)
>>>>    {
>>>>    	struct intel_context *ce, *child;
>>>> +	struct intel_gt *gt;
>>>>    	unsigned int idx;
>>>>    	int err;
>>>>    
>>>> @@ -2708,10 +2710,17 @@ eb_select_engine(struct i915_execbuffer *eb)
>>>>    		}
>>>>    	}
>>>>    	eb->num_batches = ce->parallel.number_children + 1;
>>>> +	gt = ce->engine->gt;
>>>>    
>>>>    	for_each_child(ce, child)
>>>>    		intel_context_get(child);
>>>>    	eb->wakeref = intel_gt_pm_get(ce->engine->gt);
>>>> +	/*
>>>> +	 * Keep GT0 active on MTL so that i915_vma_parked() doesn't
>>>> +	 * free VMAs while execbuf ioctl is validating VMAs.
>>>> +	 */
>>>> +	if (gt->info.id)
>>>> +		eb->wakeref_gt0 = intel_gt_pm_get(to_gt(gt->i915));
>>>>    
>>>>    	if (!test_bit(CONTEXT_ALLOC_BIT, &ce->flags)) {
>>>>    		err = intel_context_alloc_state(ce);
>>>> @@ -2750,6 +2759,9 @@ eb_select_engine(struct i915_execbuffer *eb)
>>>>    	return err;
>>>>    
>>>>    err:
>>>> +	if (gt->info.id)
>>>> +		intel_gt_pm_put(to_gt(gt->i915), eb->wakeref_gt0);
>>>> +
>>>>    	intel_gt_pm_put(ce->engine->gt, eb->wakeref);
>>>>    	for_each_child(ce, child)
>>>>    		intel_context_put(child);
>>>> @@ -2763,6 +2775,12 @@ eb_put_engine(struct i915_execbuffer *eb)
>>>>    	struct intel_context *child;
>>>>    
>>>>    	i915_vm_put(eb->context->vm);
>>>> +	/*
>>>> +	 * This works in conjunction with eb_select_engine() to prevent
>>>> +	 * i915_vma_parked() from interfering while execbuf validates vmas.
>>>> +	 */
>>>> +	if (eb->gt->info.id)
>>>> +		intel_gt_pm_put(to_gt(eb->gt->i915), eb->wakeref_gt0);
>>>>    	intel_gt_pm_put(eb->context->engine->gt, eb->wakeref);
>>>>    	for_each_child(eb->context, child)
>>>>    		intel_context_put(child);
>>
>>
>>

     prev parent reply	other threads:[~2024-05-07 21:27 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-06 18:02 [PATCH] Revert "drm/i915: Remove extra multi-gt pm-references" Janusz Krzysztofik
2024-05-06 18:32 ` Nirmoy Das
2024-05-07  7:30 ` Nirmoy Das
2024-05-07  8:54   ` Janusz Krzysztofik
2024-05-07 17:10     ` Rodrigo Vivi
2024-05-07 21:27       ` Nirmoy Das [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a95d0897-7e23-447e-9d97-4db97f82af06@linux.intel.com \
    --to=nirmoy.das@linux.intel.com \
    --cc=andi.shyti@linux.intel.com \
    --cc=chris.p.wilson@linux.intel.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=jani.nikula@linux.intel.com \
    --cc=janusz.krzysztofik@linux.intel.com \
    --cc=joonas.lahtinen@linux.intel.com \
    --cc=nirmoy.das@intel.com \
    --cc=rodrigo.vivi@intel.com \
    --cc=tursulin@ursulin.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).