Re: [RFC 2/5] drm/amdgpu: Actually respect buffer migration budget

dri-devel Archive mirror
 help / color / mirror / Atom feed

From: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
To: "Christian König" <christian.koenig@amd.com>,
	"Tvrtko Ursulin" <tursulin@igalia.com>,
	amd-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org
Cc: kernel-dev@igalia.com, Friedrich Vock <friedrich.vock@gmx.de>
Subject: Re: [RFC 2/5] drm/amdgpu: Actually respect buffer migration budget
Date: Wed, 15 May 2024 16:13:46 +0100	[thread overview]
Message-ID: <c66a41bd-4511-414e-93fc-d4aa7c36c5b5@igalia.com> (raw)
In-Reply-To: <7a9a2819-a3f0-4a51-8ae8-bde6ff7b0aaa@amd.com>


On 15/05/2024 15:31, Christian König wrote:
> Am 15.05.24 um 12:59 schrieb Tvrtko Ursulin:
>>
>> On 15/05/2024 08:20, Christian König wrote:
>>> Am 08.05.24 um 20:09 schrieb Tvrtko Ursulin:
>>>> From: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
>>>>
>>>> Current code appears to live in a misconception that playing with 
>>>> buffer
>>>> allowed and preferred placements can control the decision on whether
>>>> backing store migration will be attempted or not.
>>>>
>>>> Both from code inspection and from empirical experiments I see that not
>>>> being true, and that both allowed and preferred placement are typically
>>>> set to the same bitmask.
>>>
>>> That's not correct for the use case handled here, but see below.
>>
>> Which part is not correct, that bo->preferred_domains and 
>> bo->allower_domains are the same bitmask?
> 
> Sorry totally forgot to explain that.
> 
> This rate limit here was specially made for OpenGL applications which 
> over commit VRAM. In those case preferred_domains will be VRAM only and 
> allowed_domains will be VRAM|GTT.
> 
> RADV always uses VRAM|GTT for both (which is correct).

Got it, thanks!

>>>> As such, when the code decides to throttle the migration for a 
>>>> client, it
>>>> is in fact not achieving anything. Buffers can still be either 
>>>> migrated or
>>>> not migrated based on the external (to this function and facility) 
>>>> logic.
>>>>
>>>> Fix it by not changing the buffer object placements if the migration
>>>> budget has been spent.
>>>>
>>>> FIXME:
>>>> Is it still required to call validate is the question..
>>>>
>>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
>>>> Cc: Christian König <christian.koenig@amd.com>
>>>> Cc: Friedrich Vock <friedrich.vock@gmx.de>
>>>> ---
>>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 12 +++++++++---
>>>>   1 file changed, 9 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>>> index 22708954ae68..d07a1dd7c880 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>>> @@ -784,6 +784,7 @@ static int amdgpu_cs_bo_validate(void *param, 
>>>> struct amdgpu_bo *bo)
>>>>           .no_wait_gpu = false,
>>>>           .resv = bo->tbo.base.resv
>>>>       };
>>>> +    bool migration_allowed = true;
>>>>       struct ttm_resource *old_res;
>>>>       uint32_t domain;
>>>>       int r;
>>>> @@ -805,19 +806,24 @@ static int amdgpu_cs_bo_validate(void *param, 
>>>> struct amdgpu_bo *bo)
>>>>                * visible VRAM if we've depleted our allowance to do
>>>>                * that.
>>>>                */
>>>> -            if (p->bytes_moved_vis < p->bytes_moved_vis_threshold)
>>>> +            if (p->bytes_moved_vis < p->bytes_moved_vis_threshold) {
>>>>                   domain = bo->preferred_domains;
>>>> -            else
>>>> +            } else {
>>>>                   domain = bo->allowed_domains;
>>>> +                migration_allowed = false;
>>>> +            }
>>>>           } else {
>>>>               domain = bo->preferred_domains;
>>>>           }
>>>>       } else {
>>>>           domain = bo->allowed_domains;
>>>> +        migration_allowed = false;
>>>>       }
>>>>   retry:
>>>> -    amdgpu_bo_placement_from_domain(bo, domain);
>>>> +    if (migration_allowed)
>>>> +        amdgpu_bo_placement_from_domain(bo, domain);
>>>
>>> That's completely invalid. Calling amdgpu_bo_placement_from_domain() 
>>> is a mandatory prerequisite for calling ttm_bo_validate();
>>>
>>> E.g. the usually code fow is:
>>>
>>> /* This initializes bo->placement */
>>> amdgpu_bo_placement_from_domain()
>>>
>>> /* Eventually modify bo->placement to fit special requirements */
>>> ....
>>>
>>> /* Apply the placement to the BO */
>>> ttm_bo_validate(&bo->tbo, &bo->placement, &ctx)
>>>
>>> To sum it up bo->placement should be a variable on the stack instead, 
>>> but we never bothered to clean that up.
>>
>> I am not clear if you agree or not that the current method of trying 
>> to avoid migration doesn't really do anything?
> 
> I totally agree, but the approach you taken to fix it is just quite 
> broken. You can't leave bo->placement uninitialized and expect that 
> ttm_bo_validate() won't move the BO.

Yep, that much was clear, sorry that I did not explicitly acknowledge 
but just moved on to discussing how to fix it properly.

>> On stack placements sounds plausible to force migration avoidance by 
>> putting a single current object placement in that list, if that is 
>> what you have in mind? Or a specialized flag/version of 
>> amdgpu_bo_placement_from_domain with an bool input like 
>> "allow_placement_change"?
> 
> A very rough idea with no guarantee that it actually works:
> 
> Add a TTM_PL_FLAG_RATE_LIMITED with all the TTM code to actually figure 
> out how many bytes have been moved and how many bytes the current 
> operation can move etc...
> 
> Friedrich's patches actually looked like quite a step into the right 
> direction for that already, so I would start from there.
> 
> Then always feed amdgpu_bo_placement_from_domain() with the 
> allowed_domains in the CS path and VM validation.
> 
> Finally extend amdgpu_bo_placement_from_domain() to take a closer look 
> at bo->preferred_domains, similar to how we do for the 
> TTM_PL_FLAG_FALLBACK already and set the TTM_PL_FLAG_RATE_LIMITED flag 
> as appropriate.

Two things which I kind of don't like with the placement flag idea is 
that a) typically two placements are involved in a move so semantics of 
an individual placement being rate limited does not fully fit. Unless we 
view it is a hack to temporarily "de-prioritise" placement. In which 
case I come to b), where the dynamic games with domains and placements 
perhaps feel a bit hacky. Certainly interactions between placement 
selection in ttm_bo_validate (via ttm_resource_compatible) and 
ttm_bo_alloc_resource could be tricky.

Maybe... I will think about it a bit more.

In the meantime what I played with today is the "stack local" single 
placement in amdgpu_cs_bo_validate. If over the migration budget, I find 
the abo->placement matching the current bo->tbo.resource, and pass it 
into ttm_bo_validate instead of the full placement list.

With that patch and also "drm/amdgpu: Re-validate evicted buffers" from 
this series I did three of Assasin's Creed Valhalla on each kernel and 
it does appear to be keeping the migrations lower. It has no effect on 
average fps, but does appear to improve the minimum, 1% low and 0.1% low.

This is on an APU mind you. No idea how that would affect discrete. But 
it is interesting that marking _more_ buffers for re-validations 
combined with throttling migrations correctly, for this use case, it 
seems to do what I expected.

Regards,

Tvrtko

>> Regards,
>>
>> Tvrtko
>>
>>>
>>> Regards,
>>> Christian.
>>>
>>>> +
>>>>       r = ttm_bo_validate(&bo->tbo, &bo->placement, &ctx);
>>>>       if (unlikely(r == -ENOMEM) && domain != bo->allowed_domains) {
>>>
>

next prev parent reply	other threads:[~2024-05-15 15:13 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-08 18:09 [RFC 0/5] Discussion around eviction improvements Tvrtko Ursulin
2024-05-08 18:09 ` [RFC 1/5] drm/amdgpu: Fix migration rate limiting accounting Tvrtko Ursulin
2024-05-08 19:08   ` Friedrich Vock
2024-05-09  9:19     ` Tvrtko Ursulin
2024-05-13 14:36       ` Friedrich Vock
2024-05-15  7:14   ` Christian König
2024-05-15 10:51     ` Tvrtko Ursulin
2024-05-08 18:09 ` [RFC 2/5] drm/amdgpu: Actually respect buffer migration budget Tvrtko Ursulin
2024-05-15  7:20   ` Christian König
2024-05-15 10:59     ` Tvrtko Ursulin
2024-05-15 14:31       ` Christian König
2024-05-15 15:13         ` Tvrtko Ursulin [this message]
2024-05-08 18:09 ` [RFC 3/5] drm/ttm: Add preferred placement flag Tvrtko Ursulin
2024-05-08 18:09 ` [RFC 4/5] drm/amdgpu: Use preferred placement for VRAM+GTT Tvrtko Ursulin
2024-05-08 18:09 ` [RFC 5/5] drm/amdgpu: Re-validate evicted buffers Tvrtko Ursulin
2024-05-09 12:40 ` [RFC 0/5] Discussion around eviction improvements Tvrtko Ursulin
2024-05-13 13:49   ` Tvrtko Ursulin
2024-05-14 15:14     ` Tvrtko Ursulin
2024-05-14 15:47       ` Christian König
2024-05-13  6:50 ` Christian König

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c66a41bd-4511-414e-93fc-d4aa7c36c5b5@igalia.com \
    --to=tvrtko.ursulin@igalia.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=christian.koenig@amd.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=friedrich.vock@gmx.de \
    --cc=kernel-dev@igalia.com \
    --cc=tursulin@igalia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).