dri-devel Archive mirror
 help / color / mirror / Atom feed
From: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
To: Friedrich Vock <friedrich.vock@gmx.de>,
	Tvrtko Ursulin <tursulin@igalia.com>,
	amd-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org
Cc: kernel-dev@igalia.com, "Christian König" <christian.koenig@amd.com>
Subject: Re: [RFC 1/5] drm/amdgpu: Fix migration rate limiting accounting
Date: Thu, 9 May 2024 10:19:53 +0100	[thread overview]
Message-ID: <1a2788ef-6969-4f4c-95e9-cf5f2c7e0872@igalia.com> (raw)
In-Reply-To: <146d615c-6eb1-491a-9494-cacb9337f13e@gmx.de>


On 08/05/2024 20:08, Friedrich Vock wrote:
> On 08.05.24 20:09, Tvrtko Ursulin wrote:
>> From: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
>>
>> The logic assumed any migration attempt worked and therefore would over-
>> account the amount of data migrated during buffer re-validation. As a
>> consequence client can be unfairly penalised by incorrectly considering
>> its migration budget spent.
> 
> If the migration failed but data was still moved (which I think could be
> the case when we try evicting everything but it still doesn't work?),
> shouldn't the eviction movements count towards the ratelimit too?

Possibly, which path would that be?

I mean there are definitely more migration which *should not* be counted 
which I think your mini-series approaches more accurately. What this 
patch achieves, in its current RFC form, is reduces the "false-positive" 
migration budget depletions.

So larger improvements aside, point of the series was to illustrate that 
even the things which were said to be working do not seem to. See cover 
letter to see what I thought does not work either well or at all.
>> Fix it by looking at the before and after buffer object backing store and
>> only account if there was a change.
>>
>> FIXME:
>> I think this needs a better solution to account for migrations between
>> VRAM visible and non-visible portions.
> 
> FWIW, I have some WIP patches (not posted on any MLs yet though) that
> attempt to solve this issue (+actually enforcing ratelimits) by moving
> the ratelimit accounting/enforcement to TTM entirely.
> 
> By moving the accounting to TTM we can count moved bytes when we move
> them, and don't have to rely on comparing resources to determine whether
> moving actually happened. This should address your FIXME as well.

Yep, I've seen them. They are not necessarily conflicting with this 
series, potentialy TTM placement flag aside. *If* something like this 
can be kept small and still manage to fix up a few simple things which 
do not appear to work at all at the moment.

For the larger re-work it is quite, well, large and it is not easy to be 
certain the end result would work as expected. IMO it would be best to 
sketch out a larger series which brings some practical and masurable 
change in behaviour before commiting to merge things piecemeal.

For instance I have a niggling feeling the runtime games driver plays 
with placements and domains are not great and wonder if things could be 
cleaner if simplified by letting TTM manage things more, more 
explicitly, and having the list of placements more static. Thinking 
about it seems a step too far for now though.

Regards,

Tvrtko

> 
> Regards,
> Friedrich
> 
>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
>> Cc: Christian König <christian.koenig@amd.com>
>> Cc: Friedrich Vock <friedrich.vock@gmx.de>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 26 +++++++++++++++++++++-----
>>   1 file changed, 21 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>> index ec888fc6ead8..22708954ae68 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>> @@ -784,12 +784,15 @@ static int amdgpu_cs_bo_validate(void *param, 
>> struct amdgpu_bo *bo)
>>           .no_wait_gpu = false,
>>           .resv = bo->tbo.base.resv
>>       };
>> +    struct ttm_resource *old_res;
>>       uint32_t domain;
>>       int r;
>>
>>       if (bo->tbo.pin_count)
>>           return 0;
>>
>> +    old_res = bo->tbo.resource;
>> +
>>       /* Don't move this buffer if we have depleted our allowance
>>        * to move it. Don't move anything if the threshold is zero.
>>        */
>> @@ -817,16 +820,29 @@ static int amdgpu_cs_bo_validate(void *param, 
>> struct amdgpu_bo *bo)
>>       amdgpu_bo_placement_from_domain(bo, domain);
>>       r = ttm_bo_validate(&bo->tbo, &bo->placement, &ctx);
>>
>> -    p->bytes_moved += ctx.bytes_moved;
>> -    if (!amdgpu_gmc_vram_full_visible(&adev->gmc) &&
>> -        amdgpu_res_cpu_visible(adev, bo->tbo.resource))
>> -        p->bytes_moved_vis += ctx.bytes_moved;
>> -
>>       if (unlikely(r == -ENOMEM) && domain != bo->allowed_domains) {
>>           domain = bo->allowed_domains;
>>           goto retry;
>>       }
>>
>> +    if (!r) {
>> +        struct ttm_resource *new_res = bo->tbo.resource;
>> +        bool moved = true;
>> +
>> +        if (old_res == new_res)
>> +            moved = false;
>> +        else if (old_res && new_res &&
>> +             old_res->mem_type == new_res->mem_type)
>> +            moved = false;
>> +
>> +        if (moved) {
>> +            p->bytes_moved += ctx.bytes_moved;
>> +            if (!amdgpu_gmc_vram_full_visible(&adev->gmc) &&
>> +                amdgpu_res_cpu_visible(adev, bo->tbo.resource))
>> +                p->bytes_moved_vis += ctx.bytes_moved;
>> +        }
>> +    }
>> +
>>       return r;
>>   }
>>

  reply	other threads:[~2024-05-09  9:20 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-08 18:09 [RFC 0/5] Discussion around eviction improvements Tvrtko Ursulin
2024-05-08 18:09 ` [RFC 1/5] drm/amdgpu: Fix migration rate limiting accounting Tvrtko Ursulin
2024-05-08 19:08   ` Friedrich Vock
2024-05-09  9:19     ` Tvrtko Ursulin [this message]
2024-05-13 14:36       ` Friedrich Vock
2024-05-15  7:14   ` Christian König
2024-05-15 10:51     ` Tvrtko Ursulin
2024-05-08 18:09 ` [RFC 2/5] drm/amdgpu: Actually respect buffer migration budget Tvrtko Ursulin
2024-05-15  7:20   ` Christian König
2024-05-15 10:59     ` Tvrtko Ursulin
2024-05-15 14:31       ` Christian König
2024-05-15 15:13         ` Tvrtko Ursulin
2024-05-08 18:09 ` [RFC 3/5] drm/ttm: Add preferred placement flag Tvrtko Ursulin
2024-05-08 18:09 ` [RFC 4/5] drm/amdgpu: Use preferred placement for VRAM+GTT Tvrtko Ursulin
2024-05-08 18:09 ` [RFC 5/5] drm/amdgpu: Re-validate evicted buffers Tvrtko Ursulin
2024-05-09 12:40 ` [RFC 0/5] Discussion around eviction improvements Tvrtko Ursulin
2024-05-13 13:49   ` Tvrtko Ursulin
2024-05-14 15:14     ` Tvrtko Ursulin
2024-05-14 15:47       ` Christian König
2024-05-13  6:50 ` Christian König

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1a2788ef-6969-4f4c-95e9-cf5f2c7e0872@igalia.com \
    --to=tvrtko.ursulin@igalia.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=christian.koenig@amd.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=friedrich.vock@gmx.de \
    --cc=kernel-dev@igalia.com \
    --cc=tursulin@igalia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).