From: "Christian König" <christian.koenig@amd.com>
To: Tvrtko Ursulin <tursulin@ursulin.net>,
Tvrtko Ursulin <tursulin@igalia.com>,
amd-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org
Cc: kernel-dev@igalia.com, Tvrtko Ursulin <tvrtko.ursulin@igalia.com>,
Friedrich Vock <friedrich.vock@gmx.de>
Subject: Re: [RFC 0/5] Discussion around eviction improvements
Date: Tue, 14 May 2024 17:47:32 +0200 [thread overview]
Message-ID: <fc645b96-08cf-44bb-9fed-855ce537b8e7@amd.com> (raw)
In-Reply-To: <d48740a3-ea97-447f-9103-c4bb30194971@ursulin.net>
Am 14.05.24 um 17:14 schrieb Tvrtko Ursulin:
>
> On 13/05/2024 14:49, Tvrtko Ursulin wrote:
>>
>> On 09/05/2024 13:40, Tvrtko Ursulin wrote:
>>>
>>> On 08/05/2024 19:09, Tvrtko Ursulin wrote:
>>>> From: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
>>>>
>>>> Last few days I was looking at the situation with VRAM over
>>>> subscription, what
>>>> happens versus what perhaps should happen. Browsing through the
>>>> driver and
>>>> running some simple experiments.
>>>>
>>>> I ended up with this patch series which, as a disclaimer, may be
>>>> completely
>>>> wrong but as I found some suspicious things, to me at least, I
>>>> thought it was a
>>>> good point to stop and request some comments.
>>>>
>>>> To perhaps summarise what are the main issues I think I found:
>>>>
>>>> * Migration rate limiting does not bother knowing if actual
>>>> migration happened
>>>> and so can over-account and unfairly penalise.
>>>>
>>>> * Migration rate limiting does not even work, at least not for
>>>> the common case
>>>> where userspace configures VRAM+GTT. It thinks it can stop
>>>> migration attempts
>>>> by playing with bo->allowed_domains vs bo->preferred domains
>>>> but, both from
>>>> the code, and from empirical experiments, I see that not
>>>> working at all. Both
>>>> masks are identical so fiddling with them achieves nothing.
>>>>
>>>> * Idea of the fallback placement only works when VRAM has free
>>>> space. As soon
>>>> as it does not, ttm_resource_compatible is happy to leave the
>>>> buffers in the
>>>> secondary placement forever.
>>>>
>>>> * Driver thinks it will be re-validating evicted buffers on the
>>>> next submission
>>>> but it does not for the very common case of VRAM+GTT because it
>>>> only checks
>>>> if current placement is *none* of the preferred placements.
>>>>
>>>> All those problems are addressed in individual patches.
>>>>
>>>> End result of this series appears to be driver which will try
>>>> harder to move
>>>> buffers back into VRAM, but will be (more) correctly throttled in
>>>> doing so by
>>>> the existing rate limiting logic.
>>>>
>>>> I have run a quick benchmark of Cyberpunk 2077 and cannot say that
>>>> I saw a
>>>> change but that could be a good thing too. At least I did not break
>>>> anything,
>>>> perhaps.. On one occassion I did see the rate limiting logic get
>>>> confused while
>>>> for a period of few minutes it went to a mode where it was
>>>> constantly giving a
>>>> high migration budget. But that recovered itself when I switched
>>>> clients and did
>>>> not come back so I don't know. If there is something wrong there I
>>>> don't think
>>>> it would be caused by any patches in this series.
>>>
>>> Since yesterday I also briefly tested with Far Cry New Dawn. One run
>>> each so possibly doesn't mean anything apart that there isn't a
>>> regression aka migration throttling is keeping things at bay even
>>> with increased requests to migrate things back to VRAM:
>>>
>>> before after
>>> min/avg/max fps 36/44/54 37/45/55
>>>
>>> Cyberpunk 2077 from yesterday was similarly close:
>>>
>>> 26.96/29.59/30.40 29.70/30.00/30.32
>>>
>>> I guess the real story is proper DGPU where misplaced buffers have a
>>> real cost.
>>
>> I found one game which regresses spectacularly badly with this series
>> - Assasin's Creed Valhalla. The built-in benchmark at least. The game
>> appears to have a working set much larger than the other games I
>> tested, around 5GiB total during the benchmark. And for some reason
>> migration throttling totally fails to put it in check. I will be
>> investigating this shortly.
>
> I think that the conclusion is everything I attempted to add relating
> to TTM_PL_PREFERRED does not really work as I initially thought it
> did. Therefore please imagine this series as only containing patches
> 1, 2 and 5.
Noted (and I had just started to wrap my head around that idea).
>
> (And FWIW it was quite annoying to get to the bottom of since for some
> reason the system exibits some sort of a latching behaviour, where on
> some boots and/or some minutes of runtime things were fine, and then
> it would latch onto a mode where the TTM_PL_PREFERRED induced breakage
> would show. And sometimes this breakage would appear straight away. Odd.)
Welcome to my world. You improve one use case and four other get a
penalty. Even when you know the code and potential use cases inside out
it's really hard to predict how some applications and the core memory
management behave sometimes.
>
> I still need to test though if the subset of patches manage to achieve
> some positive improvement on their own. It is possible, as patch 5
> marks more buffers for re-validation so once overcommit subsides they
> would get promoted to preferred placement straight away. And 1&2 are
> notionally fixes for migration throttling so at least in broad sense
> should be still valid as discussion points.
Yeah, especially 5 kind of makes sense but could potentially lead to
higher overhead. Need to see how we can better handle that.
Regards,
Christian.
>
> Regards,
>
> Tvrtko
>
>>>> Series is probably rough but should be good enough for dicsussion.
>>>> I am curious
>>>> to hear if I identified at least something correctly as a real
>>>> problem.
>>>>
>>>> It would also be good to hear what are the suggested games to check
>>>> and see
>>>> whether there is any improvement.
>>>>
>>>> Cc: Christian König <christian.koenig@amd.com>
>>>> Cc: Friedrich Vock <friedrich.vock@gmx.de>
>>>>
>>>> Tvrtko Ursulin (5):
>>>> drm/amdgpu: Fix migration rate limiting accounting
>>>> drm/amdgpu: Actually respect buffer migration budget
>>>> drm/ttm: Add preferred placement flag
>>>> drm/amdgpu: Use preferred placement for VRAM+GTT
>>>> drm/amdgpu: Re-validate evicted buffers
>>>>
>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 38
>>>> +++++++++++++++++-----
>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 8 +++--
>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 21 ++++++++++--
>>>> drivers/gpu/drm/ttm/ttm_resource.c | 13 +++++---
>>>> include/drm/ttm/ttm_placement.h | 3 ++
>>>> 5 files changed, 65 insertions(+), 18 deletions(-)
>>>>
next prev parent reply other threads:[~2024-05-14 15:47 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-05-08 18:09 [RFC 0/5] Discussion around eviction improvements Tvrtko Ursulin
2024-05-08 18:09 ` [RFC 1/5] drm/amdgpu: Fix migration rate limiting accounting Tvrtko Ursulin
2024-05-08 19:08 ` Friedrich Vock
2024-05-09 9:19 ` Tvrtko Ursulin
2024-05-13 14:36 ` Friedrich Vock
2024-05-15 7:14 ` Christian König
2024-05-15 10:51 ` Tvrtko Ursulin
2024-05-08 18:09 ` [RFC 2/5] drm/amdgpu: Actually respect buffer migration budget Tvrtko Ursulin
2024-05-15 7:20 ` Christian König
2024-05-15 10:59 ` Tvrtko Ursulin
2024-05-15 14:31 ` Christian König
2024-05-15 15:13 ` Tvrtko Ursulin
2024-05-08 18:09 ` [RFC 3/5] drm/ttm: Add preferred placement flag Tvrtko Ursulin
2024-05-08 18:09 ` [RFC 4/5] drm/amdgpu: Use preferred placement for VRAM+GTT Tvrtko Ursulin
2024-05-08 18:09 ` [RFC 5/5] drm/amdgpu: Re-validate evicted buffers Tvrtko Ursulin
2024-05-09 12:40 ` [RFC 0/5] Discussion around eviction improvements Tvrtko Ursulin
2024-05-13 13:49 ` Tvrtko Ursulin
2024-05-14 15:14 ` Tvrtko Ursulin
2024-05-14 15:47 ` Christian König [this message]
2024-05-13 6:50 ` Christian König
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=fc645b96-08cf-44bb-9fed-855ce537b8e7@amd.com \
--to=christian.koenig@amd.com \
--cc=amd-gfx@lists.freedesktop.org \
--cc=dri-devel@lists.freedesktop.org \
--cc=friedrich.vock@gmx.de \
--cc=kernel-dev@igalia.com \
--cc=tursulin@igalia.com \
--cc=tursulin@ursulin.net \
--cc=tvrtko.ursulin@igalia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).