From: "Marek Olšák" <maraeo@gmail.com>
To: Friedrich Vock <friedrich.vock@gmx.de>
Cc: dri-devel@lists.freedesktop.org, amd-gfx@lists.freedesktop.org,
"Pierre-Loup Griffais" <pgriffais@valvesoftware.com>,
"Tvrtko Ursulin" <tvrtko.ursulin@igalia.com>,
"Bas Nieuwenhuizen" <bas@basnieuwenhuizen.nl>,
"Joshua Ashton" <joshua@froggi.es>,
"Christian König" <christian.koenig@amd.com>,
"Alex Deucher" <alexander.deucher@amd.com>
Subject: Re: [RFC PATCH 00/18] TTM interface for managing VRAM oversubscription
Date: Thu, 25 Apr 2024 09:22:55 -0400 [thread overview]
Message-ID: <CAAxE2A6u1sf=dzhLcp3W6Pb=pve9RP3UpEsAYfqMuyk0Mc053A@mail.gmail.com> (raw)
In-Reply-To: <20240424165937.54759-1-friedrich.vock@gmx.de>
The most extreme ping-ponging is mitigated by throttling buffer moves
in the kernel, but it only works without VM_ALWAYS_VALID and you can
set BO priorities in the BO list. A better approach that works with
VM_ALWAYS_VALID would be nice.
Marek
On Wed, Apr 24, 2024 at 1:12 PM Friedrich Vock <friedrich.vock@gmx.de> wrote:
>
> Hi everyone,
>
> recently I've been looking into remedies for apps (in particular, newer
> games) that experience significant performance loss when they start to
> hit VRAM limits, especially on older or lower-end cards that struggle
> to fit both desktop apps and all the game data into VRAM at once.
>
> The root of the problem lies in the fact that from userspace's POV,
> buffer eviction is very opaque: Userspace applications/drivers cannot
> tell how oversubscribed VRAM is, nor do they have fine-grained control
> over which buffers get evicted. At the same time, with GPU APIs becoming
> increasingly lower-level and GPU-driven, only the application itself
> can know which buffers are used within a particular submission, and
> how important each buffer is. For this, GPU APIs include interfaces
> to query oversubscription and specify memory priorities: In Vulkan,
> oversubscription can be queried through the VK_EXT_memory_budget
> extension. Different buffers can also be assigned priorities via the
> VK_EXT_pageable_device_local_memory extension. Modern games, especially
> D3D12 games via vkd3d-proton, rely on oversubscription being reported and
> priorities being respected in order to perform their memory management.
>
> However, relaying this information to the kernel via the current KMD uAPIs
> is not possible. On AMDGPU for example, all work submissions include a
> "bo list" that contains any buffer object that is accessed during the
> course of the submission. If VRAM is oversubscribed and a buffer in the
> list was evicted to system memory, that buffer is moved back to VRAM
> (potentially evicting other unused buffers).
>
> Since the usermode driver doesn't know what buffers are used by the
> application, its only choice is to submit a bo list that contains every
> buffer the application has allocated. In case of VRAM oversubscription,
> it is highly likely that some of the application's buffers were evicted,
> which almost guarantees that some buffers will get moved around. Since
> the bo list is only known at submit time, this also means the buffers
> will get moved right before submitting application work, which is the
> worst possible time to move buffers from a latency perspective. Another
> consequence of the large bo list is that nearly all memory from other
> applications will be evicted, too. When different applications (e.g. game
> and compositor) submit work one after the other, this causes a ping-pong
> effect where each app's submission evicts the other app's memory,
> resulting in a large amount of unnecessary moves.
>
> This overly aggressive eviction behavior led to RADV adopting a change
> that effectively allows all VRAM applications to reside in system memory
> [1]. This worked around the ping-ponging/excessive buffer moving problem,
> but also meant that any memory evicted to system memory would forever
> stay there, regardless of how VRAM is used.
>
> My proposal aims at providing a middle ground between these extremes.
> The goals I want to meet are:
> - Userspace is accurately informed about VRAM oversubscription/how much
> VRAM has been evicted
> - Buffer eviction respects priorities set by userspace - Wasteful
> ping-ponging is avoided to the extent possible
>
> I have been testing out some prototypes, and came up with this rough
> sketch of an API:
>
> - For each ttm_resource_manager, the amount of evicted memory is tracked
> (similarly to how "usage" tracks the memory usage). When memory is
> evicted via ttm_bo_evict, the size of the evicted memory is added, when
> memory is un-evicted (see below), its size is subtracted. The amount of
> evicted memory for e.g. VRAM can be queried by userspace via an ioctl.
>
> - Each ttm_resource_manager maintains a list of evicted buffer objects.
>
> - ttm_mem_unevict walks the list of evicted bos for a given
> ttm_resource_manager and tries moving evicted resources back. When a
> buffer is freed, this function is called to immediately restore some
> evicted memory.
>
> - Each ttm_buffer_object independently tracks the mem_type it wants
> to reside in.
>
> - ttm_bo_try_unevict is added as a helper function which attempts to
> move the buffer to its preferred mem_type. If no space is available
> there, it fails with -ENOSPC/-ENOMEM.
>
> - Similar to how ttm_bo_evict works, each driver can implement
> uneviction_valuable/unevict_flags callbacks to control buffer
> un-eviction.
>
> This is what patches 1-10 accomplish (together with an amdgpu
> implementation utilizing the new API).
>
> Userspace priorities could then be implemented as follows:
>
> - TTM already manages priorities for each buffer object. These priorities
> can be updated by userspace via a GEM_OP ioctl to inform the kernel
> which buffers should be evicted before others. If an ioctl increases
> the priority of a buffer, ttm_bo_try_unevict is called on that buffer to
> try and move it back (potentially evicting buffers with a lower
> priority)
>
> - Buffers should never be evicted by other buffers with equal/lower
> priority, but if there is a buffer with lower priority occupying VRAM,
> it should be evicted in favor of the higher-priority one. This prevents
> ping-ponging between buffers that try evicting each other and is
> trivially implementable with an early-exit in ttm_mem_evict_first.
>
> This is covered in patches 11-15, with the new features exposed to
> userspace in patches 16-18.
>
> I also have a RADV branch utilizing this API at [2], which I use for
> testing.
>
> This implementation is stil very much WIP, although the D3D12 games I
> tested already seemed to benefit from it. Nevertheless, are still quite
> a few TODOs and unresolved questions/problems.
>
> Some kernel drivers (e.g i915) already use TTM priorities for
> kernel-internal purposes. Of course, some of the highest priorities
> should stay reserved for these purposes (with userspace being able to
> use the lower priorities).
>
> Another problem with priorities is the possibility of apps starving other
> apps by occupying all of VRAM with high-priority allocations. A possible
> solution could be include restricting the highest priority/priorities
> to important apps like compositors.
>
> Tying into this problem, only apps that are actively cooperating
> to reduce memory pressure can benefit from the current memory priority
> implementation. Eventually the priority system could also be utilized
> to benefit all applications, for example with the desktop environment
> boosting the priority of the currently-focused app/its cgroup (to
> provide the best QoS to the apps the user is actively using). A full
> implementation of this is probably out-of-scope for this initial proposal,
> but it's probably a good idea to consider this as a possible future use
> of the priority API.
>
> I'm primarily looking to integrate this into amdgpu to solve the
> issues I've seen there, but I'm also interested in feedback from
> other drivers. Is this something you'd be interested in? Do you
> have any objections/comments/questions about my proposed design?
>
> Thanks,
> Friedrich
>
> [1] https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6833
> [2] https://gitlab.freedesktop.org/pixelcluster/mesa/-/tree/spilling
>
> Friedrich Vock (18):
> drm/ttm: Add tracking for evicted memory
> drm/ttm: Add per-BO eviction tracking
> drm/ttm: Implement BO eviction tracking
> drm/ttm: Add driver funcs for uneviction control
> drm/ttm: Add option to evict no BOs in operation
> drm/ttm: Add public buffer eviction/uneviction functions
> drm/amdgpu: Add TTM uneviction control functions
> drm/amdgpu: Don't try moving BOs to preferred domain before submit
> drm/amdgpu: Don't mark VRAM as a busy placement for VRAM|GTT resources
> drm/amdgpu: Don't add GTT to initial domains after failing to allocate
> VRAM
> drm/ttm: Bump BO priority count
> drm/ttm: Do not evict BOs with higher priority
> drm/ttm: Implement ttm_bo_update_priority
> drm/ttm: Consider BOs placed in non-favorite locations evicted
> drm/amdgpu: Set a default priority for user/kernel BOs
> drm/amdgpu: Implement SET_PRIORITY GEM op
> drm/amdgpu: Implement EVICTED_VRAM query
> drm/amdgpu: Bump minor version
>
> drivers/gpu/drm/amd/amdgpu/amdgpu.h | 2 -
> drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 191 +---------------
> drivers/gpu/drm/amd/amdgpu/amdgpu_cs.h | 4 -
> drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 3 +-
> drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 25 +-
> drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 3 +
> drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 26 ++-
> drivers/gpu/drm/amd/amdgpu/amdgpu_object.h | 4 +
> drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 50 ++++
> drivers/gpu/drm/ttm/ttm_bo.c | 253 ++++++++++++++++++++-
> drivers/gpu/drm/ttm/ttm_bo_util.c | 3 +
> drivers/gpu/drm/ttm/ttm_device.c | 1 +
> drivers/gpu/drm/ttm/ttm_resource.c | 19 +-
> include/drm/ttm/ttm_bo.h | 22 ++
> include/drm/ttm/ttm_device.h | 28 +++
> include/drm/ttm/ttm_resource.h | 11 +-
> include/uapi/drm/amdgpu_drm.h | 3 +
> 17 files changed, 430 insertions(+), 218 deletions(-)
>
> --
> 2.44.0
>
next prev parent reply other threads:[~2024-04-25 13:23 UTC|newest]
Thread overview: 41+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-04-24 16:56 [RFC PATCH 00/18] TTM interface for managing VRAM oversubscription Friedrich Vock
2024-04-24 16:56 ` [RFC PATCH 01/18] drm/ttm: Add tracking for evicted memory Friedrich Vock
2024-04-24 16:56 ` [RFC PATCH 02/18] drm/ttm: Add per-BO eviction tracking Friedrich Vock
2024-04-25 6:18 ` Christian König
2024-04-25 19:02 ` Matthew Brost
2024-04-26 6:27 ` Christian König
2024-04-24 16:56 ` [RFC PATCH 03/18] drm/ttm: Implement BO " Friedrich Vock
2024-04-24 16:56 ` [RFC PATCH 04/18] drm/ttm: Add driver funcs for uneviction control Friedrich Vock
2024-04-24 16:56 ` [RFC PATCH 05/18] drm/ttm: Add option to evict no BOs in operation Friedrich Vock
2024-04-25 6:20 ` Christian König
2024-04-24 16:56 ` [RFC PATCH 06/18] drm/ttm: Add public buffer eviction/uneviction functions Friedrich Vock
2024-04-24 16:56 ` [RFC PATCH 07/18] drm/amdgpu: Add TTM uneviction control functions Friedrich Vock
2024-04-24 16:56 ` [RFC PATCH 08/18] drm/amdgpu: Don't try moving BOs to preferred domain before submit Friedrich Vock
2024-04-25 6:36 ` Christian König
2024-04-24 16:56 ` [RFC PATCH 09/18] drm/amdgpu: Don't mark VRAM as a busy placement for VRAM|GTT resources Friedrich Vock
2024-04-25 6:24 ` Christian König
2024-04-24 16:57 ` [RFC PATCH 10/18] drm/amdgpu: Don't add GTT to initial domains after failing to allocate VRAM Friedrich Vock
2024-04-25 6:25 ` Christian König
2024-04-25 7:39 ` Friedrich Vock
2024-04-25 7:54 ` Christian König
2024-04-24 16:57 ` [RFC PATCH 11/18] drm/ttm: Bump BO priority count Friedrich Vock
2024-04-24 16:57 ` [RFC PATCH 12/18] drm/ttm: Do not evict BOs with higher priority Friedrich Vock
2024-04-25 6:26 ` Christian König
2024-04-24 16:57 ` [RFC PATCH 13/18] drm/ttm: Implement ttm_bo_update_priority Friedrich Vock
2024-04-25 6:29 ` Christian König
2024-04-24 16:57 ` [RFC PATCH 14/18] drm/ttm: Consider BOs placed in non-favorite locations evicted Friedrich Vock
2024-04-24 16:57 ` [RFC PATCH 15/18] drm/amdgpu: Set a default priority for user/kernel BOs Friedrich Vock
2024-04-24 16:57 ` [RFC PATCH 16/18] drm/amdgpu: Implement SET_PRIORITY GEM op Friedrich Vock
2024-04-25 6:32 ` Christian König
2024-04-25 6:46 ` Friedrich Vock
2024-04-25 6:58 ` Christian König
2024-04-25 7:06 ` Friedrich Vock
2024-04-25 7:15 ` Christian König
2024-04-25 7:39 ` Friedrich Vock
2024-04-24 16:57 ` [RFC PATCH 17/18] drm/amdgpu: Implement EVICTED_VRAM query Friedrich Vock
2024-04-24 16:57 ` [RFC PATCH 18/18] drm/amdgpu: Bump minor version Friedrich Vock
2024-04-25 6:54 ` [RFC PATCH 00/18] TTM interface for managing VRAM oversubscription Christian König
2024-04-25 13:22 ` Marek Olšák [this message]
2024-04-25 13:33 ` Christian König
2024-05-02 14:23 ` Maarten Lankhorst
2024-05-13 13:44 ` Friedrich Vock
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAAxE2A6u1sf=dzhLcp3W6Pb=pve9RP3UpEsAYfqMuyk0Mc053A@mail.gmail.com' \
--to=maraeo@gmail.com \
--cc=alexander.deucher@amd.com \
--cc=amd-gfx@lists.freedesktop.org \
--cc=bas@basnieuwenhuizen.nl \
--cc=christian.koenig@amd.com \
--cc=dri-devel@lists.freedesktop.org \
--cc=friedrich.vock@gmx.de \
--cc=joshua@froggi.es \
--cc=pgriffais@valvesoftware.com \
--cc=tvrtko.ursulin@igalia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).