dri-devel Archive mirror
 help / color / mirror / Atom feed
From: "Marek Olšák" <maraeo@gmail.com>
To: Friedrich Vock <friedrich.vock@gmx.de>
Cc: dri-devel@lists.freedesktop.org, amd-gfx@lists.freedesktop.org,
	"Pierre-Loup Griffais" <pgriffais@valvesoftware.com>,
	"Tvrtko Ursulin" <tvrtko.ursulin@igalia.com>,
	"Bas Nieuwenhuizen" <bas@basnieuwenhuizen.nl>,
	"Joshua Ashton" <joshua@froggi.es>,
	"Christian König" <christian.koenig@amd.com>,
	"Alex Deucher" <alexander.deucher@amd.com>
Subject: Re: [RFC PATCH 00/18] TTM interface for managing VRAM oversubscription
Date: Thu, 25 Apr 2024 09:22:55 -0400	[thread overview]
Message-ID: <CAAxE2A6u1sf=dzhLcp3W6Pb=pve9RP3UpEsAYfqMuyk0Mc053A@mail.gmail.com> (raw)
In-Reply-To: <20240424165937.54759-1-friedrich.vock@gmx.de>

The most extreme ping-ponging is mitigated by throttling buffer moves
in the kernel, but it only works without VM_ALWAYS_VALID and you can
set BO priorities in the BO list. A better approach that works with
VM_ALWAYS_VALID would be nice.

Marek

On Wed, Apr 24, 2024 at 1:12 PM Friedrich Vock <friedrich.vock@gmx.de> wrote:
>
> Hi everyone,
>
> recently I've been looking into remedies for apps (in particular, newer
> games) that experience significant performance loss when they start to
> hit VRAM limits, especially on older or lower-end cards that struggle
> to fit both desktop apps and all the game data into VRAM at once.
>
> The root of the problem lies in the fact that from userspace's POV,
> buffer eviction is very opaque: Userspace applications/drivers cannot
> tell how oversubscribed VRAM is, nor do they have fine-grained control
> over which buffers get evicted.  At the same time, with GPU APIs becoming
> increasingly lower-level and GPU-driven, only the application itself
> can know which buffers are used within a particular submission, and
> how important each buffer is. For this, GPU APIs include interfaces
> to query oversubscription and specify memory priorities: In Vulkan,
> oversubscription can be queried through the VK_EXT_memory_budget
> extension. Different buffers can also be assigned priorities via the
> VK_EXT_pageable_device_local_memory extension. Modern games, especially
> D3D12 games via vkd3d-proton, rely on oversubscription being reported and
> priorities being respected in order to perform their memory management.
>
> However, relaying this information to the kernel via the current KMD uAPIs
> is not possible. On AMDGPU for example, all work submissions include a
> "bo list" that contains any buffer object that is accessed during the
> course of the submission. If VRAM is oversubscribed and a buffer in the
> list was evicted to system memory, that buffer is moved back to VRAM
> (potentially evicting other unused buffers).
>
> Since the usermode driver doesn't know what buffers are used by the
> application, its only choice is to submit a bo list that contains every
> buffer the application has allocated. In case of VRAM oversubscription,
> it is highly likely that some of the application's buffers were evicted,
> which almost guarantees that some buffers will get moved around. Since
> the bo list is only known at submit time, this also means the buffers
> will get moved right before submitting application work, which is the
> worst possible time to move buffers from a latency perspective. Another
> consequence of the large bo list is that nearly all memory from other
> applications will be evicted, too. When different applications (e.g. game
> and compositor) submit work one after the other, this causes a ping-pong
> effect where each app's submission evicts the other app's memory,
> resulting in a large amount of unnecessary moves.
>
> This overly aggressive eviction behavior led to RADV adopting a change
> that effectively allows all VRAM applications to reside in system memory
> [1].  This worked around the ping-ponging/excessive buffer moving problem,
> but also meant that any memory evicted to system memory would forever
> stay there, regardless of how VRAM is used.
>
> My proposal aims at providing a middle ground between these extremes.
> The goals I want to meet are:
> - Userspace is accurately informed about VRAM oversubscription/how much
>   VRAM has been evicted
> - Buffer eviction respects priorities set by userspace - Wasteful
>   ping-ponging is avoided to the extent possible
>
> I have been testing out some prototypes, and came up with this rough
> sketch of an API:
>
> - For each ttm_resource_manager, the amount of evicted memory is tracked
>   (similarly to how "usage" tracks the memory usage). When memory is
>   evicted via ttm_bo_evict, the size of the evicted memory is added, when
>   memory is un-evicted (see below), its size is subtracted. The amount of
>   evicted memory for e.g. VRAM can be queried by userspace via an ioctl.
>
> - Each ttm_resource_manager maintains a list of evicted buffer objects.
>
> - ttm_mem_unevict walks the list of evicted bos for a given
>   ttm_resource_manager and tries moving evicted resources back. When a
>   buffer is freed, this function is called to immediately restore some
>   evicted memory.
>
> - Each ttm_buffer_object independently tracks the mem_type it wants
>   to reside in.
>
> - ttm_bo_try_unevict is added as a helper function which attempts to
>   move the buffer to its preferred mem_type. If no space is available
>   there, it fails with -ENOSPC/-ENOMEM.
>
> - Similar to how ttm_bo_evict works, each driver can implement
>   uneviction_valuable/unevict_flags callbacks to control buffer
>   un-eviction.
>
> This is what patches 1-10 accomplish (together with an amdgpu
> implementation utilizing the new API).
>
> Userspace priorities could then be implemented as follows:
>
> - TTM already manages priorities for each buffer object. These priorities
>   can be updated by userspace via a GEM_OP ioctl to inform the kernel
>   which buffers should be evicted before others. If an ioctl increases
>   the priority of a buffer, ttm_bo_try_unevict is called on that buffer to
>   try and move it back (potentially evicting buffers with a lower
>   priority)
>
> - Buffers should never be evicted by other buffers with equal/lower
>   priority, but if there is a buffer with lower priority occupying VRAM,
>   it should be evicted in favor of the higher-priority one. This prevents
>   ping-ponging between buffers that try evicting each other and is
>   trivially implementable with an early-exit in ttm_mem_evict_first.
>
> This is covered in patches 11-15, with the new features exposed to
> userspace in patches 16-18.
>
> I also have a RADV branch utilizing this API at [2], which I use for
> testing.
>
> This implementation is stil very much WIP, although the D3D12 games I
> tested already seemed to benefit from it. Nevertheless, are still quite
> a few TODOs and unresolved questions/problems.
>
> Some kernel drivers (e.g i915) already use TTM priorities for
> kernel-internal purposes. Of course, some of the highest priorities
> should stay reserved for these purposes (with userspace being able to
> use the lower priorities).
>
> Another problem with priorities is the possibility of apps starving other
> apps by occupying all of VRAM with high-priority allocations. A possible
> solution could be include restricting the highest priority/priorities
> to important apps like compositors.
>
> Tying into this problem, only apps that are actively cooperating
> to reduce memory pressure can benefit from the current memory priority
> implementation. Eventually the priority system could also be utilized
> to benefit all applications, for example with the desktop environment
> boosting the priority of the currently-focused app/its cgroup (to
> provide the best QoS to the apps the user is actively using). A full
> implementation of this is probably out-of-scope for this initial proposal,
> but it's probably a good idea to consider this as a possible future use
> of the priority API.
>
> I'm primarily looking to integrate this into amdgpu to solve the
> issues I've seen there, but I'm also interested in feedback from
> other drivers. Is this something you'd be interested in? Do you
> have any objections/comments/questions about my proposed design?
>
> Thanks,
> Friedrich
>
> [1] https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6833
> [2] https://gitlab.freedesktop.org/pixelcluster/mesa/-/tree/spilling
>
> Friedrich Vock (18):
>   drm/ttm: Add tracking for evicted memory
>   drm/ttm: Add per-BO eviction tracking
>   drm/ttm: Implement BO eviction tracking
>   drm/ttm: Add driver funcs for uneviction control
>   drm/ttm: Add option to evict no BOs in operation
>   drm/ttm: Add public buffer eviction/uneviction functions
>   drm/amdgpu: Add TTM uneviction control functions
>   drm/amdgpu: Don't try moving BOs to preferred domain before submit
>   drm/amdgpu: Don't mark VRAM as a busy placement for VRAM|GTT resources
>   drm/amdgpu: Don't add GTT to initial domains after failing to allocate
>     VRAM
>   drm/ttm: Bump BO priority count
>   drm/ttm: Do not evict BOs with higher priority
>   drm/ttm: Implement ttm_bo_update_priority
>   drm/ttm: Consider BOs placed in non-favorite locations evicted
>   drm/amdgpu: Set a default priority for user/kernel BOs
>   drm/amdgpu: Implement SET_PRIORITY GEM op
>   drm/amdgpu: Implement EVICTED_VRAM query
>   drm/amdgpu: Bump minor version
>
>  drivers/gpu/drm/amd/amdgpu/amdgpu.h        |   2 -
>  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c     | 191 +---------------
>  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.h     |   4 -
>  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c    |   3 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c    |  25 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c    |   3 +
>  drivers/gpu/drm/amd/amdgpu/amdgpu_object.c |  26 ++-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_object.h |   4 +
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c    |  50 ++++
>  drivers/gpu/drm/ttm/ttm_bo.c               | 253 ++++++++++++++++++++-
>  drivers/gpu/drm/ttm/ttm_bo_util.c          |   3 +
>  drivers/gpu/drm/ttm/ttm_device.c           |   1 +
>  drivers/gpu/drm/ttm/ttm_resource.c         |  19 +-
>  include/drm/ttm/ttm_bo.h                   |  22 ++
>  include/drm/ttm/ttm_device.h               |  28 +++
>  include/drm/ttm/ttm_resource.h             |  11 +-
>  include/uapi/drm/amdgpu_drm.h              |   3 +
>  17 files changed, 430 insertions(+), 218 deletions(-)
>
> --
> 2.44.0
>

  parent reply	other threads:[~2024-04-25 13:23 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-24 16:56 [RFC PATCH 00/18] TTM interface for managing VRAM oversubscription Friedrich Vock
2024-04-24 16:56 ` [RFC PATCH 01/18] drm/ttm: Add tracking for evicted memory Friedrich Vock
2024-04-24 16:56 ` [RFC PATCH 02/18] drm/ttm: Add per-BO eviction tracking Friedrich Vock
2024-04-25  6:18   ` Christian König
2024-04-25 19:02     ` Matthew Brost
2024-04-26  6:27       ` Christian König
2024-04-24 16:56 ` [RFC PATCH 03/18] drm/ttm: Implement BO " Friedrich Vock
2024-04-24 16:56 ` [RFC PATCH 04/18] drm/ttm: Add driver funcs for uneviction control Friedrich Vock
2024-04-24 16:56 ` [RFC PATCH 05/18] drm/ttm: Add option to evict no BOs in operation Friedrich Vock
2024-04-25  6:20   ` Christian König
2024-04-24 16:56 ` [RFC PATCH 06/18] drm/ttm: Add public buffer eviction/uneviction functions Friedrich Vock
2024-04-24 16:56 ` [RFC PATCH 07/18] drm/amdgpu: Add TTM uneviction control functions Friedrich Vock
2024-04-24 16:56 ` [RFC PATCH 08/18] drm/amdgpu: Don't try moving BOs to preferred domain before submit Friedrich Vock
2024-04-25  6:36   ` Christian König
2024-04-24 16:56 ` [RFC PATCH 09/18] drm/amdgpu: Don't mark VRAM as a busy placement for VRAM|GTT resources Friedrich Vock
2024-04-25  6:24   ` Christian König
2024-04-24 16:57 ` [RFC PATCH 10/18] drm/amdgpu: Don't add GTT to initial domains after failing to allocate VRAM Friedrich Vock
2024-04-25  6:25   ` Christian König
2024-04-25  7:39     ` Friedrich Vock
2024-04-25  7:54       ` Christian König
2024-04-24 16:57 ` [RFC PATCH 11/18] drm/ttm: Bump BO priority count Friedrich Vock
2024-04-24 16:57 ` [RFC PATCH 12/18] drm/ttm: Do not evict BOs with higher priority Friedrich Vock
2024-04-25  6:26   ` Christian König
2024-04-24 16:57 ` [RFC PATCH 13/18] drm/ttm: Implement ttm_bo_update_priority Friedrich Vock
2024-04-25  6:29   ` Christian König
2024-04-24 16:57 ` [RFC PATCH 14/18] drm/ttm: Consider BOs placed in non-favorite locations evicted Friedrich Vock
2024-04-24 16:57 ` [RFC PATCH 15/18] drm/amdgpu: Set a default priority for user/kernel BOs Friedrich Vock
2024-04-24 16:57 ` [RFC PATCH 16/18] drm/amdgpu: Implement SET_PRIORITY GEM op Friedrich Vock
2024-04-25  6:32   ` Christian König
2024-04-25  6:46     ` Friedrich Vock
2024-04-25  6:58       ` Christian König
2024-04-25  7:06         ` Friedrich Vock
2024-04-25  7:15           ` Christian König
2024-04-25  7:39             ` Friedrich Vock
2024-04-24 16:57 ` [RFC PATCH 17/18] drm/amdgpu: Implement EVICTED_VRAM query Friedrich Vock
2024-04-24 16:57 ` [RFC PATCH 18/18] drm/amdgpu: Bump minor version Friedrich Vock
2024-04-25  6:54 ` [RFC PATCH 00/18] TTM interface for managing VRAM oversubscription Christian König
2024-04-25 13:22 ` Marek Olšák [this message]
2024-04-25 13:33   ` Christian König
2024-05-02 14:23 ` Maarten Lankhorst
2024-05-13 13:44   ` Friedrich Vock

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAAxE2A6u1sf=dzhLcp3W6Pb=pve9RP3UpEsAYfqMuyk0Mc053A@mail.gmail.com' \
    --to=maraeo@gmail.com \
    --cc=alexander.deucher@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=bas@basnieuwenhuizen.nl \
    --cc=christian.koenig@amd.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=friedrich.vock@gmx.de \
    --cc=joshua@froggi.es \
    --cc=pgriffais@valvesoftware.com \
    --cc=tvrtko.ursulin@igalia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).