RE: Making drm_gpuvm work across gpu devices

dri-devel Archive mirror
 help / color / mirror / Atom feed

From: "Zeng, Oak" <oak.zeng@intel.com>
To: Daniel Vetter <daniel@ffwll.ch>, David Airlie <airlied@redhat.com>
Cc: "Brost, Matthew" <matthew.brost@intel.com>,
	"Thomas Hellström" <thomas.hellstrom@linux.intel.com>,
	"rcampbell@nvidia.com" <rcampbell@nvidia.com>,
	"apopple@nvidia.com" <apopple@nvidia.com>,
	"Felix Kuehling" <felix.kuehling@amd.com>,
	"Welty, Brian" <brian.welty@intel.com>,
	"Shah, Ankur N" <ankur.n.shah@intel.com>,
	"dri-devel@lists.freedesktop.org"
	<dri-devel@lists.freedesktop.org>,
	"intel-xe@lists.freedesktop.org" <intel-xe@lists.freedesktop.org>,
	"jglisse@redhat.com" <jglisse@redhat.com>,
	"Ghimiray, Himal Prasad" <himal.prasad.ghimiray@intel.com>,
	"Gupta, saurabhg" <saurabhg.gupta@intel.com>,
	"Bommu,  Krishnaiah" <krishnaiah.bommu@intel.com>,
	"Vishwanathapura,
	Niranjana" <niranjana.vishwanathapura@intel.com>,
	"Christian König" <christian.koenig@amd.com>,
	"Danilo Krummrich" <dakr@redhat.com>
Subject: RE: Making drm_gpuvm work across gpu devices
Date: Wed, 31 Jan 2024 20:59:32 +0000	[thread overview]
Message-ID: <SA1PR11MB69915EAFE5D1278FDD3E2760927C2@SA1PR11MB6991.namprd11.prod.outlook.com> (raw)
In-Reply-To: <SA1PR11MB69910A237BF3666C8D798AFA927C2@SA1PR11MB6991.namprd11.prod.outlook.com>

Fixed one typo: GPU VA != GPU VA should be GPU VA can != CPU VA

> -----Original Message-----
> From: Zeng, Oak
> Sent: Wednesday, January 31, 2024 3:17 PM
> To: Daniel Vetter <daniel@ffwll.ch>; David Airlie <airlied@redhat.com>
> Cc: Christian König <christian.koenig@amd.com>; Thomas Hellström
> <thomas.hellstrom@linux.intel.com>; Brost, Matthew
> <matthew.brost@intel.com>; Felix Kuehling <felix.kuehling@amd.com>; Welty,
> Brian <brian.welty@intel.com>; dri-devel@lists.freedesktop.org; Ghimiray, Himal
> Prasad <himal.prasad.ghimiray@intel.com>; Bommu, Krishnaiah
> <krishnaiah.bommu@intel.com>; Gupta, saurabhg <saurabhg.gupta@intel.com>;
> Vishwanathapura, Niranjana <niranjana.vishwanathapura@intel.com>; intel-
> xe@lists.freedesktop.org; Danilo Krummrich <dakr@redhat.com>; Shah, Ankur N
> <ankur.n.shah@intel.com>; jglisse@redhat.com; rcampbell@nvidia.com;
> apopple@nvidia.com
> Subject: RE: Making drm_gpuvm work across gpu devices
> 
> Hi Sima, Dave,
> 
> I am well aware nouveau driver is not what Nvidia do with their customer. The
> key argument is, can we move forward with the concept shared virtual address
> space b/t CPU and GPU? This is the foundation of HMM. We already have split
> address space support with other driver API. SVM, from its name, it means
> shared address space. Are we allowed to implement another driver model to
> allow SVM work, along with other APIs supporting split address space? Those two
> scheme can co-exist in harmony. We actually have real use cases to use both
> models in one application.
> 
> Hi Christian, Thomas,
> 
> In your scheme, GPU VA can != CPU VA. This does introduce some flexibility. But
> this scheme alone doesn't solve the problem of the proxy process/para-
> virtualization. You will still need a second mechanism to partition GPU VA space
> b/t guest process1 and guest process2 because proxy process (or the host
> hypervisor whatever you call it) use one single gpu page table for all the
> guest/client processes. GPU VA for different guest process can't overlap. If this
> second mechanism exist, we of course can use the same mechanism to partition
> CPU VA space between guest processes as well, then we can still use shared VA
> b/t CPU and GPU inside one process, but process1 and process2's address space
> (for both cpu and gpu) doesn't overlap. This second mechanism is the key to
> solve the proxy process problem, not the flexibility you introduced.
> 
> In practice, your scheme also have a risk of running out of process space because
> you have to partition whole address space b/t processes. Apparently allowing
> each guest process to own the whole process space and using separate GPU/CPU
> page table for different processes is a better solution than using single page table
> and partition process space b/t processes.
> 
> For Intel GPU, para-virtualization (xenGT, see https://github.com/intel/XenGT-
> Preview-kernel. It is similar idea of the proxy process in Flex's email. They are all
> SW-based GPU virtualization technology) is an old project. It is now replaced with
> HW accelerated SRIOV/system virtualization. XenGT is abandoned long time ago.
> So agreed your scheme add some flexibility. The question is, do we have a valid
> use case to use such flexibility? I don't see a single one ATM.
> 
> I also pictured into how to implement your scheme. You basically rejected the
> very foundation of hmm design which is shared address space b/t CPU and GPU.
> In your scheme, GPU VA = CPU VA + offset. In every single place where driver
> need to call hmm facilities such as hmm_range_fault, migrate_vma_setup and in
> mmu notifier call back, you need to offset the GPU VA to get a CPU VA. From
> application writer's perspective, whenever he want to use a CPU pointer in his
> GPU program, he add to add that offset. Do you think this is awkward?
> 
> Finally, to implement SVM, we need to implement some memory hint API which
> applies to a virtual address range across all GPU devices. For example, user would
> say, for this virtual address range, I prefer the backing store memory to be on
> GPU deviceX (because user knows deviceX would use this address range much
> more than other GPU devices or CPU). It doesn't make sense to me to make such
> API per device based. For example, if you tell device A that the preferred
> memory location is device B memory, this doesn't sounds correct to me because
> in your scheme, device A is not even aware of the existence of device B. right?
> 
> Regards,
> Oak
> > -----Original Message-----
> > From: Daniel Vetter <daniel@ffwll.ch>
> > Sent: Wednesday, January 31, 2024 4:15 AM
> > To: David Airlie <airlied@redhat.com>
> > Cc: Zeng, Oak <oak.zeng@intel.com>; Christian König
> > <christian.koenig@amd.com>; Thomas Hellström
> > <thomas.hellstrom@linux.intel.com>; Daniel Vetter <daniel@ffwll.ch>; Brost,
> > Matthew <matthew.brost@intel.com>; Felix Kuehling
> > <felix.kuehling@amd.com>; Welty, Brian <brian.welty@intel.com>; dri-
> > devel@lists.freedesktop.org; Ghimiray, Himal Prasad
> > <himal.prasad.ghimiray@intel.com>; Bommu, Krishnaiah
> > <krishnaiah.bommu@intel.com>; Gupta, saurabhg
> <saurabhg.gupta@intel.com>;
> > Vishwanathapura, Niranjana <niranjana.vishwanathapura@intel.com>; intel-
> > xe@lists.freedesktop.org; Danilo Krummrich <dakr@redhat.com>; Shah, Ankur
> N
> > <ankur.n.shah@intel.com>; jglisse@redhat.com; rcampbell@nvidia.com;
> > apopple@nvidia.com
> > Subject: Re: Making drm_gpuvm work across gpu devices
> >
> > On Wed, Jan 31, 2024 at 09:12:39AM +1000, David Airlie wrote:
> > > On Wed, Jan 31, 2024 at 8:29 AM Zeng, Oak <oak.zeng@intel.com> wrote:
> > > >
> > > > Hi Christian,
> > > >
> > > >
> > > >
> > > > Nvidia Nouveau driver uses exactly the same concept of SVM with HMM,
> > GPU address in the same process is exactly the same with CPU virtual address.
> It
> > is already in upstream Linux kernel. We Intel just follow the same direction for
> > our customers. Why we are not allowed?
> > >
> > >
> > > Oak, this isn't how upstream works, you don't get to appeal to
> > > customer or internal design. nouveau isn't "NVIDIA"'s and it certainly
> > > isn't something NVIDIA would ever suggest for their customers. We also
> > > likely wouldn't just accept NVIDIA's current solution upstream without
> > > some serious discussions. The implementation in nouveau was more of a
> > > sample HMM use case rather than a serious implementation. I suspect if
> > > we do get down the road of making nouveau an actual compute driver for
> > > SVM etc then it would have to severely change.
> >
> > Yeah on the nouveau hmm code specifically my gut feeling impression is
> > that we didn't really make friends with that among core kernel
> > maintainers. It's a bit too much just a tech demo to be able to merge the
> > hmm core apis for nvidia's out-of-tree driver.
> >
> > Also, a few years of learning and experience gaining happened meanwhile -
> > you always have to look at an api design in the context of when it was
> > designed, and that context changes all the time.
> >
> > Cheers, Sima
> > --
> > Daniel Vetter
> > Software Engineer, Intel Corporation
> > http://blog.ffwll.ch

next prev parent reply	other threads:[~2024-01-31 20:59 UTC|newest]

Thread overview: 123+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-17 22:12 [PATCH 00/23] XeKmd basic SVM support Oak Zeng
2024-01-17 22:12 ` [PATCH 01/23] drm/xe/svm: Add SVM document Oak Zeng
2024-01-17 22:12 ` [PATCH 02/23] drm/xe/svm: Add svm key data structures Oak Zeng
2024-01-17 22:12 ` [PATCH 03/23] drm/xe/svm: create xe svm during vm creation Oak Zeng
2024-01-17 22:12 ` [PATCH 04/23] drm/xe/svm: Trace svm creation Oak Zeng
2024-01-17 22:12 ` [PATCH 05/23] drm/xe/svm: add helper to retrieve svm range from address Oak Zeng
2024-01-17 22:12 ` [PATCH 06/23] drm/xe/svm: Introduce a helper to build sg table from hmm range Oak Zeng
2024-04-05  0:39   ` Jason Gunthorpe
2024-04-05  3:33     ` Zeng, Oak
2024-04-05 12:37       ` Jason Gunthorpe
2024-04-05 16:42         ` Zeng, Oak
2024-04-05 18:02           ` Jason Gunthorpe
2024-04-09 16:45             ` Zeng, Oak
2024-04-09 17:24               ` Jason Gunthorpe
2024-04-23 21:17                 ` Zeng, Oak
2024-04-24  2:31                   ` Matthew Brost
2024-04-24 13:57                     ` Jason Gunthorpe
2024-04-24 16:35                       ` Matthew Brost
2024-04-24 16:44                         ` Jason Gunthorpe
2024-04-24 16:56                           ` Matthew Brost
2024-04-24 17:48                             ` Jason Gunthorpe
2024-04-24 13:48                   ` Jason Gunthorpe
2024-04-24 23:59                     ` Zeng, Oak
2024-04-25  1:05                       ` Jason Gunthorpe
2024-04-26  9:55                         ` Thomas Hellström
2024-04-26 12:00                           ` Jason Gunthorpe
2024-04-26 14:49                             ` Thomas Hellström
2024-04-26 16:35                               ` Jason Gunthorpe
2024-04-29  8:25                                 ` Thomas Hellström
2024-04-30 17:30                                   ` Jason Gunthorpe
2024-04-30 18:57                                     ` Daniel Vetter
2024-05-01  0:09                                       ` Jason Gunthorpe
2024-05-02  8:04                                         ` Daniel Vetter
2024-05-02  9:11                                           ` Thomas Hellström
2024-05-02 12:46                                             ` Jason Gunthorpe
2024-05-02 15:01                                               ` Thomas Hellström
2024-05-02 19:25                                                 ` Zeng, Oak
2024-05-03 13:37                                                   ` Jason Gunthorpe
2024-05-03 14:43                                                     ` Zeng, Oak
2024-05-03 16:28                                                       ` Jason Gunthorpe
2024-05-03 20:29                                                         ` Zeng, Oak
2024-05-04  1:03                                                           ` Dave Airlie
2024-05-06 13:04                                                             ` Daniel Vetter
2024-05-06 23:50                                                               ` Matthew Brost
2024-05-07 11:56                                                                 ` Jason Gunthorpe
2024-05-06 13:33                                                           ` Jason Gunthorpe
2024-04-09 17:33               ` Matthew Brost
2024-01-17 22:12 ` [PATCH 07/23] drm/xe/svm: Add helper for binding hmm range to gpu Oak Zeng
2024-01-17 22:12 ` [PATCH 08/23] drm/xe/svm: Add helper to invalidate svm range from GPU Oak Zeng
2024-01-17 22:12 ` [PATCH 09/23] drm/xe/svm: Remap and provide memmap backing for GPU vram Oak Zeng
2024-01-17 22:12 ` [PATCH 10/23] drm/xe/svm: Introduce svm migration function Oak Zeng
2024-01-17 22:12 ` [PATCH 11/23] drm/xe/svm: implement functions to allocate and free device memory Oak Zeng
2024-01-17 22:12 ` [PATCH 12/23] drm/xe/svm: Trace buddy block allocation and free Oak Zeng
2024-01-17 22:12 ` [PATCH 13/23] drm/xe/svm: Handle CPU page fault Oak Zeng
2024-01-17 22:12 ` [PATCH 14/23] drm/xe/svm: trace svm range migration Oak Zeng
2024-01-17 22:12 ` [PATCH 15/23] drm/xe/svm: Implement functions to register and unregister mmu notifier Oak Zeng
2024-01-17 22:12 ` [PATCH 16/23] drm/xe/svm: Implement the mmu notifier range invalidate callback Oak Zeng
2024-01-17 22:12 ` [PATCH 17/23] drm/xe/svm: clean up svm range during process exit Oak Zeng
2024-01-17 22:12 ` [PATCH 18/23] drm/xe/svm: Move a few structures to xe_gt.h Oak Zeng
2024-01-17 22:12 ` [PATCH 19/23] drm/xe/svm: migrate svm range to vram Oak Zeng
2024-01-17 22:12 ` [PATCH 20/23] drm/xe/svm: Populate svm range Oak Zeng
2024-01-17 22:12 ` [PATCH 21/23] drm/xe/svm: GPU page fault support Oak Zeng
2024-01-23  2:06   ` Welty, Brian
2024-01-23  3:09     ` Zeng, Oak
2024-01-23  3:21       ` Making drm_gpuvm work across gpu devices Zeng, Oak
2024-01-23 11:13         ` Christian König
2024-01-23 19:37           ` Zeng, Oak
2024-01-23 20:17             ` Felix Kuehling
2024-01-25  1:39               ` Zeng, Oak
2024-01-23 23:56             ` Danilo Krummrich
2024-01-24  3:57               ` Zeng, Oak
2024-01-24  4:14                 ` Zeng, Oak
2024-01-24  6:48                   ` Christian König
2024-01-25 22:13                 ` Danilo Krummrich
2024-01-24  8:33             ` Christian König
2024-01-25  1:17               ` Zeng, Oak
2024-01-25  1:25                 ` David Airlie
2024-01-25  5:25                   ` Zeng, Oak
2024-01-26 10:09                     ` Christian König
2024-01-26 20:13                       ` Zeng, Oak
2024-01-29 10:10                         ` Christian König
2024-01-29 20:09                           ` Zeng, Oak
2024-01-25 11:00                 ` 回复：Making " 周春明(日月)
2024-01-25 17:00                   ` Zeng, Oak
2024-01-25 17:15                 ` Making " Felix Kuehling
2024-01-25 18:37                   ` Zeng, Oak
2024-01-26 13:23                     ` Christian König
2024-01-25 16:42               ` Zeng, Oak
2024-01-25 18:32               ` Daniel Vetter
2024-01-25 21:02                 ` Zeng, Oak
2024-01-26  8:21                 ` Thomas Hellström
2024-01-26 12:52                   ` Christian König
2024-01-27  2:21                     ` Zeng, Oak
2024-01-29 10:19                       ` Christian König
2024-01-30  0:21                         ` Zeng, Oak
2024-01-30  8:39                           ` Christian König
2024-01-30 22:29                             ` Zeng, Oak
2024-01-30 23:12                               ` David Airlie
2024-01-31  9:15                                 ` Daniel Vetter
2024-01-31 20:17                                   ` Zeng, Oak
2024-01-31 20:59                                     ` Zeng, Oak [this message]
2024-02-01  8:52                                     ` Christian König
2024-02-29 18:22                                       ` Zeng, Oak
2024-03-08  4:43                                         ` Zeng, Oak
2024-03-08 10:07                                           ` Christian König
2024-01-30  8:43                           ` Thomas Hellström
2024-01-29 15:03                 ` Felix Kuehling
2024-01-29 15:33                   ` Christian König
2024-01-29 16:24                     ` Felix Kuehling
2024-01-29 16:28                       ` Christian König
2024-01-29 17:52                         ` Felix Kuehling
2024-01-29 19:03                           ` Christian König
2024-01-29 20:24                             ` Felix Kuehling
2024-02-23 20:12               ` Zeng, Oak
2024-02-27  6:54                 ` Christian König
2024-02-27 15:58                   ` Zeng, Oak
2024-02-28 19:51                     ` Zeng, Oak
2024-02-29  9:41                       ` Christian König
2024-02-29 16:05                         ` Zeng, Oak
2024-02-29 17:12                         ` Thomas Hellström
2024-03-01  7:01                           ` Christian König
2024-01-17 22:12 ` [PATCH 22/23] drm/xe/svm: Add DRM_XE_SVM kernel config entry Oak Zeng
2024-01-17 22:12 ` [PATCH 23/23] drm/xe/svm: Add svm memory hints interface Oak Zeng

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=SA1PR11MB69915EAFE5D1278FDD3E2760927C2@SA1PR11MB6991.namprd11.prod.outlook.com \
    --to=oak.zeng@intel.com \
    --cc=airlied@redhat.com \
    --cc=ankur.n.shah@intel.com \
    --cc=apopple@nvidia.com \
    --cc=brian.welty@intel.com \
    --cc=christian.koenig@amd.com \
    --cc=dakr@redhat.com \
    --cc=daniel@ffwll.ch \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=felix.kuehling@amd.com \
    --cc=himal.prasad.ghimiray@intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=jglisse@redhat.com \
    --cc=krishnaiah.bommu@intel.com \
    --cc=matthew.brost@intel.com \
    --cc=niranjana.vishwanathapura@intel.com \
    --cc=rcampbell@nvidia.com \
    --cc=saurabhg.gupta@intel.com \
    --cc=thomas.hellstrom@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).