All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
From: "Zeng, Oak" <oak.zeng@intel.com>
To: "Zeng, Oak" <oak.zeng@intel.com>,
	"Danilo Krummrich" <dakr@redhat.com>,
	"Christian König" <christian.koenig@amd.com>,
	"Dave Airlie" <airlied@redhat.com>,
	"Daniel Vetter" <daniel@ffwll.ch>,
	"Felix Kuehling" <felix.kuehling@amd.com>,
	"Welty, Brian" <brian.welty@intel.com>
Cc: "Brost, Matthew" <matthew.brost@intel.com>,
	"Thomas.Hellstrom@linux.intel.com"
	<Thomas.Hellstrom@linux.intel.com>,
	"dri-devel@lists.freedesktop.org"
	<dri-devel@lists.freedesktop.org>,
	"Ghimiray, Himal Prasad" <himal.prasad.ghimiray@intel.com>,
	"Gupta, saurabhg" <saurabhg.gupta@intel.com>,
	"Bommu, Krishnaiah" <krishnaiah.bommu@intel.com>,
	"Vishwanathapura,
	Niranjana" <niranjana.vishwanathapura@intel.com>,
	"intel-xe@lists.freedesktop.org" <intel-xe@lists.freedesktop.org>
Subject: RE: Making drm_gpuvm work across gpu devices
Date: Wed, 24 Jan 2024 04:14:02 +0000	[thread overview]
Message-ID: <SA1PR11MB6991C144358812EAA2EE67C2927B2@SA1PR11MB6991.namprd11.prod.outlook.com> (raw)
In-Reply-To: <SA1PR11MB69915590D8D282DA41B8783E927B2@SA1PR11MB6991.namprd11.prod.outlook.com>

Danilo,

Maybe before I give up, I should also ask, currently drm_gpuvm is designed for BO-centric world. Is it easy to make the va range split/merge work simply for va range, but without BO? Conceptually this should work as we are merge/splitting virtual address range which can be decoupled completely from BO. 

> -----Original Message-----
> From: dri-devel <dri-devel-bounces@lists.freedesktop.org> On Behalf Of Zeng,
> Oak
> Sent: Tuesday, January 23, 2024 10:57 PM
> To: Danilo Krummrich <dakr@redhat.com>; Christian König
> <christian.koenig@amd.com>; Dave Airlie <airlied@redhat.com>; Daniel Vetter
> <daniel@ffwll.ch>; Felix Kuehling <felix.kuehling@amd.com>; Welty, Brian
> <brian.welty@intel.com>
> Cc: Brost, Matthew <matthew.brost@intel.com>;
> Thomas.Hellstrom@linux.intel.com; dri-devel@lists.freedesktop.org; Ghimiray,
> Himal Prasad <himal.prasad.ghimiray@intel.com>; Gupta, saurabhg
> <saurabhg.gupta@intel.com>; Bommu, Krishnaiah
> <krishnaiah.bommu@intel.com>; Vishwanathapura, Niranjana
> <niranjana.vishwanathapura@intel.com>; intel-xe@lists.freedesktop.org
> Subject: RE: Making drm_gpuvm work across gpu devices
> 
> Thanks a lot Danilo.
> 
> Maybe I wasn't clear enough. In the solution I proposed, each device still have
> separate vm/page tables. Each device still need to manage the mapping, page
> table flags etc. It is just in svm use case, all devices share one drm_gpuvm
> instance. As I understand it, drm_gpuvm's main function is the va range split and
> merging. I don't see why it doesn't work across gpu devices.
> 
> But I read more about drm_gpuvm. Its split merge function takes a
> drm_gem_object parameter, see drm_gpuvm_sm_map_ops_create and
> drm_gpuvm_sm_map. Actually the whole drm_gpuvm is designed for BO-centric
> driver, for example, it has a drm_gpuvm_bo concept to keep track of the
> 1BO:Ngpuva mapping. The whole purpose of leveraging drm_gpuvm is to re-use
> the va split/merge functions for SVM. But in our SVM implementation, there is no
> buffer object at all. So I don't think our SVM codes can leverage drm_gpuvm.
> 
> I will give up this approach, unless Matt or Brian can see a way.
> 
> A few replies inline.... @Welty, Brian I had more thoughts inline to one of your
> original question....
> 
> > -----Original Message-----
> > From: Danilo Krummrich <dakr@redhat.com>
> > Sent: Tuesday, January 23, 2024 6:57 PM
> > To: Zeng, Oak <oak.zeng@intel.com>; Christian König
> > <christian.koenig@amd.com>; Dave Airlie <airlied@redhat.com>; Daniel Vetter
> > <daniel@ffwll.ch>; Felix Kuehling <felix.kuehling@amd.com>
> > Cc: Welty, Brian <brian.welty@intel.com>; dri-devel@lists.freedesktop.org;
> intel-
> > xe@lists.freedesktop.org; Bommu, Krishnaiah <krishnaiah.bommu@intel.com>;
> > Ghimiray, Himal Prasad <himal.prasad.ghimiray@intel.com>;
> > Thomas.Hellstrom@linux.intel.com; Vishwanathapura, Niranjana
> > <niranjana.vishwanathapura@intel.com>; Brost, Matthew
> > <matthew.brost@intel.com>; Gupta, saurabhg <saurabhg.gupta@intel.com>
> > Subject: Re: Making drm_gpuvm work across gpu devices
> >
> > Hi Oak,
> >
> > On 1/23/24 20:37, Zeng, Oak wrote:
> > > Thanks Christian. I have some comment inline below.
> > >
> > > Danilo, can you also take a look and give your feedback? Thanks.
> >
> > I agree with everything Christian already wrote. Except for the KFD parts, which
> > I'm simply not familiar with, I had exactly the same thoughts after reading your
> > initial mail.
> >
> > Please find some more comments below.
> >
> > >
> > >> -----Original Message-----
> > >> From: Christian König <christian.koenig@amd.com>
> > >> Sent: Tuesday, January 23, 2024 6:13 AM
> > >> To: Zeng, Oak <oak.zeng@intel.com>; Danilo Krummrich
> <dakr@redhat.com>;
> > >> Dave Airlie <airlied@redhat.com>; Daniel Vetter <daniel@ffwll.ch>
> > >> Cc: Welty, Brian <brian.welty@intel.com>; dri-devel@lists.freedesktop.org;
> > intel-
> > >> xe@lists.freedesktop.org; Bommu, Krishnaiah
> > <krishnaiah.bommu@intel.com>;
> > >> Ghimiray, Himal Prasad <himal.prasad.ghimiray@intel.com>;
> > >> Thomas.Hellstrom@linux.intel.com; Vishwanathapura, Niranjana
> > >> <niranjana.vishwanathapura@intel.com>; Brost, Matthew
> > >> <matthew.brost@intel.com>
> > >> Subject: Re: Making drm_gpuvm work across gpu devices
> > >>
> > >> Hi Oak,
> > >>
> > >> Am 23.01.24 um 04:21 schrieb Zeng, Oak:
> > >>> Hi Danilo and all,
> > >>>
> > >>> During the work of Intel's SVM code, we came up the idea of making
> > >> drm_gpuvm to work across multiple gpu devices. See some discussion here:
> > >> https://lore.kernel.org/dri-
> > >>
> >
> devel/PH7PR11MB70049E7E6A2F40BF6282ECC292742@PH7PR11MB7004.namprd
> > >> 11.prod.outlook.com/
> > >>>
> > >>> The reason we try to do this is, for a SVM (shared virtual memory across
> cpu
> > >> program and all gpu program on all gpu devices) process, the address space
> > has
> > >> to be across all gpu devices. So if we make drm_gpuvm to work across
> devices,
> > >> then our SVM code can leverage drm_gpuvm as well.
> > >>>
> > >>> At a first look, it seems feasible because drm_gpuvm doesn't really use the
> > >> drm_device *drm pointer a lot. This param is used only for printing/warning.
> > So I
> > >> think maybe we can delete this drm field from drm_gpuvm.
> > >>>
> > >>> This way, on a multiple gpu device system, for one process, we can have
> only
> > >> one drm_gpuvm instance, instead of multiple drm_gpuvm instances (one
> for
> > >> each gpu device).
> > >>>
> > >>> What do you think?
> > >>
> > >> Well from the GPUVM side I don't think it would make much difference if
> > >> we have the drm device or not.
> > >>
> > >> But the experience we had with the KFD I think I should mention that we
> > >> should absolutely *not* deal with multiple devices at the same time in
> > >> the UAPI or VM objects inside the driver.
> > >>
> > >> The background is that all the APIs inside the Linux kernel are build
> > >> around the idea that they work with only one device at a time. This
> > >> accounts for both low level APIs like the DMA API as well as pretty high
> > >> level things like for example file system address space etc...
> > >
> > > Yes most API are per device based.
> > >
> > > One exception I know is actually the kfd SVM API. If you look at the svm_ioctl
> > function, it is per-process based. Each kfd_process represent a process across N
> > gpu devices. Cc Felix.
> > >
> > > Need to say, kfd SVM represent a shared virtual address space across CPU
> and
> > all GPU devices on the system. This is by the definition of SVM (shared virtual
> > memory). This is very different from our legacy gpu *device* driver which
> works
> > for only one device (i.e., if you want one device to access another device's
> > memory, you will have to use dma-buf export/import etc).
> > >
> > > We have the same design requirement of SVM. For anyone who want to
> > implement the SVM concept, this is a hard requirement. Since now drm has the
> > drm_gpuvm concept which strictly speaking is designed for one device, I want
> to
> > see whether we can extend drm_gpuvm to make it work for both single device
> > (as used in xe) and multipe devices (will be used in the SVM code). That is why I
> > brought up this topic.
> > >
> > >>
> > >> So when you have multiple GPUs you either have an inseparable cluster of
> > >> them which case you would also only have one drm_device. Or you have
> > >> separated drm_device which also results in separate drm render nodes and
> > >> separate virtual address spaces and also eventually separate IOMMU
> > >> domains which gives you separate dma_addresses for the same page and so
> > >> separate GPUVM page tables....
> > >
> > > I am thinking we can still make each device has its separate
> drm_device/render
> > node/iommu domains/gpu page table. Just as what we have today. I am not
> plan
> > to change this picture.
> > >
> > > But the virtual address space will support two modes of operation:
> > > 1. one drm_gpuvm per device. This is when svm is not in the picture
> > > 2. all devices in the process share one single drm_gpuvm, when svm is in the
> > picture. In xe driver design, we have to support a mixture use of legacy mode
> > (such as gem_create and vm_bind) and svm (such as malloc'ed memory for gpu
> > submission). So whenever SVM is in the picture, we want one single process
> > address space across all devices. Drm_gpuvm doesn't need to be aware of
> those
> > two operation modes. It is driver's responsibility to use different mode.
> > >
> > > For example, in mode #1, a driver's vm structure (such as xe_vm) can inherit
> > from drm_gpuvm. In mode #2, a driver's svm structure (xe_svm in this series:
> > https://lore.kernel.org/dri-devel/20240117221223.18540-1-
> oak.zeng@intel.com/)
> > can inherit from drm_gpuvm while each xe_vm (still a per-device based struct)
> > will just have a pointer to the drm_gpuvm structure. This way when svm is in
> play,
> > we build a 1 process:1 mm_struct:1 xe_svm:N xe_vm correlations which means
> > shared address space across gpu devices.
> >
> > With a shared GPUVM structure, how do you track actual per device resources
> > such as
> > page tables? You also need to consider that the page table layout, memory
> > mapping
> > flags may vary from device to device due to different GPU chipsets or revisions.
> 
> The per device page table, flags etc are still managed per-device based, which is
> the xe_vm in the xekmd driver.
> 
> >
> > Also, if you replace the shared GPUVM structure with a pointer to a shared one,
> > you may run into all kinds of difficulties due to increasing complexity in terms
> > of locking, synchronization, lifetime and potential unwind operations in error
> > paths.
> > I haven't thought it through yet, but I wouldn't be surprised entirely if there are
> > cases where you simply run into circular dependencies.
> 
> Make sense, I can't see through this without a prove of concept code either.
> 
> >
> > Also, looking at the conversation in the linked patch series:
> >
> > <snip>
> >
> > >> For example as hmm_range_fault brings a range from host into GPU address
> > >> space,  what if it was already allocated and in use by VM_BIND for
> > >> a GEM_CREATE allocated buffer?    That is of course application error,
> > >> but KMD needs to detect it, and provide one single managed address
> > >> space across all allocations from the application....
> >
> > > This is very good question. Yes agree we should check this application error.
> > Fortunately this is doable. All vm_bind virtual address range are tracked in
> > xe_vm/drm_gpuvm struct. In this case, we should iterate the drm_gpuvm's rb
> > tree of *all* gpu devices (as xe_vm is for one device only) to see whether
> there
> > is a conflict. Will make the change soon.
> >
> > <snip>
> >
> > How do you do that if xe_vm->gpuvm is just a pointer to the GPUVM structure
> > within xe_svm?
> 
> In the proposed approach, we have a single drm_gpuvm instance for one process.
> All device's xe_vm pointing to this drm_gpuvm instance. This drm_gpuvm's rb
> tree maintains all the va range we have in this process. We can just walk this rb
> tree to see if there is a conflict.
> 
> But I didn't answer Brian's question completely... In a mixed use of vm_bind and
> malloc/mmap, the virtual address used by vm_bind should first be reserved in
> user space using mmap. So all valid virtual address should be tracked by linux
> kernel vma_struct.
> 
> Both vm_bind and malloc'ed virtual address can cause a gpu page fault. Our fault
> handler should first see whether this is a vm_bind va and service the fault
> accordingly; if not, then serve the fault in the SVM path; if SVM path also failed, it
> is an invalid address. So from user perspective, user can use:
> Ptr = mmap()
> Vm_bind(ptr, bo)
> Submit gpu kernel using ptr
> Or:
> Ptr = mmap()
> Submit gpu kernel using ptr
> Whether vm_bind is called or not decides the gpu fault handler code path.
> Hopefully this answers @Welty, Brian's original question
> 
> 
> >
> > >
> > > This requires some changes of drm_gpuvm design:
> > > 1. The drm_device *drm pointer, in mode #2 operation, this can be NULL,
> > means this drm_gpuvm is not for specific gpu device
> > > 2. The common dma_resv object: drm_gem_object *r_obj. *Does one
> > dma_resv object allocated/initialized for one device work for all devices*? From
> > first look, dma_resv is just some CPU structure maintaining dma-fences. So I
> > guess it should work for all devices? I definitely need you to comment.
> >
> > The general rule is that drivers can share the common dma_resv across GEM
> > objects that
> > are only mapped within the VM owning the dma_resv, but never within
> another
> > VM.
> >
> > Now, your question is whether multiple VMs can share the same common
> > dma_resv. I think
> > that calls for trouble, since it would create dependencies that simply aren't
> > needed
> > and might even introduce locking issues.
> >
> > However, that's optional, you can simply decide to not make use of the
> common
> > dma_resv
> > and all the optimizations based on it.
> 
> Ok, got it.
> >
> > >
> > >
> > >>
> > >> It's up to you how to implement it, but I think it's pretty clear that
> > >> you need separate drm_gpuvm objects to manage those.
> > >
> > > As explained above, I am thinking of one drm_gpuvm object across all devices
> > when SVM is in the picture...
> > >
> > >>
> > >> That you map the same thing in all those virtual address spaces at the
> > >> same address is a completely different optimization problem I think.
> > >
> > > Not sure I follow here... the requirement from SVM is, one virtual address
> > points to same physical backing store. For example, whenever CPU or any GPU
> > device access this virtual address, it refers to the same physical content. Of
> > course the physical backing store can be migrated b/t host memory and any of
> > the GPU's device memory, but the content should be consistent.
> >
> > Technically, multiple different GPUs will have separate virtual address spaces,
> it's
> > just that you create mappings within all of them such that the same virtual
> > address
> > resolves to the same physical content on all of them.
> >
> > So, having a single GPUVM instance representing all of them might give the
> > illusion of
> > a single unified address space, but you still need to maintain each device's
> > address
> > space backing resources, such as page tables, separately.
> 
> Yes agreed.
> 
> Regards,
> Oak
> >
> > - Danilo
> >
> > >
> > > So we are mapping same physical content to the same virtual address in
> either
> > cpu page table or any gpu device's page table...
> > >
> > >> What we could certainly do is to optimize hmm_range_fault by making
> > >> hmm_range a reference counted object and using it for multiple devices
> > >> at the same time if those devices request the same range of an mm_struct.
> > >>
> > >
> > > Not very follow. If you are trying to resolve a multiple devices concurrent
> access
> > problem, I think we should serialize concurrent device fault to one address
> range.
> > The reason is, during device fault handling, we might migrate the backing store
> so
> > hmm_range->hmm_pfns[] might have changed after one device access it.
> > >
> > >> I think if you start using the same drm_gpuvm for multiple devices you
> > >> will sooner or later start to run into the same mess we have seen with
> > >> KFD, where we moved more and more functionality from the KFD to the
> DRM
> > >> render node because we found that a lot of the stuff simply doesn't work
> > >> correctly with a single object to maintain the state.
> > >
> > > As I understand it, KFD is designed to work across devices. A single pseudo
> > /dev/kfd device represent all hardware gpu devices. That is why during kfd
> open,
> > many pdd (process device data) is created, each for one hardware device for
> this
> > process. Yes the codes are a little complicated.
> > >
> > > Kfd manages the shared virtual address space in the kfd driver codes, like the
> > split, merging etc. Here I am looking whether we can leverage the drm_gpuvm
> > code for those functions.
> > >
> > > As of the shared virtual address space across gpu devices, it is a hard
> > requirement for svm/system allocator (aka malloc for gpu program). We need
> to
> > make it work either at driver level or drm_gpuvm level. Drm_gpuvm is better
> > because the work can be shared b/t drivers.
> > >
> > > Thanks a lot,
> > > Oak
> > >
> > >>
> > >> Just one more point to your original discussion on the xe list: I think
> > >> it's perfectly valid for an application to map something at the same
> > >> address you already have something else.
> > >>
> > >> Cheers,
> > >> Christian.
> > >>
> > >>>
> > >>> Thanks,
> > >>> Oak
> > >


WARNING: multiple messages have this Message-ID (diff)
From: "Zeng, Oak" <oak.zeng@intel.com>
To: "Zeng, Oak" <oak.zeng@intel.com>,
	"Danilo Krummrich" <dakr@redhat.com>,
	"Christian König" <christian.koenig@amd.com>,
	"Dave Airlie" <airlied@redhat.com>,
	"Daniel Vetter" <daniel@ffwll.ch>,
	"Felix Kuehling" <felix.kuehling@amd.com>,
	"Welty, Brian" <brian.welty@intel.com>
Cc: "dri-devel@lists.freedesktop.org"
	<dri-devel@lists.freedesktop.org>,
	"intel-xe@lists.freedesktop.org" <intel-xe@lists.freedesktop.org>
Subject: RE: Making drm_gpuvm work across gpu devices
Date: Wed, 24 Jan 2024 04:14:02 +0000	[thread overview]
Message-ID: <SA1PR11MB6991C144358812EAA2EE67C2927B2@SA1PR11MB6991.namprd11.prod.outlook.com> (raw)
In-Reply-To: <SA1PR11MB69915590D8D282DA41B8783E927B2@SA1PR11MB6991.namprd11.prod.outlook.com>

Danilo,

Maybe before I give up, I should also ask, currently drm_gpuvm is designed for BO-centric world. Is it easy to make the va range split/merge work simply for va range, but without BO? Conceptually this should work as we are merge/splitting virtual address range which can be decoupled completely from BO. 

> -----Original Message-----
> From: dri-devel <dri-devel-bounces@lists.freedesktop.org> On Behalf Of Zeng,
> Oak
> Sent: Tuesday, January 23, 2024 10:57 PM
> To: Danilo Krummrich <dakr@redhat.com>; Christian König
> <christian.koenig@amd.com>; Dave Airlie <airlied@redhat.com>; Daniel Vetter
> <daniel@ffwll.ch>; Felix Kuehling <felix.kuehling@amd.com>; Welty, Brian
> <brian.welty@intel.com>
> Cc: Brost, Matthew <matthew.brost@intel.com>;
> Thomas.Hellstrom@linux.intel.com; dri-devel@lists.freedesktop.org; Ghimiray,
> Himal Prasad <himal.prasad.ghimiray@intel.com>; Gupta, saurabhg
> <saurabhg.gupta@intel.com>; Bommu, Krishnaiah
> <krishnaiah.bommu@intel.com>; Vishwanathapura, Niranjana
> <niranjana.vishwanathapura@intel.com>; intel-xe@lists.freedesktop.org
> Subject: RE: Making drm_gpuvm work across gpu devices
> 
> Thanks a lot Danilo.
> 
> Maybe I wasn't clear enough. In the solution I proposed, each device still have
> separate vm/page tables. Each device still need to manage the mapping, page
> table flags etc. It is just in svm use case, all devices share one drm_gpuvm
> instance. As I understand it, drm_gpuvm's main function is the va range split and
> merging. I don't see why it doesn't work across gpu devices.
> 
> But I read more about drm_gpuvm. Its split merge function takes a
> drm_gem_object parameter, see drm_gpuvm_sm_map_ops_create and
> drm_gpuvm_sm_map. Actually the whole drm_gpuvm is designed for BO-centric
> driver, for example, it has a drm_gpuvm_bo concept to keep track of the
> 1BO:Ngpuva mapping. The whole purpose of leveraging drm_gpuvm is to re-use
> the va split/merge functions for SVM. But in our SVM implementation, there is no
> buffer object at all. So I don't think our SVM codes can leverage drm_gpuvm.
> 
> I will give up this approach, unless Matt or Brian can see a way.
> 
> A few replies inline.... @Welty, Brian I had more thoughts inline to one of your
> original question....
> 
> > -----Original Message-----
> > From: Danilo Krummrich <dakr@redhat.com>
> > Sent: Tuesday, January 23, 2024 6:57 PM
> > To: Zeng, Oak <oak.zeng@intel.com>; Christian König
> > <christian.koenig@amd.com>; Dave Airlie <airlied@redhat.com>; Daniel Vetter
> > <daniel@ffwll.ch>; Felix Kuehling <felix.kuehling@amd.com>
> > Cc: Welty, Brian <brian.welty@intel.com>; dri-devel@lists.freedesktop.org;
> intel-
> > xe@lists.freedesktop.org; Bommu, Krishnaiah <krishnaiah.bommu@intel.com>;
> > Ghimiray, Himal Prasad <himal.prasad.ghimiray@intel.com>;
> > Thomas.Hellstrom@linux.intel.com; Vishwanathapura, Niranjana
> > <niranjana.vishwanathapura@intel.com>; Brost, Matthew
> > <matthew.brost@intel.com>; Gupta, saurabhg <saurabhg.gupta@intel.com>
> > Subject: Re: Making drm_gpuvm work across gpu devices
> >
> > Hi Oak,
> >
> > On 1/23/24 20:37, Zeng, Oak wrote:
> > > Thanks Christian. I have some comment inline below.
> > >
> > > Danilo, can you also take a look and give your feedback? Thanks.
> >
> > I agree with everything Christian already wrote. Except for the KFD parts, which
> > I'm simply not familiar with, I had exactly the same thoughts after reading your
> > initial mail.
> >
> > Please find some more comments below.
> >
> > >
> > >> -----Original Message-----
> > >> From: Christian König <christian.koenig@amd.com>
> > >> Sent: Tuesday, January 23, 2024 6:13 AM
> > >> To: Zeng, Oak <oak.zeng@intel.com>; Danilo Krummrich
> <dakr@redhat.com>;
> > >> Dave Airlie <airlied@redhat.com>; Daniel Vetter <daniel@ffwll.ch>
> > >> Cc: Welty, Brian <brian.welty@intel.com>; dri-devel@lists.freedesktop.org;
> > intel-
> > >> xe@lists.freedesktop.org; Bommu, Krishnaiah
> > <krishnaiah.bommu@intel.com>;
> > >> Ghimiray, Himal Prasad <himal.prasad.ghimiray@intel.com>;
> > >> Thomas.Hellstrom@linux.intel.com; Vishwanathapura, Niranjana
> > >> <niranjana.vishwanathapura@intel.com>; Brost, Matthew
> > >> <matthew.brost@intel.com>
> > >> Subject: Re: Making drm_gpuvm work across gpu devices
> > >>
> > >> Hi Oak,
> > >>
> > >> Am 23.01.24 um 04:21 schrieb Zeng, Oak:
> > >>> Hi Danilo and all,
> > >>>
> > >>> During the work of Intel's SVM code, we came up the idea of making
> > >> drm_gpuvm to work across multiple gpu devices. See some discussion here:
> > >> https://lore.kernel.org/dri-
> > >>
> >
> devel/PH7PR11MB70049E7E6A2F40BF6282ECC292742@PH7PR11MB7004.namprd
> > >> 11.prod.outlook.com/
> > >>>
> > >>> The reason we try to do this is, for a SVM (shared virtual memory across
> cpu
> > >> program and all gpu program on all gpu devices) process, the address space
> > has
> > >> to be across all gpu devices. So if we make drm_gpuvm to work across
> devices,
> > >> then our SVM code can leverage drm_gpuvm as well.
> > >>>
> > >>> At a first look, it seems feasible because drm_gpuvm doesn't really use the
> > >> drm_device *drm pointer a lot. This param is used only for printing/warning.
> > So I
> > >> think maybe we can delete this drm field from drm_gpuvm.
> > >>>
> > >>> This way, on a multiple gpu device system, for one process, we can have
> only
> > >> one drm_gpuvm instance, instead of multiple drm_gpuvm instances (one
> for
> > >> each gpu device).
> > >>>
> > >>> What do you think?
> > >>
> > >> Well from the GPUVM side I don't think it would make much difference if
> > >> we have the drm device or not.
> > >>
> > >> But the experience we had with the KFD I think I should mention that we
> > >> should absolutely *not* deal with multiple devices at the same time in
> > >> the UAPI or VM objects inside the driver.
> > >>
> > >> The background is that all the APIs inside the Linux kernel are build
> > >> around the idea that they work with only one device at a time. This
> > >> accounts for both low level APIs like the DMA API as well as pretty high
> > >> level things like for example file system address space etc...
> > >
> > > Yes most API are per device based.
> > >
> > > One exception I know is actually the kfd SVM API. If you look at the svm_ioctl
> > function, it is per-process based. Each kfd_process represent a process across N
> > gpu devices. Cc Felix.
> > >
> > > Need to say, kfd SVM represent a shared virtual address space across CPU
> and
> > all GPU devices on the system. This is by the definition of SVM (shared virtual
> > memory). This is very different from our legacy gpu *device* driver which
> works
> > for only one device (i.e., if you want one device to access another device's
> > memory, you will have to use dma-buf export/import etc).
> > >
> > > We have the same design requirement of SVM. For anyone who want to
> > implement the SVM concept, this is a hard requirement. Since now drm has the
> > drm_gpuvm concept which strictly speaking is designed for one device, I want
> to
> > see whether we can extend drm_gpuvm to make it work for both single device
> > (as used in xe) and multipe devices (will be used in the SVM code). That is why I
> > brought up this topic.
> > >
> > >>
> > >> So when you have multiple GPUs you either have an inseparable cluster of
> > >> them which case you would also only have one drm_device. Or you have
> > >> separated drm_device which also results in separate drm render nodes and
> > >> separate virtual address spaces and also eventually separate IOMMU
> > >> domains which gives you separate dma_addresses for the same page and so
> > >> separate GPUVM page tables....
> > >
> > > I am thinking we can still make each device has its separate
> drm_device/render
> > node/iommu domains/gpu page table. Just as what we have today. I am not
> plan
> > to change this picture.
> > >
> > > But the virtual address space will support two modes of operation:
> > > 1. one drm_gpuvm per device. This is when svm is not in the picture
> > > 2. all devices in the process share one single drm_gpuvm, when svm is in the
> > picture. In xe driver design, we have to support a mixture use of legacy mode
> > (such as gem_create and vm_bind) and svm (such as malloc'ed memory for gpu
> > submission). So whenever SVM is in the picture, we want one single process
> > address space across all devices. Drm_gpuvm doesn't need to be aware of
> those
> > two operation modes. It is driver's responsibility to use different mode.
> > >
> > > For example, in mode #1, a driver's vm structure (such as xe_vm) can inherit
> > from drm_gpuvm. In mode #2, a driver's svm structure (xe_svm in this series:
> > https://lore.kernel.org/dri-devel/20240117221223.18540-1-
> oak.zeng@intel.com/)
> > can inherit from drm_gpuvm while each xe_vm (still a per-device based struct)
> > will just have a pointer to the drm_gpuvm structure. This way when svm is in
> play,
> > we build a 1 process:1 mm_struct:1 xe_svm:N xe_vm correlations which means
> > shared address space across gpu devices.
> >
> > With a shared GPUVM structure, how do you track actual per device resources
> > such as
> > page tables? You also need to consider that the page table layout, memory
> > mapping
> > flags may vary from device to device due to different GPU chipsets or revisions.
> 
> The per device page table, flags etc are still managed per-device based, which is
> the xe_vm in the xekmd driver.
> 
> >
> > Also, if you replace the shared GPUVM structure with a pointer to a shared one,
> > you may run into all kinds of difficulties due to increasing complexity in terms
> > of locking, synchronization, lifetime and potential unwind operations in error
> > paths.
> > I haven't thought it through yet, but I wouldn't be surprised entirely if there are
> > cases where you simply run into circular dependencies.
> 
> Make sense, I can't see through this without a prove of concept code either.
> 
> >
> > Also, looking at the conversation in the linked patch series:
> >
> > <snip>
> >
> > >> For example as hmm_range_fault brings a range from host into GPU address
> > >> space,  what if it was already allocated and in use by VM_BIND for
> > >> a GEM_CREATE allocated buffer?    That is of course application error,
> > >> but KMD needs to detect it, and provide one single managed address
> > >> space across all allocations from the application....
> >
> > > This is very good question. Yes agree we should check this application error.
> > Fortunately this is doable. All vm_bind virtual address range are tracked in
> > xe_vm/drm_gpuvm struct. In this case, we should iterate the drm_gpuvm's rb
> > tree of *all* gpu devices (as xe_vm is for one device only) to see whether
> there
> > is a conflict. Will make the change soon.
> >
> > <snip>
> >
> > How do you do that if xe_vm->gpuvm is just a pointer to the GPUVM structure
> > within xe_svm?
> 
> In the proposed approach, we have a single drm_gpuvm instance for one process.
> All device's xe_vm pointing to this drm_gpuvm instance. This drm_gpuvm's rb
> tree maintains all the va range we have in this process. We can just walk this rb
> tree to see if there is a conflict.
> 
> But I didn't answer Brian's question completely... In a mixed use of vm_bind and
> malloc/mmap, the virtual address used by vm_bind should first be reserved in
> user space using mmap. So all valid virtual address should be tracked by linux
> kernel vma_struct.
> 
> Both vm_bind and malloc'ed virtual address can cause a gpu page fault. Our fault
> handler should first see whether this is a vm_bind va and service the fault
> accordingly; if not, then serve the fault in the SVM path; if SVM path also failed, it
> is an invalid address. So from user perspective, user can use:
> Ptr = mmap()
> Vm_bind(ptr, bo)
> Submit gpu kernel using ptr
> Or:
> Ptr = mmap()
> Submit gpu kernel using ptr
> Whether vm_bind is called or not decides the gpu fault handler code path.
> Hopefully this answers @Welty, Brian's original question
> 
> 
> >
> > >
> > > This requires some changes of drm_gpuvm design:
> > > 1. The drm_device *drm pointer, in mode #2 operation, this can be NULL,
> > means this drm_gpuvm is not for specific gpu device
> > > 2. The common dma_resv object: drm_gem_object *r_obj. *Does one
> > dma_resv object allocated/initialized for one device work for all devices*? From
> > first look, dma_resv is just some CPU structure maintaining dma-fences. So I
> > guess it should work for all devices? I definitely need you to comment.
> >
> > The general rule is that drivers can share the common dma_resv across GEM
> > objects that
> > are only mapped within the VM owning the dma_resv, but never within
> another
> > VM.
> >
> > Now, your question is whether multiple VMs can share the same common
> > dma_resv. I think
> > that calls for trouble, since it would create dependencies that simply aren't
> > needed
> > and might even introduce locking issues.
> >
> > However, that's optional, you can simply decide to not make use of the
> common
> > dma_resv
> > and all the optimizations based on it.
> 
> Ok, got it.
> >
> > >
> > >
> > >>
> > >> It's up to you how to implement it, but I think it's pretty clear that
> > >> you need separate drm_gpuvm objects to manage those.
> > >
> > > As explained above, I am thinking of one drm_gpuvm object across all devices
> > when SVM is in the picture...
> > >
> > >>
> > >> That you map the same thing in all those virtual address spaces at the
> > >> same address is a completely different optimization problem I think.
> > >
> > > Not sure I follow here... the requirement from SVM is, one virtual address
> > points to same physical backing store. For example, whenever CPU or any GPU
> > device access this virtual address, it refers to the same physical content. Of
> > course the physical backing store can be migrated b/t host memory and any of
> > the GPU's device memory, but the content should be consistent.
> >
> > Technically, multiple different GPUs will have separate virtual address spaces,
> it's
> > just that you create mappings within all of them such that the same virtual
> > address
> > resolves to the same physical content on all of them.
> >
> > So, having a single GPUVM instance representing all of them might give the
> > illusion of
> > a single unified address space, but you still need to maintain each device's
> > address
> > space backing resources, such as page tables, separately.
> 
> Yes agreed.
> 
> Regards,
> Oak
> >
> > - Danilo
> >
> > >
> > > So we are mapping same physical content to the same virtual address in
> either
> > cpu page table or any gpu device's page table...
> > >
> > >> What we could certainly do is to optimize hmm_range_fault by making
> > >> hmm_range a reference counted object and using it for multiple devices
> > >> at the same time if those devices request the same range of an mm_struct.
> > >>
> > >
> > > Not very follow. If you are trying to resolve a multiple devices concurrent
> access
> > problem, I think we should serialize concurrent device fault to one address
> range.
> > The reason is, during device fault handling, we might migrate the backing store
> so
> > hmm_range->hmm_pfns[] might have changed after one device access it.
> > >
> > >> I think if you start using the same drm_gpuvm for multiple devices you
> > >> will sooner or later start to run into the same mess we have seen with
> > >> KFD, where we moved more and more functionality from the KFD to the
> DRM
> > >> render node because we found that a lot of the stuff simply doesn't work
> > >> correctly with a single object to maintain the state.
> > >
> > > As I understand it, KFD is designed to work across devices. A single pseudo
> > /dev/kfd device represent all hardware gpu devices. That is why during kfd
> open,
> > many pdd (process device data) is created, each for one hardware device for
> this
> > process. Yes the codes are a little complicated.
> > >
> > > Kfd manages the shared virtual address space in the kfd driver codes, like the
> > split, merging etc. Here I am looking whether we can leverage the drm_gpuvm
> > code for those functions.
> > >
> > > As of the shared virtual address space across gpu devices, it is a hard
> > requirement for svm/system allocator (aka malloc for gpu program). We need
> to
> > make it work either at driver level or drm_gpuvm level. Drm_gpuvm is better
> > because the work can be shared b/t drivers.
> > >
> > > Thanks a lot,
> > > Oak
> > >
> > >>
> > >> Just one more point to your original discussion on the xe list: I think
> > >> it's perfectly valid for an application to map something at the same
> > >> address you already have something else.
> > >>
> > >> Cheers,
> > >> Christian.
> > >>
> > >>>
> > >>> Thanks,
> > >>> Oak
> > >


  reply	other threads:[~2024-01-24  4:14 UTC|newest]

Thread overview: 198+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-17 22:12 [PATCH 00/23] XeKmd basic SVM support Oak Zeng
2024-01-17 22:12 ` Oak Zeng
2024-01-17 22:12 ` [PATCH 01/23] drm/xe/svm: Add SVM document Oak Zeng
2024-01-17 22:12   ` Oak Zeng
2024-01-17 22:12 ` [PATCH 02/23] drm/xe/svm: Add svm key data structures Oak Zeng
2024-01-17 22:12   ` Oak Zeng
2024-01-17 22:12 ` [PATCH 03/23] drm/xe/svm: create xe svm during vm creation Oak Zeng
2024-01-17 22:12   ` Oak Zeng
2024-01-17 22:12 ` [PATCH 04/23] drm/xe/svm: Trace svm creation Oak Zeng
2024-01-17 22:12   ` Oak Zeng
2024-01-17 22:12 ` [PATCH 05/23] drm/xe/svm: add helper to retrieve svm range from address Oak Zeng
2024-01-17 22:12   ` Oak Zeng
2024-01-17 22:12 ` [PATCH 06/23] drm/xe/svm: Introduce a helper to build sg table from hmm range Oak Zeng
2024-01-17 22:12   ` Oak Zeng
2024-04-05  0:39   ` Jason Gunthorpe
2024-04-05  3:33     ` Zeng, Oak
2024-04-05 12:37       ` Jason Gunthorpe
2024-04-05 16:42         ` Zeng, Oak
2024-04-05 18:02           ` Jason Gunthorpe
2024-04-09 16:45             ` Zeng, Oak
2024-04-09 17:24               ` Jason Gunthorpe
2024-04-23 21:17                 ` Zeng, Oak
2024-04-24  2:31                   ` Matthew Brost
2024-04-24 13:57                     ` Jason Gunthorpe
2024-04-24 16:35                       ` Matthew Brost
2024-04-24 16:44                         ` Jason Gunthorpe
2024-04-24 16:56                           ` Matthew Brost
2024-04-24 17:48                             ` Jason Gunthorpe
2024-04-24 13:48                   ` Jason Gunthorpe
2024-04-24 23:59                     ` Zeng, Oak
2024-04-25  1:05                       ` Jason Gunthorpe
2024-04-26  9:55                         ` Thomas Hellström
2024-04-26 12:00                           ` Jason Gunthorpe
2024-04-26 14:49                             ` Thomas Hellström
2024-04-26 16:35                               ` Jason Gunthorpe
2024-04-29  8:25                                 ` Thomas Hellström
2024-04-30 17:30                                   ` Jason Gunthorpe
2024-04-30 18:57                                     ` Daniel Vetter
2024-05-01  0:09                                       ` Jason Gunthorpe
2024-05-02  8:04                                         ` Daniel Vetter
2024-05-02  9:11                                           ` Thomas Hellström
2024-05-02 12:46                                             ` Jason Gunthorpe
2024-05-02 15:01                                               ` Thomas Hellström
2024-05-02 19:25                                                 ` Zeng, Oak
2024-05-03 13:37                                                   ` Jason Gunthorpe
2024-05-03 14:43                                                     ` Zeng, Oak
2024-05-03 16:28                                                       ` Jason Gunthorpe
2024-05-03 20:29                                                         ` Zeng, Oak
2024-05-04  1:03                                                           ` Dave Airlie
2024-05-06 13:04                                                             ` Daniel Vetter
2024-05-06 23:50                                                               ` Matthew Brost
2024-05-07 11:56                                                                 ` Jason Gunthorpe
2024-05-06 13:33                                                           ` Jason Gunthorpe
2024-04-09 17:33               ` Matthew Brost
2024-01-17 22:12 ` [PATCH 07/23] drm/xe/svm: Add helper for binding hmm range to gpu Oak Zeng
2024-01-17 22:12   ` Oak Zeng
2024-01-17 22:12 ` [PATCH 08/23] drm/xe/svm: Add helper to invalidate svm range from GPU Oak Zeng
2024-01-17 22:12   ` Oak Zeng
2024-01-17 22:12 ` [PATCH 09/23] drm/xe/svm: Remap and provide memmap backing for GPU vram Oak Zeng
2024-01-17 22:12   ` Oak Zeng
2024-01-17 22:12 ` [PATCH 10/23] drm/xe/svm: Introduce svm migration function Oak Zeng
2024-01-17 22:12   ` Oak Zeng
2024-01-17 22:12 ` [PATCH 11/23] drm/xe/svm: implement functions to allocate and free device memory Oak Zeng
2024-01-17 22:12   ` Oak Zeng
2024-01-17 22:12 ` [PATCH 12/23] drm/xe/svm: Trace buddy block allocation and free Oak Zeng
2024-01-17 22:12   ` Oak Zeng
2024-01-17 22:12 ` [PATCH 13/23] drm/xe/svm: Handle CPU page fault Oak Zeng
2024-01-17 22:12   ` Oak Zeng
2024-01-17 22:12 ` [PATCH 14/23] drm/xe/svm: trace svm range migration Oak Zeng
2024-01-17 22:12   ` Oak Zeng
2024-01-17 22:12 ` [PATCH 15/23] drm/xe/svm: Implement functions to register and unregister mmu notifier Oak Zeng
2024-01-17 22:12   ` Oak Zeng
2024-01-17 22:12 ` [PATCH 16/23] drm/xe/svm: Implement the mmu notifier range invalidate callback Oak Zeng
2024-01-17 22:12   ` Oak Zeng
2024-01-17 22:12 ` [PATCH 17/23] drm/xe/svm: clean up svm range during process exit Oak Zeng
2024-01-17 22:12   ` Oak Zeng
2024-01-17 22:12 ` [PATCH 18/23] drm/xe/svm: Move a few structures to xe_gt.h Oak Zeng
2024-01-17 22:12   ` Oak Zeng
2024-01-17 22:12 ` [PATCH 19/23] drm/xe/svm: migrate svm range to vram Oak Zeng
2024-01-17 22:12   ` Oak Zeng
2024-01-17 22:12 ` [PATCH 20/23] drm/xe/svm: Populate svm range Oak Zeng
2024-01-17 22:12   ` Oak Zeng
2024-01-17 22:12 ` [PATCH 21/23] drm/xe/svm: GPU page fault support Oak Zeng
2024-01-17 22:12   ` Oak Zeng
2024-01-23  2:06   ` Welty, Brian
2024-01-23  2:06     ` Welty, Brian
2024-01-23  3:09     ` Zeng, Oak
2024-01-23  3:09       ` Zeng, Oak
2024-01-23  3:21       ` Making drm_gpuvm work across gpu devices Zeng, Oak
2024-01-23  3:21         ` Zeng, Oak
2024-01-23 11:13         ` Christian König
2024-01-23 11:13           ` Christian König
2024-01-23 19:37           ` Zeng, Oak
2024-01-23 19:37             ` Zeng, Oak
2024-01-23 20:17             ` Felix Kuehling
2024-01-23 20:17               ` Felix Kuehling
2024-01-25  1:39               ` Zeng, Oak
2024-01-25  1:39                 ` Zeng, Oak
2024-01-23 23:56             ` Danilo Krummrich
2024-01-23 23:56               ` Danilo Krummrich
2024-01-24  3:57               ` Zeng, Oak
2024-01-24  3:57                 ` Zeng, Oak
2024-01-24  4:14                 ` Zeng, Oak [this message]
2024-01-24  4:14                   ` Zeng, Oak
2024-01-24  6:48                   ` Christian König
2024-01-24  6:48                     ` Christian König
2024-01-25 22:13                 ` Danilo Krummrich
2024-01-25 22:13                   ` Danilo Krummrich
2024-01-24  8:33             ` Christian König
2024-01-24  8:33               ` Christian König
2024-01-25  1:17               ` Zeng, Oak
2024-01-25  1:17                 ` Zeng, Oak
2024-01-25  1:25                 ` David Airlie
2024-01-25  1:25                   ` David Airlie
2024-01-25  5:25                   ` Zeng, Oak
2024-01-25  5:25                     ` Zeng, Oak
2024-01-26 10:09                     ` Christian König
2024-01-26 10:09                       ` Christian König
2024-01-26 20:13                       ` Zeng, Oak
2024-01-26 20:13                         ` Zeng, Oak
2024-01-29 10:10                         ` Christian König
2024-01-29 10:10                           ` Christian König
2024-01-29 20:09                           ` Zeng, Oak
2024-01-29 20:09                             ` Zeng, Oak
2024-01-25 11:00                 ` 回复:Making " 周春明(日月)
2024-01-25 11:00                   ` 周春明(日月)
2024-01-25 17:00                   ` Zeng, Oak
2024-01-25 17:00                     ` Zeng, Oak
2024-01-25 17:15                 ` Making " Felix Kuehling
2024-01-25 17:15                   ` Felix Kuehling
2024-01-25 18:37                   ` Zeng, Oak
2024-01-25 18:37                     ` Zeng, Oak
2024-01-26 13:23                     ` Christian König
2024-01-26 13:23                       ` Christian König
2024-01-25 16:42               ` Zeng, Oak
2024-01-25 16:42                 ` Zeng, Oak
2024-01-25 18:32               ` Daniel Vetter
2024-01-25 18:32                 ` Daniel Vetter
2024-01-25 21:02                 ` Zeng, Oak
2024-01-25 21:02                   ` Zeng, Oak
2024-01-26  8:21                 ` Thomas Hellström
2024-01-26  8:21                   ` Thomas Hellström
2024-01-26 12:52                   ` Christian König
2024-01-26 12:52                     ` Christian König
2024-01-27  2:21                     ` Zeng, Oak
2024-01-27  2:21                       ` Zeng, Oak
2024-01-29 10:19                       ` Christian König
2024-01-29 10:19                         ` Christian König
2024-01-30  0:21                         ` Zeng, Oak
2024-01-30  0:21                           ` Zeng, Oak
2024-01-30  8:39                           ` Christian König
2024-01-30  8:39                             ` Christian König
2024-01-30 22:29                             ` Zeng, Oak
2024-01-30 22:29                               ` Zeng, Oak
2024-01-30 23:12                               ` David Airlie
2024-01-30 23:12                                 ` David Airlie
2024-01-31  9:15                                 ` Daniel Vetter
2024-01-31  9:15                                   ` Daniel Vetter
2024-01-31 20:17                                   ` Zeng, Oak
2024-01-31 20:17                                     ` Zeng, Oak
2024-01-31 20:59                                     ` Zeng, Oak
2024-01-31 20:59                                       ` Zeng, Oak
2024-02-01  8:52                                     ` Christian König
2024-02-01  8:52                                       ` Christian König
2024-02-29 18:22                                       ` Zeng, Oak
2024-03-08  4:43                                         ` Zeng, Oak
2024-03-08 10:07                                           ` Christian König
2024-01-30  8:43                           ` Thomas Hellström
2024-01-30  8:43                             ` Thomas Hellström
2024-01-29 15:03                 ` Felix Kuehling
2024-01-29 15:03                   ` Felix Kuehling
2024-01-29 15:33                   ` Christian König
2024-01-29 15:33                     ` Christian König
2024-01-29 16:24                     ` Felix Kuehling
2024-01-29 16:24                       ` Felix Kuehling
2024-01-29 16:28                       ` Christian König
2024-01-29 16:28                         ` Christian König
2024-01-29 17:52                         ` Felix Kuehling
2024-01-29 17:52                           ` Felix Kuehling
2024-01-29 19:03                           ` Christian König
2024-01-29 19:03                             ` Christian König
2024-01-29 20:24                             ` Felix Kuehling
2024-01-29 20:24                               ` Felix Kuehling
2024-02-23 20:12               ` Zeng, Oak
2024-02-27  6:54                 ` Christian König
2024-02-27 15:58                   ` Zeng, Oak
2024-02-28 19:51                     ` Zeng, Oak
2024-02-29  9:41                       ` Christian König
2024-02-29 16:05                         ` Zeng, Oak
2024-02-29 17:12                         ` Thomas Hellström
2024-03-01  7:01                           ` Christian König
2024-01-17 22:12 ` [PATCH 22/23] drm/xe/svm: Add DRM_XE_SVM kernel config entry Oak Zeng
2024-01-17 22:12   ` Oak Zeng
2024-01-17 22:12 ` [PATCH 23/23] drm/xe/svm: Add svm memory hints interface Oak Zeng
2024-01-17 22:12   ` Oak Zeng
2024-01-18  2:45 ` ✓ CI.Patch_applied: success for XeKmd basic SVM support Patchwork
2024-01-18  2:46 ` ✗ CI.checkpatch: warning " Patchwork
2024-01-18  2:46 ` ✗ CI.KUnit: failure " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=SA1PR11MB6991C144358812EAA2EE67C2927B2@SA1PR11MB6991.namprd11.prod.outlook.com \
    --to=oak.zeng@intel.com \
    --cc=Thomas.Hellstrom@linux.intel.com \
    --cc=airlied@redhat.com \
    --cc=brian.welty@intel.com \
    --cc=christian.koenig@amd.com \
    --cc=dakr@redhat.com \
    --cc=daniel@ffwll.ch \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=felix.kuehling@amd.com \
    --cc=himal.prasad.ghimiray@intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=krishnaiah.bommu@intel.com \
    --cc=matthew.brost@intel.com \
    --cc=niranjana.vishwanathapura@intel.com \
    --cc=saurabhg.gupta@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.