[PATCH 0/8] Use hmm_range_fault to populate user page

All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/8] Use hmm_range_fault to populate user page
@ 2024-03-19  2:55 Oak Zeng
  2024-03-19  2:55 ` [PATCH 1/8] drm/xe/svm: Remap and provide memmap backing for GPU vram Oak Zeng
                   ` (10 more replies)
  0 siblings, 11 replies; 21+ messages in thread
From: Oak Zeng @ 2024-03-19  2:55 UTC (permalink / raw
  To: intel-xe
  Cc: thomas.hellstrom, matthew.brost, brian.welty,
	himal.prasad.ghimiray

This is an effort to unify hmmptr (system allocator) and userptr.
A helper xe_hmm_populate_range is created to populate a user
page using hmm_range_fault, instead of using get_user_pages_fast.
This helper is then used to replace some userptr codes.

The same help will be used later for hmmptr.

This is part of the hmmptr (system allocator) codes. Since this
part can be merged separately, send it out for CI and review first.
It will be followed by other hmmptr codes.

Oak Zeng (8):
  drm/xe/svm: Remap and provide memmap backing for GPU vram
  drm/xe/svm: Add DRM_XE_SVM kernel config entry
  drm/xe: Helper to get tile from memory region
  drm/xe: Introduce a helper to get dpa from pfn
  drm/xe/svm: Get xe memory region from page
  drm/xe: Helper to populate a userptr or hmmptr
  drm/xe: Introduce a helper to free sg table
  drm/xe: Use hmm_range_fault to populate user pages

 drivers/gpu/drm/xe/Kconfig           |  21 ++
 drivers/gpu/drm/xe/Makefile          |   2 +
 drivers/gpu/drm/xe/xe_device.h       |   5 +
 drivers/gpu/drm/xe/xe_device_types.h |   8 +
 drivers/gpu/drm/xe/xe_hmm.c          | 282 +++++++++++++++++++++++++++
 drivers/gpu/drm/xe/xe_hmm.h          |  25 +++
 drivers/gpu/drm/xe/xe_mmio.c         |   6 +
 drivers/gpu/drm/xe/xe_svm.h          |  49 +++++
 drivers/gpu/drm/xe/xe_svm_devmem.c   |  89 +++++++++
 drivers/gpu/drm/xe/xe_tile.c         |   8 +
 drivers/gpu/drm/xe/xe_vm.c           | 122 ++----------
 11 files changed, 507 insertions(+), 110 deletions(-)
 create mode 100644 drivers/gpu/drm/xe/xe_hmm.c
 create mode 100644 drivers/gpu/drm/xe/xe_hmm.h
 create mode 100644 drivers/gpu/drm/xe/xe_svm.h
 create mode 100644 drivers/gpu/drm/xe/xe_svm_devmem.c

-- 
2.26.3


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 1/8] drm/xe/svm: Remap and provide memmap backing for GPU vram
  2024-03-19  2:55 [PATCH 0/8] Use hmm_range_fault to populate user page Oak Zeng
@ 2024-03-19  2:55 ` Oak Zeng
  2024-03-19  2:55 ` [PATCH 2/8] drm/xe/svm: Add DRM_XE_SVM kernel config entry Oak Zeng
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 21+ messages in thread
From: Oak Zeng @ 2024-03-19  2:55 UTC (permalink / raw
  To: intel-xe
  Cc: thomas.hellstrom, matthew.brost, brian.welty,
	himal.prasad.ghimiray

Memory remap GPU vram using devm_memremap_pages, so each GPU vram
page is backed by a struct page.

Those struct pages are created to allow hmm migrate buffer b/t
GPU vram and CPU system memory using existing Linux migration
mechanism (i.e., migrating b/t CPU system memory and hard disk).

This is prepare work to enable svm (shared virtual memory) through
Linux kernel hmm framework. The memory remap's page map type is set
to MEMORY_DEVICE_PRIVATE for now. This means even though each GPU
vram page get a struct page and can be mapped in CPU page table,
but such pages are treated as GPU's private resource, so CPU can't
access them. If CPU access such page, a page fault is triggered
and page will be migrate to system memory.

For GPU device which supports coherent memory protocol b/t CPU and
GPU (such as CXL and CAPI protocol), we can remap device memory as
MEMORY_DEVICE_COHERENT. This is TBD.

v1:
Changes per code review feedback from Matt:
    change .o order in Makefile
    fix indentation
    change code order in mmio_fini
    remove unnecessary header file
    uniform xe_svm_devm_add/_remove parameter
    use tile (vs dev) as pagemap.owner during memremap
    only remap vram for platform that support usm
Changes per review feedback from Brian:
    s/xe_svm_devm_add/xe_devm_add
    s/xe_svm_devm_remove/xe_devm_remove
    move calling of xe_devm_add to xe_tile.c

Signed-off-by: Oak Zeng <oak.zeng@intel.com>
Co-developed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@intel.com>
Cc: Brian Welty <brian.welty@intel.com>
---
 drivers/gpu/drm/xe/Makefile          |  1 +
 drivers/gpu/drm/xe/xe_device_types.h |  8 +++
 drivers/gpu/drm/xe/xe_mmio.c         |  6 ++
 drivers/gpu/drm/xe/xe_svm.h          | 15 +++++
 drivers/gpu/drm/xe/xe_svm_devmem.c   | 89 ++++++++++++++++++++++++++++
 drivers/gpu/drm/xe/xe_tile.c         |  4 ++
 6 files changed, 123 insertions(+)
 create mode 100644 drivers/gpu/drm/xe/xe_svm.h
 create mode 100644 drivers/gpu/drm/xe/xe_svm_devmem.c

diff --git a/drivers/gpu/drm/xe/Makefile b/drivers/gpu/drm/xe/Makefile
index 3c3e67885559..e2ec6d1375c0 100644
--- a/drivers/gpu/drm/xe/Makefile
+++ b/drivers/gpu/drm/xe/Makefile
@@ -128,6 +128,7 @@ xe-y += xe_bb.o \
 	xe_sa.o \
 	xe_sched_job.o \
 	xe_step.o \
+	xe_svm_devmem.o \
 	xe_sync.o \
 	xe_tile.o \
 	xe_tile_sysfs.o \
diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
index 9785eef2e5a4..607b61326c9a 100644
--- a/drivers/gpu/drm/xe/xe_device_types.h
+++ b/drivers/gpu/drm/xe/xe_device_types.h
@@ -99,6 +99,14 @@ struct xe_mem_region {
 	resource_size_t actual_physical_size;
 	/** @mapping: pointer to VRAM mappable space */
 	void __iomem *mapping;
+	/** @pagemap: Used to remap device memory as ZONE_DEVICE */
+	struct dev_pagemap pagemap;
+	/**
+	 * @hpa_base: base host physical address
+	 *
+	 * This is generated when remap device memory as ZONE_DEVICE
+	 */
+	resource_size_t hpa_base;
 };
 
 /**
diff --git a/drivers/gpu/drm/xe/xe_mmio.c b/drivers/gpu/drm/xe/xe_mmio.c
index 7ba2477452d7..525254d9369e 100644
--- a/drivers/gpu/drm/xe/xe_mmio.c
+++ b/drivers/gpu/drm/xe/xe_mmio.c
@@ -22,6 +22,7 @@
 #include "xe_module.h"
 #include "xe_sriov.h"
 #include "xe_tile.h"
+#include "xe_svm.h"
 
 #define XEHP_MTCFG_ADDR		XE_REG(0x101800)
 #define TILE_COUNT		REG_GENMASK(15, 8)
@@ -354,6 +355,11 @@ void xe_mmio_probe_tiles(struct xe_device *xe)
 static void mmio_fini(struct drm_device *drm, void *arg)
 {
 	struct xe_device *xe = arg;
+    struct xe_tile *tile;
+    u8 id;
+
+	for_each_tile(tile, xe, id)
+		xe_devm_remove(tile, &tile->mem.vram);
 
 	pci_iounmap(to_pci_dev(xe->drm.dev), xe->mmio.regs);
 	if (xe->mem.vram.mapping)
diff --git a/drivers/gpu/drm/xe/xe_svm.h b/drivers/gpu/drm/xe/xe_svm.h
new file mode 100644
index 000000000000..e944971cfc6d
--- /dev/null
+++ b/drivers/gpu/drm/xe/xe_svm.h
@@ -0,0 +1,15 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright © 2023 Intel Corporation
+ */
+
+#ifndef __XE_SVM_H
+#define __XE_SVM_H
+
+struct xe_tile;
+struct xe_mem_region;
+
+int xe_devm_add(struct xe_tile *tile, struct xe_mem_region *mr);
+void xe_devm_remove(struct xe_tile *tile, struct xe_mem_region *mr);
+
+#endif
diff --git a/drivers/gpu/drm/xe/xe_svm_devmem.c b/drivers/gpu/drm/xe/xe_svm_devmem.c
new file mode 100644
index 000000000000..f5fa07150874
--- /dev/null
+++ b/drivers/gpu/drm/xe/xe_svm_devmem.c
@@ -0,0 +1,89 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright © 2023 Intel Corporation
+ */
+
+#include <linux/mm_types.h>
+#include <linux/sched/mm.h>
+
+#include "xe_device_types.h"
+#include "xe_svm.h"
+
+
+static vm_fault_t xe_devm_migrate_to_ram(struct vm_fault *vmf)
+{
+	return 0;
+}
+
+static void xe_devm_page_free(struct page *page)
+{
+}
+
+static const struct dev_pagemap_ops xe_devm_pagemap_ops = {
+	.page_free = xe_devm_page_free,
+	.migrate_to_ram = xe_devm_migrate_to_ram,
+};
+
+/**
+ * xe_devm_add: Remap and provide memmap backing for device memory
+ * @tile: tile that the memory region blongs to
+ * @mr: memory region to remap
+ *
+ * This remap device memory to host physical address space and create
+ * struct page to back device memory
+ *
+ * Return: 0 on success standard error code otherwise
+ */
+int xe_devm_add(struct xe_tile *tile, struct xe_mem_region *mr)
+{
+	struct xe_device *xe = tile_to_xe(tile);
+	struct device *dev = &to_pci_dev(xe->drm.dev)->dev;
+	struct resource *res;
+	void *addr;
+	int ret;
+
+	res = devm_request_free_mem_region(dev, &iomem_resource,
+					   mr->usable_size);
+	if (IS_ERR(res)) {
+		ret = PTR_ERR(res);
+		return ret;
+	}
+
+	mr->pagemap.type = MEMORY_DEVICE_PRIVATE;
+	mr->pagemap.range.start = res->start;
+	mr->pagemap.range.end = res->end;
+	mr->pagemap.nr_range = 1;
+	mr->pagemap.ops = &xe_devm_pagemap_ops;
+	mr->pagemap.owner = xe;
+	addr = devm_memremap_pages(dev, &mr->pagemap);
+	if (IS_ERR(addr)) {
+		devm_release_mem_region(dev, res->start, resource_size(res));
+		ret = PTR_ERR(addr);
+		drm_err(&xe->drm, "Failed to remap tile %d memory, errno %d\n",
+				tile->id, ret);
+		return ret;
+	}
+	mr->hpa_base = res->start;
+
+	drm_info(&xe->drm, "Added tile %d memory [%llx-%llx] to devm, remapped to %pr\n",
+			tile->id, mr->io_start, mr->io_start + mr->usable_size, res);
+	return 0;
+}
+
+/**
+ * xe_devm_remove: Unmap device memory and free resources
+ * @tile: xe tile
+ * @mr: memory region to remove
+ */
+void xe_devm_remove(struct xe_tile *tile, struct xe_mem_region *mr)
+{
+	struct device *dev = &to_pci_dev(tile->xe->drm.dev)->dev;
+
+	/*FIXME: Does below cause a kernel hange during moduel remove?*/
+	if (mr->hpa_base) {
+		devm_memunmap_pages(dev, &mr->pagemap);
+		devm_release_mem_region(dev, mr->pagemap.range.start,
+			mr->pagemap.range.end - mr->pagemap.range.start +1);
+	}
+}
+
diff --git a/drivers/gpu/drm/xe/xe_tile.c b/drivers/gpu/drm/xe/xe_tile.c
index 0650b2fa75ef..f1c4f9de51df 100644
--- a/drivers/gpu/drm/xe/xe_tile.c
+++ b/drivers/gpu/drm/xe/xe_tile.c
@@ -14,6 +14,7 @@
 #include "xe_tile_sysfs.h"
 #include "xe_ttm_vram_mgr.h"
 #include "xe_wa.h"
+#include "xe_svm.h"
 
 /**
  * DOC: Multi-tile Design
@@ -158,6 +159,7 @@ static int tile_ttm_mgr_init(struct xe_tile *tile)
  */
 int xe_tile_init_noalloc(struct xe_tile *tile)
 {
+	struct xe_device *xe = tile_to_xe(tile);
 	int err;
 
 	xe_device_mem_access_get(tile_to_xe(tile));
@@ -175,6 +177,8 @@ int xe_tile_init_noalloc(struct xe_tile *tile)
 
 	xe_tile_sysfs_init(tile);
 
+	if (xe->info.has_usm)
+		xe_devm_add(tile, &tile->mem.vram);
 err_mem_access:
 	xe_device_mem_access_put(tile_to_xe(tile));
 	return err;
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 2/8] drm/xe/svm: Add DRM_XE_SVM kernel config entry
  2024-03-19  2:55 [PATCH 0/8] Use hmm_range_fault to populate user page Oak Zeng
  2024-03-19  2:55 ` [PATCH 1/8] drm/xe/svm: Remap and provide memmap backing for GPU vram Oak Zeng
@ 2024-03-19  2:55 ` Oak Zeng
  2024-03-19  9:25   ` Hellstrom, Thomas
  2024-03-19  2:55 ` [PATCH 3/8] drm/xe: Helper to get tile from memory region Oak Zeng
                   ` (8 subsequent siblings)
  10 siblings, 1 reply; 21+ messages in thread
From: Oak Zeng @ 2024-03-19  2:55 UTC (permalink / raw
  To: intel-xe
  Cc: thomas.hellstrom, matthew.brost, brian.welty,
	himal.prasad.ghimiray

DRM_XE_SVM kernel config entry is added so
xe svm feature can be configured before kernel
compilation.

Signed-off-by: Oak Zeng <oak.zeng@intel.com>
Co-developed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@intel.com>
Cc: Brian Welty <brian.welty@intel.com>
---
 drivers/gpu/drm/xe/Kconfig   | 21 +++++++++++++++++++++
 drivers/gpu/drm/xe/xe_tile.c |  4 ++++
 2 files changed, 25 insertions(+)

diff --git a/drivers/gpu/drm/xe/Kconfig b/drivers/gpu/drm/xe/Kconfig
index 1a556d087e63..e244165459c5 100644
--- a/drivers/gpu/drm/xe/Kconfig
+++ b/drivers/gpu/drm/xe/Kconfig
@@ -83,6 +83,27 @@ config DRM_XE_FORCE_PROBE
 	  4571.
 
 	  Use "!*" to block the probe of the driver for all known devices.
+config DRM_XE_SVM
+	bool "Enable Shared Virtual Memory support in xe"
+	depends on DRM_XE
+	depends on ARCH_ENABLE_MEMORY_HOTPLUG
+	depends on ARCH_ENABLE_MEMORY_HOTREMOVE
+	depends on MEMORY_HOTPLUG
+	depends on MEMORY_HOTREMOVE
+	depends on ARCH_HAS_PTE_DEVMAP
+	depends on SPARSEMEM_VMEMMAP
+	depends on ZONE_DEVICE
+	depends on DEVICE_PRIVATE
+	depends on MMU
+	select HMM_MIRROR
+	select MMU_NOTIFIER
+	default y
+	help
+	  Choose this option if you want Shared Virtual Memory (SVM)
+	  support in xe. With SVM, virtual address space is shared
+	  between CPU and GPU. This means any virtual address such
+	  as malloc or mmap returns, variables on stack, or global
+	  memory pointers, can be used for GPU transparently.
 
 menu "drm/Xe Debugging"
 depends on DRM_XE
diff --git a/drivers/gpu/drm/xe/xe_tile.c b/drivers/gpu/drm/xe/xe_tile.c
index f1c4f9de51df..b52a00a6b5d5 100644
--- a/drivers/gpu/drm/xe/xe_tile.c
+++ b/drivers/gpu/drm/xe/xe_tile.c
@@ -159,7 +159,9 @@ static int tile_ttm_mgr_init(struct xe_tile *tile)
  */
 int xe_tile_init_noalloc(struct xe_tile *tile)
 {
+#if IS_ENABLED(CONFIG_DRM_XE_SVM)
 	struct xe_device *xe = tile_to_xe(tile);
+#endif
 	int err;
 
 	xe_device_mem_access_get(tile_to_xe(tile));
@@ -177,8 +179,10 @@ int xe_tile_init_noalloc(struct xe_tile *tile)
 
 	xe_tile_sysfs_init(tile);
 
+#if IS_ENABLED(CONFIG_DRM_XE_SVM)
 	if (xe->info.has_usm)
 		xe_devm_add(tile, &tile->mem.vram);
+#endif
 err_mem_access:
 	xe_device_mem_access_put(tile_to_xe(tile));
 	return err;
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 3/8] drm/xe: Helper to get tile from memory region
  2024-03-19  2:55 [PATCH 0/8] Use hmm_range_fault to populate user page Oak Zeng
  2024-03-19  2:55 ` [PATCH 1/8] drm/xe/svm: Remap and provide memmap backing for GPU vram Oak Zeng
  2024-03-19  2:55 ` [PATCH 2/8] drm/xe/svm: Add DRM_XE_SVM kernel config entry Oak Zeng
@ 2024-03-19  2:55 ` Oak Zeng
  2024-03-19  9:28   ` Hellstrom, Thomas
  2024-03-19  2:55 ` [PATCH 4/8] drm/xe: Introduce a helper to get dpa from pfn Oak Zeng
                   ` (7 subsequent siblings)
  10 siblings, 1 reply; 21+ messages in thread
From: Oak Zeng @ 2024-03-19  2:55 UTC (permalink / raw
  To: intel-xe
  Cc: thomas.hellstrom, matthew.brost, brian.welty,
	himal.prasad.ghimiray

A simple helper to retrieve tile from memory region

v1: move the function to xe_device.h (Matt)

Signed-off-by: Oak Zeng <oak.zeng@intel.com>
---
 drivers/gpu/drm/xe/xe_device.h | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_device.h b/drivers/gpu/drm/xe/xe_device.h
index 14be34d9f543..4735268cf07f 100644
--- a/drivers/gpu/drm/xe/xe_device.h
+++ b/drivers/gpu/drm/xe/xe_device.h
@@ -176,4 +176,9 @@ void xe_device_snapshot_print(struct xe_device *xe, struct drm_printer *p);
 u64 xe_device_canonicalize_addr(struct xe_device *xe, u64 address);
 u64 xe_device_uncanonicalize_addr(struct xe_device *xe, u64 address);
 
+static inline struct xe_tile *xe_mem_region_to_tile(struct xe_mem_region *mr)
+{
+	return container_of(mr, struct xe_tile, mem.vram);
+}
+
 #endif
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 4/8] drm/xe: Introduce a helper to get dpa from pfn
  2024-03-19  2:55 [PATCH 0/8] Use hmm_range_fault to populate user page Oak Zeng
                   ` (2 preceding siblings ...)
  2024-03-19  2:55 ` [PATCH 3/8] drm/xe: Helper to get tile from memory region Oak Zeng
@ 2024-03-19  2:55 ` Oak Zeng
  2024-03-19  2:55 ` [PATCH 5/8] drm/xe/svm: Get xe memory region from page Oak Zeng
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 21+ messages in thread
From: Oak Zeng @ 2024-03-19  2:55 UTC (permalink / raw
  To: intel-xe
  Cc: thomas.hellstrom, matthew.brost, brian.welty,
	himal.prasad.ghimiray

Since we now create struct page backing for each vram page,
each vram page now also has a pfn, just like system memory.
This allow us to calcuate device physical address from pfn.

v1: move the function to xe_svm.h (Matt)
    s/vram_pfn_to_dpa/xe_mem_region_pfn_to_dpa (Matt)
    add kernel document for the helper (Thomas)

Signed-off-by: Oak Zeng <oak.zeng@intel.com>
---
 drivers/gpu/drm/xe/xe_svm.h | 27 +++++++++++++++++++++++++--
 1 file changed, 25 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_svm.h b/drivers/gpu/drm/xe/xe_svm.h
index e944971cfc6d..8a34429eb674 100644
--- a/drivers/gpu/drm/xe/xe_svm.h
+++ b/drivers/gpu/drm/xe/xe_svm.h
@@ -6,8 +6,31 @@
 #ifndef __XE_SVM_H
 #define __XE_SVM_H
 
-struct xe_tile;
-struct xe_mem_region;
+#include "xe_device_types.h"
+#include "xe_device.h"
+#include "xe_assert.h"
+
+/**
+ * xe_mem_region_pfn_to_dpa() - Calculate page's dpa from pfn
+ *
+ * @mr: The memory region that page resides in
+ * @pfn: page frame number of the page
+ *
+ * Returns: the device physical address of the page
+ */
+static inline u64 xe_mem_region_pfn_to_dpa(struct xe_mem_region *mr, u64 pfn)
+{
+	u64 dpa;
+	struct xe_tile *tile = xe_mem_region_to_tile(mr);
+	struct xe_device *xe = tile_to_xe(tile);
+	u64 offset;
+
+	xe_assert(xe, (pfn << PAGE_SHIFT) >= mr->hpa_base);
+	offset = (pfn << PAGE_SHIFT) - mr->hpa_base;
+	dpa = mr->dpa_base + offset;
+
+	return dpa;
+}
 
 int xe_devm_add(struct xe_tile *tile, struct xe_mem_region *mr);
 void xe_devm_remove(struct xe_tile *tile, struct xe_mem_region *mr);
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 5/8] drm/xe/svm: Get xe memory region from page
  2024-03-19  2:55 [PATCH 0/8] Use hmm_range_fault to populate user page Oak Zeng
                   ` (3 preceding siblings ...)
  2024-03-19  2:55 ` [PATCH 4/8] drm/xe: Introduce a helper to get dpa from pfn Oak Zeng
@ 2024-03-19  2:55 ` Oak Zeng
  2024-03-19  2:55 ` [PATCH 6/8] drm/xe: Helper to populate a userptr or hmmptr Oak Zeng
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 21+ messages in thread
From: Oak Zeng @ 2024-03-19  2:55 UTC (permalink / raw
  To: intel-xe
  Cc: thomas.hellstrom, matthew.brost, brian.welty,
	himal.prasad.ghimiray

For gpu vram page, we now have a struct page backing of
it. struct page's pgmap points to xe_memory_region's
pagemap. This allow us to retrieve xe_memory_region
from struct page.

v1: move the function to xe_svm.h

Signed-off-by: Oak Zeng <oak.zeng@intel.com>
---
 drivers/gpu/drm/xe/xe_svm.h | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_svm.h b/drivers/gpu/drm/xe/xe_svm.h
index 8a34429eb674..624c1581f8ba 100644
--- a/drivers/gpu/drm/xe/xe_svm.h
+++ b/drivers/gpu/drm/xe/xe_svm.h
@@ -6,6 +6,7 @@
 #ifndef __XE_SVM_H
 #define __XE_SVM_H
 
+#include <linux/mm_types.h>
 #include "xe_device_types.h"
 #include "xe_device.h"
 #include "xe_assert.h"
@@ -35,4 +36,14 @@ static inline u64 xe_mem_region_pfn_to_dpa(struct xe_mem_region *mr, u64 pfn)
 int xe_devm_add(struct xe_tile *tile, struct xe_mem_region *mr);
 void xe_devm_remove(struct xe_tile *tile, struct xe_mem_region *mr);
 
+/**
+ * xe_page_to_mem_region() - Get a page's memory region
+ *
+ * @page: a struct page pointer pointing to a page in vram memory region
+ */
+static inline struct xe_mem_region *xe_page_to_mem_region(struct page *page)
+{
+	return container_of(page->pgmap, struct xe_mem_region, pagemap);
+}
+
 #endif
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 6/8] drm/xe: Helper to populate a userptr or hmmptr
  2024-03-19  2:55 [PATCH 0/8] Use hmm_range_fault to populate user page Oak Zeng
                   ` (4 preceding siblings ...)
  2024-03-19  2:55 ` [PATCH 5/8] drm/xe/svm: Get xe memory region from page Oak Zeng
@ 2024-03-19  2:55 ` Oak Zeng
  2024-03-19 10:33   ` Hellstrom, Thomas
  2024-03-19  2:55 ` [PATCH 7/8] drm/xe: Introduce a helper to free sg table Oak Zeng
                   ` (4 subsequent siblings)
  10 siblings, 1 reply; 21+ messages in thread
From: Oak Zeng @ 2024-03-19  2:55 UTC (permalink / raw
  To: intel-xe
  Cc: thomas.hellstrom, matthew.brost, brian.welty,
	himal.prasad.ghimiray

Add a helper function xe_userptr_populate_range to populate
a a userptr or hmmptr range. This functions calls hmm_range_fault
to read CPU page tables and populate all pfns/pages of this
virtual address range.

If the populated page is system memory page, dma-mapping is performed
to get a dma-address which can be used later for GPU to access pages.

If the populated page is device private page, we calculate the dpa (
device physical address) of the page.

The dma-address or dpa is then saved in userptr's sg table. This is
prepare work to replace the get_user_pages_fast code in userptr code
path. The helper function will also be used to populate hmmptr later.

v1: Address review comments:
    separate a npage_in_range function (Matt)
    reparameterize function xe_userptr_populate_range function (Matt)
    move mmu_interval_read_begin() call into while loop (Thomas)
    s/mark_range_accessed/xe_mark_range_accessed (Thomas)
    use set_page_dirty_lock (vs set_page_dirty) (Thomas)
    move a few checking in xe_vma_userptr_pin_pages to hmm.c (Matt)

Signed-off-by: Oak Zeng <oak.zeng@intel.com>
Co-developed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@intel.com>
Cc: Brian Welty <brian.welty@intel.com>
---
 drivers/gpu/drm/xe/Makefile |   1 +
 drivers/gpu/drm/xe/xe_hmm.c | 231 ++++++++++++++++++++++++++++++++++++
 drivers/gpu/drm/xe/xe_hmm.h |  10 ++
 3 files changed, 242 insertions(+)
 create mode 100644 drivers/gpu/drm/xe/xe_hmm.c
 create mode 100644 drivers/gpu/drm/xe/xe_hmm.h

diff --git a/drivers/gpu/drm/xe/Makefile b/drivers/gpu/drm/xe/Makefile
index e2ec6d1375c0..52300de1e86f 100644
--- a/drivers/gpu/drm/xe/Makefile
+++ b/drivers/gpu/drm/xe/Makefile
@@ -101,6 +101,7 @@ xe-y += xe_bb.o \
 	xe_guc_pc.o \
 	xe_guc_submit.o \
 	xe_heci_gsc.o \
+	xe_hmm.o \
 	xe_hw_engine.o \
 	xe_hw_engine_class_sysfs.o \
 	xe_hw_fence.o \
diff --git a/drivers/gpu/drm/xe/xe_hmm.c b/drivers/gpu/drm/xe/xe_hmm.c
new file mode 100644
index 000000000000..305e3f2e659b
--- /dev/null
+++ b/drivers/gpu/drm/xe/xe_hmm.c
@@ -0,0 +1,231 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright © 2024 Intel Corporation
+ */
+
+#include <linux/mmu_notifier.h>
+#include <linux/dma-mapping.h>
+#include <linux/memremap.h>
+#include <linux/swap.h>
+#include <linux/hmm.h>
+#include <linux/mm.h>
+#include "xe_hmm.h"
+#include "xe_svm.h"
+#include "xe_vm.h"
+
+static inline u64 npages_in_range(unsigned long start, unsigned long end)
+{
+	return ((end - 1) >> PAGE_SHIFT) - (start >> PAGE_SHIFT) + 1;
+}
+
+/**
+ * xe_mark_range_accessed() - mark a range is accessed, so core mm
+ * have such information for memory eviction or write back to
+ * hard disk
+ *
+ * @range: the range to mark
+ * @write: if write to this range, we mark pages in this range
+ * as dirty
+ */
+static void xe_mark_range_accessed(struct hmm_range *range, bool write)
+{
+	struct page *page;
+	u64 i, npages;
+
+	npages = npages_in_range(range->start, range->end);
+	for (i = 0; i < npages; i++) {
+		page = hmm_pfn_to_page(range->hmm_pfns[i]);
+		if (write)
+			set_page_dirty_lock(page);
+
+		mark_page_accessed(page);
+	}
+}
+
+/**
+ * build_sg() - build a scatter gather table for all the physical pages/pfn
+ * in a hmm_range. dma-address is save in sg table and will be used to program
+ * GPU page table later.
+ *
+ * @xe: the xe device who will access the dma-address in sg table
+ * @range: the hmm range that we build the sg table from. range->hmm_pfns[]
+ * has the pfn numbers of pages that back up this hmm address range.
+ * @st: pointer to the sg table.
+ * @write: whether we write to this range. This decides dma map direction
+ * for system pages. If write we map it bi-diretional; otherwise
+ * DMA_TO_DEVICE
+ *
+ * All the contiguous pfns will be collapsed into one entry in
+ * the scatter gather table. This is for the convenience of
+ * later on operations to bind address range to GPU page table.
+ *
+ * The dma_address in the sg table will later be used by GPU to
+ * access memory. So if the memory is system memory, we need to
+ * do a dma-mapping so it can be accessed by GPU/DMA. If the memory
+ * is GPU local memory (of the GPU who is going to access memory),
+ * we need gpu dpa (device physical address), and there is no need
+ * of dma-mapping.
+ *
+ * FIXME: dma-mapping for peer gpu device to access remote gpu's
+ * memory. Add this when you support p2p
+ *
+ * This function allocates the storage of the sg table. It is
+ * caller's responsibility to free it calling sg_free_table.
+ *
+ * Returns 0 if successful; -ENOMEM if fails to allocate memory
+ */
+static int build_sg(struct xe_device *xe, struct hmm_range *range,
+			     struct sg_table *st, bool write)
+{
+	struct device *dev = xe->drm.dev;
+	struct scatterlist *sg;
+	u64 i, npages;
+
+	sg = NULL;
+	st->nents = 0;
+	npages = npages_in_range(range->start, range->end);
+
+	if (unlikely(sg_alloc_table(st, npages, GFP_KERNEL)))
+		return -ENOMEM;
+
+	for (i = 0; i < npages; i++) {
+		struct page *page;
+		unsigned long addr;
+		struct xe_mem_region *mr;
+
+		page = hmm_pfn_to_page(range->hmm_pfns[i]);
+		if (is_device_private_page(page)) {
+			mr = xe_page_to_mem_region(page);
+			addr = xe_mem_region_pfn_to_dpa(mr, range->hmm_pfns[i]);
+		} else {
+			addr = dma_map_page(dev, page, 0, PAGE_SIZE,
+					write ? DMA_BIDIRECTIONAL : DMA_TO_DEVICE);
+		}
+
+		if (sg && (addr == (sg_dma_address(sg) + sg->length))) {
+			sg->length += PAGE_SIZE;
+			sg_dma_len(sg) += PAGE_SIZE;
+			continue;
+		}
+
+		sg =  sg ? sg_next(sg) : st->sgl;
+		sg_dma_address(sg) = addr;
+		sg_dma_len(sg) = PAGE_SIZE;
+		sg->length = PAGE_SIZE;
+		st->nents++;
+	}
+
+	sg_mark_end(sg);
+	return 0;
+}
+
+/**
+ * xe_userptr_populate_range() - Populate physical pages of a virtual
+ * address range
+ *
+ * @uvma: userptr vma which has information of the range to populate.
+ *
+ * This function populate the physical pages of a virtual
+ * address range. The populated physical pages is saved in
+ * userptr's sg table. It is similar to get_user_pages but call
+ * hmm_range_fault.
+ *
+ * This function also read mmu notifier sequence # (
+ * mmu_interval_read_begin), for the purpose of later
+ * comparison (through mmu_interval_read_retry).
+ *
+ * This must be called with mmap read or write lock held.
+ *
+ * This function allocates the storage of the userptr sg table.
+ * It is caller's responsibility to free it calling sg_free_table.
+ *
+ * returns: 0 for succuss; negative error no on failure
+ */
+int xe_userptr_populate_range(struct xe_userptr_vma *uvma)
+{
+	unsigned long timeout =
+		jiffies + msecs_to_jiffies(HMM_RANGE_DEFAULT_TIMEOUT);
+	unsigned long *pfns, flags = HMM_PFN_REQ_FAULT;
+	struct xe_userptr *userptr;
+	struct xe_vma *vma = &uvma->vma;
+	u64 start = xe_vma_userptr(vma);
+	u64 end = start + xe_vma_size(vma);
+	struct xe_vm *vm = xe_vma_vm(vma);
+	struct hmm_range hmm_range;
+	bool write = !xe_vma_read_only(vma);
+	bool in_kthread = !current->mm;
+	u64 npages;
+	int ret;
+
+	userptr = &uvma->userptr;
+	mmap_assert_locked(userptr->notifier.mm);
+
+	if (vma->gpuva.flags & XE_VMA_DESTROYED)
+		return 0;
+
+	npages = npages_in_range(start, end);
+	pfns = kvmalloc_array(npages, sizeof(*pfns), GFP_KERNEL);
+	if (unlikely(!pfns))
+		return -ENOMEM;
+
+	if (write)
+		flags |= HMM_PFN_REQ_WRITE;
+
+	if (in_kthread) {
+		if (!mmget_not_zero(userptr->notifier.mm)) {
+			ret = -EFAULT;
+			goto free_pfns;
+		}
+		kthread_use_mm(userptr->notifier.mm);
+	}
+
+	memset64((u64 *)pfns, (u64)flags, npages);
+	hmm_range.hmm_pfns = pfns;
+	hmm_range.notifier = &userptr->notifier;
+	hmm_range.start = start;
+	hmm_range.end = end;
+	hmm_range.pfn_flags_mask = HMM_PFN_REQ_FAULT | HMM_PFN_REQ_WRITE;
+	/**
+	 * FIXME:
+	 * Set the the dev_private_owner can prevent hmm_range_fault to fault
+	 * in the device private pages owned by caller. See function
+	 * hmm_vma_handle_pte. In multiple GPU case, this should be set to the
+	 * device owner of the best migration destination. e.g., device0/vm0
+	 * has a page fault, but we have determined the best placement of
+	 * the fault address should be on device1, we should set below to
+	 * device1 instead of device0.
+	 */
+	hmm_range.dev_private_owner = vm->xe;
+
+	while (true) {
+		hmm_range.notifier_seq = mmu_interval_read_begin(&userptr->notifier);
+		ret = hmm_range_fault(&hmm_range);
+		if (time_after(jiffies, timeout))
+			break;
+
+		if (ret == -EBUSY)
+			continue;
+		break;
+	}
+
+	if (in_kthread) {
+		kthread_unuse_mm(userptr->notifier.mm);
+		mmput(userptr->notifier.mm);
+	}
+
+	if (ret)
+		goto free_pfns;
+
+	ret = build_sg(vm->xe, &hmm_range, &userptr->sgt, write);
+	if (ret)
+		goto free_pfns;
+
+	xe_mark_range_accessed(&hmm_range, write);
+	userptr->sg = &userptr->sgt;
+	userptr->notifier_seq = hmm_range.notifier_seq;
+
+free_pfns:
+	kvfree(pfns);
+	return ret;
+}
+
diff --git a/drivers/gpu/drm/xe/xe_hmm.h b/drivers/gpu/drm/xe/xe_hmm.h
new file mode 100644
index 000000000000..fa5ddc11f10b
--- /dev/null
+++ b/drivers/gpu/drm/xe/xe_hmm.h
@@ -0,0 +1,10 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright © 2024 Intel Corporation
+ */
+
+#include <linux/types.h>
+
+struct xe_userptr_vma;
+
+int xe_userptr_populate_range(struct xe_userptr_vma *uvma);
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 7/8] drm/xe: Introduce a helper to free sg table
  2024-03-19  2:55 [PATCH 0/8] Use hmm_range_fault to populate user page Oak Zeng
                   ` (5 preceding siblings ...)
  2024-03-19  2:55 ` [PATCH 6/8] drm/xe: Helper to populate a userptr or hmmptr Oak Zeng
@ 2024-03-19  2:55 ` Oak Zeng
  2024-03-19  2:55 ` [PATCH 8/8] drm/xe: Use hmm_range_fault to populate user pages Oak Zeng
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 21+ messages in thread
From: Oak Zeng @ 2024-03-19  2:55 UTC (permalink / raw
  To: intel-xe
  Cc: thomas.hellstrom, matthew.brost, brian.welty,
	himal.prasad.ghimiray

Introduce xe_userptr_free_sg helper to dma-unmap all
addresses in userptr's sg table and free sg table.

Signed-off-by: Oak Zeng <oak.zeng@intel.com>
Suggested by: Matthew Brost <matthew.brost@intel.com>
---
 drivers/gpu/drm/xe/xe_hmm.c | 59 ++++++++++++++++++++++++++++++++++---
 drivers/gpu/drm/xe/xe_hmm.h | 15 ++++++++++
 2 files changed, 70 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_hmm.c b/drivers/gpu/drm/xe/xe_hmm.c
index 305e3f2e659b..98d85b615b53 100644
--- a/drivers/gpu/drm/xe/xe_hmm.c
+++ b/drivers/gpu/drm/xe/xe_hmm.c
@@ -3,6 +3,7 @@
  * Copyright © 2024 Intel Corporation
  */
 
+#include <linux/scatterlist.h>
 #include <linux/mmu_notifier.h>
 #include <linux/dma-mapping.h>
 #include <linux/memremap.h>
@@ -13,6 +14,16 @@
 #include "xe_svm.h"
 #include "xe_vm.h"
 
+static inline unsigned long append_vram_bit_to_addr(unsigned long addr)
+{
+	return (addr | ADDR_VRAM_BIT);
+}
+
+static inline bool address_is_vram(unsigned long addr)
+{
+	return (addr & ADDR_VRAM_BIT);
+}
+
 static inline u64 npages_in_range(unsigned long start, unsigned long end)
 {
 	return ((end - 1) >> PAGE_SHIFT) - (start >> PAGE_SHIFT) + 1;
@@ -55,9 +66,11 @@ static void xe_mark_range_accessed(struct hmm_range *range, bool write)
  * for system pages. If write we map it bi-diretional; otherwise
  * DMA_TO_DEVICE
  *
- * All the contiguous pfns will be collapsed into one entry in
- * the scatter gather table. This is for the convenience of
- * later on operations to bind address range to GPU page table.
+ * If the pfns are backed by vram, all the contiguous pfns will be
+ * collapsed into one entry in the scatter gather table. This is
+ * for the convenience of later on operations to bind address
+ * range to GPU page table. pfns which are backed by system
+ * memory are not collapsed.
  *
  * The dma_address in the sg table will later be used by GPU to
  * access memory. So if the memory is system memory, we need to
@@ -97,12 +110,14 @@ static int build_sg(struct xe_device *xe, struct hmm_range *range,
 		if (is_device_private_page(page)) {
 			mr = xe_page_to_mem_region(page);
 			addr = xe_mem_region_pfn_to_dpa(mr, range->hmm_pfns[i]);
+			addr = append_vram_bit_to_addr(addr);
 		} else {
 			addr = dma_map_page(dev, page, 0, PAGE_SIZE,
 					write ? DMA_BIDIRECTIONAL : DMA_TO_DEVICE);
 		}
 
-		if (sg && (addr == (sg_dma_address(sg) + sg->length))) {
+		if (sg && is_device_private_page(page) &&
+				(addr == (sg_dma_address(sg) + sg->length))) {
 			sg->length += PAGE_SIZE;
 			sg_dma_len(sg) += PAGE_SIZE;
 			continue;
@@ -119,6 +134,39 @@ static int build_sg(struct xe_device *xe, struct hmm_range *range,
 	return 0;
 }
 
+/**
+ * xe_userptr_free_sg() - Free the scatter gather table of userptr
+ *
+ * @uvma: the userptr vma which hold the scatter gather table
+ *
+ * With function xe_userptr_populate_range, we allocate storage of
+ * the userptr sg table. This is a helper function to free this
+ * sg table, and dma unmap the address in the table.
+ */
+void xe_userptr_free_sg(struct xe_userptr_vma *uvma)
+{
+	struct xe_userptr *userptr = &uvma->userptr;
+	struct xe_vma *vma = &uvma->vma;
+	bool write = !xe_vma_read_only(vma);
+	struct xe_vm *vm = xe_vma_vm(vma);
+	struct xe_device *xe = vm->xe;
+	struct device *dev = xe->drm.dev;
+	struct scatterlist *sg;
+	unsigned long addr;
+	int i;
+
+	xe_assert(xe, userptr->sg);
+	for_each_sgtable_sg(userptr->sg, sg, i) {
+		addr = sg_dma_address(sg);
+		if (!address_is_vram(addr))
+			dma_unmap_page(dev, addr, PAGE_SIZE,
+				write ? DMA_BIDIRECTIONAL : DMA_TO_DEVICE);
+	}
+
+	sg_free_table(userptr->sg);
+	userptr->sg = NULL;
+}
+
 /**
  * xe_userptr_populate_range() - Populate physical pages of a virtual
  * address range
@@ -163,6 +211,9 @@ int xe_userptr_populate_range(struct xe_userptr_vma *uvma)
 	if (vma->gpuva.flags & XE_VMA_DESTROYED)
 		return 0;
 
+	if (userptr->sg)
+		xe_userptr_free_sg(uvma);
+
 	npages = npages_in_range(start, end);
 	pfns = kvmalloc_array(npages, sizeof(*pfns), GFP_KERNEL);
 	if (unlikely(!pfns))
diff --git a/drivers/gpu/drm/xe/xe_hmm.h b/drivers/gpu/drm/xe/xe_hmm.h
index fa5ddc11f10b..b1e61d48a1cb 100644
--- a/drivers/gpu/drm/xe/xe_hmm.h
+++ b/drivers/gpu/drm/xe/xe_hmm.h
@@ -7,4 +7,19 @@
 
 struct xe_userptr_vma;
 
+/**
+ * This bit is used during generating of userptr
+ * sg table. If a page is in vram, we append this
+ * bit to the dpa address. This information is
+ * used later to tell whether an address is vram
+ * or system memory.
+ */
+#define ADDR_VRAM_BIT (1<<0)
+
 int xe_userptr_populate_range(struct xe_userptr_vma *uvma);
+void xe_userptr_free_sg(struct xe_userptr_vma *uvma);
+
+static inline unsigned long xe_remove_vram_bit_from_addr(unsigned long addr)
+{
+	return (addr & ~ADDR_VRAM_BIT);
+}
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 8/8] drm/xe: Use hmm_range_fault to populate user pages
  2024-03-19  2:55 [PATCH 0/8] Use hmm_range_fault to populate user page Oak Zeng
                   ` (6 preceding siblings ...)
  2024-03-19  2:55 ` [PATCH 7/8] drm/xe: Introduce a helper to free sg table Oak Zeng
@ 2024-03-19  2:55 ` Oak Zeng
  2024-03-19  4:09 ` ✓ CI.Patch_applied: success for Use hmm_range_fault to populate user page (rev2) Patchwork
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 21+ messages in thread
From: Oak Zeng @ 2024-03-19  2:55 UTC (permalink / raw
  To: intel-xe
  Cc: thomas.hellstrom, matthew.brost, brian.welty,
	himal.prasad.ghimiray

This is an effort to unify hmmptr (aka system allocator)
and userptr code. hmm_range_fault is used to populate
a virtual address range for both hmmptr and userptr,
instead of hmmptr using hmm_range_fault and userptr
using get_user_pages_fast.

This also aligns with AMD gpu driver's behavior. In
long term, we plan to put some common helpers in this
area to drm layer so it can be re-used by different
vendors.

Signed-off-by: Oak Zeng <oak.zeng@intel.com>
---
 drivers/gpu/drm/xe/xe_vm.c | 122 ++++---------------------------------
 1 file changed, 12 insertions(+), 110 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index cbb9b8935c90..eb796f475b65 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -38,6 +38,7 @@
 #include "xe_sync.h"
 #include "xe_trace.h"
 #include "xe_wa.h"
+#include "xe_hmm.h"
 
 static struct drm_gem_object *xe_vm_obj(struct xe_vm *vm)
 {
@@ -65,113 +66,21 @@ int xe_vma_userptr_check_repin(struct xe_userptr_vma *uvma)
 
 int xe_vma_userptr_pin_pages(struct xe_userptr_vma *uvma)
 {
-	struct xe_userptr *userptr = &uvma->userptr;
 	struct xe_vma *vma = &uvma->vma;
 	struct xe_vm *vm = xe_vma_vm(vma);
 	struct xe_device *xe = vm->xe;
-	const unsigned long num_pages = xe_vma_size(vma) >> PAGE_SHIFT;
-	struct page **pages;
-	bool in_kthread = !current->mm;
-	unsigned long notifier_seq;
-	int pinned, ret, i;
-	bool read_only = xe_vma_read_only(vma);
+	struct xe_userptr *userptr;
+	int ret;
 
 	lockdep_assert_held(&vm->lock);
 	xe_assert(xe, xe_vma_is_userptr(vma));
-retry:
-	if (vma->gpuva.flags & XE_VMA_DESTROYED)
-		return 0;
-
-	notifier_seq = mmu_interval_read_begin(&userptr->notifier);
-	if (notifier_seq == userptr->notifier_seq)
-		return 0;
-
-	pages = kvmalloc_array(num_pages, sizeof(*pages), GFP_KERNEL);
-	if (!pages)
-		return -ENOMEM;
-
-	if (userptr->sg) {
-		dma_unmap_sgtable(xe->drm.dev,
-				  userptr->sg,
-				  read_only ? DMA_TO_DEVICE :
-				  DMA_BIDIRECTIONAL, 0);
-		sg_free_table(userptr->sg);
-		userptr->sg = NULL;
-	}
 
-	pinned = ret = 0;
-	if (in_kthread) {
-		if (!mmget_not_zero(userptr->notifier.mm)) {
-			ret = -EFAULT;
-			goto mm_closed;
-		}
-		kthread_use_mm(userptr->notifier.mm);
-	}
-
-	while (pinned < num_pages) {
-		ret = get_user_pages_fast(xe_vma_userptr(vma) +
-					  pinned * PAGE_SIZE,
-					  num_pages - pinned,
-					  read_only ? 0 : FOLL_WRITE,
-					  &pages[pinned]);
-		if (ret < 0)
-			break;
+	userptr = &uvma->userptr;
+	mmap_read_lock(userptr->notifier.mm);
+	ret = xe_userptr_populate_range(uvma);
+	mmap_read_unlock(userptr->notifier.mm);
 
-		pinned += ret;
-		ret = 0;
-	}
-
-	if (in_kthread) {
-		kthread_unuse_mm(userptr->notifier.mm);
-		mmput(userptr->notifier.mm);
-	}
-mm_closed:
-	if (ret)
-		goto out;
-
-	ret = sg_alloc_table_from_pages_segment(&userptr->sgt, pages,
-						pinned, 0,
-						(u64)pinned << PAGE_SHIFT,
-						xe_sg_segment_size(xe->drm.dev),
-						GFP_KERNEL);
-	if (ret) {
-		userptr->sg = NULL;
-		goto out;
-	}
-	userptr->sg = &userptr->sgt;
-
-	ret = dma_map_sgtable(xe->drm.dev, userptr->sg,
-			      read_only ? DMA_TO_DEVICE :
-			      DMA_BIDIRECTIONAL,
-			      DMA_ATTR_SKIP_CPU_SYNC |
-			      DMA_ATTR_NO_KERNEL_MAPPING);
-	if (ret) {
-		sg_free_table(userptr->sg);
-		userptr->sg = NULL;
-		goto out;
-	}
-
-	for (i = 0; i < pinned; ++i) {
-		if (!read_only) {
-			lock_page(pages[i]);
-			set_page_dirty(pages[i]);
-			unlock_page(pages[i]);
-		}
-
-		mark_page_accessed(pages[i]);
-	}
-
-out:
-	release_pages(pages, pinned);
-	kvfree(pages);
-
-	if (!(ret < 0)) {
-		userptr->notifier_seq = notifier_seq;
-		if (xe_vma_userptr_check_repin(uvma) == -EAGAIN)
-			goto retry;
-	}
-
-	return ret < 0 ? ret : 0;
+	return ret;
 }
 
 static bool preempt_fences_waiting(struct xe_vm *vm)
@@ -917,8 +826,6 @@ static struct xe_vma *xe_vma_create(struct xe_vm *vm,
 static void xe_vma_destroy_late(struct xe_vma *vma)
 {
 	struct xe_vm *vm = xe_vma_vm(vma);
-	struct xe_device *xe = vm->xe;
-	bool read_only = xe_vma_read_only(vma);
 
 	if (vma->ufence) {
 		xe_sync_ufence_put(vma->ufence);
@@ -926,16 +833,11 @@ static void xe_vma_destroy_late(struct xe_vma *vma)
 	}
 
 	if (xe_vma_is_userptr(vma)) {
-		struct xe_userptr *userptr = &to_userptr_vma(vma)->userptr;
+		struct xe_userptr_vma *uvma = to_userptr_vma(vma);
+		struct xe_userptr *userptr = &uvma->userptr;
 
-		if (userptr->sg) {
-			dma_unmap_sgtable(xe->drm.dev,
-					  userptr->sg,
-					  read_only ? DMA_TO_DEVICE :
-					  DMA_BIDIRECTIONAL, 0);
-			sg_free_table(userptr->sg);
-			userptr->sg = NULL;
-		}
+		if (userptr->sg)
+			xe_userptr_free_sg(uvma);
 
 		/*
 		 * Since userptr pages are not pinned, we can't remove
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* ✓ CI.Patch_applied: success for Use hmm_range_fault to populate user page (rev2)
  2024-03-19  2:55 [PATCH 0/8] Use hmm_range_fault to populate user page Oak Zeng
                   ` (7 preceding siblings ...)
  2024-03-19  2:55 ` [PATCH 8/8] drm/xe: Use hmm_range_fault to populate user pages Oak Zeng
@ 2024-03-19  4:09 ` Patchwork
  2024-03-19  4:09 ` ✗ CI.checkpatch: warning " Patchwork
  2024-03-19  4:10 ` ✗ CI.KUnit: failure " Patchwork
  10 siblings, 0 replies; 21+ messages in thread
From: Patchwork @ 2024-03-19  4:09 UTC (permalink / raw
  To: Oak Zeng; +Cc: intel-xe

== Series Details ==

Series: Use hmm_range_fault to populate user page (rev2)
URL   : https://patchwork.freedesktop.org/series/131117/
State : success

== Summary ==

=== Applying kernel patches on branch 'drm-tip' with base: ===
Base commit: e8f2ef9ff548 drm-tip: 2024y-03m-19d-00h-29m-37s UTC integration manifest
=== git am output follows ===
.git/rebase-apply/patch:182: new blank line at EOF.
+
warning: 1 line adds whitespace errors.
.git/rebase-apply/patch:257: new blank line at EOF.
+
warning: 1 line adds whitespace errors.
Applying: drm/xe/svm: Remap and provide memmap backing for GPU vram
Applying: drm/xe/svm: Add DRM_XE_SVM kernel config entry
Applying: drm/xe: Helper to get tile from memory region
Applying: drm/xe: Introduce a helper to get dpa from pfn
Applying: drm/xe/svm: Get xe memory region from page
Applying: drm/xe: Helper to populate a userptr or hmmptr
Applying: drm/xe: Introduce a helper to free sg table
Applying: drm/xe: Use hmm_range_fault to populate user pages

^ permalink raw reply	[flat|nested] 21+ messages in thread

* ✗ CI.checkpatch: warning for Use hmm_range_fault to populate user page (rev2)
  2024-03-19  2:55 [PATCH 0/8] Use hmm_range_fault to populate user page Oak Zeng
                   ` (8 preceding siblings ...)
  2024-03-19  4:09 ` ✓ CI.Patch_applied: success for Use hmm_range_fault to populate user page (rev2) Patchwork
@ 2024-03-19  4:09 ` Patchwork
  2024-03-19  4:10 ` ✗ CI.KUnit: failure " Patchwork
  10 siblings, 0 replies; 21+ messages in thread
From: Patchwork @ 2024-03-19  4:09 UTC (permalink / raw
  To: Oak Zeng; +Cc: intel-xe

== Series Details ==

Series: Use hmm_range_fault to populate user page (rev2)
URL   : https://patchwork.freedesktop.org/series/131117/
State : warning

== Summary ==

+ KERNEL=/kernel
+ git clone https://gitlab.freedesktop.org/drm/maintainer-tools mt
Cloning into 'mt'...
warning: redirecting to https://gitlab.freedesktop.org/drm/maintainer-tools.git/
+ git -C mt rev-list -n1 origin/master
a9eb1ac8298ef9f9146567c29fa762d8e9efa1ef
+ cd /kernel
+ git config --global --add safe.directory /kernel
+ git log -n1
commit e62651e1ba2b2f43b08a1ca66c3be6017b9d27eb
Author: Oak Zeng <oak.zeng@intel.com>
Date:   Mon Mar 18 22:55:11 2024 -0400

    drm/xe: Use hmm_range_fault to populate user pages
    
    This is an effort to unify hmmptr (aka system allocator)
    and userptr code. hmm_range_fault is used to populate
    a virtual address range for both hmmptr and userptr,
    instead of hmmptr using hmm_range_fault and userptr
    using get_user_pages_fast.
    
    This also aligns with AMD gpu driver's behavior. In
    long term, we plan to put some common helpers in this
    area to drm layer so it can be re-used by different
    vendors.
    
    Signed-off-by: Oak Zeng <oak.zeng@intel.com>
+ /mt/dim checkpatch e8f2ef9ff5486b3bbfc599bf89e10e3bdecd96b9 drm-intel
1524f5c3349e drm/xe/svm: Remap and provide memmap backing for GPU vram
Traceback (most recent call last):
  File "scripts/spdxcheck.py", line 6, in <module>
    from ply import lex, yacc
ModuleNotFoundError: No module named 'ply'
-:96: WARNING:LEADING_SPACE: please, no spaces at the start of a line
#96: FILE: drivers/gpu/drm/xe/xe_mmio.c:358:
+    struct xe_tile *tile;$

-:97: WARNING:LEADING_SPACE: please, no spaces at the start of a line
#97: FILE: drivers/gpu/drm/xe/xe_mmio.c:359:
+    u8 id;$

-:105: WARNING:FILE_PATH_CHANGES: added, moved or deleted file(s), does MAINTAINERS need updating?
#105: 
new file mode 100644

-:110: WARNING:SPDX_LICENSE_TAG: Improper SPDX comment style for 'drivers/gpu/drm/xe/xe_svm.h', please use '/*' instead
#110: FILE: drivers/gpu/drm/xe/xe_svm.h:1:
+// SPDX-License-Identifier: MIT

-:110: WARNING:SPDX_LICENSE_TAG: Missing or malformed SPDX-License-Identifier tag in line 1
#110: FILE: drivers/gpu/drm/xe/xe_svm.h:1:
+// SPDX-License-Identifier: MIT

-:142: CHECK:LINE_SPACING: Please don't use multiple blank lines
#142: FILE: drivers/gpu/drm/xe/xe_svm_devmem.c:12:
+
+

-:193: CHECK:PARENTHESIS_ALIGNMENT: Alignment should match open parenthesis
#193: FILE: drivers/gpu/drm/xe/xe_svm_devmem.c:63:
+		drm_err(&xe->drm, "Failed to remap tile %d memory, errno %d\n",
+				tile->id, ret);

-:199: CHECK:PARENTHESIS_ALIGNMENT: Alignment should match open parenthesis
#199: FILE: drivers/gpu/drm/xe/xe_svm_devmem.c:69:
+	drm_info(&xe->drm, "Added tile %d memory [%llx-%llx] to devm, remapped to %pr\n",
+			tile->id, mr->io_start, mr->io_start + mr->usable_size, res);

-:216: CHECK:PARENTHESIS_ALIGNMENT: Alignment should match open parenthesis
#216: FILE: drivers/gpu/drm/xe/xe_svm_devmem.c:86:
+		devm_release_mem_region(dev, mr->pagemap.range.start,
+			mr->pagemap.range.end - mr->pagemap.range.start +1);

-:216: CHECK:SPACING: spaces preferred around that '+' (ctx:WxV)
#216: FILE: drivers/gpu/drm/xe/xe_svm_devmem.c:86:
+			mr->pagemap.range.end - mr->pagemap.range.start +1);
 			                                                ^

total: 0 errors, 5 warnings, 5 checks, 165 lines checked
e33865eb68f6 drm/xe/svm: Add DRM_XE_SVM kernel config entry
1fdd2cba011d drm/xe: Helper to get tile from memory region
39f0fd7ecfca drm/xe: Introduce a helper to get dpa from pfn
52a1d1f641ec drm/xe/svm: Get xe memory region from page
08aa488367d1 drm/xe: Helper to populate a userptr or hmmptr
Traceback (most recent call last):
  File "scripts/spdxcheck.py", line 6, in <module>
    from ply import lex, yacc
ModuleNotFoundError: No module named 'ply'
-:52: WARNING:FILE_PATH_CHANGES: added, moved or deleted file(s), does MAINTAINERS need updating?
#52: 
new file mode 100644

-:134: CHECK:PARENTHESIS_ALIGNMENT: Alignment should match open parenthesis
#134: FILE: drivers/gpu/drm/xe/xe_hmm.c:78:
+static int build_sg(struct xe_device *xe, struct hmm_range *range,
+			     struct sg_table *st, bool write)

-:158: CHECK:PARENTHESIS_ALIGNMENT: Alignment should match open parenthesis
#158: FILE: drivers/gpu/drm/xe/xe_hmm.c:102:
+			addr = dma_map_page(dev, page, 0, PAGE_SIZE,
+					write ? DMA_BIDIRECTIONAL : DMA_TO_DEVICE);

-:246: WARNING:REPEATED_WORD: Possible repeated word: 'the'
#246: FILE: drivers/gpu/drm/xe/xe_hmm.c:190:
+	 * Set the the dev_private_owner can prevent hmm_range_fault to fault

-:294: WARNING:SPDX_LICENSE_TAG: Improper SPDX comment style for 'drivers/gpu/drm/xe/xe_hmm.h', please use '/*' instead
#294: FILE: drivers/gpu/drm/xe/xe_hmm.h:1:
+// SPDX-License-Identifier: MIT

-:294: WARNING:SPDX_LICENSE_TAG: Missing or malformed SPDX-License-Identifier tag in line 1
#294: FILE: drivers/gpu/drm/xe/xe_hmm.h:1:
+// SPDX-License-Identifier: MIT

total: 0 errors, 4 warnings, 2 checks, 248 lines checked
957bae77ae43 drm/xe: Introduce a helper to free sg table
-:68: CHECK:PARENTHESIS_ALIGNMENT: Alignment should match open parenthesis
#68: FILE: drivers/gpu/drm/xe/xe_hmm.c:120:
+		if (sg && is_device_private_page(page) &&
+				(addr == (sg_dma_address(sg) + sg->length))) {

-:102: CHECK:PARENTHESIS_ALIGNMENT: Alignment should match open parenthesis
#102: FILE: drivers/gpu/drm/xe/xe_hmm.c:163:
+			dma_unmap_page(dev, addr, PAGE_SIZE,
+				write ? DMA_BIDIRECTIONAL : DMA_TO_DEVICE);

-:137: CHECK:SPACING: spaces preferred around that '<<' (ctx:VxV)
#137: FILE: drivers/gpu/drm/xe/xe_hmm.h:17:
+#define ADDR_VRAM_BIT (1<<0)
                         ^

total: 0 errors, 0 warnings, 3 checks, 119 lines checked
e62651e1ba2b drm/xe: Use hmm_range_fault to populate user pages



^ permalink raw reply	[flat|nested] 21+ messages in thread

* ✗ CI.KUnit: failure for Use hmm_range_fault to populate user page (rev2)
  2024-03-19  2:55 [PATCH 0/8] Use hmm_range_fault to populate user page Oak Zeng
                   ` (9 preceding siblings ...)
  2024-03-19  4:09 ` ✗ CI.checkpatch: warning " Patchwork
@ 2024-03-19  4:10 ` Patchwork
  10 siblings, 0 replies; 21+ messages in thread
From: Patchwork @ 2024-03-19  4:10 UTC (permalink / raw
  To: Oak Zeng; +Cc: intel-xe

== Series Details ==

Series: Use hmm_range_fault to populate user page (rev2)
URL   : https://patchwork.freedesktop.org/series/131117/
State : failure

== Summary ==

+ trap cleanup EXIT
+ /kernel/tools/testing/kunit/kunit.py run --kunitconfig /kernel/drivers/gpu/drm/xe/.kunitconfig
ERROR:root:../arch/x86/um/user-offsets.c:17:6: warning: no previous prototype for ‘foo’ [-Wmissing-prototypes]
   17 | void foo(void)
      |      ^~~
In file included from ../arch/um/kernel/asm-offsets.c:1:
../arch/x86/um/shared/sysdep/kernel-offsets.h:9:6: warning: no previous prototype for ‘foo’ [-Wmissing-prototypes]
    9 | void foo(void)
      |      ^~~
../arch/x86/um/fault.c:18:5: warning: no previous prototype for ‘arch_fixup’ [-Wmissing-prototypes]
   18 | int arch_fixup(unsigned long address, struct uml_pt_regs *regs)
      |     ^~~~~~~~~~
../arch/x86/um/bugs_64.c:9:6: warning: no previous prototype for ‘arch_check_bugs’ [-Wmissing-prototypes]
    9 | void arch_check_bugs(void)
      |      ^~~~~~~~~~~~~~~
../arch/x86/um/bugs_64.c:13:6: warning: no previous prototype for ‘arch_examine_signal’ [-Wmissing-prototypes]
   13 | void arch_examine_signal(int sig, struct uml_pt_regs *regs)
      |      ^~~~~~~~~~~~~~~~~~~
../arch/x86/um/os-Linux/registers.c:146:15: warning: no previous prototype for ‘get_thread_reg’ [-Wmissing-prototypes]
  146 | unsigned long get_thread_reg(int reg, jmp_buf *buf)
      |               ^~~~~~~~~~~~~~
../arch/x86/um/vdso/um_vdso.c:16:5: warning: no previous prototype for ‘__vdso_clock_gettime’ [-Wmissing-prototypes]
   16 | int __vdso_clock_gettime(clockid_t clock, struct __kernel_old_timespec *ts)
      |     ^~~~~~~~~~~~~~~~~~~~
../arch/x86/um/vdso/um_vdso.c:30:5: warning: no previous prototype for ‘__vdso_gettimeofday’ [-Wmissing-prototypes]
   30 | int __vdso_gettimeofday(struct __kernel_old_timeval *tv, struct timezone *tz)
      |     ^~~~~~~~~~~~~~~~~~~
../arch/x86/um/vdso/um_vdso.c:44:21: warning: no previous prototype for ‘__vdso_time’ [-Wmissing-prototypes]
   44 | __kernel_old_time_t __vdso_time(__kernel_old_time_t *t)
      |                     ^~~~~~~~~~~
../arch/x86/um/vdso/um_vdso.c:57:1: warning: no previous prototype for ‘__vdso_getcpu’ [-Wmissing-prototypes]
   57 | __vdso_getcpu(unsigned *cpu, unsigned *node, struct getcpu_cache *unused)
      | ^~~~~~~~~~~~~
../arch/x86/um/os-Linux/mcontext.c:7:6: warning: no previous prototype for ‘get_regs_from_mc’ [-Wmissing-prototypes]
    7 | void get_regs_from_mc(struct uml_pt_regs *regs, mcontext_t *mc)
      |      ^~~~~~~~~~~~~~~~
../arch/um/os-Linux/skas/process.c:107:6: warning: no previous prototype for ‘wait_stub_done’ [-Wmissing-prototypes]
  107 | void wait_stub_done(int pid)
      |      ^~~~~~~~~~~~~~
../arch/um/os-Linux/skas/process.c:683:6: warning: no previous prototype for ‘__switch_mm’ [-Wmissing-prototypes]
  683 | void __switch_mm(struct mm_id *mm_idp)
      |      ^~~~~~~~~~~
../arch/um/kernel/skas/process.c:36:12: warning: no previous prototype for ‘start_uml’ [-Wmissing-prototypes]
   36 | int __init start_uml(void)
      |            ^~~~~~~~~
../arch/um/kernel/skas/mmu.c:17:5: warning: no previous prototype for ‘init_new_context’ [-Wmissing-prototypes]
   17 | int init_new_context(struct task_struct *task, struct mm_struct *mm)
      |     ^~~~~~~~~~~~~~~~
../arch/um/kernel/skas/mmu.c:60:6: warning: no previous prototype for ‘destroy_context’ [-Wmissing-prototypes]
   60 | void destroy_context(struct mm_struct *mm)
      |      ^~~~~~~~~~~~~~~
../arch/um/os-Linux/main.c:187:7: warning: no previous prototype for ‘__wrap_malloc’ [-Wmissing-prototypes]
  187 | void *__wrap_malloc(int size)
      |       ^~~~~~~~~~~~~
../arch/um/os-Linux/main.c:208:7: warning: no previous prototype for ‘__wrap_calloc’ [-Wmissing-prototypes]
  208 | void *__wrap_calloc(int n, int size)
      |       ^~~~~~~~~~~~~
../arch/um/os-Linux/main.c:222:6: warning: no previous prototype for ‘__wrap_free’ [-Wmissing-prototypes]
  222 | void __wrap_free(void *ptr)
      |      ^~~~~~~~~~~
../arch/um/os-Linux/mem.c:28:6: warning: no previous prototype for ‘kasan_map_memory’ [-Wmissing-prototypes]
   28 | void kasan_map_memory(void *start, size_t len)
      |      ^~~~~~~~~~~~~~~~
../arch/um/os-Linux/mem.c:212:13: warning: no previous prototype for ‘check_tmpexec’ [-Wmissing-prototypes]
  212 | void __init check_tmpexec(void)
      |             ^~~~~~~~~~~~~
../arch/x86/um/ptrace_64.c:111:5: warning: no previous prototype for ‘poke_user’ [-Wmissing-prototypes]
  111 | int poke_user(struct task_struct *child, long addr, long data)
      |     ^~~~~~~~~
../arch/x86/um/ptrace_64.c:171:5: warning: no previous prototype for ‘peek_user’ [-Wmissing-prototypes]
  171 | int peek_user(struct task_struct *child, long addr, long data)
      |     ^~~~~~~~~
../arch/um/os-Linux/signal.c:75:6: warning: no previous prototype for ‘sig_handler’ [-Wmissing-prototypes]
   75 | void sig_handler(int sig, struct siginfo *si, mcontext_t *mc)
      |      ^~~~~~~~~~~
../arch/um/os-Linux/signal.c:111:6: warning: no previous prototype for ‘timer_alarm_handler’ [-Wmissing-prototypes]
  111 | void timer_alarm_handler(int sig, struct siginfo *unused_si, mcontext_t *mc)
      |      ^~~~~~~~~~~~~~~~~~~
../arch/x86/um/signal.c:560:6: warning: no previous prototype for ‘sys_rt_sigreturn’ [-Wmissing-prototypes]
  560 | long sys_rt_sigreturn(void)
      |      ^~~~~~~~~~~~~~~~
../arch/um/os-Linux/start_up.c:301:12: warning: no previous prototype for ‘parse_iomem’ [-Wmissing-prototypes]
  301 | int __init parse_iomem(char *str, int *add)
      |            ^~~~~~~~~~~
../arch/x86/um/syscalls_64.c:48:6: warning: no previous prototype for ‘arch_switch_to’ [-Wmissing-prototypes]
   48 | void arch_switch_to(struct task_struct *to)
      |      ^~~~~~~~~~~~~~
../arch/um/kernel/mem.c:202:8: warning: no previous prototype for ‘pgd_alloc’ [-Wmissing-prototypes]
  202 | pgd_t *pgd_alloc(struct mm_struct *mm)
      |        ^~~~~~~~~
../arch/um/kernel/mem.c:215:7: warning: no previous prototype for ‘uml_kmalloc’ [-Wmissing-prototypes]
  215 | void *uml_kmalloc(int size, int flags)
      |       ^~~~~~~~~~~
../arch/um/kernel/process.c:51:5: warning: no previous prototype for ‘pid_to_processor_id’ [-Wmissing-prototypes]
   51 | int pid_to_processor_id(int pid)
      |     ^~~~~~~~~~~~~~~~~~~
../arch/um/kernel/process.c:87:7: warning: no previous prototype for ‘__switch_to’ [-Wmissing-prototypes]
   87 | void *__switch_to(struct task_struct *from, struct task_struct *to)
      |       ^~~~~~~~~~~
../arch/um/kernel/process.c:140:6: warning: no previous prototype for ‘fork_handler’ [-Wmissing-prototypes]
  140 | void fork_handler(void)
      |      ^~~~~~~~~~~~
../arch/um/kernel/process.c:217:6: warning: no previous prototype for ‘arch_cpu_idle’ [-Wmissing-prototypes]
  217 | void arch_cpu_idle(void)
      |      ^~~~~~~~~~~~~
../arch/um/kernel/process.c:253:5: warning: no previous prototype for ‘copy_to_user_proc’ [-Wmissing-prototypes]
  253 | int copy_to_user_proc(void __user *to, void *from, int size)
      |     ^~~~~~~~~~~~~~~~~
../arch/um/kernel/process.c:263:5: warning: no previous prototype for ‘clear_user_proc’ [-Wmissing-prototypes]
  263 | int clear_user_proc(void __user *buf, int size)
      |     ^~~~~~~~~~~~~~~
../arch/um/kernel/process.c:271:6: warning: no previous prototype for ‘set_using_sysemu’ [-Wmissing-prototypes]
  271 | void set_using_sysemu(int value)
      |      ^~~~~~~~~~~~~~~~
../arch/um/kernel/process.c:278:5: warning: no previous prototype for ‘get_using_sysemu’ [-Wmissing-prototypes]
  278 | int get_using_sysemu(void)
      |     ^~~~~~~~~~~~~~~~
../arch/um/kernel/process.c:316:12: warning: no previous prototype for ‘make_proc_sysemu’ [-Wmissing-prototypes]
  316 | int __init make_proc_sysemu(void)
      |            ^~~~~~~~~~~~~~~~
../arch/um/kernel/process.c:348:15: warning: no previous prototype for ‘arch_align_stack’ [-Wmissing-prototypes]
  348 | unsigned long arch_align_stack(unsigned long sp)
      |               ^~~~~~~~~~~~~~~~
../arch/um/kernel/reboot.c:45:6: warning: no previous prototype for ‘machine_restart’ [-Wmissing-prototypes]
   45 | void machine_restart(char * __unused)
      |      ^~~~~~~~~~~~~~~
../arch/um/kernel/reboot.c:51:6: warning: no previous prototype for ‘machine_power_off’ [-Wmissing-prototypes]
   51 | void machine_power_off(void)
      |      ^~~~~~~~~~~~~~~~~
../arch/um/kernel/reboot.c:57:6: warning: no previous prototype for ‘machine_halt’ [-Wmissing-prototypes]
   57 | void machine_halt(void)
      |      ^~~~~~~~~~~~
../arch/um/kernel/tlb.c:579:6: warning: no previous prototype for ‘flush_tlb_mm_range’ [-Wmissing-prototypes]
  579 | void flush_tlb_mm_range(struct mm_struct *mm, unsigned long start,
      |      ^~~~~~~~~~~~~~~~~~
../arch/um/kernel/tlb.c:594:6: warning: no previous prototype for ‘force_flush_all’ [-Wmissing-prototypes]
  594 | void force_flush_all(void)
      |      ^~~~~~~~~~~~~~~
../arch/um/kernel/um_arch.c:408:19: warning: no previous prototype for ‘read_initrd’ [-Wmissing-prototypes]
  408 | int __init __weak read_initrd(void)
      |                   ^~~~~~~~~~~
../arch/um/kernel/um_arch.c:461:7: warning: no previous prototype for ‘text_poke’ [-Wmissing-prototypes]
  461 | void *text_poke(void *addr, const void *opcode, size_t len)
      |       ^~~~~~~~~
../arch/um/kernel/um_arch.c:473:6: warning: no previous prototype for ‘text_poke_sync’ [-Wmissing-prototypes]
  473 | void text_poke_sync(void)
      |      ^~~~~~~~~~~~~~
../arch/um/kernel/kmsg_dump.c:60:12: warning: no previous prototype for ‘kmsg_dumper_stdout_init’ [-Wmissing-prototypes]
   60 | int __init kmsg_dumper_stdout_init(void)
      |            ^~~~~~~~~~~~~~~~~~~~~~~
../lib/iomap.c:156:5: warning: no previous prototype for ‘ioread64_lo_hi’ [-Wmissing-prototypes]
  156 | u64 ioread64_lo_hi(const void __iomem *addr)
      |     ^~~~~~~~~~~~~~
../lib/iomap.c:163:5: warning: no previous prototype for ‘ioread64_hi_lo’ [-Wmissing-prototypes]
  163 | u64 ioread64_hi_lo(const void __iomem *addr)
      |     ^~~~~~~~~~~~~~
../lib/iomap.c:170:5: warning: no previous prototype for ‘ioread64be_lo_hi’ [-Wmissing-prototypes]
  170 | u64 ioread64be_lo_hi(const void __iomem *addr)
      |     ^~~~~~~~~~~~~~~~
../lib/iomap.c:178:5: warning: no previous prototype for ‘ioread64be_hi_lo’ [-Wmissing-prototypes]
  178 | u64 ioread64be_hi_lo(const void __iomem *addr)
      |     ^~~~~~~~~~~~~~~~
../lib/iomap.c:264:6: warning: no previous prototype for ‘iowrite64_lo_hi’ [-Wmissing-prototypes]
  264 | void iowrite64_lo_hi(u64 val, void __iomem *addr)
      |      ^~~~~~~~~~~~~~~
../lib/iomap.c:272:6: warning: no previous prototype for ‘iowrite64_hi_lo’ [-Wmissing-prototypes]
  272 | void iowrite64_hi_lo(u64 val, void __iomem *addr)
      |      ^~~~~~~~~~~~~~~
../lib/iomap.c:280:6: warning: no previous prototype for ‘iowrite64be_lo_hi’ [-Wmissing-prototypes]
  280 | void iowrite64be_lo_hi(u64 val, void __iomem *addr)
      |      ^~~~~~~~~~~~~~~~~
../lib/iomap.c:288:6: warning: no previous prototype for ‘iowrite64be_hi_lo’ [-Wmissing-prototypes]
  288 | void iowrite64be_hi_lo(u64 val, void __iomem *addr)
      |      ^~~~~~~~~~~~~~~~~
/usr/bin/ld: drivers/gpu/drm/xe/xe_hmm.o: in function `xe_userptr_populate_range':
xe_hmm.c:(.text+0x233): undefined reference to `hmm_range_fault'
/usr/bin/ld: drivers/gpu/drm/xe/xe_svm_devmem.o: in function `xe_devm_add':
xe_svm_devmem.c:(.text+0x26): undefined reference to `devm_request_free_mem_region'
collect2: error: ld returned 1 exit status
make[3]: *** [../scripts/Makefile.vmlinux:37: vmlinux] Error 1
make[2]: *** [/kernel/Makefile:1162: vmlinux] Error 2
make[1]: *** [/kernel/Makefile:240: __sub-make] Error 2
make: *** [Makefile:240: __sub-make] Error 2

[04:09:38] Configuring KUnit Kernel ...
Generating .config ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
[04:09:42] Building KUnit Kernel ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
Building with:
$ make ARCH=um O=.kunit --jobs=48
+ cleanup
++ stat -c %u:%g /kernel
+ chown -R 1003:1003 /kernel



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/8] drm/xe/svm: Add DRM_XE_SVM kernel config entry
  2024-03-19  2:55 ` [PATCH 2/8] drm/xe/svm: Add DRM_XE_SVM kernel config entry Oak Zeng
@ 2024-03-19  9:25   ` Hellstrom, Thomas
  2024-03-19 18:27     ` Zeng, Oak
  0 siblings, 1 reply; 21+ messages in thread
From: Hellstrom, Thomas @ 2024-03-19  9:25 UTC (permalink / raw
  To: intel-xe@lists.freedesktop.org, Zeng,  Oak
  Cc: Brost, Matthew, Welty, Brian, Ghimiray, Himal Prasad

Hi, Oak,

On Mon, 2024-03-18 at 22:55 -0400, Oak Zeng wrote:
> DRM_XE_SVM kernel config entry is added so
> xe svm feature can be configured before kernel
> compilation.

Please use imperative language in commit messages.


> 
> Signed-off-by: Oak Zeng <oak.zeng@intel.com>
> Co-developed-by: Niranjana Vishwanathapura
> <niranjana.vishwanathapura@intel.com>
> Signed-off-by: Niranjana Vishwanathapura
> <niranjana.vishwanathapura@intel.com>
> Cc: Matthew Brost <matthew.brost@intel.com>
> Cc: Thomas Hellström <thomas.hellstrom@intel.com>
> Cc: Brian Welty <brian.welty@intel.com>
> ---
>  drivers/gpu/drm/xe/Kconfig   | 21 +++++++++++++++++++++
>  drivers/gpu/drm/xe/xe_tile.c |  4 ++++
>  2 files changed, 25 insertions(+)
> 
> diff --git a/drivers/gpu/drm/xe/Kconfig b/drivers/gpu/drm/xe/Kconfig
> index 1a556d087e63..e244165459c5 100644
> --- a/drivers/gpu/drm/xe/Kconfig
> +++ b/drivers/gpu/drm/xe/Kconfig
> @@ -83,6 +83,27 @@ config DRM_XE_FORCE_PROBE
>  	  4571.
>  
>  	  Use "!*" to block the probe of the driver for all known
> devices.
> +config DRM_XE_SVM
> +	bool "Enable Shared Virtual Memory support in xe"
> +	depends on DRM_XE
> +	depends on ARCH_ENABLE_MEMORY_HOTPLUG
> +	depends on ARCH_ENABLE_MEMORY_HOTREMOVE
> +	depends on MEMORY_HOTPLUG
> +	depends on MEMORY_HOTREMOVE
> +	depends on ARCH_HAS_PTE_DEVMAP
> +	depends on SPARSEMEM_VMEMMAP
> +	depends on ZONE_DEVICE
> +	depends on DEVICE_PRIVATE
> +	depends on MMU
> +	select HMM_MIRROR
> +	select MMU_NOTIFIER
> +	default y
> +	help
> +	  Choose this option if you want Shared Virtual Memory (SVM)
> +	  support in xe. With SVM, virtual address space is shared
> +	  between CPU and GPU. This means any virtual address such
> +	  as malloc or mmap returns, variables on stack, or global
> +	  memory pointers, can be used for GPU transparently.
>  
>  menu "drm/Xe Debugging"
>  depends on DRM_XE
> diff --git a/drivers/gpu/drm/xe/xe_tile.c
> b/drivers/gpu/drm/xe/xe_tile.c
> index f1c4f9de51df..b52a00a6b5d5 100644
> --- a/drivers/gpu/drm/xe/xe_tile.c
> +++ b/drivers/gpu/drm/xe/xe_tile.c
> @@ -159,7 +159,9 @@ static int tile_ttm_mgr_init(struct xe_tile
> *tile)
>   */
>  int xe_tile_init_noalloc(struct xe_tile *tile)
>  {
> +#if IS_ENABLED(CONFIG_DRM_XE_SVM)
>  	struct xe_device *xe = tile_to_xe(tile);
> +#endif

Avoid using conditional compilation in this way inside functions if
possible, please see below:

>  	int err;
>  
>  	xe_device_mem_access_get(tile_to_xe(tile));
> @@ -177,8 +179,10 @@ int xe_tile_init_noalloc(struct xe_tile *tile)
>  
>  	xe_tile_sysfs_init(tile);
>  
> +#if IS_ENABLED(CONFIG_DRM_XE_SVM)
>  	if (xe->info.has_usm)

Could be:
	if (IS_ENABLED(CONFIG_DRM_XE_SVM) && tile_to_xe(tile)-
>info.has_usm)

/Thomas


>  		xe_devm_add(tile, &tile->mem.vram);
> +#endif
>  err_mem_access:
>  	xe_device_mem_access_put(tile_to_xe(tile));
>  	return err;


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 3/8] drm/xe: Helper to get tile from memory region
  2024-03-19  2:55 ` [PATCH 3/8] drm/xe: Helper to get tile from memory region Oak Zeng
@ 2024-03-19  9:28   ` Hellstrom, Thomas
  0 siblings, 0 replies; 21+ messages in thread
From: Hellstrom, Thomas @ 2024-03-19  9:28 UTC (permalink / raw
  To: intel-xe@lists.freedesktop.org, Zeng,  Oak
  Cc: Brost, Matthew, Welty, Brian, Ghimiray, Himal Prasad

On Mon, 2024-03-18 at 22:55 -0400, Oak Zeng wrote:
> A simple helper to retrieve tile from memory region

Again, Imperative language and kerneloc.

/Thomas


> 
> v1: move the function to xe_device.h (Matt)
> 
> Signed-off-by: Oak Zeng <oak.zeng@intel.com>
> ---
>  drivers/gpu/drm/xe/xe_device.h | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/drivers/gpu/drm/xe/xe_device.h
> b/drivers/gpu/drm/xe/xe_device.h
> index 14be34d9f543..4735268cf07f 100644
> --- a/drivers/gpu/drm/xe/xe_device.h
> +++ b/drivers/gpu/drm/xe/xe_device.h
> @@ -176,4 +176,9 @@ void xe_device_snapshot_print(struct xe_device
> *xe, struct drm_printer *p);
>  u64 xe_device_canonicalize_addr(struct xe_device *xe, u64 address);
>  u64 xe_device_uncanonicalize_addr(struct xe_device *xe, u64
> address);
>  
> +static inline struct xe_tile *xe_mem_region_to_tile(struct
> xe_mem_region *mr)
> +{
> +	return container_of(mr, struct xe_tile, mem.vram);
> +}
> +
>  #endif


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 6/8] drm/xe: Helper to populate a userptr or hmmptr
  2024-03-19  2:55 ` [PATCH 6/8] drm/xe: Helper to populate a userptr or hmmptr Oak Zeng
@ 2024-03-19 10:33   ` Hellstrom, Thomas
  2024-03-19 10:45     ` Hellstrom, Thomas
                       ` (2 more replies)
  0 siblings, 3 replies; 21+ messages in thread
From: Hellstrom, Thomas @ 2024-03-19 10:33 UTC (permalink / raw
  To: intel-xe@lists.freedesktop.org, Zeng,  Oak
  Cc: Brost, Matthew, Welty, Brian, Ghimiray, Himal Prasad

On Mon, 2024-03-18 at 22:55 -0400, Oak Zeng wrote:
> Add a helper function xe_userptr_populate_range to populate
> a a userptr or hmmptr range. This functions calls hmm_range_fault
> to read CPU page tables and populate all pfns/pages of this
> virtual address range.
> 
> If the populated page is system memory page, dma-mapping is performed
> to get a dma-address which can be used later for GPU to access pages.
> 
> If the populated page is device private page, we calculate the dpa (
> device physical address) of the page.
> 
> The dma-address or dpa is then saved in userptr's sg table. This is
> prepare work to replace the get_user_pages_fast code in userptr code
> path. The helper function will also be used to populate hmmptr later.
> 
> v1: Address review comments:
>     separate a npage_in_range function (Matt)
>     reparameterize function xe_userptr_populate_range function (Matt)
>     move mmu_interval_read_begin() call into while loop (Thomas)
>     s/mark_range_accessed/xe_mark_range_accessed (Thomas)
>     use set_page_dirty_lock (vs set_page_dirty) (Thomas)
>     move a few checking in xe_vma_userptr_pin_pages to hmm.c (Matt)
> 
> Signed-off-by: Oak Zeng <oak.zeng@intel.com>
> Co-developed-by: Niranjana Vishwanathapura
> <niranjana.vishwanathapura@intel.com>
> Signed-off-by: Niranjana Vishwanathapura
> <niranjana.vishwanathapura@intel.com>
> Cc: Matthew Brost <matthew.brost@intel.com>
> Cc: Thomas Hellström <thomas.hellstrom@intel.com>
> Cc: Brian Welty <brian.welty@intel.com>
> ---
>  drivers/gpu/drm/xe/Makefile |   1 +
>  drivers/gpu/drm/xe/xe_hmm.c | 231
> ++++++++++++++++++++++++++++++++++++
>  drivers/gpu/drm/xe/xe_hmm.h |  10 ++
>  3 files changed, 242 insertions(+)
>  create mode 100644 drivers/gpu/drm/xe/xe_hmm.c
>  create mode 100644 drivers/gpu/drm/xe/xe_hmm.h
> 
> diff --git a/drivers/gpu/drm/xe/Makefile
> b/drivers/gpu/drm/xe/Makefile
> index e2ec6d1375c0..52300de1e86f 100644
> --- a/drivers/gpu/drm/xe/Makefile
> +++ b/drivers/gpu/drm/xe/Makefile
> @@ -101,6 +101,7 @@ xe-y += xe_bb.o \
>  	xe_guc_pc.o \
>  	xe_guc_submit.o \
>  	xe_heci_gsc.o \
> +	xe_hmm.o \
>  	xe_hw_engine.o \
>  	xe_hw_engine_class_sysfs.o \
>  	xe_hw_fence.o \
> diff --git a/drivers/gpu/drm/xe/xe_hmm.c
> b/drivers/gpu/drm/xe/xe_hmm.c
> new file mode 100644
> index 000000000000..305e3f2e659b
> --- /dev/null
> +++ b/drivers/gpu/drm/xe/xe_hmm.c
> @@ -0,0 +1,231 @@
> +// SPDX-License-Identifier: MIT
> +/*
> + * Copyright © 2024 Intel Corporation
> + */
> +
> +#include <linux/mmu_notifier.h>
> +#include <linux/dma-mapping.h>
> +#include <linux/memremap.h>
> +#include <linux/swap.h>
> +#include <linux/hmm.h>
> +#include <linux/mm.h>
> +#include "xe_hmm.h"
> +#include "xe_svm.h"
> +#include "xe_vm.h"
> +
> +static inline u64 npages_in_range(unsigned long start, unsigned long
> end)
> +{
> +	return ((end - 1) >> PAGE_SHIFT) - (start >> PAGE_SHIFT) +
> 1;
> +}
> +
> +/**
> + * xe_mark_range_accessed() - mark a range is accessed, so core mm
> + * have such information for memory eviction or write back to
> + * hard disk
> + *
> + * @range: the range to mark
> + * @write: if write to this range, we mark pages in this range
> + * as dirty
> + */
> +static void xe_mark_range_accessed(struct hmm_range *range, bool
> write)
> +{
> +	struct page *page;
> +	u64 i, npages;
> +
> +	npages = npages_in_range(range->start, range->end);
> +	for (i = 0; i < npages; i++) {
> +		page = hmm_pfn_to_page(range->hmm_pfns[i]);
> +		if (write)
> +			set_page_dirty_lock(page);
> +
> +		mark_page_accessed(page);
> +	}
> +}
> +
> +/**
> + * build_sg() - build a scatter gather table for all the physical
> pages/pfn
> + * in a hmm_range. dma-address is save in sg table and will be used
> to program
> + * GPU page table later.
> + *
> + * @xe: the xe device who will access the dma-address in sg table
> + * @range: the hmm range that we build the sg table from. range-
> >hmm_pfns[]
> + * has the pfn numbers of pages that back up this hmm address range.
> + * @st: pointer to the sg table.
> + * @write: whether we write to this range. This decides dma map
> direction
> + * for system pages. If write we map it bi-diretional; otherwise
> + * DMA_TO_DEVICE
> + *
> + * All the contiguous pfns will be collapsed into one entry in
> + * the scatter gather table. This is for the convenience of
> + * later on operations to bind address range to GPU page table.
> + *
> + * The dma_address in the sg table will later be used by GPU to
> + * access memory. So if the memory is system memory, we need to
> + * do a dma-mapping so it can be accessed by GPU/DMA. If the memory
> + * is GPU local memory (of the GPU who is going to access memory),
> + * we need gpu dpa (device physical address), and there is no need
> + * of dma-mapping.
> + *
> + * FIXME: dma-mapping for peer gpu device to access remote gpu's
> + * memory. Add this when you support p2p
> + *
> + * This function allocates the storage of the sg table. It is
> + * caller's responsibility to free it calling sg_free_table.
> + *
> + * Returns 0 if successful; -ENOMEM if fails to allocate memory
> + */
> +static int build_sg(struct xe_device *xe, struct hmm_range *range,
> +			     struct sg_table *st, bool write)
> +{
> +	struct device *dev = xe->drm.dev;
> +	struct scatterlist *sg;
> +	u64 i, npages;
> +
> +	sg = NULL;
> +	st->nents = 0;
> +	npages = npages_in_range(range->start, range->end);
> +
> +	if (unlikely(sg_alloc_table(st, npages, GFP_KERNEL)))
> +		return -ENOMEM;
> +
> +	for (i = 0; i < npages; i++) {
> +		struct page *page;
> +		unsigned long addr;
> +		struct xe_mem_region *mr;
> +
> +		page = hmm_pfn_to_page(range->hmm_pfns[i]);
> +		if (is_device_private_page(page)) {
> +			mr = xe_page_to_mem_region(page);
> +			addr = xe_mem_region_pfn_to_dpa(mr, range-
> >hmm_pfns[i]);
> +		} else {
> +			addr = dma_map_page(dev, page, 0, PAGE_SIZE,
> +					write ? DMA_BIDIRECTIONAL :
> DMA_TO_DEVICE);
> +		}

The dma_map_page call here essentially stops the iommu from coalescing
the page-table into contigous iommu VAs, and stops us from using huge
page-table-entries for the PPGTT. IMO we should investigate whether we
could build a single scatter-gather using sg_alloc_table_from_pages()
or perhaps look at sg_alloc_append_table_from_pages(), and then use
dma_map_sg() on contiguous chunks of system pages within that sg-table.

/Thomas


> +
> +		if (sg && (addr == (sg_dma_address(sg) + sg-
> >length))) {
> +			sg->length += PAGE_SIZE;
> +			sg_dma_len(sg) += PAGE_SIZE;
> +			continue;
> +		}
> +
> +		sg =  sg ? sg_next(sg) : st->sgl;
> +		sg_dma_address(sg) = addr;
> +		sg_dma_len(sg) = PAGE_SIZE;
> +		sg->length = PAGE_SIZE;
> +		st->nents++;
> +	}
> +
> +	sg_mark_end(sg);
> +	return 0;
> +}
> +
> +/**
> + * xe_userptr_populate_range() - Populate physical pages of a
> virtual
> + * address range
> + *
> + * @uvma: userptr vma which has information of the range to
> populate.
> + *
> + * This function populate the physical pages of a virtual
> + * address range. The populated physical pages is saved in
> + * userptr's sg table. It is similar to get_user_pages but call
> + * hmm_range_fault.
> + *
> + * This function also read mmu notifier sequence # (
> + * mmu_interval_read_begin), for the purpose of later
> + * comparison (through mmu_interval_read_retry).
> + *
> + * This must be called with mmap read or write lock held.
> + *
> + * This function allocates the storage of the userptr sg table.
> + * It is caller's responsibility to free it calling sg_free_table.
> + *
> + * returns: 0 for succuss; negative error no on failure
> + */
> +int xe_userptr_populate_range(struct xe_userptr_vma *uvma)
> +{
> +	unsigned long timeout =
> +		jiffies +
> msecs_to_jiffies(HMM_RANGE_DEFAULT_TIMEOUT);
> +	unsigned long *pfns, flags = HMM_PFN_REQ_FAULT;
> +	struct xe_userptr *userptr;
> +	struct xe_vma *vma = &uvma->vma;
> +	u64 start = xe_vma_userptr(vma);
> +	u64 end = start + xe_vma_size(vma);
> +	struct xe_vm *vm = xe_vma_vm(vma);
> +	struct hmm_range hmm_range;
> +	bool write = !xe_vma_read_only(vma);
> +	bool in_kthread = !current->mm;
> +	u64 npages;
> +	int ret;
> +
> +	userptr = &uvma->userptr;
> +	mmap_assert_locked(userptr->notifier.mm);
> +
> +	if (vma->gpuva.flags & XE_VMA_DESTROYED)
> +		return 0;
> +
> +	npages = npages_in_range(start, end);
> +	pfns = kvmalloc_array(npages, sizeof(*pfns), GFP_KERNEL);
> +	if (unlikely(!pfns))
> +		return -ENOMEM;
> +
> +	if (write)
> +		flags |= HMM_PFN_REQ_WRITE;
> +
> +	if (in_kthread) {
> +		if (!mmget_not_zero(userptr->notifier.mm)) {
> +			ret = -EFAULT;
> +			goto free_pfns;
> +		}
> +		kthread_use_mm(userptr->notifier.mm);
> +	}
> +
> +	memset64((u64 *)pfns, (u64)flags, npages);
> +	hmm_range.hmm_pfns = pfns;
> +	hmm_range.notifier = &userptr->notifier;
> +	hmm_range.start = start;
> +	hmm_range.end = end;
> +	hmm_range.pfn_flags_mask = HMM_PFN_REQ_FAULT |
> HMM_PFN_REQ_WRITE;
> +	/**
> +	 * FIXME:
> +	 * Set the the dev_private_owner can prevent hmm_range_fault
> to fault
> +	 * in the device private pages owned by caller. See function
> +	 * hmm_vma_handle_pte. In multiple GPU case, this should be
> set to the
> +	 * device owner of the best migration destination. e.g.,
> device0/vm0
> +	 * has a page fault, but we have determined the best
> placement of
> +	 * the fault address should be on device1, we should set
> below to
> +	 * device1 instead of device0.
> +	 */
> +	hmm_range.dev_private_owner = vm->xe;
> +
> +	while (true) {
> +		hmm_range.notifier_seq =
> mmu_interval_read_begin(&userptr->notifier);
> +		ret = hmm_range_fault(&hmm_range);
> +		if (time_after(jiffies, timeout))
> +			break;
> +
> +		if (ret == -EBUSY)
> +			continue;
> +		break;
> +	}
> +
> +	if (in_kthread) {
> +		kthread_unuse_mm(userptr->notifier.mm);
> +		mmput(userptr->notifier.mm);
> +	}
> +
> +	if (ret)
> +		goto free_pfns;
> +
> +	ret = build_sg(vm->xe, &hmm_range, &userptr->sgt, write);
> +	if (ret)
> +		goto free_pfns;
> +
> +	xe_mark_range_accessed(&hmm_range, write);
> +	userptr->sg = &userptr->sgt;
> +	userptr->notifier_seq = hmm_range.notifier_seq;
> +
> +free_pfns:
> +	kvfree(pfns);
> +	return ret;
> +}
> +
> diff --git a/drivers/gpu/drm/xe/xe_hmm.h
> b/drivers/gpu/drm/xe/xe_hmm.h
> new file mode 100644
> index 000000000000..fa5ddc11f10b
> --- /dev/null
> +++ b/drivers/gpu/drm/xe/xe_hmm.h
> @@ -0,0 +1,10 @@
> +// SPDX-License-Identifier: MIT
> +/*
> + * Copyright © 2024 Intel Corporation
> + */
> +
> +#include <linux/types.h>
> +
> +struct xe_userptr_vma;
> +
> +int xe_userptr_populate_range(struct xe_userptr_vma *uvma);


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 6/8] drm/xe: Helper to populate a userptr or hmmptr
  2024-03-19 10:33   ` Hellstrom, Thomas
@ 2024-03-19 10:45     ` Hellstrom, Thomas
  2024-03-19 20:56       ` Zeng, Oak
  2024-03-19 16:48     ` Matthew Brost
  2024-03-19 20:45     ` Zeng, Oak
  2 siblings, 1 reply; 21+ messages in thread
From: Hellstrom, Thomas @ 2024-03-19 10:45 UTC (permalink / raw
  To: intel-xe@lists.freedesktop.org, Zeng,  Oak
  Cc: Brost, Matthew, Welty, Brian, Ghimiray, Himal Prasad

Some more comments inline:

On Tue, 2024-03-19 at 10:33 +0000, Hellstrom, Thomas wrote:
> On Mon, 2024-03-18 at 22:55 -0400, Oak Zeng wrote:
> > Add a helper function xe_userptr_populate_range to populate
> > a a userptr or hmmptr range. This functions calls hmm_range_fault
> > to read CPU page tables and populate all pfns/pages of this
> > virtual address range.
> > 
> > If the populated page is system memory page, dma-mapping is
> > performed
> > to get a dma-address which can be used later for GPU to access
> > pages.
> > 
> > If the populated page is device private page, we calculate the dpa
> > (
> > device physical address) of the page.
> > 
> > The dma-address or dpa is then saved in userptr's sg table. This is
> > prepare work to replace the get_user_pages_fast code in userptr
> > code
> > path. The helper function will also be used to populate hmmptr
> > later.
> > 
> > v1: Address review comments:
> >     separate a npage_in_range function (Matt)
> >     reparameterize function xe_userptr_populate_range function
> > (Matt)
> >     move mmu_interval_read_begin() call into while loop (Thomas)
> >     s/mark_range_accessed/xe_mark_range_accessed (Thomas)
> >     use set_page_dirty_lock (vs set_page_dirty) (Thomas)
> >     move a few checking in xe_vma_userptr_pin_pages to hmm.c (Matt)
> > 
> > Signed-off-by: Oak Zeng <oak.zeng@intel.com>
> > Co-developed-by: Niranjana Vishwanathapura
> > <niranjana.vishwanathapura@intel.com>
> > Signed-off-by: Niranjana Vishwanathapura
> > <niranjana.vishwanathapura@intel.com>
> > Cc: Matthew Brost <matthew.brost@intel.com>
> > Cc: Thomas Hellström <thomas.hellstrom@intel.com>
> > Cc: Brian Welty <brian.welty@intel.com>
> > ---
> >  drivers/gpu/drm/xe/Makefile |   1 +
> >  drivers/gpu/drm/xe/xe_hmm.c | 231
> > ++++++++++++++++++++++++++++++++++++
> >  drivers/gpu/drm/xe/xe_hmm.h |  10 ++
> >  3 files changed, 242 insertions(+)
> >  create mode 100644 drivers/gpu/drm/xe/xe_hmm.c
> >  create mode 100644 drivers/gpu/drm/xe/xe_hmm.h
> > 
> > diff --git a/drivers/gpu/drm/xe/Makefile
> > b/drivers/gpu/drm/xe/Makefile
> > index e2ec6d1375c0..52300de1e86f 100644
> > --- a/drivers/gpu/drm/xe/Makefile
> > +++ b/drivers/gpu/drm/xe/Makefile
> > @@ -101,6 +101,7 @@ xe-y += xe_bb.o \
> >  	xe_guc_pc.o \
> >  	xe_guc_submit.o \
> >  	xe_heci_gsc.o \
> > +	xe_hmm.o \
> >  	xe_hw_engine.o \
> >  	xe_hw_engine_class_sysfs.o \
> >  	xe_hw_fence.o \
> > diff --git a/drivers/gpu/drm/xe/xe_hmm.c
> > b/drivers/gpu/drm/xe/xe_hmm.c
> > new file mode 100644
> > index 000000000000..305e3f2e659b
> > --- /dev/null
> > +++ b/drivers/gpu/drm/xe/xe_hmm.c
> > @@ -0,0 +1,231 @@
> > +// SPDX-License-Identifier: MIT
> > +/*
> > + * Copyright © 2024 Intel Corporation
> > + */
> > +
> > +#include <linux/mmu_notifier.h>
> > +#include <linux/dma-mapping.h>
> > +#include <linux/memremap.h>
> > +#include <linux/swap.h>
> > +#include <linux/hmm.h>
> > +#include <linux/mm.h>
> > +#include "xe_hmm.h"
> > +#include "xe_svm.h"
> > +#include "xe_vm.h"
> > +
> > +static inline u64 npages_in_range(unsigned long start, unsigned

Avoid "inline" in C files. The compiler should be able to figure that
out. Also update naming?

> > long
> > end)
> > +{
> > +	return ((end - 1) >> PAGE_SHIFT) - (start >> PAGE_SHIFT) +
> > 1;

(PAGE_ALIGN(end) - PAGE_ALIGN_DOWN(start)) >> PAGE_SHIFT ?

> > +}
> > +
> > +/**
> > + * xe_mark_range_accessed() - mark a range is accessed, so core mm
> > + * have such information for memory eviction or write back to
> > + * hard disk
> > + *
> > + * @range: the range to mark
> > + * @write: if write to this range, we mark pages in this range
> > + * as dirty
> > + */
> > +static void xe_mark_range_accessed(struct hmm_range *range, bool
> > write)
> > +{
> > +	struct page *page;
> > +	u64 i, npages;
> > +
> > +	npages = npages_in_range(range->start, range->end);
> > +	for (i = 0; i < npages; i++) {
> > +		page = hmm_pfn_to_page(range->hmm_pfns[i]);
> > +		if (write)
> > +			set_page_dirty_lock(page);
> > +
> > +		mark_page_accessed(page);
> > +	}
> > +}
> > +
> > +/**
> > + * build_sg() - build a scatter gather table for all the physical
> > pages/pfn
> > + * in a hmm_range. dma-address is save in sg table and will be
> > used
> > to program
> > + * GPU page table later.
> > + *
> > + * @xe: the xe device who will access the dma-address in sg table
> > + * @range: the hmm range that we build the sg table from. range-
> > > hmm_pfns[]
> > + * has the pfn numbers of pages that back up this hmm address
> > range.
> > + * @st: pointer to the sg table.
> > + * @write: whether we write to this range. This decides dma map
> > direction
> > + * for system pages. If write we map it bi-diretional; otherwise
> > + * DMA_TO_DEVICE
> > + *
> > + * All the contiguous pfns will be collapsed into one entry in
> > + * the scatter gather table. This is for the convenience of
> > + * later on operations to bind address range to GPU page table.
> > + *
> > + * The dma_address in the sg table will later be used by GPU to
> > + * access memory. So if the memory is system memory, we need to
> > + * do a dma-mapping so it can be accessed by GPU/DMA. If the
> > memory
> > + * is GPU local memory (of the GPU who is going to access memory),
> > + * we need gpu dpa (device physical address), and there is no need
> > + * of dma-mapping.
> > + *
> > + * FIXME: dma-mapping for peer gpu device to access remote gpu's
> > + * memory. Add this when you support p2p
> > + *
> > + * This function allocates the storage of the sg table. It is
> > + * caller's responsibility to free it calling sg_free_table.
> > + *
> > + * Returns 0 if successful; -ENOMEM if fails to allocate memory
> > + */
> > +static int build_sg(struct xe_device *xe, struct hmm_range *range,
> > +			     struct sg_table *st, bool write)

Also naming here.

> > +{
> > +	struct device *dev = xe->drm.dev;
> > +	struct scatterlist *sg;
> > +	u64 i, npages;
> > +
> > +	sg = NULL;
> > +	st->nents = 0;
> > +	npages = npages_in_range(range->start, range->end);
> > +
> > +	if (unlikely(sg_alloc_table(st, npages, GFP_KERNEL)))
> > +		return -ENOMEM;
> > +
> > +	for (i = 0; i < npages; i++) {
> > +		struct page *page;
> > +		unsigned long addr;
> > +		struct xe_mem_region *mr;
> > +
> > +		page = hmm_pfn_to_page(range->hmm_pfns[i]);
> > +		if (is_device_private_page(page)) {
> > +			mr = xe_page_to_mem_region(page);
> > +			addr = xe_mem_region_pfn_to_dpa(mr, range-
> > > hmm_pfns[i]);
> > +		} else {
> > +			addr = dma_map_page(dev, page, 0,
> > PAGE_SIZE,
> > +					write ? DMA_BIDIRECTIONAL
> > :
> > DMA_TO_DEVICE);
> > +		}
> 
> The dma_map_page call here essentially stops the iommu from
> coalescing
> the page-table into contigous iommu VAs, and stops us from using huge
> page-table-entries for the PPGTT. IMO we should investigate whether
> we
> could build a single scatter-gather using sg_alloc_table_from_pages()
> or perhaps look at sg_alloc_append_table_from_pages(), and then use
> dma_map_sg() on contiguous chunks of system pages within that sg-
> table.
> 
> /Thomas
> 
> 
> > +
> > +		if (sg && (addr == (sg_dma_address(sg) + sg-
> > > length))) {
> > +			sg->length += PAGE_SIZE;
> > +			sg_dma_len(sg) += PAGE_SIZE;
> > +			continue;
> > +		}
> > +
> > +		sg =  sg ? sg_next(sg) : st->sgl;
> > +		sg_dma_address(sg) = addr;
> > +		sg_dma_len(sg) = PAGE_SIZE;
> > +		sg->length = PAGE_SIZE;
> > +		st->nents++;
> > +	}
> > +
> > +	sg_mark_end(sg);
> > +	return 0;
> > +}
> > +
> > +/**
> > + * xe_userptr_populate_range() - Populate physical pages of a
> > virtual
> > + * address range
> > + *
> > + * @uvma: userptr vma which has information of the range to
> > populate.
> > + *
> > + * This function populate the physical pages of a virtual
> > + * address range. The populated physical pages is saved in
> > + * userptr's sg table. It is similar to get_user_pages but call
> > + * hmm_range_fault.
> > + *
> > + * This function also read mmu notifier sequence # (
> > + * mmu_interval_read_begin), for the purpose of later
> > + * comparison (through mmu_interval_read_retry).
> > + *
> > + * This must be called with mmap read or write lock held.
> > + *
> > + * This function allocates the storage of the userptr sg table.
> > + * It is caller's responsibility to free it calling sg_free_table.
> > + *
> > + * returns: 0 for succuss; negative error no on failure
> > + */
> > +int xe_userptr_populate_range(struct xe_userptr_vma *uvma)
> > +{
> > +	unsigned long timeout =
> > +		jiffies +
> > msecs_to_jiffies(HMM_RANGE_DEFAULT_TIMEOUT);
> > +	unsigned long *pfns, flags = HMM_PFN_REQ_FAULT;
> > +	struct xe_userptr *userptr;
> > +	struct xe_vma *vma = &uvma->vma;
> > +	u64 start = xe_vma_userptr(vma);
> > +	u64 end = start + xe_vma_size(vma);
> > +	struct xe_vm *vm = xe_vma_vm(vma);
> > +	struct hmm_range hmm_range;
> > +	bool write = !xe_vma_read_only(vma);
> > +	bool in_kthread = !current->mm;
> > +	u64 npages;
> > +	int ret;
> > +
> > +	userptr = &uvma->userptr;
> > +	mmap_assert_locked(userptr->notifier.mm);
> > +
> > +	if (vma->gpuva.flags & XE_VMA_DESTROYED)
> > +		return 0;
> > +
> > +	npages = npages_in_range(start, end);
> > +	pfns = kvmalloc_array(npages, sizeof(*pfns), GFP_KERNEL);
> > +	if (unlikely(!pfns))
> > +		return -ENOMEM;
> > +
> > +	if (write)
> > +		flags |= HMM_PFN_REQ_WRITE;
> > +
> > +	if (in_kthread) {
> > +		if (!mmget_not_zero(userptr->notifier.mm)) {
> > +			ret = -EFAULT;
> > +			goto free_pfns;
> > +		}
> > +		kthread_use_mm(userptr->notifier.mm);
> > +	}
> > +
> > +	memset64((u64 *)pfns, (u64)flags, npages);
> > +	hmm_range.hmm_pfns = pfns;
> > +	hmm_range.notifier = &userptr->notifier;
> > +	hmm_range.start = start;
> > +	hmm_range.end = end;
> > +	hmm_range.pfn_flags_mask = HMM_PFN_REQ_FAULT |
> > HMM_PFN_REQ_WRITE;
> > +	/**
> > +	 * FIXME:
> > +	 * Set the the dev_private_owner can prevent
> > hmm_range_fault
> > to fault
> > +	 * in the device private pages owned by caller. See
> > function
> > +	 * hmm_vma_handle_pte. In multiple GPU case, this should
> > be
> > set to the
> > +	 * device owner of the best migration destination. e.g.,
> > device0/vm0
> > +	 * has a page fault, but we have determined the best
> > placement of
> > +	 * the fault address should be on device1, we should set
> > below to
> > +	 * device1 instead of device0.
> > +	 */
> > +	hmm_range.dev_private_owner = vm->xe;
> > +
> > +	while (true) {
> > +		hmm_range.notifier_seq =
> > mmu_interval_read_begin(&userptr->notifier);
> > +		ret = hmm_range_fault(&hmm_range);
> > +		if (time_after(jiffies, timeout))
> > +			break;
> > +
> > +		if (ret == -EBUSY)
> > +			continue;
> > +		break;
> > +	}
> > +
> > +	if (in_kthread) {
> > +		kthread_unuse_mm(userptr->notifier.mm);
> > +		mmput(userptr->notifier.mm);
> > +	}
> > +
> > +	if (ret)
> > +		goto free_pfns;
> > +
> > +	ret = build_sg(vm->xe, &hmm_range, &userptr->sgt, write);
> > +	if (ret)
> > +		goto free_pfns;
> > +
> > +	xe_mark_range_accessed(&hmm_range, write);
> > +	userptr->sg = &userptr->sgt;
> > +	userptr->notifier_seq = hmm_range.notifier_seq;
> > +
> > +free_pfns:
> > +	kvfree(pfns);
> > +	return ret;
> > +}
> > +
> > diff --git a/drivers/gpu/drm/xe/xe_hmm.h
> > b/drivers/gpu/drm/xe/xe_hmm.h
> > new file mode 100644
> > index 000000000000..fa5ddc11f10b
> > --- /dev/null
> > +++ b/drivers/gpu/drm/xe/xe_hmm.h
> > @@ -0,0 +1,10 @@
> > +// SPDX-License-Identifier: MIT
> > +/*
> > + * Copyright © 2024 Intel Corporation
> > + */
> > +
> > +#include <linux/types.h>
> > +
> > +struct xe_userptr_vma;
> > +
> > +int xe_userptr_populate_range(struct xe_userptr_vma *uvma);
> 


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 6/8] drm/xe: Helper to populate a userptr or hmmptr
  2024-03-19 10:33   ` Hellstrom, Thomas
  2024-03-19 10:45     ` Hellstrom, Thomas
@ 2024-03-19 16:48     ` Matthew Brost
  2024-03-19 17:59       ` Zeng, Oak
  2024-03-19 20:45     ` Zeng, Oak
  2 siblings, 1 reply; 21+ messages in thread
From: Matthew Brost @ 2024-03-19 16:48 UTC (permalink / raw
  To: Hellstrom, Thomas
  Cc: intel-xe@lists.freedesktop.org, Zeng,  Oak, Welty, Brian,
	Ghimiray, Himal Prasad

On Tue, Mar 19, 2024 at 04:33:39AM -0600, Hellstrom, Thomas wrote:
> On Mon, 2024-03-18 at 22:55 -0400, Oak Zeng wrote:
> > Add a helper function xe_userptr_populate_range to populate
> > a a userptr or hmmptr range. This functions calls hmm_range_fault
> > to read CPU page tables and populate all pfns/pages of this
> > virtual address range.
> > 
> > If the populated page is system memory page, dma-mapping is performed
> > to get a dma-address which can be used later for GPU to access pages.
> > 
> > If the populated page is device private page, we calculate the dpa (
> > device physical address) of the page.
> > 
> > The dma-address or dpa is then saved in userptr's sg table. This is
> > prepare work to replace the get_user_pages_fast code in userptr code
> > path. The helper function will also be used to populate hmmptr later.
> > 
> > v1: Address review comments:
> >     separate a npage_in_range function (Matt)
> >     reparameterize function xe_userptr_populate_range function (Matt)
> >     move mmu_interval_read_begin() call into while loop (Thomas)
> >     s/mark_range_accessed/xe_mark_range_accessed (Thomas)
> >     use set_page_dirty_lock (vs set_page_dirty) (Thomas)
> >     move a few checking in xe_vma_userptr_pin_pages to hmm.c (Matt)
> > 
> > Signed-off-by: Oak Zeng <oak.zeng@intel.com>
> > Co-developed-by: Niranjana Vishwanathapura
> > <niranjana.vishwanathapura@intel.com>
> > Signed-off-by: Niranjana Vishwanathapura
> > <niranjana.vishwanathapura@intel.com>
> > Cc: Matthew Brost <matthew.brost@intel.com>
> > Cc: Thomas Hellström <thomas.hellstrom@intel.com>
> > Cc: Brian Welty <brian.welty@intel.com>
> > ---
> >  drivers/gpu/drm/xe/Makefile |   1 +
> >  drivers/gpu/drm/xe/xe_hmm.c | 231
> > ++++++++++++++++++++++++++++++++++++
> >  drivers/gpu/drm/xe/xe_hmm.h |  10 ++
> >  3 files changed, 242 insertions(+)
> >  create mode 100644 drivers/gpu/drm/xe/xe_hmm.c
> >  create mode 100644 drivers/gpu/drm/xe/xe_hmm.h
> > 
> > diff --git a/drivers/gpu/drm/xe/Makefile
> > b/drivers/gpu/drm/xe/Makefile
> > index e2ec6d1375c0..52300de1e86f 100644
> > --- a/drivers/gpu/drm/xe/Makefile
> > +++ b/drivers/gpu/drm/xe/Makefile
> > @@ -101,6 +101,7 @@ xe-y += xe_bb.o \
> >  	xe_guc_pc.o \
> >  	xe_guc_submit.o \
> >  	xe_heci_gsc.o \
> > +	xe_hmm.o \
> >  	xe_hw_engine.o \
> >  	xe_hw_engine_class_sysfs.o \
> >  	xe_hw_fence.o \
> > diff --git a/drivers/gpu/drm/xe/xe_hmm.c
> > b/drivers/gpu/drm/xe/xe_hmm.c
> > new file mode 100644
> > index 000000000000..305e3f2e659b
> > --- /dev/null
> > +++ b/drivers/gpu/drm/xe/xe_hmm.c
> > @@ -0,0 +1,231 @@
> > +// SPDX-License-Identifier: MIT
> > +/*
> > + * Copyright © 2024 Intel Corporation
> > + */
> > +
> > +#include <linux/mmu_notifier.h>
> > +#include <linux/dma-mapping.h>
> > +#include <linux/memremap.h>
> > +#include <linux/swap.h>
> > +#include <linux/hmm.h>
> > +#include <linux/mm.h>
> > +#include "xe_hmm.h"
> > +#include "xe_svm.h"
> > +#include "xe_vm.h"
> > +
> > +static inline u64 npages_in_range(unsigned long start, unsigned long
> > end)
> > +{
> > +	return ((end - 1) >> PAGE_SHIFT) - (start >> PAGE_SHIFT) +
> > 1;
> > +}
> > +
> > +/**
> > + * xe_mark_range_accessed() - mark a range is accessed, so core mm
> > + * have such information for memory eviction or write back to
> > + * hard disk
> > + *
> > + * @range: the range to mark
> > + * @write: if write to this range, we mark pages in this range
> > + * as dirty
> > + */
> > +static void xe_mark_range_accessed(struct hmm_range *range, bool
> > write)
> > +{
> > +	struct page *page;
> > +	u64 i, npages;
> > +
> > +	npages = npages_in_range(range->start, range->end);
> > +	for (i = 0; i < npages; i++) {
> > +		page = hmm_pfn_to_page(range->hmm_pfns[i]);
> > +		if (write)
> > +			set_page_dirty_lock(page);
> > +
> > +		mark_page_accessed(page);
> > +	}
> > +}
> > +
> > +/**
> > + * build_sg() - build a scatter gather table for all the physical
> > pages/pfn
> > + * in a hmm_range. dma-address is save in sg table and will be used
> > to program
> > + * GPU page table later.
> > + *
> > + * @xe: the xe device who will access the dma-address in sg table
> > + * @range: the hmm range that we build the sg table from. range-
> > >hmm_pfns[]
> > + * has the pfn numbers of pages that back up this hmm address range.
> > + * @st: pointer to the sg table.
> > + * @write: whether we write to this range. This decides dma map
> > direction
> > + * for system pages. If write we map it bi-diretional; otherwise
> > + * DMA_TO_DEVICE
> > + *
> > + * All the contiguous pfns will be collapsed into one entry in
> > + * the scatter gather table. This is for the convenience of
> > + * later on operations to bind address range to GPU page table.
> > + *
> > + * The dma_address in the sg table will later be used by GPU to
> > + * access memory. So if the memory is system memory, we need to
> > + * do a dma-mapping so it can be accessed by GPU/DMA. If the memory
> > + * is GPU local memory (of the GPU who is going to access memory),
> > + * we need gpu dpa (device physical address), and there is no need
> > + * of dma-mapping.
> > + *
> > + * FIXME: dma-mapping for peer gpu device to access remote gpu's
> > + * memory. Add this when you support p2p
> > + *
> > + * This function allocates the storage of the sg table. It is
> > + * caller's responsibility to free it calling sg_free_table.
> > + *
> > + * Returns 0 if successful; -ENOMEM if fails to allocate memory
> > + */
> > +static int build_sg(struct xe_device *xe, struct hmm_range *range,
> > +			     struct sg_table *st, bool write)
> > +{
> > +	struct device *dev = xe->drm.dev;
> > +	struct scatterlist *sg;
> > +	u64 i, npages;
> > +
> > +	sg = NULL;
> > +	st->nents = 0;
> > +	npages = npages_in_range(range->start, range->end);
> > +
> > +	if (unlikely(sg_alloc_table(st, npages, GFP_KERNEL)))
> > +		return -ENOMEM;
> > +
> > +	for (i = 0; i < npages; i++) {
> > +		struct page *page;
> > +		unsigned long addr;
> > +		struct xe_mem_region *mr;
> > +
> > +		page = hmm_pfn_to_page(range->hmm_pfns[i]);
> > +		if (is_device_private_page(page)) {
> > +			mr = xe_page_to_mem_region(page);
> > +			addr = xe_mem_region_pfn_to_dpa(mr, range-
> > >hmm_pfns[i]);
> > +		} else {
> > +			addr = dma_map_page(dev, page, 0, PAGE_SIZE,
> > +					write ? DMA_BIDIRECTIONAL :
> > DMA_TO_DEVICE);
> > +		}
> 
> The dma_map_page call here essentially stops the iommu from coalescing
> the page-table into contigous iommu VAs, and stops us from using huge
> page-table-entries for the PPGTT. IMO we should investigate whether we
> could build a single scatter-gather using sg_alloc_table_from_pages()
> or perhaps look at sg_alloc_append_table_from_pages(), and then use
> dma_map_sg() on contiguous chunks of system pages within that sg-table.
> 

Yes, this was my concern as well. I suggested we continue to use
sg_alloc_table_from_pages_segment + dma_map_sgtable with only system
support for now as device private page tables are still unused at the
moment. That suggestion, among, other seems to have been ignored. I'm
referring to my reply in [1], it would be nice to see those suggestions
followed.

Thomas is right about huge page-table-entries in the PPGTT. Add some
printks if you need to see this in action (xe_pt_stage_bind_entry,
xe_pt_hugepte_possible returning true).

Make sure the IOMMU is enabled (intel_iommu=on on kernel cmdline) and run:

xe_vm.large-userptr-binds-16777216 -> This will use 8x 2MB page tables on tip
xe_vm.large-userptr-binds-2147483648 -> This will use 2x 1GB page tables on tip

With your series it will likely use 4k pages for everything.

Matt

[1] https://patchwork.freedesktop.org/patch/582859/?series=131117&rev=1#comment_1064237

> /Thomas
> 
> 
> > +
> > +		if (sg && (addr == (sg_dma_address(sg) + sg-
> > >length))) {
> > +			sg->length += PAGE_SIZE;
> > +			sg_dma_len(sg) += PAGE_SIZE;
> > +			continue;
> > +		}
> > +
> > +		sg =  sg ? sg_next(sg) : st->sgl;
> > +		sg_dma_address(sg) = addr;
> > +		sg_dma_len(sg) = PAGE_SIZE;
> > +		sg->length = PAGE_SIZE;
> > +		st->nents++;
> > +	}
> > +
> > +	sg_mark_end(sg);
> > +	return 0;
> > +}
> > +
> > +/**
> > + * xe_userptr_populate_range() - Populate physical pages of a
> > virtual
> > + * address range
> > + *
> > + * @uvma: userptr vma which has information of the range to
> > populate.
> > + *
> > + * This function populate the physical pages of a virtual
> > + * address range. The populated physical pages is saved in
> > + * userptr's sg table. It is similar to get_user_pages but call
> > + * hmm_range_fault.
> > + *
> > + * This function also read mmu notifier sequence # (
> > + * mmu_interval_read_begin), for the purpose of later
> > + * comparison (through mmu_interval_read_retry).
> > + *
> > + * This must be called with mmap read or write lock held.
> > + *
> > + * This function allocates the storage of the userptr sg table.
> > + * It is caller's responsibility to free it calling sg_free_table.
> > + *
> > + * returns: 0 for succuss; negative error no on failure
> > + */
> > +int xe_userptr_populate_range(struct xe_userptr_vma *uvma)
> > +{
> > +	unsigned long timeout =
> > +		jiffies +
> > msecs_to_jiffies(HMM_RANGE_DEFAULT_TIMEOUT);
> > +	unsigned long *pfns, flags = HMM_PFN_REQ_FAULT;
> > +	struct xe_userptr *userptr;
> > +	struct xe_vma *vma = &uvma->vma;
> > +	u64 start = xe_vma_userptr(vma);
> > +	u64 end = start + xe_vma_size(vma);
> > +	struct xe_vm *vm = xe_vma_vm(vma);
> > +	struct hmm_range hmm_range;
> > +	bool write = !xe_vma_read_only(vma);
> > +	bool in_kthread = !current->mm;
> > +	u64 npages;
> > +	int ret;
> > +
> > +	userptr = &uvma->userptr;
> > +	mmap_assert_locked(userptr->notifier.mm);
> > +
> > +	if (vma->gpuva.flags & XE_VMA_DESTROYED)
> > +		return 0;
> > +
> > +	npages = npages_in_range(start, end);
> > +	pfns = kvmalloc_array(npages, sizeof(*pfns), GFP_KERNEL);
> > +	if (unlikely(!pfns))
> > +		return -ENOMEM;
> > +
> > +	if (write)
> > +		flags |= HMM_PFN_REQ_WRITE;
> > +
> > +	if (in_kthread) {
> > +		if (!mmget_not_zero(userptr->notifier.mm)) {
> > +			ret = -EFAULT;
> > +			goto free_pfns;
> > +		}
> > +		kthread_use_mm(userptr->notifier.mm);
> > +	}
> > +
> > +	memset64((u64 *)pfns, (u64)flags, npages);
> > +	hmm_range.hmm_pfns = pfns;
> > +	hmm_range.notifier = &userptr->notifier;
> > +	hmm_range.start = start;
> > +	hmm_range.end = end;
> > +	hmm_range.pfn_flags_mask = HMM_PFN_REQ_FAULT |
> > HMM_PFN_REQ_WRITE;
> > +	/**
> > +	 * FIXME:
> > +	 * Set the the dev_private_owner can prevent hmm_range_fault
> > to fault
> > +	 * in the device private pages owned by caller. See function
> > +	 * hmm_vma_handle_pte. In multiple GPU case, this should be
> > set to the
> > +	 * device owner of the best migration destination. e.g.,
> > device0/vm0
> > +	 * has a page fault, but we have determined the best
> > placement of
> > +	 * the fault address should be on device1, we should set
> > below to
> > +	 * device1 instead of device0.
> > +	 */
> > +	hmm_range.dev_private_owner = vm->xe;
> > +
> > +	while (true) {
> > +		hmm_range.notifier_seq =
> > mmu_interval_read_begin(&userptr->notifier);
> > +		ret = hmm_range_fault(&hmm_range);
> > +		if (time_after(jiffies, timeout))
> > +			break;
> > +
> > +		if (ret == -EBUSY)
> > +			continue;
> > +		break;
> > +	}
> > +
> > +	if (in_kthread) {
> > +		kthread_unuse_mm(userptr->notifier.mm);
> > +		mmput(userptr->notifier.mm);
> > +	}
> > +
> > +	if (ret)
> > +		goto free_pfns;
> > +
> > +	ret = build_sg(vm->xe, &hmm_range, &userptr->sgt, write);
> > +	if (ret)
> > +		goto free_pfns;
> > +
> > +	xe_mark_range_accessed(&hmm_range, write);
> > +	userptr->sg = &userptr->sgt;
> > +	userptr->notifier_seq = hmm_range.notifier_seq;
> > +
> > +free_pfns:
> > +	kvfree(pfns);
> > +	return ret;
> > +}
> > +
> > diff --git a/drivers/gpu/drm/xe/xe_hmm.h
> > b/drivers/gpu/drm/xe/xe_hmm.h
> > new file mode 100644
> > index 000000000000..fa5ddc11f10b
> > --- /dev/null
> > +++ b/drivers/gpu/drm/xe/xe_hmm.h
> > @@ -0,0 +1,10 @@
> > +// SPDX-License-Identifier: MIT
> > +/*
> > + * Copyright © 2024 Intel Corporation
> > + */
> > +
> > +#include <linux/types.h>
> > +
> > +struct xe_userptr_vma;
> > +
> > +int xe_userptr_populate_range(struct xe_userptr_vma *uvma);
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [PATCH 6/8] drm/xe: Helper to populate a userptr or hmmptr
  2024-03-19 16:48     ` Matthew Brost
@ 2024-03-19 17:59       ` Zeng, Oak
  0 siblings, 0 replies; 21+ messages in thread
From: Zeng, Oak @ 2024-03-19 17:59 UTC (permalink / raw
  To: Brost, Matthew, Hellstrom, Thomas
  Cc: intel-xe@lists.freedesktop.org, Welty,  Brian,
	Ghimiray, Himal Prasad

Hi Matt,

> -----Original Message-----
> From: Brost, Matthew <matthew.brost@intel.com>
> Sent: Tuesday, March 19, 2024 12:49 PM
> To: Hellstrom, Thomas <thomas.hellstrom@intel.com>
> Cc: intel-xe@lists.freedesktop.org; Zeng, Oak <oak.zeng@intel.com>; Welty,
> Brian <brian.welty@intel.com>; Ghimiray, Himal Prasad
> <himal.prasad.ghimiray@intel.com>
> Subject: Re: [PATCH 6/8] drm/xe: Helper to populate a userptr or hmmptr
> 
> On Tue, Mar 19, 2024 at 04:33:39AM -0600, Hellstrom, Thomas wrote:
> > On Mon, 2024-03-18 at 22:55 -0400, Oak Zeng wrote:
> > > Add a helper function xe_userptr_populate_range to populate
> > > a a userptr or hmmptr range. This functions calls hmm_range_fault
> > > to read CPU page tables and populate all pfns/pages of this
> > > virtual address range.
> > >
> > > If the populated page is system memory page, dma-mapping is performed
> > > to get a dma-address which can be used later for GPU to access pages.
> > >
> > > If the populated page is device private page, we calculate the dpa (
> > > device physical address) of the page.
> > >
> > > The dma-address or dpa is then saved in userptr's sg table. This is
> > > prepare work to replace the get_user_pages_fast code in userptr code
> > > path. The helper function will also be used to populate hmmptr later.
> > >
> > > v1: Address review comments:
> > >     separate a npage_in_range function (Matt)
> > >     reparameterize function xe_userptr_populate_range function (Matt)
> > >     move mmu_interval_read_begin() call into while loop (Thomas)
> > >     s/mark_range_accessed/xe_mark_range_accessed (Thomas)
> > >     use set_page_dirty_lock (vs set_page_dirty) (Thomas)
> > >     move a few checking in xe_vma_userptr_pin_pages to hmm.c (Matt)
> > >
> > > Signed-off-by: Oak Zeng <oak.zeng@intel.com>
> > > Co-developed-by: Niranjana Vishwanathapura
> > > <niranjana.vishwanathapura@intel.com>
> > > Signed-off-by: Niranjana Vishwanathapura
> > > <niranjana.vishwanathapura@intel.com>
> > > Cc: Matthew Brost <matthew.brost@intel.com>
> > > Cc: Thomas Hellström <thomas.hellstrom@intel.com>
> > > Cc: Brian Welty <brian.welty@intel.com>
> > > ---
> > >  drivers/gpu/drm/xe/Makefile |   1 +
> > >  drivers/gpu/drm/xe/xe_hmm.c | 231
> > > ++++++++++++++++++++++++++++++++++++
> > >  drivers/gpu/drm/xe/xe_hmm.h |  10 ++
> > >  3 files changed, 242 insertions(+)
> > >  create mode 100644 drivers/gpu/drm/xe/xe_hmm.c
> > >  create mode 100644 drivers/gpu/drm/xe/xe_hmm.h
> > >
> > > diff --git a/drivers/gpu/drm/xe/Makefile
> > > b/drivers/gpu/drm/xe/Makefile
> > > index e2ec6d1375c0..52300de1e86f 100644
> > > --- a/drivers/gpu/drm/xe/Makefile
> > > +++ b/drivers/gpu/drm/xe/Makefile
> > > @@ -101,6 +101,7 @@ xe-y += xe_bb.o \
> > >  	xe_guc_pc.o \
> > >  	xe_guc_submit.o \
> > >  	xe_heci_gsc.o \
> > > +	xe_hmm.o \
> > >  	xe_hw_engine.o \
> > >  	xe_hw_engine_class_sysfs.o \
> > >  	xe_hw_fence.o \
> > > diff --git a/drivers/gpu/drm/xe/xe_hmm.c
> > > b/drivers/gpu/drm/xe/xe_hmm.c
> > > new file mode 100644
> > > index 000000000000..305e3f2e659b
> > > --- /dev/null
> > > +++ b/drivers/gpu/drm/xe/xe_hmm.c
> > > @@ -0,0 +1,231 @@
> > > +// SPDX-License-Identifier: MIT
> > > +/*
> > > + * Copyright © 2024 Intel Corporation
> > > + */
> > > +
> > > +#include <linux/mmu_notifier.h>
> > > +#include <linux/dma-mapping.h>
> > > +#include <linux/memremap.h>
> > > +#include <linux/swap.h>
> > > +#include <linux/hmm.h>
> > > +#include <linux/mm.h>
> > > +#include "xe_hmm.h"
> > > +#include "xe_svm.h"
> > > +#include "xe_vm.h"
> > > +
> > > +static inline u64 npages_in_range(unsigned long start, unsigned long
> > > end)
> > > +{
> > > +	return ((end - 1) >> PAGE_SHIFT) - (start >> PAGE_SHIFT) +
> > > 1;
> > > +}
> > > +
> > > +/**
> > > + * xe_mark_range_accessed() - mark a range is accessed, so core mm
> > > + * have such information for memory eviction or write back to
> > > + * hard disk
> > > + *
> > > + * @range: the range to mark
> > > + * @write: if write to this range, we mark pages in this range
> > > + * as dirty
> > > + */
> > > +static void xe_mark_range_accessed(struct hmm_range *range, bool
> > > write)
> > > +{
> > > +	struct page *page;
> > > +	u64 i, npages;
> > > +
> > > +	npages = npages_in_range(range->start, range->end);
> > > +	for (i = 0; i < npages; i++) {
> > > +		page = hmm_pfn_to_page(range->hmm_pfns[i]);
> > > +		if (write)
> > > +			set_page_dirty_lock(page);
> > > +
> > > +		mark_page_accessed(page);
> > > +	}
> > > +}
> > > +
> > > +/**
> > > + * build_sg() - build a scatter gather table for all the physical
> > > pages/pfn
> > > + * in a hmm_range. dma-address is save in sg table and will be used
> > > to program
> > > + * GPU page table later.
> > > + *
> > > + * @xe: the xe device who will access the dma-address in sg table
> > > + * @range: the hmm range that we build the sg table from. range-
> > > >hmm_pfns[]
> > > + * has the pfn numbers of pages that back up this hmm address range.
> > > + * @st: pointer to the sg table.
> > > + * @write: whether we write to this range. This decides dma map
> > > direction
> > > + * for system pages. If write we map it bi-diretional; otherwise
> > > + * DMA_TO_DEVICE
> > > + *
> > > + * All the contiguous pfns will be collapsed into one entry in
> > > + * the scatter gather table. This is for the convenience of
> > > + * later on operations to bind address range to GPU page table.
> > > + *
> > > + * The dma_address in the sg table will later be used by GPU to
> > > + * access memory. So if the memory is system memory, we need to
> > > + * do a dma-mapping so it can be accessed by GPU/DMA. If the memory
> > > + * is GPU local memory (of the GPU who is going to access memory),
> > > + * we need gpu dpa (device physical address), and there is no need
> > > + * of dma-mapping.
> > > + *
> > > + * FIXME: dma-mapping for peer gpu device to access remote gpu's
> > > + * memory. Add this when you support p2p
> > > + *
> > > + * This function allocates the storage of the sg table. It is
> > > + * caller's responsibility to free it calling sg_free_table.
> > > + *
> > > + * Returns 0 if successful; -ENOMEM if fails to allocate memory
> > > + */
> > > +static int build_sg(struct xe_device *xe, struct hmm_range *range,
> > > +			     struct sg_table *st, bool write)
> > > +{
> > > +	struct device *dev = xe->drm.dev;
> > > +	struct scatterlist *sg;
> > > +	u64 i, npages;
> > > +
> > > +	sg = NULL;
> > > +	st->nents = 0;
> > > +	npages = npages_in_range(range->start, range->end);
> > > +
> > > +	if (unlikely(sg_alloc_table(st, npages, GFP_KERNEL)))
> > > +		return -ENOMEM;
> > > +
> > > +	for (i = 0; i < npages; i++) {
> > > +		struct page *page;
> > > +		unsigned long addr;
> > > +		struct xe_mem_region *mr;
> > > +
> > > +		page = hmm_pfn_to_page(range->hmm_pfns[i]);
> > > +		if (is_device_private_page(page)) {
> > > +			mr = xe_page_to_mem_region(page);
> > > +			addr = xe_mem_region_pfn_to_dpa(mr, range-
> > > >hmm_pfns[i]);
> > > +		} else {
> > > +			addr = dma_map_page(dev, page, 0, PAGE_SIZE,
> > > +					write ? DMA_BIDIRECTIONAL :
> > > DMA_TO_DEVICE);
> > > +		}
> >
> > The dma_map_page call here essentially stops the iommu from coalescing
> > the page-table into contigous iommu VAs, and stops us from using huge
> > page-table-entries for the PPGTT. IMO we should investigate whether we
> > could build a single scatter-gather using sg_alloc_table_from_pages()
> > or perhaps look at sg_alloc_append_table_from_pages(), and then use
> > dma_map_sg() on contiguous chunks of system pages within that sg-table.
> >
> 
> Yes, this was my concern as well. I suggested we continue to use
> sg_alloc_table_from_pages_segment + dma_map_sgtable with only system
> support for now as device private page tables are still unused at the
> moment. That suggestion, among, other seems to have been ignored. I'm
> referring to my reply in [1], it would be nice to see those suggestions
> followed.
> 
> Thomas is right about huge page-table-entries in the PPGTT. Add some
> printks if you need to see this in action (xe_pt_stage_bind_entry,
> xe_pt_hugepte_possible returning true).
> 
> Make sure the IOMMU is enabled (intel_iommu=on on kernel cmdline) and run:
> 
> xe_vm.large-userptr-binds-16777216 -> This will use 8x 2MB page tables on tip
> xe_vm.large-userptr-binds-2147483648 -> This will use 2x 1GB page tables on tip
> 
> With your series it will likely use 4k pages for everything.
> 
> Matt
> 
> [1]
> https://patchwork.freedesktop.org/patch/582859/?series=131117&rev=1#comm
> ent_1064237

Damn, this comment was indeed overlooked...

Yes, both yours and Thomas' comments make a lot of sense to me. Let me follow up.

Oak

> 
> > /Thomas
> >
> >
> > > +
> > > +		if (sg && (addr == (sg_dma_address(sg) + sg-
> > > >length))) {
> > > +			sg->length += PAGE_SIZE;
> > > +			sg_dma_len(sg) += PAGE_SIZE;
> > > +			continue;
> > > +		}
> > > +
> > > +		sg =  sg ? sg_next(sg) : st->sgl;
> > > +		sg_dma_address(sg) = addr;
> > > +		sg_dma_len(sg) = PAGE_SIZE;
> > > +		sg->length = PAGE_SIZE;
> > > +		st->nents++;
> > > +	}
> > > +
> > > +	sg_mark_end(sg);
> > > +	return 0;
> > > +}
> > > +
> > > +/**
> > > + * xe_userptr_populate_range() - Populate physical pages of a
> > > virtual
> > > + * address range
> > > + *
> > > + * @uvma: userptr vma which has information of the range to
> > > populate.
> > > + *
> > > + * This function populate the physical pages of a virtual
> > > + * address range. The populated physical pages is saved in
> > > + * userptr's sg table. It is similar to get_user_pages but call
> > > + * hmm_range_fault.
> > > + *
> > > + * This function also read mmu notifier sequence # (
> > > + * mmu_interval_read_begin), for the purpose of later
> > > + * comparison (through mmu_interval_read_retry).
> > > + *
> > > + * This must be called with mmap read or write lock held.
> > > + *
> > > + * This function allocates the storage of the userptr sg table.
> > > + * It is caller's responsibility to free it calling sg_free_table.
> > > + *
> > > + * returns: 0 for succuss; negative error no on failure
> > > + */
> > > +int xe_userptr_populate_range(struct xe_userptr_vma *uvma)
> > > +{
> > > +	unsigned long timeout =
> > > +		jiffies +
> > > msecs_to_jiffies(HMM_RANGE_DEFAULT_TIMEOUT);
> > > +	unsigned long *pfns, flags = HMM_PFN_REQ_FAULT;
> > > +	struct xe_userptr *userptr;
> > > +	struct xe_vma *vma = &uvma->vma;
> > > +	u64 start = xe_vma_userptr(vma);
> > > +	u64 end = start + xe_vma_size(vma);
> > > +	struct xe_vm *vm = xe_vma_vm(vma);
> > > +	struct hmm_range hmm_range;
> > > +	bool write = !xe_vma_read_only(vma);
> > > +	bool in_kthread = !current->mm;
> > > +	u64 npages;
> > > +	int ret;
> > > +
> > > +	userptr = &uvma->userptr;
> > > +	mmap_assert_locked(userptr->notifier.mm);
> > > +
> > > +	if (vma->gpuva.flags & XE_VMA_DESTROYED)
> > > +		return 0;
> > > +
> > > +	npages = npages_in_range(start, end);
> > > +	pfns = kvmalloc_array(npages, sizeof(*pfns), GFP_KERNEL);
> > > +	if (unlikely(!pfns))
> > > +		return -ENOMEM;
> > > +
> > > +	if (write)
> > > +		flags |= HMM_PFN_REQ_WRITE;
> > > +
> > > +	if (in_kthread) {
> > > +		if (!mmget_not_zero(userptr->notifier.mm)) {
> > > +			ret = -EFAULT;
> > > +			goto free_pfns;
> > > +		}
> > > +		kthread_use_mm(userptr->notifier.mm);
> > > +	}
> > > +
> > > +	memset64((u64 *)pfns, (u64)flags, npages);
> > > +	hmm_range.hmm_pfns = pfns;
> > > +	hmm_range.notifier = &userptr->notifier;
> > > +	hmm_range.start = start;
> > > +	hmm_range.end = end;
> > > +	hmm_range.pfn_flags_mask = HMM_PFN_REQ_FAULT |
> > > HMM_PFN_REQ_WRITE;
> > > +	/**
> > > +	 * FIXME:
> > > +	 * Set the the dev_private_owner can prevent hmm_range_fault
> > > to fault
> > > +	 * in the device private pages owned by caller. See function
> > > +	 * hmm_vma_handle_pte. In multiple GPU case, this should be
> > > set to the
> > > +	 * device owner of the best migration destination. e.g.,
> > > device0/vm0
> > > +	 * has a page fault, but we have determined the best
> > > placement of
> > > +	 * the fault address should be on device1, we should set
> > > below to
> > > +	 * device1 instead of device0.
> > > +	 */
> > > +	hmm_range.dev_private_owner = vm->xe;
> > > +
> > > +	while (true) {
> > > +		hmm_range.notifier_seq =
> > > mmu_interval_read_begin(&userptr->notifier);
> > > +		ret = hmm_range_fault(&hmm_range);
> > > +		if (time_after(jiffies, timeout))
> > > +			break;
> > > +
> > > +		if (ret == -EBUSY)
> > > +			continue;
> > > +		break;
> > > +	}
> > > +
> > > +	if (in_kthread) {
> > > +		kthread_unuse_mm(userptr->notifier.mm);
> > > +		mmput(userptr->notifier.mm);
> > > +	}
> > > +
> > > +	if (ret)
> > > +		goto free_pfns;
> > > +
> > > +	ret = build_sg(vm->xe, &hmm_range, &userptr->sgt, write);
> > > +	if (ret)
> > > +		goto free_pfns;
> > > +
> > > +	xe_mark_range_accessed(&hmm_range, write);
> > > +	userptr->sg = &userptr->sgt;
> > > +	userptr->notifier_seq = hmm_range.notifier_seq;
> > > +
> > > +free_pfns:
> > > +	kvfree(pfns);
> > > +	return ret;
> > > +}
> > > +
> > > diff --git a/drivers/gpu/drm/xe/xe_hmm.h
> > > b/drivers/gpu/drm/xe/xe_hmm.h
> > > new file mode 100644
> > > index 000000000000..fa5ddc11f10b
> > > --- /dev/null
> > > +++ b/drivers/gpu/drm/xe/xe_hmm.h
> > > @@ -0,0 +1,10 @@
> > > +// SPDX-License-Identifier: MIT
> > > +/*
> > > + * Copyright © 2024 Intel Corporation
> > > + */
> > > +
> > > +#include <linux/types.h>
> > > +
> > > +struct xe_userptr_vma;
> > > +
> > > +int xe_userptr_populate_range(struct xe_userptr_vma *uvma);
> >

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [PATCH 2/8] drm/xe/svm: Add DRM_XE_SVM kernel config entry
  2024-03-19  9:25   ` Hellstrom, Thomas
@ 2024-03-19 18:27     ` Zeng, Oak
  0 siblings, 0 replies; 21+ messages in thread
From: Zeng, Oak @ 2024-03-19 18:27 UTC (permalink / raw
  To: Hellstrom, Thomas, intel-xe@lists.freedesktop.org
  Cc: Brost, Matthew, Welty, Brian, Ghimiray, Himal Prasad



> -----Original Message-----
> From: Hellstrom, Thomas <thomas.hellstrom@intel.com>
> Sent: Tuesday, March 19, 2024 5:26 AM
> To: intel-xe@lists.freedesktop.org; Zeng, Oak <oak.zeng@intel.com>
> Cc: Brost, Matthew <matthew.brost@intel.com>; Welty, Brian
> <brian.welty@intel.com>; Ghimiray, Himal Prasad
> <himal.prasad.ghimiray@intel.com>
> Subject: Re: [PATCH 2/8] drm/xe/svm: Add DRM_XE_SVM kernel config entry
> 
> Hi, Oak,
> 
> On Mon, 2024-03-18 at 22:55 -0400, Oak Zeng wrote:
> > DRM_XE_SVM kernel config entry is added so
> > xe svm feature can be configured before kernel
> > compilation.
> 
> Please use imperative language in commit messages.
> 
> 
> >
> > Signed-off-by: Oak Zeng <oak.zeng@intel.com>
> > Co-developed-by: Niranjana Vishwanathapura
> > <niranjana.vishwanathapura@intel.com>
> > Signed-off-by: Niranjana Vishwanathapura
> > <niranjana.vishwanathapura@intel.com>
> > Cc: Matthew Brost <matthew.brost@intel.com>
> > Cc: Thomas Hellström <thomas.hellstrom@intel.com>
> > Cc: Brian Welty <brian.welty@intel.com>
> > ---
> >  drivers/gpu/drm/xe/Kconfig   | 21 +++++++++++++++++++++
> >  drivers/gpu/drm/xe/xe_tile.c |  4 ++++
> >  2 files changed, 25 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/xe/Kconfig b/drivers/gpu/drm/xe/Kconfig
> > index 1a556d087e63..e244165459c5 100644
> > --- a/drivers/gpu/drm/xe/Kconfig
> > +++ b/drivers/gpu/drm/xe/Kconfig
> > @@ -83,6 +83,27 @@ config DRM_XE_FORCE_PROBE
> >  	  4571.
> >
> >  	  Use "!*" to block the probe of the driver for all known
> > devices.
> > +config DRM_XE_SVM
> > +	bool "Enable Shared Virtual Memory support in xe"
> > +	depends on DRM_XE
> > +	depends on ARCH_ENABLE_MEMORY_HOTPLUG
> > +	depends on ARCH_ENABLE_MEMORY_HOTREMOVE
> > +	depends on MEMORY_HOTPLUG
> > +	depends on MEMORY_HOTREMOVE
> > +	depends on ARCH_HAS_PTE_DEVMAP
> > +	depends on SPARSEMEM_VMEMMAP
> > +	depends on ZONE_DEVICE
> > +	depends on DEVICE_PRIVATE
> > +	depends on MMU
> > +	select HMM_MIRROR
> > +	select MMU_NOTIFIER
> > +	default y
> > +	help
> > +	  Choose this option if you want Shared Virtual Memory (SVM)
> > +	  support in xe. With SVM, virtual address space is shared
> > +	  between CPU and GPU. This means any virtual address such
> > +	  as malloc or mmap returns, variables on stack, or global
> > +	  memory pointers, can be used for GPU transparently.
> >
> >  menu "drm/Xe Debugging"
> >  depends on DRM_XE
> > diff --git a/drivers/gpu/drm/xe/xe_tile.c
> > b/drivers/gpu/drm/xe/xe_tile.c
> > index f1c4f9de51df..b52a00a6b5d5 100644
> > --- a/drivers/gpu/drm/xe/xe_tile.c
> > +++ b/drivers/gpu/drm/xe/xe_tile.c
> > @@ -159,7 +159,9 @@ static int tile_ttm_mgr_init(struct xe_tile
> > *tile)
> >   */
> >  int xe_tile_init_noalloc(struct xe_tile *tile)
> >  {
> > +#if IS_ENABLED(CONFIG_DRM_XE_SVM)
> >  	struct xe_device *xe = tile_to_xe(tile);
> > +#endif
> 
> Avoid using conditional compilation in this way inside functions if
> possible, please see below:

Below is doable.

For above, do you want something like:

#if IS_ENABLED(CONFIG_DRM_XE_SVM)
     struct xe_device *xe;
#endif

If (IS_ENABLED(CONFIG_DRM_XE_SVM))
	Xe = tile_to_xe(tile);

It looks like we have to use #if in above code... otherwise, the definition of xe will be not used when DRM_XE_SVM is not defined...

Maybe keep the original for above?
> 
> >  	int err;
> >
> >  	xe_device_mem_access_get(tile_to_xe(tile));
> > @@ -177,8 +179,10 @@ int xe_tile_init_noalloc(struct xe_tile *tile)
> >
> >  	xe_tile_sysfs_init(tile);
> >
> > +#if IS_ENABLED(CONFIG_DRM_XE_SVM)
> >  	if (xe->info.has_usm)
> 
> Could be:
> 	if (IS_ENABLED(CONFIG_DRM_XE_SVM) && tile_to_xe(tile)-
> >info.has_usm)

Will fix.

Oak
> 
> /Thomas
> 
> 
> >  		xe_devm_add(tile, &tile->mem.vram);
> > +#endif
> >  err_mem_access:
> >  	xe_device_mem_access_put(tile_to_xe(tile));
> >  	return err;


^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [PATCH 6/8] drm/xe: Helper to populate a userptr or hmmptr
  2024-03-19 10:33   ` Hellstrom, Thomas
  2024-03-19 10:45     ` Hellstrom, Thomas
  2024-03-19 16:48     ` Matthew Brost
@ 2024-03-19 20:45     ` Zeng, Oak
  2 siblings, 0 replies; 21+ messages in thread
From: Zeng, Oak @ 2024-03-19 20:45 UTC (permalink / raw
  To: Hellstrom, Thomas, intel-xe@lists.freedesktop.org
  Cc: Brost, Matthew, Welty, Brian, Ghimiray, Himal Prasad



> -----Original Message-----
> From: Hellstrom, Thomas <thomas.hellstrom@intel.com>
> Sent: Tuesday, March 19, 2024 6:34 AM
> To: intel-xe@lists.freedesktop.org; Zeng, Oak <oak.zeng@intel.com>
> Cc: Brost, Matthew <matthew.brost@intel.com>; Welty, Brian
> <brian.welty@intel.com>; Ghimiray, Himal Prasad
> <himal.prasad.ghimiray@intel.com>
> Subject: Re: [PATCH 6/8] drm/xe: Helper to populate a userptr or hmmptr
> 
> On Mon, 2024-03-18 at 22:55 -0400, Oak Zeng wrote:
> > Add a helper function xe_userptr_populate_range to populate
> > a a userptr or hmmptr range. This functions calls hmm_range_fault
> > to read CPU page tables and populate all pfns/pages of this
> > virtual address range.
> >
> > If the populated page is system memory page, dma-mapping is performed
> > to get a dma-address which can be used later for GPU to access pages.
> >
> > If the populated page is device private page, we calculate the dpa (
> > device physical address) of the page.
> >
> > The dma-address or dpa is then saved in userptr's sg table. This is
> > prepare work to replace the get_user_pages_fast code in userptr code
> > path. The helper function will also be used to populate hmmptr later.
> >
> > v1: Address review comments:
> >     separate a npage_in_range function (Matt)
> >     reparameterize function xe_userptr_populate_range function (Matt)
> >     move mmu_interval_read_begin() call into while loop (Thomas)
> >     s/mark_range_accessed/xe_mark_range_accessed (Thomas)
> >     use set_page_dirty_lock (vs set_page_dirty) (Thomas)
> >     move a few checking in xe_vma_userptr_pin_pages to hmm.c (Matt)
> >
> > Signed-off-by: Oak Zeng <oak.zeng@intel.com>
> > Co-developed-by: Niranjana Vishwanathapura
> > <niranjana.vishwanathapura@intel.com>
> > Signed-off-by: Niranjana Vishwanathapura
> > <niranjana.vishwanathapura@intel.com>
> > Cc: Matthew Brost <matthew.brost@intel.com>
> > Cc: Thomas Hellström <thomas.hellstrom@intel.com>
> > Cc: Brian Welty <brian.welty@intel.com>
> > ---
> >  drivers/gpu/drm/xe/Makefile |   1 +
> >  drivers/gpu/drm/xe/xe_hmm.c | 231
> > ++++++++++++++++++++++++++++++++++++
> >  drivers/gpu/drm/xe/xe_hmm.h |  10 ++
> >  3 files changed, 242 insertions(+)
> >  create mode 100644 drivers/gpu/drm/xe/xe_hmm.c
> >  create mode 100644 drivers/gpu/drm/xe/xe_hmm.h
> >
> > diff --git a/drivers/gpu/drm/xe/Makefile
> > b/drivers/gpu/drm/xe/Makefile
> > index e2ec6d1375c0..52300de1e86f 100644
> > --- a/drivers/gpu/drm/xe/Makefile
> > +++ b/drivers/gpu/drm/xe/Makefile
> > @@ -101,6 +101,7 @@ xe-y += xe_bb.o \
> >  	xe_guc_pc.o \
> >  	xe_guc_submit.o \
> >  	xe_heci_gsc.o \
> > +	xe_hmm.o \
> >  	xe_hw_engine.o \
> >  	xe_hw_engine_class_sysfs.o \
> >  	xe_hw_fence.o \
> > diff --git a/drivers/gpu/drm/xe/xe_hmm.c
> > b/drivers/gpu/drm/xe/xe_hmm.c
> > new file mode 100644
> > index 000000000000..305e3f2e659b
> > --- /dev/null
> > +++ b/drivers/gpu/drm/xe/xe_hmm.c
> > @@ -0,0 +1,231 @@
> > +// SPDX-License-Identifier: MIT
> > +/*
> > + * Copyright © 2024 Intel Corporation
> > + */
> > +
> > +#include <linux/mmu_notifier.h>
> > +#include <linux/dma-mapping.h>
> > +#include <linux/memremap.h>
> > +#include <linux/swap.h>
> > +#include <linux/hmm.h>
> > +#include <linux/mm.h>
> > +#include "xe_hmm.h"
> > +#include "xe_svm.h"
> > +#include "xe_vm.h"
> > +
> > +static inline u64 npages_in_range(unsigned long start, unsigned long
> > end)
> > +{
> > +	return ((end - 1) >> PAGE_SHIFT) - (start >> PAGE_SHIFT) +
> > 1;
> > +}
> > +
> > +/**
> > + * xe_mark_range_accessed() - mark a range is accessed, so core mm
> > + * have such information for memory eviction or write back to
> > + * hard disk
> > + *
> > + * @range: the range to mark
> > + * @write: if write to this range, we mark pages in this range
> > + * as dirty
> > + */
> > +static void xe_mark_range_accessed(struct hmm_range *range, bool
> > write)
> > +{
> > +	struct page *page;
> > +	u64 i, npages;
> > +
> > +	npages = npages_in_range(range->start, range->end);
> > +	for (i = 0; i < npages; i++) {
> > +		page = hmm_pfn_to_page(range->hmm_pfns[i]);
> > +		if (write)
> > +			set_page_dirty_lock(page);
> > +
> > +		mark_page_accessed(page);
> > +	}
> > +}
> > +
> > +/**
> > + * build_sg() - build a scatter gather table for all the physical
> > pages/pfn
> > + * in a hmm_range. dma-address is save in sg table and will be used
> > to program
> > + * GPU page table later.
> > + *
> > + * @xe: the xe device who will access the dma-address in sg table
> > + * @range: the hmm range that we build the sg table from. range-
> > >hmm_pfns[]
> > + * has the pfn numbers of pages that back up this hmm address range.
> > + * @st: pointer to the sg table.
> > + * @write: whether we write to this range. This decides dma map
> > direction
> > + * for system pages. If write we map it bi-diretional; otherwise
> > + * DMA_TO_DEVICE
> > + *
> > + * All the contiguous pfns will be collapsed into one entry in
> > + * the scatter gather table. This is for the convenience of
> > + * later on operations to bind address range to GPU page table.
> > + *
> > + * The dma_address in the sg table will later be used by GPU to
> > + * access memory. So if the memory is system memory, we need to
> > + * do a dma-mapping so it can be accessed by GPU/DMA. If the memory
> > + * is GPU local memory (of the GPU who is going to access memory),
> > + * we need gpu dpa (device physical address), and there is no need
> > + * of dma-mapping.
> > + *
> > + * FIXME: dma-mapping for peer gpu device to access remote gpu's
> > + * memory. Add this when you support p2p
> > + *
> > + * This function allocates the storage of the sg table. It is
> > + * caller's responsibility to free it calling sg_free_table.
> > + *
> > + * Returns 0 if successful; -ENOMEM if fails to allocate memory
> > + */
> > +static int build_sg(struct xe_device *xe, struct hmm_range *range,
> > +			     struct sg_table *st, bool write)
> > +{
> > +	struct device *dev = xe->drm.dev;
> > +	struct scatterlist *sg;
> > +	u64 i, npages;
> > +
> > +	sg = NULL;
> > +	st->nents = 0;
> > +	npages = npages_in_range(range->start, range->end);
> > +
> > +	if (unlikely(sg_alloc_table(st, npages, GFP_KERNEL)))
> > +		return -ENOMEM;
> > +
> > +	for (i = 0; i < npages; i++) {
> > +		struct page *page;
> > +		unsigned long addr;
> > +		struct xe_mem_region *mr;
> > +
> > +		page = hmm_pfn_to_page(range->hmm_pfns[i]);
> > +		if (is_device_private_page(page)) {
> > +			mr = xe_page_to_mem_region(page);
> > +			addr = xe_mem_region_pfn_to_dpa(mr, range-
> > >hmm_pfns[i]);
> > +		} else {
> > +			addr = dma_map_page(dev, page, 0, PAGE_SIZE,
> > +					write ? DMA_BIDIRECTIONAL :
> > DMA_TO_DEVICE);
> > +		}
> 
> The dma_map_page call here essentially stops the iommu from coalescing
> the page-table into contigous iommu VAs, and stops us from using huge
> page-table-entries for the PPGTT. IMO we should investigate whether we
> could build a single scatter-gather using sg_alloc_table_from_pages()
> or perhaps look at sg_alloc_append_table_from_pages(), and then use
> dma_map_sg() on contiguous chunks of system pages within that sg-table.


Yes that make sense. Will do what you and Matt suggested.
Oak
> 
> /Thomas
> 
> 
> > +
> > +		if (sg && (addr == (sg_dma_address(sg) + sg-
> > >length))) {
> > +			sg->length += PAGE_SIZE;
> > +			sg_dma_len(sg) += PAGE_SIZE;
> > +			continue;
> > +		}
> > +
> > +		sg =  sg ? sg_next(sg) : st->sgl;
> > +		sg_dma_address(sg) = addr;
> > +		sg_dma_len(sg) = PAGE_SIZE;
> > +		sg->length = PAGE_SIZE;
> > +		st->nents++;
> > +	}
> > +
> > +	sg_mark_end(sg);
> > +	return 0;
> > +}
> > +
> > +/**
> > + * xe_userptr_populate_range() - Populate physical pages of a
> > virtual
> > + * address range
> > + *
> > + * @uvma: userptr vma which has information of the range to
> > populate.
> > + *
> > + * This function populate the physical pages of a virtual
> > + * address range. The populated physical pages is saved in
> > + * userptr's sg table. It is similar to get_user_pages but call
> > + * hmm_range_fault.
> > + *
> > + * This function also read mmu notifier sequence # (
> > + * mmu_interval_read_begin), for the purpose of later
> > + * comparison (through mmu_interval_read_retry).
> > + *
> > + * This must be called with mmap read or write lock held.
> > + *
> > + * This function allocates the storage of the userptr sg table.
> > + * It is caller's responsibility to free it calling sg_free_table.
> > + *
> > + * returns: 0 for succuss; negative error no on failure
> > + */
> > +int xe_userptr_populate_range(struct xe_userptr_vma *uvma)
> > +{
> > +	unsigned long timeout =
> > +		jiffies +
> > msecs_to_jiffies(HMM_RANGE_DEFAULT_TIMEOUT);
> > +	unsigned long *pfns, flags = HMM_PFN_REQ_FAULT;
> > +	struct xe_userptr *userptr;
> > +	struct xe_vma *vma = &uvma->vma;
> > +	u64 start = xe_vma_userptr(vma);
> > +	u64 end = start + xe_vma_size(vma);
> > +	struct xe_vm *vm = xe_vma_vm(vma);
> > +	struct hmm_range hmm_range;
> > +	bool write = !xe_vma_read_only(vma);
> > +	bool in_kthread = !current->mm;
> > +	u64 npages;
> > +	int ret;
> > +
> > +	userptr = &uvma->userptr;
> > +	mmap_assert_locked(userptr->notifier.mm);
> > +
> > +	if (vma->gpuva.flags & XE_VMA_DESTROYED)
> > +		return 0;
> > +
> > +	npages = npages_in_range(start, end);
> > +	pfns = kvmalloc_array(npages, sizeof(*pfns), GFP_KERNEL);
> > +	if (unlikely(!pfns))
> > +		return -ENOMEM;
> > +
> > +	if (write)
> > +		flags |= HMM_PFN_REQ_WRITE;
> > +
> > +	if (in_kthread) {
> > +		if (!mmget_not_zero(userptr->notifier.mm)) {
> > +			ret = -EFAULT;
> > +			goto free_pfns;
> > +		}
> > +		kthread_use_mm(userptr->notifier.mm);
> > +	}
> > +
> > +	memset64((u64 *)pfns, (u64)flags, npages);
> > +	hmm_range.hmm_pfns = pfns;
> > +	hmm_range.notifier = &userptr->notifier;
> > +	hmm_range.start = start;
> > +	hmm_range.end = end;
> > +	hmm_range.pfn_flags_mask = HMM_PFN_REQ_FAULT |
> > HMM_PFN_REQ_WRITE;
> > +	/**
> > +	 * FIXME:
> > +	 * Set the the dev_private_owner can prevent hmm_range_fault
> > to fault
> > +	 * in the device private pages owned by caller. See function
> > +	 * hmm_vma_handle_pte. In multiple GPU case, this should be
> > set to the
> > +	 * device owner of the best migration destination. e.g.,
> > device0/vm0
> > +	 * has a page fault, but we have determined the best
> > placement of
> > +	 * the fault address should be on device1, we should set
> > below to
> > +	 * device1 instead of device0.
> > +	 */
> > +	hmm_range.dev_private_owner = vm->xe;
> > +
> > +	while (true) {
> > +		hmm_range.notifier_seq =
> > mmu_interval_read_begin(&userptr->notifier);
> > +		ret = hmm_range_fault(&hmm_range);
> > +		if (time_after(jiffies, timeout))
> > +			break;
> > +
> > +		if (ret == -EBUSY)
> > +			continue;
> > +		break;
> > +	}
> > +
> > +	if (in_kthread) {
> > +		kthread_unuse_mm(userptr->notifier.mm);
> > +		mmput(userptr->notifier.mm);
> > +	}
> > +
> > +	if (ret)
> > +		goto free_pfns;
> > +
> > +	ret = build_sg(vm->xe, &hmm_range, &userptr->sgt, write);
> > +	if (ret)
> > +		goto free_pfns;
> > +
> > +	xe_mark_range_accessed(&hmm_range, write);
> > +	userptr->sg = &userptr->sgt;
> > +	userptr->notifier_seq = hmm_range.notifier_seq;
> > +
> > +free_pfns:
> > +	kvfree(pfns);
> > +	return ret;
> > +}
> > +
> > diff --git a/drivers/gpu/drm/xe/xe_hmm.h
> > b/drivers/gpu/drm/xe/xe_hmm.h
> > new file mode 100644
> > index 000000000000..fa5ddc11f10b
> > --- /dev/null
> > +++ b/drivers/gpu/drm/xe/xe_hmm.h
> > @@ -0,0 +1,10 @@
> > +// SPDX-License-Identifier: MIT
> > +/*
> > + * Copyright © 2024 Intel Corporation
> > + */
> > +
> > +#include <linux/types.h>
> > +
> > +struct xe_userptr_vma;
> > +
> > +int xe_userptr_populate_range(struct xe_userptr_vma *uvma);


^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [PATCH 6/8] drm/xe: Helper to populate a userptr or hmmptr
  2024-03-19 10:45     ` Hellstrom, Thomas
@ 2024-03-19 20:56       ` Zeng, Oak
  0 siblings, 0 replies; 21+ messages in thread
From: Zeng, Oak @ 2024-03-19 20:56 UTC (permalink / raw
  To: Hellstrom, Thomas, intel-xe@lists.freedesktop.org
  Cc: Brost, Matthew, Welty, Brian, Ghimiray, Himal Prasad



> -----Original Message-----
> From: Hellstrom, Thomas <thomas.hellstrom@intel.com>
> Sent: Tuesday, March 19, 2024 6:45 AM
> To: intel-xe@lists.freedesktop.org; Zeng, Oak <oak.zeng@intel.com>
> Cc: Brost, Matthew <matthew.brost@intel.com>; Welty, Brian
> <brian.welty@intel.com>; Ghimiray, Himal Prasad
> <himal.prasad.ghimiray@intel.com>
> Subject: Re: [PATCH 6/8] drm/xe: Helper to populate a userptr or hmmptr
> 
> Some more comments inline:
> 
> On Tue, 2024-03-19 at 10:33 +0000, Hellstrom, Thomas wrote:
> > On Mon, 2024-03-18 at 22:55 -0400, Oak Zeng wrote:
> > > Add a helper function xe_userptr_populate_range to populate
> > > a a userptr or hmmptr range. This functions calls hmm_range_fault
> > > to read CPU page tables and populate all pfns/pages of this
> > > virtual address range.
> > >
> > > If the populated page is system memory page, dma-mapping is
> > > performed
> > > to get a dma-address which can be used later for GPU to access
> > > pages.
> > >
> > > If the populated page is device private page, we calculate the dpa
> > > (
> > > device physical address) of the page.
> > >
> > > The dma-address or dpa is then saved in userptr's sg table. This is
> > > prepare work to replace the get_user_pages_fast code in userptr
> > > code
> > > path. The helper function will also be used to populate hmmptr
> > > later.
> > >
> > > v1: Address review comments:
> > >     separate a npage_in_range function (Matt)
> > >     reparameterize function xe_userptr_populate_range function
> > > (Matt)
> > >     move mmu_interval_read_begin() call into while loop (Thomas)
> > >     s/mark_range_accessed/xe_mark_range_accessed (Thomas)
> > >     use set_page_dirty_lock (vs set_page_dirty) (Thomas)
> > >     move a few checking in xe_vma_userptr_pin_pages to hmm.c (Matt)
> > >
> > > Signed-off-by: Oak Zeng <oak.zeng@intel.com>
> > > Co-developed-by: Niranjana Vishwanathapura
> > > <niranjana.vishwanathapura@intel.com>
> > > Signed-off-by: Niranjana Vishwanathapura
> > > <niranjana.vishwanathapura@intel.com>
> > > Cc: Matthew Brost <matthew.brost@intel.com>
> > > Cc: Thomas Hellström <thomas.hellstrom@intel.com>
> > > Cc: Brian Welty <brian.welty@intel.com>
> > > ---
> > >  drivers/gpu/drm/xe/Makefile |   1 +
> > >  drivers/gpu/drm/xe/xe_hmm.c | 231
> > > ++++++++++++++++++++++++++++++++++++
> > >  drivers/gpu/drm/xe/xe_hmm.h |  10 ++
> > >  3 files changed, 242 insertions(+)
> > >  create mode 100644 drivers/gpu/drm/xe/xe_hmm.c
> > >  create mode 100644 drivers/gpu/drm/xe/xe_hmm.h
> > >
> > > diff --git a/drivers/gpu/drm/xe/Makefile
> > > b/drivers/gpu/drm/xe/Makefile
> > > index e2ec6d1375c0..52300de1e86f 100644
> > > --- a/drivers/gpu/drm/xe/Makefile
> > > +++ b/drivers/gpu/drm/xe/Makefile
> > > @@ -101,6 +101,7 @@ xe-y += xe_bb.o \
> > >  	xe_guc_pc.o \
> > >  	xe_guc_submit.o \
> > >  	xe_heci_gsc.o \
> > > +	xe_hmm.o \
> > >  	xe_hw_engine.o \
> > >  	xe_hw_engine_class_sysfs.o \
> > >  	xe_hw_fence.o \
> > > diff --git a/drivers/gpu/drm/xe/xe_hmm.c
> > > b/drivers/gpu/drm/xe/xe_hmm.c
> > > new file mode 100644
> > > index 000000000000..305e3f2e659b
> > > --- /dev/null
> > > +++ b/drivers/gpu/drm/xe/xe_hmm.c
> > > @@ -0,0 +1,231 @@
> > > +// SPDX-License-Identifier: MIT
> > > +/*
> > > + * Copyright © 2024 Intel Corporation
> > > + */
> > > +
> > > +#include <linux/mmu_notifier.h>
> > > +#include <linux/dma-mapping.h>
> > > +#include <linux/memremap.h>
> > > +#include <linux/swap.h>
> > > +#include <linux/hmm.h>
> > > +#include <linux/mm.h>
> > > +#include "xe_hmm.h"
> > > +#include "xe_svm.h"
> > > +#include "xe_vm.h"
> > > +
> > > +static inline u64 npages_in_range(unsigned long start, unsigned
> 
> Avoid "inline" in C files. The compiler should be able to figure that
> out. Also update naming?

Will fix

> 
> > > long
> > > end)
> > > +{
> > > +	return ((end - 1) >> PAGE_SHIFT) - (start >> PAGE_SHIFT) +
> > > 1;
> 
> (PAGE_ALIGN(end) - PAGE_ALIGN_DOWN(start)) >> PAGE_SHIFT ?

Yes, that is better.
> 
> > > +}
> > > +
> > > +/**
> > > + * xe_mark_range_accessed() - mark a range is accessed, so core mm
> > > + * have such information for memory eviction or write back to
> > > + * hard disk
> > > + *
> > > + * @range: the range to mark
> > > + * @write: if write to this range, we mark pages in this range
> > > + * as dirty
> > > + */
> > > +static void xe_mark_range_accessed(struct hmm_range *range, bool
> > > write)
> > > +{
> > > +	struct page *page;
> > > +	u64 i, npages;
> > > +
> > > +	npages = npages_in_range(range->start, range->end);
> > > +	for (i = 0; i < npages; i++) {
> > > +		page = hmm_pfn_to_page(range->hmm_pfns[i]);
> > > +		if (write)
> > > +			set_page_dirty_lock(page);
> > > +
> > > +		mark_page_accessed(page);
> > > +	}
> > > +}
> > > +
> > > +/**
> > > + * build_sg() - build a scatter gather table for all the physical
> > > pages/pfn
> > > + * in a hmm_range. dma-address is save in sg table and will be
> > > used
> > > to program
> > > + * GPU page table later.
> > > + *
> > > + * @xe: the xe device who will access the dma-address in sg table
> > > + * @range: the hmm range that we build the sg table from. range-
> > > > hmm_pfns[]
> > > + * has the pfn numbers of pages that back up this hmm address
> > > range.
> > > + * @st: pointer to the sg table.
> > > + * @write: whether we write to this range. This decides dma map
> > > direction
> > > + * for system pages. If write we map it bi-diretional; otherwise
> > > + * DMA_TO_DEVICE
> > > + *
> > > + * All the contiguous pfns will be collapsed into one entry in
> > > + * the scatter gather table. This is for the convenience of
> > > + * later on operations to bind address range to GPU page table.
> > > + *
> > > + * The dma_address in the sg table will later be used by GPU to
> > > + * access memory. So if the memory is system memory, we need to
> > > + * do a dma-mapping so it can be accessed by GPU/DMA. If the
> > > memory
> > > + * is GPU local memory (of the GPU who is going to access memory),
> > > + * we need gpu dpa (device physical address), and there is no need
> > > + * of dma-mapping.
> > > + *
> > > + * FIXME: dma-mapping for peer gpu device to access remote gpu's
> > > + * memory. Add this when you support p2p
> > > + *
> > > + * This function allocates the storage of the sg table. It is
> > > + * caller's responsibility to free it calling sg_free_table.
> > > + *
> > > + * Returns 0 if successful; -ENOMEM if fails to allocate memory
> > > + */
> > > +static int build_sg(struct xe_device *xe, struct hmm_range *range,
> > > +			     struct sg_table *st, bool write)
> 
> Also naming here.

Will fix
> 
> > > +{
> > > +	struct device *dev = xe->drm.dev;
> > > +	struct scatterlist *sg;
> > > +	u64 i, npages;
> > > +
> > > +	sg = NULL;
> > > +	st->nents = 0;
> > > +	npages = npages_in_range(range->start, range->end);
> > > +
> > > +	if (unlikely(sg_alloc_table(st, npages, GFP_KERNEL)))
> > > +		return -ENOMEM;
> > > +
> > > +	for (i = 0; i < npages; i++) {
> > > +		struct page *page;
> > > +		unsigned long addr;
> > > +		struct xe_mem_region *mr;
> > > +
> > > +		page = hmm_pfn_to_page(range->hmm_pfns[i]);
> > > +		if (is_device_private_page(page)) {
> > > +			mr = xe_page_to_mem_region(page);
> > > +			addr = xe_mem_region_pfn_to_dpa(mr, range-
> > > > hmm_pfns[i]);
> > > +		} else {
> > > +			addr = dma_map_page(dev, page, 0,
> > > PAGE_SIZE,
> > > +					write ? DMA_BIDIRECTIONAL
> > > :
> > > DMA_TO_DEVICE);
> > > +		}
> >
> > The dma_map_page call here essentially stops the iommu from
> > coalescing
> > the page-table into contigous iommu VAs, and stops us from using huge
> > page-table-entries for the PPGTT. IMO we should investigate whether
> > we
> > could build a single scatter-gather using sg_alloc_table_from_pages()
> > or perhaps look at sg_alloc_append_table_from_pages(), and then use
> > dma_map_sg() on contiguous chunks of system pages within that sg-
> > table.
> >
> > /Thomas
> >
> >
> > > +
> > > +		if (sg && (addr == (sg_dma_address(sg) + sg-
> > > > length))) {
> > > +			sg->length += PAGE_SIZE;
> > > +			sg_dma_len(sg) += PAGE_SIZE;
> > > +			continue;
> > > +		}
> > > +
> > > +		sg =  sg ? sg_next(sg) : st->sgl;
> > > +		sg_dma_address(sg) = addr;
> > > +		sg_dma_len(sg) = PAGE_SIZE;
> > > +		sg->length = PAGE_SIZE;
> > > +		st->nents++;
> > > +	}
> > > +
> > > +	sg_mark_end(sg);
> > > +	return 0;
> > > +}
> > > +
> > > +/**
> > > + * xe_userptr_populate_range() - Populate physical pages of a
> > > virtual
> > > + * address range
> > > + *
> > > + * @uvma: userptr vma which has information of the range to
> > > populate.
> > > + *
> > > + * This function populate the physical pages of a virtual
> > > + * address range. The populated physical pages is saved in
> > > + * userptr's sg table. It is similar to get_user_pages but call
> > > + * hmm_range_fault.
> > > + *
> > > + * This function also read mmu notifier sequence # (
> > > + * mmu_interval_read_begin), for the purpose of later
> > > + * comparison (through mmu_interval_read_retry).
> > > + *
> > > + * This must be called with mmap read or write lock held.
> > > + *
> > > + * This function allocates the storage of the userptr sg table.
> > > + * It is caller's responsibility to free it calling sg_free_table.
> > > + *
> > > + * returns: 0 for succuss; negative error no on failure
> > > + */
> > > +int xe_userptr_populate_range(struct xe_userptr_vma *uvma)
> > > +{
> > > +	unsigned long timeout =
> > > +		jiffies +
> > > msecs_to_jiffies(HMM_RANGE_DEFAULT_TIMEOUT);
> > > +	unsigned long *pfns, flags = HMM_PFN_REQ_FAULT;
> > > +	struct xe_userptr *userptr;
> > > +	struct xe_vma *vma = &uvma->vma;
> > > +	u64 start = xe_vma_userptr(vma);
> > > +	u64 end = start + xe_vma_size(vma);
> > > +	struct xe_vm *vm = xe_vma_vm(vma);
> > > +	struct hmm_range hmm_range;
> > > +	bool write = !xe_vma_read_only(vma);
> > > +	bool in_kthread = !current->mm;
> > > +	u64 npages;
> > > +	int ret;
> > > +
> > > +	userptr = &uvma->userptr;
> > > +	mmap_assert_locked(userptr->notifier.mm);
> > > +
> > > +	if (vma->gpuva.flags & XE_VMA_DESTROYED)
> > > +		return 0;
> > > +
> > > +	npages = npages_in_range(start, end);
> > > +	pfns = kvmalloc_array(npages, sizeof(*pfns), GFP_KERNEL);
> > > +	if (unlikely(!pfns))
> > > +		return -ENOMEM;
> > > +
> > > +	if (write)
> > > +		flags |= HMM_PFN_REQ_WRITE;
> > > +
> > > +	if (in_kthread) {
> > > +		if (!mmget_not_zero(userptr->notifier.mm)) {
> > > +			ret = -EFAULT;
> > > +			goto free_pfns;
> > > +		}
> > > +		kthread_use_mm(userptr->notifier.mm);
> > > +	}
> > > +
> > > +	memset64((u64 *)pfns, (u64)flags, npages);
> > > +	hmm_range.hmm_pfns = pfns;
> > > +	hmm_range.notifier = &userptr->notifier;
> > > +	hmm_range.start = start;
> > > +	hmm_range.end = end;
> > > +	hmm_range.pfn_flags_mask = HMM_PFN_REQ_FAULT |
> > > HMM_PFN_REQ_WRITE;
> > > +	/**
> > > +	 * FIXME:
> > > +	 * Set the the dev_private_owner can prevent
> > > hmm_range_fault
> > > to fault
> > > +	 * in the device private pages owned by caller. See
> > > function
> > > +	 * hmm_vma_handle_pte. In multiple GPU case, this should
> > > be
> > > set to the
> > > +	 * device owner of the best migration destination. e.g.,
> > > device0/vm0
> > > +	 * has a page fault, but we have determined the best
> > > placement of
> > > +	 * the fault address should be on device1, we should set
> > > below to
> > > +	 * device1 instead of device0.
> > > +	 */
> > > +	hmm_range.dev_private_owner = vm->xe;
> > > +
> > > +	while (true) {
> > > +		hmm_range.notifier_seq =
> > > mmu_interval_read_begin(&userptr->notifier);
> > > +		ret = hmm_range_fault(&hmm_range);
> > > +		if (time_after(jiffies, timeout))
> > > +			break;
> > > +
> > > +		if (ret == -EBUSY)
> > > +			continue;
> > > +		break;
> > > +	}
> > > +
> > > +	if (in_kthread) {
> > > +		kthread_unuse_mm(userptr->notifier.mm);
> > > +		mmput(userptr->notifier.mm);
> > > +	}
> > > +
> > > +	if (ret)
> > > +		goto free_pfns;
> > > +
> > > +	ret = build_sg(vm->xe, &hmm_range, &userptr->sgt, write);
> > > +	if (ret)
> > > +		goto free_pfns;
> > > +
> > > +	xe_mark_range_accessed(&hmm_range, write);
> > > +	userptr->sg = &userptr->sgt;
> > > +	userptr->notifier_seq = hmm_range.notifier_seq;
> > > +
> > > +free_pfns:
> > > +	kvfree(pfns);
> > > +	return ret;
> > > +}
> > > +
> > > diff --git a/drivers/gpu/drm/xe/xe_hmm.h
> > > b/drivers/gpu/drm/xe/xe_hmm.h
> > > new file mode 100644
> > > index 000000000000..fa5ddc11f10b
> > > --- /dev/null
> > > +++ b/drivers/gpu/drm/xe/xe_hmm.h
> > > @@ -0,0 +1,10 @@
> > > +// SPDX-License-Identifier: MIT
> > > +/*
> > > + * Copyright © 2024 Intel Corporation
> > > + */
> > > +
> > > +#include <linux/types.h>
> > > +
> > > +struct xe_userptr_vma;
> > > +
> > > +int xe_userptr_populate_range(struct xe_userptr_vma *uvma);
> >


^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2024-03-19 20:56 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-03-19  2:55 [PATCH 0/8] Use hmm_range_fault to populate user page Oak Zeng
2024-03-19  2:55 ` [PATCH 1/8] drm/xe/svm: Remap and provide memmap backing for GPU vram Oak Zeng
2024-03-19  2:55 ` [PATCH 2/8] drm/xe/svm: Add DRM_XE_SVM kernel config entry Oak Zeng
2024-03-19  9:25   ` Hellstrom, Thomas
2024-03-19 18:27     ` Zeng, Oak
2024-03-19  2:55 ` [PATCH 3/8] drm/xe: Helper to get tile from memory region Oak Zeng
2024-03-19  9:28   ` Hellstrom, Thomas
2024-03-19  2:55 ` [PATCH 4/8] drm/xe: Introduce a helper to get dpa from pfn Oak Zeng
2024-03-19  2:55 ` [PATCH 5/8] drm/xe/svm: Get xe memory region from page Oak Zeng
2024-03-19  2:55 ` [PATCH 6/8] drm/xe: Helper to populate a userptr or hmmptr Oak Zeng
2024-03-19 10:33   ` Hellstrom, Thomas
2024-03-19 10:45     ` Hellstrom, Thomas
2024-03-19 20:56       ` Zeng, Oak
2024-03-19 16:48     ` Matthew Brost
2024-03-19 17:59       ` Zeng, Oak
2024-03-19 20:45     ` Zeng, Oak
2024-03-19  2:55 ` [PATCH 7/8] drm/xe: Introduce a helper to free sg table Oak Zeng
2024-03-19  2:55 ` [PATCH 8/8] drm/xe: Use hmm_range_fault to populate user pages Oak Zeng
2024-03-19  4:09 ` ✓ CI.Patch_applied: success for Use hmm_range_fault to populate user page (rev2) Patchwork
2024-03-19  4:09 ` ✗ CI.checkpatch: warning " Patchwork
2024-03-19  4:10 ` ✗ CI.KUnit: failure " Patchwork

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.