From: Leon Romanovsky <leon@kernel.org> To: Christoph Hellwig <hch@lst.de>, Robin Murphy <robin.murphy@arm.com>, Marek Szyprowski <m.szyprowski@samsung.com>, Joerg Roedel <joro@8bytes.org>, Will Deacon <will@kernel.org>, Jason Gunthorpe <jgg@ziepe.ca>, Chaitanya Kulkarni <chaitanyak@nvidia.com> Cc: "Leon Romanovsky" <leonro@nvidia.com>, "Jonathan Corbet" <corbet@lwn.net>, "Jens Axboe" <axboe@kernel.dk>, "Keith Busch" <kbusch@kernel.org>, "Sagi Grimberg" <sagi@grimberg.me>, "Yishai Hadas" <yishaih@nvidia.com>, "Shameer Kolothum" <shameerali.kolothum.thodi@huawei.com>, "Kevin Tian" <kevin.tian@intel.com>, "Alex Williamson" <alex.williamson@redhat.com>, "Jérôme Glisse" <jglisse@redhat.com>, "Andrew Morton" <akpm@linux-foundation.org>, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-block@vger.kernel.org, linux-rdma@vger.kernel.org, iommu@lists.linux.dev, linux-nvme@lists.infradead.org, kvm@vger.kernel.org, linux-mm@kvack.org, "Bart Van Assche" <bvanassche@acm.org>, "Damien Le Moal" <damien.lemoal@opensource.wdc.com>, "Amir Goldstein" <amir73il@gmail.com>, "josef@toxicpanda.com" <josef@toxicpanda.com>, "Martin K. Petersen" <martin.petersen@oracle.com>, "daniel@iogearbox.net" <daniel@iogearbox.net>, "Dan Williams" <dan.j.williams@intel.com>, "jack@suse.com" <jack@suse.com>, "Zhu Yanjun" <zyjzyj2000@gmail.com> Subject: [RFC 01/16] mm/hmm: let users to tag specific PFNs Date: Tue, 5 Mar 2024 12:15:11 +0200 [thread overview] Message-ID: <a77609c9c9a09214e38b04133e44eee67fe50ab0.1709631413.git.leon@kernel.org> (raw) From: Leon Romanovsky <leonro@nvidia.com> Introduce new sticky flag, which isn't overwritten by HMM range fault. Such flag allows users to tag specific PFNs with extra data in addition to already filled by HMM. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> --- include/linux/hmm.h | 3 +++ mm/hmm.c | 34 +++++++++++++++++++++------------- 2 files changed, 24 insertions(+), 13 deletions(-) diff --git a/include/linux/hmm.h b/include/linux/hmm.h index 126a36571667..b90902baa593 100644 --- a/include/linux/hmm.h +++ b/include/linux/hmm.h @@ -23,6 +23,7 @@ struct mmu_interval_notifier; * HMM_PFN_WRITE - if the page memory can be written to (requires HMM_PFN_VALID) * HMM_PFN_ERROR - accessing the pfn is impossible and the device should * fail. ie poisoned memory, special pages, no vma, etc + * HMM_PFN_STICKY - Flag preserved on input-to-output transformation * * On input: * 0 - Return the current state of the page, do not fault it. @@ -36,6 +37,8 @@ enum hmm_pfn_flags { HMM_PFN_VALID = 1UL << (BITS_PER_LONG - 1), HMM_PFN_WRITE = 1UL << (BITS_PER_LONG - 2), HMM_PFN_ERROR = 1UL << (BITS_PER_LONG - 3), + /* Sticky lag, carried from Input to Output */ + HMM_PFN_STICKY = 1UL << (BITS_PER_LONG - 7), HMM_PFN_ORDER_SHIFT = (BITS_PER_LONG - 8), /* Input flags */ diff --git a/mm/hmm.c b/mm/hmm.c index 277ddcab4947..9645a72beec0 100644 --- a/mm/hmm.c +++ b/mm/hmm.c @@ -44,8 +44,10 @@ static int hmm_pfns_fill(unsigned long addr, unsigned long end, { unsigned long i = (addr - range->start) >> PAGE_SHIFT; - for (; addr < end; addr += PAGE_SIZE, i++) - range->hmm_pfns[i] = cpu_flags; + for (; addr < end; addr += PAGE_SIZE, i++) { + range->hmm_pfns[i] &= HMM_PFN_STICKY; + range->hmm_pfns[i] |= cpu_flags; + } return 0; } @@ -202,8 +204,10 @@ static int hmm_vma_handle_pmd(struct mm_walk *walk, unsigned long addr, return hmm_vma_fault(addr, end, required_fault, walk); pfn = pmd_pfn(pmd) + ((addr & ~PMD_MASK) >> PAGE_SHIFT); - for (i = 0; addr < end; addr += PAGE_SIZE, i++, pfn++) - hmm_pfns[i] = pfn | cpu_flags; + for (i = 0; addr < end; addr += PAGE_SIZE, i++, pfn++) { + hmm_pfns[i] &= HMM_PFN_STICKY; + hmm_pfns[i] |= pfn | cpu_flags; + } return 0; } #else /* CONFIG_TRANSPARENT_HUGEPAGE */ @@ -236,7 +240,7 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr, hmm_pte_need_fault(hmm_vma_walk, pfn_req_flags, 0); if (required_fault) goto fault; - *hmm_pfn = 0; + *hmm_pfn = *hmm_pfn & HMM_PFN_STICKY; return 0; } @@ -253,14 +257,14 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr, cpu_flags = HMM_PFN_VALID; if (is_writable_device_private_entry(entry)) cpu_flags |= HMM_PFN_WRITE; - *hmm_pfn = swp_offset_pfn(entry) | cpu_flags; + *hmm_pfn = (*hmm_pfn & HMM_PFN_STICKY) | swp_offset_pfn(entry) | cpu_flags; return 0; } required_fault = hmm_pte_need_fault(hmm_vma_walk, pfn_req_flags, 0); if (!required_fault) { - *hmm_pfn = 0; + *hmm_pfn = *hmm_pfn & HMM_PFN_STICKY; return 0; } @@ -304,11 +308,11 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr, pte_unmap(ptep); return -EFAULT; } - *hmm_pfn = HMM_PFN_ERROR; + *hmm_pfn = (*hmm_pfn & HMM_PFN_STICKY) | HMM_PFN_ERROR; return 0; } - *hmm_pfn = pte_pfn(pte) | cpu_flags; + *hmm_pfn = (*hmm_pfn & HMM_PFN_STICKY) | pte_pfn(pte) | cpu_flags; return 0; fault: @@ -453,8 +457,10 @@ static int hmm_vma_walk_pud(pud_t *pudp, unsigned long start, unsigned long end, } pfn = pud_pfn(pud) + ((addr & ~PUD_MASK) >> PAGE_SHIFT); - for (i = 0; i < npages; ++i, ++pfn) - hmm_pfns[i] = pfn | cpu_flags; + for (i = 0; i < npages; ++i, ++pfn) { + hmm_pfns[i] &= HMM_PFN_STICKY; + hmm_pfns[i] |= pfn | cpu_flags; + } goto out_unlock; } @@ -512,8 +518,10 @@ static int hmm_vma_walk_hugetlb_entry(pte_t *pte, unsigned long hmask, } pfn = pte_pfn(entry) + ((start & ~hmask) >> PAGE_SHIFT); - for (; addr < end; addr += PAGE_SIZE, i++, pfn++) - range->hmm_pfns[i] = pfn | cpu_flags; + for (; addr < end; addr += PAGE_SIZE, i++, pfn++) { + range->hmm_pfns[i] &= HMM_PFN_STICKY; + range->hmm_pfns[i] |= pfn | cpu_flags; + } spin_unlock(ptl); return 0; -- 2.44.0
WARNING: multiple messages have this Message-ID (diff)
From: Leon Romanovsky <leon@kernel.org> To: Christoph Hellwig <hch@lst.de>, Robin Murphy <robin.murphy@arm.com>, Marek Szyprowski <m.szyprowski@samsung.com>, Joerg Roedel <joro@8bytes.org>, Will Deacon <will@kernel.org>, Jason Gunthorpe <jgg@ziepe.ca>, Chaitanya Kulkarni <chaitanyak@nvidia.com> Cc: "Leon Romanovsky" <leonro@nvidia.com>, "Jonathan Corbet" <corbet@lwn.net>, "Jens Axboe" <axboe@kernel.dk>, "Keith Busch" <kbusch@kernel.org>, "Sagi Grimberg" <sagi@grimberg.me>, "Yishai Hadas" <yishaih@nvidia.com>, "Shameer Kolothum" <shameerali.kolothum.thodi@huawei.com>, "Kevin Tian" <kevin.tian@intel.com>, "Alex Williamson" <alex.williamson@redhat.com>, "Jérôme Glisse" <jglisse@redhat.com>, "Andrew Morton" <akpm@linux-foundation.org>, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-block@vger.kernel.org, linux-rdma@vger.kernel.org, iommu@lists.linux.dev, linux-nvme@lists.infradead.org, kvm@vger.kernel.org, linux-mm@kvack.org, "Bart Van Assche" <bvanassche@acm.org>, "Damien Le Moal" <damien.lemoal@opensource.wdc.com>, "Amir Goldstein" <amir73il@gmail.com>, "josef@toxicpanda.com" <josef@toxicpanda.com>, "Martin K. Petersen" <martin.petersen@oracle.com>, "daniel@iogearbox.net" <daniel@iogearbox.net>, "Dan Williams" <dan.j.williams@intel.com>, "jack@suse.com" <jack@suse.com>, "Zhu Yanjun" <zyjzyj2000@gmail.com> Subject: [RFC 01/16] mm/hmm: let users to tag specific PFNs Date: Tue, 5 Mar 2024 12:22:02 +0200 [thread overview] Message-ID: <a77609c9c9a09214e38b04133e44eee67fe50ab0.1709631413.git.leon@kernel.org> (raw) Message-ID: <20240305102202.q7de1SMFDRfVPj-GGbKTZ6qb9f1bKms-qCTBnko1ryo@z> (raw) In-Reply-To: <cover.1709631800.git.leon@kernel.org> From: Leon Romanovsky <leonro@nvidia.com> Introduce new sticky flag, which isn't overwritten by HMM range fault. Such flag allows users to tag specific PFNs with extra data in addition to already filled by HMM. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> --- include/linux/hmm.h | 3 +++ mm/hmm.c | 34 +++++++++++++++++++++------------- 2 files changed, 24 insertions(+), 13 deletions(-) diff --git a/include/linux/hmm.h b/include/linux/hmm.h index 126a36571667..b90902baa593 100644 --- a/include/linux/hmm.h +++ b/include/linux/hmm.h @@ -23,6 +23,7 @@ struct mmu_interval_notifier; * HMM_PFN_WRITE - if the page memory can be written to (requires HMM_PFN_VALID) * HMM_PFN_ERROR - accessing the pfn is impossible and the device should * fail. ie poisoned memory, special pages, no vma, etc + * HMM_PFN_STICKY - Flag preserved on input-to-output transformation * * On input: * 0 - Return the current state of the page, do not fault it. @@ -36,6 +37,8 @@ enum hmm_pfn_flags { HMM_PFN_VALID = 1UL << (BITS_PER_LONG - 1), HMM_PFN_WRITE = 1UL << (BITS_PER_LONG - 2), HMM_PFN_ERROR = 1UL << (BITS_PER_LONG - 3), + /* Sticky lag, carried from Input to Output */ + HMM_PFN_STICKY = 1UL << (BITS_PER_LONG - 7), HMM_PFN_ORDER_SHIFT = (BITS_PER_LONG - 8), /* Input flags */ diff --git a/mm/hmm.c b/mm/hmm.c index 277ddcab4947..9645a72beec0 100644 --- a/mm/hmm.c +++ b/mm/hmm.c @@ -44,8 +44,10 @@ static int hmm_pfns_fill(unsigned long addr, unsigned long end, { unsigned long i = (addr - range->start) >> PAGE_SHIFT; - for (; addr < end; addr += PAGE_SIZE, i++) - range->hmm_pfns[i] = cpu_flags; + for (; addr < end; addr += PAGE_SIZE, i++) { + range->hmm_pfns[i] &= HMM_PFN_STICKY; + range->hmm_pfns[i] |= cpu_flags; + } return 0; } @@ -202,8 +204,10 @@ static int hmm_vma_handle_pmd(struct mm_walk *walk, unsigned long addr, return hmm_vma_fault(addr, end, required_fault, walk); pfn = pmd_pfn(pmd) + ((addr & ~PMD_MASK) >> PAGE_SHIFT); - for (i = 0; addr < end; addr += PAGE_SIZE, i++, pfn++) - hmm_pfns[i] = pfn | cpu_flags; + for (i = 0; addr < end; addr += PAGE_SIZE, i++, pfn++) { + hmm_pfns[i] &= HMM_PFN_STICKY; + hmm_pfns[i] |= pfn | cpu_flags; + } return 0; } #else /* CONFIG_TRANSPARENT_HUGEPAGE */ @@ -236,7 +240,7 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr, hmm_pte_need_fault(hmm_vma_walk, pfn_req_flags, 0); if (required_fault) goto fault; - *hmm_pfn = 0; + *hmm_pfn = *hmm_pfn & HMM_PFN_STICKY; return 0; } @@ -253,14 +257,14 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr, cpu_flags = HMM_PFN_VALID; if (is_writable_device_private_entry(entry)) cpu_flags |= HMM_PFN_WRITE; - *hmm_pfn = swp_offset_pfn(entry) | cpu_flags; + *hmm_pfn = (*hmm_pfn & HMM_PFN_STICKY) | swp_offset_pfn(entry) | cpu_flags; return 0; } required_fault = hmm_pte_need_fault(hmm_vma_walk, pfn_req_flags, 0); if (!required_fault) { - *hmm_pfn = 0; + *hmm_pfn = *hmm_pfn & HMM_PFN_STICKY; return 0; } @@ -304,11 +308,11 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr, pte_unmap(ptep); return -EFAULT; } - *hmm_pfn = HMM_PFN_ERROR; + *hmm_pfn = (*hmm_pfn & HMM_PFN_STICKY) | HMM_PFN_ERROR; return 0; } - *hmm_pfn = pte_pfn(pte) | cpu_flags; + *hmm_pfn = (*hmm_pfn & HMM_PFN_STICKY) | pte_pfn(pte) | cpu_flags; return 0; fault: @@ -453,8 +457,10 @@ static int hmm_vma_walk_pud(pud_t *pudp, unsigned long start, unsigned long end, } pfn = pud_pfn(pud) + ((addr & ~PUD_MASK) >> PAGE_SHIFT); - for (i = 0; i < npages; ++i, ++pfn) - hmm_pfns[i] = pfn | cpu_flags; + for (i = 0; i < npages; ++i, ++pfn) { + hmm_pfns[i] &= HMM_PFN_STICKY; + hmm_pfns[i] |= pfn | cpu_flags; + } goto out_unlock; } @@ -512,8 +518,10 @@ static int hmm_vma_walk_hugetlb_entry(pte_t *pte, unsigned long hmask, } pfn = pte_pfn(entry) + ((start & ~hmask) >> PAGE_SHIFT); - for (; addr < end; addr += PAGE_SIZE, i++, pfn++) - range->hmm_pfns[i] = pfn | cpu_flags; + for (; addr < end; addr += PAGE_SIZE, i++, pfn++) { + range->hmm_pfns[i] &= HMM_PFN_STICKY; + range->hmm_pfns[i] |= pfn | cpu_flags; + } spin_unlock(ptl); return 0; -- 2.44.0
next prev reply other threads:[~2024-03-05 10:15 UTC|newest] Thread overview: 35+ messages / expand[flat|nested] mbox.gz Atom feed top 2024-03-05 10:15 [RFC 00/16] Split IOMMU DMA mapping operation to two steps Leon Romanovsky 2024-03-05 10:15 ` Leon Romanovsky [this message] 2024-03-05 10:15 ` [RFC 02/16] dma-mapping: provide an interface to allocate IOVA Leon Romanovsky 2024-03-05 10:22 ` Leon Romanovsky 2024-03-05 10:15 ` [RFC 03/16] dma-mapping: provide callbacks to link/unlink pages to specific IOVA Leon Romanovsky 2024-03-05 10:22 ` Leon Romanovsky 2024-03-05 10:15 ` [RFC 04/16] iommu/dma: Provide an interface to allow preallocate IOVA Leon Romanovsky 2024-03-05 10:22 ` Leon Romanovsky 2024-03-05 10:15 ` [RFC 05/16] iommu/dma: Prepare map/unmap page functions to receive IOVA Leon Romanovsky 2024-03-05 10:22 ` Leon Romanovsky 2024-03-05 10:15 ` [RFC 06/16] iommu/dma: Implement link/unlink page callbacks Leon Romanovsky 2024-03-05 10:22 ` Leon Romanovsky 2024-03-05 10:15 ` [RFC 07/16] RDMA/umem: Preallocate and cache IOVA for UMEM ODP Leon Romanovsky 2024-03-05 10:22 ` Leon Romanovsky 2024-03-05 10:15 ` [RFC 08/16] RDMA/umem: Store ODP access mask information in PFN Leon Romanovsky 2024-03-05 10:22 ` Leon Romanovsky 2024-03-05 10:15 ` [RFC 09/16] RDMA/core: Separate DMA mapping to caching IOVA and page linkage Leon Romanovsky 2024-03-05 10:22 ` Leon Romanovsky 2024-03-05 10:15 ` [RFC 10/16] RDMA/umem: Prevent UMEM ODP creation with SWIOTLB Leon Romanovsky 2024-03-05 10:22 ` Leon Romanovsky 2024-03-05 10:15 ` [RFC 11/16] vfio/mlx5: Explicitly use number of pages instead of allocated length Leon Romanovsky 2024-03-05 10:22 ` Leon Romanovsky 2024-03-05 10:15 ` [RFC 12/16] vfio/mlx5: Rewrite create mkey flow to allow better code reuse Leon Romanovsky 2024-03-05 10:22 ` Leon Romanovsky 2024-03-05 10:15 ` [RFC 13/16] vfio/mlx5: Explicitly store page list Leon Romanovsky 2024-03-05 10:22 ` Leon Romanovsky 2024-03-05 10:15 ` [RFC 14/16] vfio/mlx5: Convert vfio to use DMA link API Leon Romanovsky 2024-03-05 10:22 ` Leon Romanovsky 2024-03-05 10:15 ` [RFC 15/16] block: add dma_link_range() based API Leon Romanovsky 2024-03-05 10:22 ` Leon Romanovsky 2024-03-05 10:15 ` [RFC 16/16] nvme-pci: use blk_rq_dma_map() for NVMe SGL Leon Romanovsky 2024-03-05 10:22 ` Leon Romanovsky 2024-03-05 10:22 ` [RFC 01/16] mm/hmm: let users to tag specific PFNs Leon Romanovsky 2024-03-05 10:22 ` [RFC 00/16] Split IOMMU DMA mapping operation to two steps Leon Romanovsky 2024-03-05 10:23 ` Leon Romanovsky
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=a77609c9c9a09214e38b04133e44eee67fe50ab0.1709631413.git.leon@kernel.org \ --to=leon@kernel.org \ --cc=akpm@linux-foundation.org \ --cc=alex.williamson@redhat.com \ --cc=amir73il@gmail.com \ --cc=axboe@kernel.dk \ --cc=bvanassche@acm.org \ --cc=chaitanyak@nvidia.com \ --cc=corbet@lwn.net \ --cc=damien.lemoal@opensource.wdc.com \ --cc=dan.j.williams@intel.com \ --cc=daniel@iogearbox.net \ --cc=hch@lst.de \ --cc=iommu@lists.linux.dev \ --cc=jack@suse.com \ --cc=jgg@ziepe.ca \ --cc=jglisse@redhat.com \ --cc=joro@8bytes.org \ --cc=josef@toxicpanda.com \ --cc=kbusch@kernel.org \ --cc=kevin.tian@intel.com \ --cc=kvm@vger.kernel.org \ --cc=leonro@nvidia.com \ --cc=linux-block@vger.kernel.org \ --cc=linux-doc@vger.kernel.org \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-mm@kvack.org \ --cc=linux-nvme@lists.infradead.org \ --cc=linux-rdma@vger.kernel.org \ --cc=m.szyprowski@samsung.com \ --cc=martin.petersen@oracle.com \ --cc=robin.murphy@arm.com \ --cc=sagi@grimberg.me \ --cc=shameerali.kolothum.thodi@huawei.com \ --cc=will@kernel.org \ --cc=yishaih@nvidia.com \ --cc=zyjzyj2000@gmail.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).