cgroups.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Huang, Ying" <ying.huang@intel.com>
To: Yuanchu Xie <yuanchu@google.com>
Cc: David Hildenbrand <david@redhat.com>,
	 "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>,
	 Khalid Aziz <khalid.aziz@oracle.com>,
	Henry Huang <henry.hj@antgroup.com>,  Yu Zhao <yuzhao@google.com>,
	 Dan Williams <dan.j.williams@intel.com>,
	 Gregory Price <gregory.price@memverge.com>,
	 Wei Xu <weixugc@google.com>,
	 David Rientjes <rientjes@google.com>,
	 Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	 "Rafael J. Wysocki" <rafael@kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	 Johannes Weiner <hannes@cmpxchg.org>,
	 Michal Hocko <mhocko@kernel.org>,
	 Roman Gushchin <roman.gushchin@linux.dev>,
	 Muchun Song <muchun.song@linux.dev>,
	 Shuah Khan <shuah@kernel.org>,
	 Yosry Ahmed <yosryahmed@google.com>,
	 Matthew Wilcox <willy@infradead.org>,
	 Sudarshan Rajagopalan <quic_sudaraja@quicinc.com>,
	 Kairui Song <kasong@tencent.com>,
	 "Michael S. Tsirkin" <mst@redhat.com>,
	 Vasily Averin <vasily.averin@linux.dev>,
	Nhat Pham <nphamcs@gmail.com>,  Miaohe Lin <linmiaohe@huawei.com>,
	 Qi Zheng <zhengqi.arch@bytedance.com>,
	 Abel Wu <wuyun.abel@bytedance.com>,
	"Vishal Moola (Oracle)" <vishal.moola@gmail.com>,
	 Kefeng Wang <wangkefeng.wang@huawei.com>,
	 linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	 cgroups@vger.kernel.org, linux-kselftest@vger.kernel.org
Subject: Re: [RFC PATCH v3 1/8] mm: multi-gen LRU: ignore non-leaf pmd_young for force_scan=true
Date: Wed, 10 Apr 2024 14:15:18 +0800	[thread overview]
Message-ID: <87plux68w9.fsf@yhuang6-desk2.ccr.corp.intel.com> (raw)
In-Reply-To: <CAJj2-QEczZzon8AhO32_B=D2MAZG+1YWp0yrgSKQOChjQnN1OA@mail.gmail.com> (Yuanchu Xie's message of "Tue, 9 Apr 2024 15:36:04 -0700")

Yuanchu Xie <yuanchu@google.com> writes:

> On Mon, Apr 8, 2024 at 11:52 PM Huang, Ying <ying.huang@intel.com> wrote:
>>
>> Yuanchu Xie <yuanchu@google.com> writes:
>>
>> > When non-leaf pmd accessed bits are available, MGLRU page table walks
>> > can clear the accessed bit and promptly ignore the accessed bit on the
>> > pte because it's on a different node, so the walk does not update the
>> > generation of said page. When the next scan comes around on the right
>> > node, the non-leaf pmd accessed bit might remain cleared and the pte
>> > accessed bits won't be checked. While this is sufficient for
>> > reclaim-driven aging, where the goal is to select a reasonably cold
>> > page, the access can be missed when aging proactively for measuring the
>> > working set size of a node/memcg.
>> >
>> > Since force_scan disables various other optimizations, we check
>> > force_scan to ignore the non-leaf pmd accessed bit.
>> >
>> > Signed-off-by: Yuanchu Xie <yuanchu@google.com>
>> > ---
>> >  mm/vmscan.c | 2 +-
>> >  1 file changed, 1 insertion(+), 1 deletion(-)
>> >
>> > diff --git a/mm/vmscan.c b/mm/vmscan.c
>> > index 4f9c854ce6cc..1a7c7d537db6 100644
>> > --- a/mm/vmscan.c
>> > +++ b/mm/vmscan.c
>> > @@ -3522,7 +3522,7 @@ static void walk_pmd_range(pud_t *pud, unsigned long start, unsigned long end,
>> >
>> >               walk->mm_stats[MM_NONLEAF_TOTAL]++;
>> >
>> > -             if (should_clear_pmd_young()) {
>> > +             if (!walk->force_scan && should_clear_pmd_young()) {
>> >                       if (!pmd_young(val))
>> >                               continue;
>>
>> Sorry, I don't understand why we need this.  If !pmd_young(val), we
>> don't need to update the generation.  If pmd_young(val), the bloom
>> filter will be ignored if force_scan == true.  Or do I miss something?
> If !pmd_young(val), we still might need to update the generation.
>
> The get_pfn_folio function returns NULL if the folio's nid != node
> under scanning,
> so the pte accessed bit does not get cleared and the generation is not updated.
> Now the pmd_young flag of this pmd is cleared, and if none of the
> pte's are accessed
> before another round of scanning occurs on the folio's node, the pmd_young check
> fails and the pte accessed bit is skipped.
>
> This is fine for kswapd but can introduce inaccuracies when scanning
> proactively for
> workingset estimation.

Got it!  Thanks for detailed explanation.  Can you give more details in
patch description too?

It's unfortunate because PMD young checking helps scanning performance
much.  It's unnecessary to be done in this patchset, but I hope we can
find some way to get it back at some time.

--
Best Regards,
Huang, Ying

  reply	other threads:[~2024-04-10  6:17 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-27 21:30 [RFC PATCH v3 0/8] mm: workingset reporting Yuanchu Xie
2024-03-27 21:31 ` [RFC PATCH v3 1/8] mm: multi-gen LRU: ignore non-leaf pmd_young for force_scan=true Yuanchu Xie
2024-04-09  6:50   ` Huang, Ying
2024-04-09 22:36     ` Yuanchu Xie
2024-04-10  6:15       ` Huang, Ying [this message]
2024-03-27 21:31 ` [RFC PATCH v3 2/8] mm: aggregate working set information into histograms Yuanchu Xie
2024-04-09  7:18   ` Huang, Ying
2024-03-27 21:31 ` [RFC PATCH v3 3/8] mm: use refresh interval to rate-limit workingset report aggregation Yuanchu Xie
2024-03-27 21:31 ` [RFC PATCH v3 4/8] mm: report workingset during memory pressure driven scanning Yuanchu Xie
2024-03-27 21:31 ` [RFC PATCH v3 5/8] mm: extend working set reporting to memcgs Yuanchu Xie
2024-03-27 21:31 ` [RFC PATCH v3 6/8] mm: add per-memcg reaccess histogram Yuanchu Xie
2024-03-27 21:31 ` [RFC PATCH v3 7/8] mm: add kernel aging thread for workingset reporting Yuanchu Xie
2024-03-27 21:31 ` [RFC PATCH v3 8/8] mm: test system-wide " Yuanchu Xie
2024-03-29 19:43   ` Muhammad Usama Anjum
2024-03-27 21:44 ` [RFC PATCH v3 0/8] mm: " Gregory Price
2024-03-27 22:53   ` Yuanchu Xie
2024-03-29 17:28     ` Gregory Price

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87plux68w9.fsf@yhuang6-desk2.ccr.corp.intel.com \
    --to=ying.huang@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=aneesh.kumar@linux.ibm.com \
    --cc=cgroups@vger.kernel.org \
    --cc=dan.j.williams@intel.com \
    --cc=david@redhat.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=gregory.price@memverge.com \
    --cc=hannes@cmpxchg.org \
    --cc=henry.hj@antgroup.com \
    --cc=kasong@tencent.com \
    --cc=khalid.aziz@oracle.com \
    --cc=linmiaohe@huawei.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=mst@redhat.com \
    --cc=muchun.song@linux.dev \
    --cc=nphamcs@gmail.com \
    --cc=quic_sudaraja@quicinc.com \
    --cc=rafael@kernel.org \
    --cc=rientjes@google.com \
    --cc=roman.gushchin@linux.dev \
    --cc=shuah@kernel.org \
    --cc=vasily.averin@linux.dev \
    --cc=vishal.moola@gmail.com \
    --cc=wangkefeng.wang@huawei.com \
    --cc=weixugc@google.com \
    --cc=willy@infradead.org \
    --cc=wuyun.abel@bytedance.com \
    --cc=yosryahmed@google.com \
    --cc=yuanchu@google.com \
    --cc=yuzhao@google.com \
    --cc=zhengqi.arch@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).