Linux-mm Archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH 4/4] mm: Make most walk page paths with pmd_trans_unstable() to retry
       [not found]       ` <CAHbLzkq-dE4B5k+4KV5YtSJRXf+x61V8iBte6Z=Afbh=_oCJtw@mail.gmail.com>
@ 2023-06-06 19:59         ` Peter Xu
  0 siblings, 0 replies; 6+ messages in thread
From: Peter Xu @ 2023-06-06 19:59 UTC (permalink / raw
  To: Yang Shi
  Cc: linux-kernel, linux-mm, David Hildenbrand, Alistair Popple,
	Andrew Morton, Andrea Arcangeli, Kirill A . Shutemov,
	Johannes Weiner, John Hubbard, Naoya Horiguchi,
	Muhammad Usama Anjum, Hugh Dickins, Mike Rapoport

On Tue, Jun 06, 2023 at 12:12:03PM -0700, Yang Shi wrote:
> > > > @@ -1539,8 +1544,10 @@ static int pagemap_pmd_range(pmd_t *pmdp, unsigned long addr, unsigned long end,
> > > >                 return err;
> > > >         }
> > > >
> > > > -       if (pmd_trans_unstable(pmdp))
> > > > +       if (pmd_trans_unstable(pmdp)) {
> > > > +               walk->action = ACTION_AGAIN;
> > > >                 return 0;
> > >
> > > Had a quick look at the pagemap code, I agree with your analysis,
> > > "returning 0" may mess up pagemap, retry should be fine. But I'm
> > > wondering whether we should just fill in empty entries. Anyway I don't
> > > have a  strong opinion on this, just a little bit concerned by
> > > potential indefinite retry.
> >
> > Yes, none pte is still an option.  But if we're going to fix this anyway,
> > it seems better to fix it with the accurate new thing that poped up, and
> > it's even less change (just apply walk->action rather than doing random
> > stuff in different call sites).
> >
> > I see that you have worry on deadloop over this, so I hope to discuss
> > altogether here.
> >
> > Unlike normal checks, pmd_trans_unstable() check means something must have
> > changed in the past very short period or it should just never if nothing
> > changed concurrently from under us, so it's not a "if (flag==true)" check
> > which is even more likely to loop.
> >
> > If we see the places that I didn't touch, most of them suggested a retry in
> > one form or another.  So if there's a worry this will also not the first
> > time to do a retry (and for such a "unstable" API, that's really the most
> > natural thing to do which is to retry until it's stable).
> 
> IIUC other than do_anonymous_page() suggests retry (retry page fault),
> others may not, for example:
>   - follow_pmd_mask: return -EBUSY

I assumed a -EBUSY would mean if the caller still needs the page it'll just
need to retry.

It's actually a very rare errno to return in follow page... e.g. some gup
callers may not even be able to handle -EBUSY afaiu, neither does the gup
core (__get_user_pages), afaiu it'll just forwarded upwards.

My bet is it's just so rare and only used with FOLL_SPLIT_PMD for now.  I
had a quick look, at least kvm handles -EBUSY as a real fault (hva_to_pfn,
where it should translate that -EBUSY into a KVM_PFN_ERR_FAULT), I think
it'll crash the hypervisor directly if returned from gup one day (not for
now if never with !FOLL_SPLIT_PMD)..

>   - wp_clean_pmd_entry: actually just retry for pmd_none case, but the
> pagewalk code does handle pmd_none by skipping it, so it basically
> just retry once

This one is very special IMHO, pmd_trans_unstable() should in most cases be
used after someone checked pmd value before walking ptes.  I had a feeling
it's kind of abused, to check whether it's a huge pmd in whatever format..
IMHO it should just use the other pmd_*() helpers but I won't touch it in
this series.

>   - min_core_pte_range: treated as unmapped range by calling
> __mincore_unmapped_range

Correct.

Also pmd_devmap_trans_unstable(), called in 3 call sites:

pmd_devmap_trans_unstable[1418] return pmd_devmap(*pmd) || pmd_trans_unstable(pmd);
    filemap_map_pmd[3423]          if (pmd_devmap_trans_unstable(vmf->pmd)) {
    finish_fault[4390]             if (pmd_devmap_trans_unstable(vmf->pmd))
    handle_pte_fault[4922]         if (pmd_devmap_trans_unstable(vmf->pmd))

They all suggest a retry on the page faults, afaiu.

So indeed not all of them retries, but I doubt those ones that are not and
whether that's the best we should have.  Again, I'll leave that out of this
series.

> 
> Anyway I really don't have a strong opinion on this. I may be just
> over-concerned. I just thought if nobody cares whether the result is
> accurate or not, why do we bother fixing those cases?

Because anyway we're already at it and it's easier and cleaner to do so? :)

I would say if to fix this I think the most important ones are pagemap and
memcg paths so far, but since I'm at it and anyway I checked all of them, I
figured maybe I should just make everywhere do right and in the same
pattern when handling unstable pmd. Especially, what this patch touched are
all using walk_page*() routines (I left special ones in first patches), so
it's quite straightforward IMHO to switch altogether using ACTION_AGAIN.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 0/4] mm: Fix pmd_trans_unstable() call sites on retry
       [not found] <20230602230552.350731-1-peterx@redhat.com>
       [not found] ` <20230602230552.350731-5-peterx@redhat.com>
@ 2023-06-07 13:49 ` Peter Xu
  2023-06-07 15:45   ` David Hildenbrand
  1 sibling, 1 reply; 6+ messages in thread
From: Peter Xu @ 2023-06-07 13:49 UTC (permalink / raw
  To: linux-kernel, linux-mm
  Cc: David Hildenbrand, Alistair Popple, Andrew Morton,
	Andrea Arcangeli, Kirill A . Shutemov, Johannes Weiner,
	John Hubbard, Naoya Horiguchi, Muhammad Usama Anjum, Hugh Dickins,
	Mike Rapoport

On Fri, Jun 02, 2023 at 07:05:48PM -0400, Peter Xu wrote:
> Please have a look, thanks.

Hello, all,

This one seems to have more or less conflict with Hugh's rework on pmd
collapse.  Please hold off review or merging until I prepare another one
(probably based on Hugh's, after I have a closer read).

Sorry for the noise.

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 0/4] mm: Fix pmd_trans_unstable() call sites on retry
  2023-06-07 13:49 ` [PATCH 0/4] mm: Fix pmd_trans_unstable() call sites on retry Peter Xu
@ 2023-06-07 15:45   ` David Hildenbrand
  2023-06-07 16:21     ` Peter Xu
  0 siblings, 1 reply; 6+ messages in thread
From: David Hildenbrand @ 2023-06-07 15:45 UTC (permalink / raw
  To: Peter Xu, linux-kernel, linux-mm
  Cc: Alistair Popple, Andrew Morton, Andrea Arcangeli,
	Kirill A . Shutemov, Johannes Weiner, John Hubbard,
	Naoya Horiguchi, Muhammad Usama Anjum, Hugh Dickins,
	Mike Rapoport

On 07.06.23 15:49, Peter Xu wrote:
> On Fri, Jun 02, 2023 at 07:05:48PM -0400, Peter Xu wrote:
>> Please have a look, thanks.
> 
> Hello, all,
> 
> This one seems to have more or less conflict with Hugh's rework on pmd
> collapse.  Please hold off review or merging until I prepare another one
> (probably based on Hugh's, after I have a closer read).
> 
> Sorry for the noise.
> 

[did not have time to look yet]

Are there any fixes buried in there that we'd like to have in earlier? I 
skimmed over the patches and all read like "cleanup" + "consistency", 
correct?

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 0/4] mm: Fix pmd_trans_unstable() call sites on retry
  2023-06-07 15:45   ` David Hildenbrand
@ 2023-06-07 16:21     ` Peter Xu
  2023-06-07 16:39       ` Yang Shi
  0 siblings, 1 reply; 6+ messages in thread
From: Peter Xu @ 2023-06-07 16:21 UTC (permalink / raw
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, Alistair Popple, Andrew Morton,
	Andrea Arcangeli, Kirill A . Shutemov, Johannes Weiner,
	John Hubbard, Naoya Horiguchi, Muhammad Usama Anjum, Hugh Dickins,
	Mike Rapoport

On Wed, Jun 07, 2023 at 05:45:28PM +0200, David Hildenbrand wrote:
> On 07.06.23 15:49, Peter Xu wrote:
> > On Fri, Jun 02, 2023 at 07:05:48PM -0400, Peter Xu wrote:
> > > Please have a look, thanks.
> > 
> > Hello, all,
> > 
> > This one seems to have more or less conflict with Hugh's rework on pmd
> > collapse.  Please hold off review or merging until I prepare another one
> > (probably based on Hugh's, after I have a closer read).
> > 
> > Sorry for the noise.
> > 
> 
> [did not have time to look yet]
> 
> Are there any fixes buried in there that we'd like to have in earlier? I
> skimmed over the patches and all read like "cleanup" + "consistency",
> correct?

There are bug fixes when unluckily hitting unstable pmd I think, these ones
worth mentioning:

  - pagemap can be broken, causing read to be shifted over to the next
    (wrong data read)

  - memcg wrong accounting, e.g., moving one task from memcg1 to memcg2, we
    can skip an unstable pmd while it could quickly contain something that
    can belong to memcg1, I think.  This one needs some eyes from memcg
    developers.

I don't rush on having them because these are all theoretical and no bug
report I saw, no reproducer I wrote, only observed by my eyes.

At least the pagemap issue should have been there for 10+ years without
being noticed even if rightfully spot this time.  Meanwhile this seems to
have conflict with Hugh's series which should have been posted earlier - I
still need to check on how that will affect this series, but not yet.

Said that, let me know if any of you hit any (potential) issue with above
or think that we should to move this in earlier.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 0/4] mm: Fix pmd_trans_unstable() call sites on retry
  2023-06-07 16:21     ` Peter Xu
@ 2023-06-07 16:39       ` Yang Shi
  2023-06-07 18:22         ` Peter Xu
  0 siblings, 1 reply; 6+ messages in thread
From: Yang Shi @ 2023-06-07 16:39 UTC (permalink / raw
  To: Peter Xu
  Cc: David Hildenbrand, linux-kernel, linux-mm, Alistair Popple,
	Andrew Morton, Andrea Arcangeli, Kirill A . Shutemov,
	Johannes Weiner, John Hubbard, Naoya Horiguchi,
	Muhammad Usama Anjum, Hugh Dickins, Mike Rapoport

On Wed, Jun 7, 2023 at 9:21 AM Peter Xu <peterx@redhat.com> wrote:
>
> On Wed, Jun 07, 2023 at 05:45:28PM +0200, David Hildenbrand wrote:
> > On 07.06.23 15:49, Peter Xu wrote:
> > > On Fri, Jun 02, 2023 at 07:05:48PM -0400, Peter Xu wrote:
> > > > Please have a look, thanks.
> > >
> > > Hello, all,
> > >
> > > This one seems to have more or less conflict with Hugh's rework on pmd
> > > collapse.  Please hold off review or merging until I prepare another one
> > > (probably based on Hugh's, after I have a closer read).
> > >
> > > Sorry for the noise.
> > >
> >
> > [did not have time to look yet]
> >
> > Are there any fixes buried in there that we'd like to have in earlier? I
> > skimmed over the patches and all read like "cleanup" + "consistency",
> > correct?
>
> There are bug fixes when unluckily hitting unstable pmd I think, these ones
> worth mentioning:
>
>   - pagemap can be broken, causing read to be shifted over to the next
>     (wrong data read)

Yes, it may corrupt the pagemap data. But anyway it seems like nobody
was busted by this one as you said.

>
>   - memcg wrong accounting, e.g., moving one task from memcg1 to memcg2, we
>     can skip an unstable pmd while it could quickly contain something that
>     can belong to memcg1, I think.  This one needs some eyes from memcg
>     developers.

I don't think this is an important thing. There are plenty of other
conditions that could make the accounting inaccurate, for example,
isolating page from LRU fails, force charge, etc. And it seems like
nobody was bothered by this either.

>
> I don't rush on having them because these are all theoretical and no bug
> report I saw, no reproducer I wrote, only observed by my eyes.
>
> At least the pagemap issue should have been there for 10+ years without
> being noticed even if rightfully spot this time.  Meanwhile this seems to
> have conflict with Hugh's series which should have been posted earlier - I
> still need to check on how that will affect this series, but not yet.
>
> Said that, let me know if any of you hit any (potential) issue with above
> or think that we should to move this in earlier.
>
> Thanks,
>
> --
> Peter Xu
>
>


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 0/4] mm: Fix pmd_trans_unstable() call sites on retry
  2023-06-07 16:39       ` Yang Shi
@ 2023-06-07 18:22         ` Peter Xu
  0 siblings, 0 replies; 6+ messages in thread
From: Peter Xu @ 2023-06-07 18:22 UTC (permalink / raw
  To: Yang Shi
  Cc: David Hildenbrand, linux-kernel, linux-mm, Alistair Popple,
	Andrew Morton, Andrea Arcangeli, Kirill A . Shutemov,
	Johannes Weiner, John Hubbard, Naoya Horiguchi,
	Muhammad Usama Anjum, Hugh Dickins, Mike Rapoport

On Wed, Jun 07, 2023 at 09:39:44AM -0700, Yang Shi wrote:
> I don't think this is an important thing. There are plenty of other
> conditions that could make the accounting inaccurate, for example,
> isolating page from LRU fails, force charge, etc. And it seems like
> nobody was bothered by this either.

Yes, I read that a bit more and I agree.  So let me summarize after I read
Hugh's series just now..

With the pre-requisite of the new __pte_map_offset() that Hugh proposed
here:

[PATCH 04/31] mm/pgtable: allow pte_offset_map[_lock]() to fail
https://lore.kernel.org/r/8218ffdc-8be-54e5-0a8-83f5542af283@google.com

We should not need pmd_trans_unstable() anymore as Hugh pointed out, which
I fully agree.  I think Hugh has covered all the issues that this series
wanted to address alongside, namely:

  Patch 1 (mprotect) is covered in:

  [PATCH 18/31] mm/mprotect: delete pmd_none_or_clear_bad_unless_trans_huge()
  https://lore.kernel.org/r/4a834932-9064-9ed7-3cd1-99466f549486@google.com

  No way to move spinlock outside, so one -EAGAIN needed which makes sense.

  Patch 2 (migrate_device) is covered in:

  [PATCH 24/31] mm/migrate_device: allow pte_offset_map_lock() to fail
  https://lore.kernel.org/r/ea51bb69-189c-229b-fc0-9d3e7be5d6b@google.com

  By a direct retry, and more code unified so even better.

  Patch 3 () is covered in:

  [PATCH 19/31] mm/mremap: retry if either pte_offset_map_*lock() fails
  https://lore.kernel.org/r/2d3fbfea-5884-8211-0cc-954afe25ae9c@google.com

  Instead of WARN_ON_ONCE(), it got dropped which looks all good, too.

  Most of patch 4's changes are done in:

  [PATCH 09/31] mm/pagewalkers: ACTION_AGAIN if pte_offset_map_lock() fails
  https://lore.kernel.org/r/6265ac58-6018-a8c6-cf38-69cba698471@google.com

  There're some different handling on memcg changes, where in Hugh's
  series it was put separately here:

  [PATCH 17/31] mm/various: give up if pte_offset_map[_lock]() fails
  https://lore.kernel.org/r/c299eba-4e17-c645-1115-ccd1fd9956bd@google.com

  After double check, I agree with Yang's comment and Hugh's approach,
  that no retry is needed for memcg.

Let's ignore this series, not needed anymore.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2023-06-07 18:22 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20230602230552.350731-1-peterx@redhat.com>
     [not found] ` <20230602230552.350731-5-peterx@redhat.com>
     [not found]   ` <CAHbLzkp_tzN8SZVeWTKxtMAnFSzUvk2064KFg3quj=raOSHPrA@mail.gmail.com>
     [not found]     ` <ZH41YzZ0DBoF8csH@x1n>
     [not found]       ` <CAHbLzkq-dE4B5k+4KV5YtSJRXf+x61V8iBte6Z=Afbh=_oCJtw@mail.gmail.com>
2023-06-06 19:59         ` [PATCH 4/4] mm: Make most walk page paths with pmd_trans_unstable() to retry Peter Xu
2023-06-07 13:49 ` [PATCH 0/4] mm: Fix pmd_trans_unstable() call sites on retry Peter Xu
2023-06-07 15:45   ` David Hildenbrand
2023-06-07 16:21     ` Peter Xu
2023-06-07 16:39       ` Yang Shi
2023-06-07 18:22         ` Peter Xu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).