All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>,
	Chris Mason <clm@fb.com>, Josef Bacik <josef@toxicpanda.com>,
	David Sterba <dsterba@suse.com>
Cc: Linux List Kernel Mailing <linux-kernel@vger.kernel.org>,
	Linux Memory Management List <linux-mm@kvack.org>,
	Matthew Wilcox <willy@infradead.org>,
	linux-btrfs <linux-btrfs@vger.kernel.org>
Subject: Re: 6.9/BUG: Bad page state in process kswapd0 pfn:d6e840
Date: Wed, 29 May 2024 08:57:48 +0200	[thread overview]
Message-ID: <ff29f723-32de-421b-a65e-7b7d2e03162d@redhat.com> (raw)
In-Reply-To: <209ff705-fe6e-4d6d-9d08-201afba7d74b@redhat.com>

On 28.05.24 16:24, David Hildenbrand wrote:
> Am 28.05.24 um 15:57 schrieb David Hildenbrand:
>> Am 28.05.24 um 08:05 schrieb Mikhail Gavrilov:
>>> On Thu, May 23, 2024 at 12:05 PM Mikhail Gavrilov
>>> <mikhail.v.gavrilov@gmail.com> wrote:
>>>>
>>>> On Thu, May 9, 2024 at 10:50 PM David Hildenbrand <david@redhat.com> wrote:
>>>>
>>>> The only known workload that causes this is updating a large
>>>> container. Unfortunately, not every container update reproduces the
>>>> problem.
>>>
>>> Is it possible to add more debugging information to make it clearer
>>> what's going on?
>>
>> If we knew who originally allocated that problematic page, that might help.
>> Maybe page_owner could give some hints?
>>
>>>
>>> BUG: Bad page state in process kcompactd0  pfn:605811
>>> page: refcount:0 mapcount:0 mapping:0000000082d91e3e index:0x1045efc4f
>>> pfn:0x605811
>>> aops:btree_aops ino:1
>>> flags:
>>> 0x17ffffc600020c(referenced|uptodate|workingset|node=0|zone=2|lastcpupid=0x1fffff)
>>> raw: 0017ffffc600020c dead000000000100 dead000000000122 ffff888159075220
>>> raw: 00000001045efc4f 0000000000000000 00000000ffffffff 0000000000000000
>>> page dumped because: non-NULL mapping
>>
>> Seems to be an order-0 page, otherwise we would have another "head: ..." report.
>>
>> It's not an anon/ksm/non-lru migration folio, because we clear the page->mapping
>> field for them manually on the page freeing path. Likely it's a pagecache folio.
>>
>> So one option is that something seems to not properly set folio->mapping to
>> NULL. But that problem would then also show up without page migration? Hmm.
>>
>>> Hardware name: ASUS System Product Name/ROG STRIX B650E-I GAMING WIFI,
>>> BIOS 2611 04/07/2024
>>> Call Trace:
>>>    <TASK>
>>>    dump_stack_lvl+0x84/0xd0
>>>    bad_page.cold+0xbe/0xe0
>>>    ? __pfx_bad_page+0x10/0x10
>>>    ? page_bad_reason+0x9d/0x1f0
>>>    free_unref_page+0x838/0x10e0
>>>    __folio_put+0x1ba/0x2b0
>>>    ? __pfx___folio_put+0x10/0x10
>>>    ? __pfx___might_resched+0x10/0x10
>>
>> I suspect we come via
>>       migrate_pages_batch()->migrate_folio_unmap()->migrate_folio_done().
>>
>> Maybe this is the "Folio was freed from under us. So we are done." path
>> when "folio_ref_count(src) == 1".
>>
>> Alternatively, we might come via
>>       migrate_pages_batch()->migrate_folio_move()->migrate_folio_done().
>>
>> For ordinary migration, move_to_new_folio() will clear src->mapping if
>> the folio was migrated successfully. That's the very first thing that
>> migrate_folio_move() does, so I doubt that is the problem.
>>
>> So I suspect we are in the migrate_folio_unmap() path. But for
>> a !anon folio, who should be freeing the folio concurrently (and not clearing
>> folio->mapping?)? After all, we have to hold the folio lock while migrating.
>>
>> In khugepaged:collapse_file() we manually set folio->mapping = NULL, before
>> dropping the reference.
>>
>> Something to try might be (to see if the problem goes away).
>>
>> diff --git a/mm/migrate.c b/mm/migrate.c
>> index dd04f578c19c..45e92e14c904 100644
>> --- a/mm/migrate.c
>> +++ b/mm/migrate.c
>> @@ -1124,6 +1124,13 @@ static int migrate_folio_unmap(new_folio_t get_new_folio,
>>                   /* Folio was freed from under us. So we are done. */
>>                   folio_clear_active(src);
>>                   folio_clear_unevictable(src);
>> +               /*
>> +                * Anonymous and movable src->mapping will be cleared by
>> +                * free_pages_prepare so don't reset it here for keeping
>> +                * the type to work PageAnon, for example.
>> +                */
>> +               if (!folio_mapping_flags(src))
>> +                       src->mapping = NULL;
>>                   /* free_pages_prepare() will clear PG_isolated. */
>>                   list_del(&src->lru);
>>                   migrate_folio_done(src, reason);
>>
>> But it does feel weird: who freed the page concurrently and didn't clear
>> folio->mapping ...
>>
>> We don't hold the folio lock of src, though, but have the only reference. So
>> another possible thing might be folio refcount mis-counting: folio_ref_count()
>> == 1 but there are other references (e.g., from the pagecache).
> 
> Hmm, your original report mentions kswapd, so I'm getting the feeling someone
> does one folio_put() too much and we are freeing a pageache folio that is still
> in the pageache and, therefore, has folio->mapping set ... bisecting would
> really help.
> 

A little bird just told me that I missed an important piece in the dmesg 
output: "aops:btree_aops ino:1" from dump_mapping():

This is btrfs, i_ino is 1, and we don't have a dentry. Is that 
BTRFS_BTREE_INODE_OBJECTID?

Summarizing what we know so far:
(1) Freeing an order-0 btrfs folio where folio->mapping
     is still set
(2) Triggered by kswapd and kcompactd; not triggered by other means of
     page freeing so far

Possible theories:
(A) folio->mapping not cleared when freeing the folio. But shouldn't
     this also happen on other freeing paths? Or are we simply lucky to
     never trigger that for that folio?
(B) Messed-up refcounting: freeing a folio that is still in use (and
     therefore has folio-> mapping still set)

I was briefly wondering if large folio splitting could be involved.

CCing btrfs maintainers.

-- 
Cheers,

David / dhildenb


  reply	other threads:[~2024-05-29  6:57 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-18  9:55 6.9/BUG: Bad page state in process kswapd0 pfn:d6e840 Mikhail Gavrilov
2024-05-08 10:16 ` Mikhail Gavrilov
2024-05-08 17:45   ` David Hildenbrand
2024-05-09 11:59     ` Mikhail Gavrilov
2024-05-09 17:50       ` David Hildenbrand
2024-05-23  7:05         ` Mikhail Gavrilov
2024-05-28  6:05           ` Mikhail Gavrilov
2024-05-28 13:57             ` David Hildenbrand
2024-05-28 14:24               ` David Hildenbrand
2024-05-29  6:57                 ` David Hildenbrand [this message]
2024-05-29 19:00                   ` David Sterba
2024-05-29 22:37                   ` Qu Wenruo
2024-05-30  5:26                     ` Qu Wenruo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ff29f723-32de-421b-a65e-7b7d2e03162d@redhat.com \
    --to=david@redhat.com \
    --cc=clm@fb.com \
    --cc=dsterba@suse.com \
    --cc=josef@toxicpanda.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mikhail.v.gavrilov@gmail.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.