From: Naoya Horiguchi <naoya.horiguchi@linux.dev>
To: "Luck, Tony" <tony.luck@intel.com>
Cc: "HORIGUCHI NAOYA(堀口 直也)" <naoya.horiguchi@nec.com>,
"Naoya Horiguchi" <nao.horiguchi@gmail.com>,
"Oscar Salvador" <osalvador@suse.de>,
"Muchun Song" <songmuchun@bytedance.com>,
"Mike Kravetz" <mike.kravetz@oracle.com>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
"Andrew Morton" <akpm@linux-foundation.org>,
"Michal Hocko" <mhocko@suse.com>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v6 1/2] mm,hwpoison: fix race with hugetlb page allocation
Date: Tue, 17 Aug 2021 02:12:07 +0900 [thread overview]
Message-ID: <20210816171207.GA2239284@u2004> (raw)
In-Reply-To: <96d4fd8b75e44a6c970e4d9530980f21@intel.com>
On Fri, Aug 13, 2021 at 03:07:20PM +0000, Luck, Tony wrote:
> I'm running the default case from my einj_mem_uc test. That just:
>
> 1) allocates a page using:
>
> mmap(NULL, pagesize, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_ANON, -1, 0);
>
> 2) fills the page with random data (to make sure it has been allocated, and that the kernel can't
> do KSM tricks to share this physical page with some other user).
>
> 3) injects the error at a 1KB offset within the page.
>
> 4) does a memory read of the poison address.
>
>
> > action_result(pfn, MF_MSG_UNKNOWN, MF_IGNORED);
> > + dump_page(p, "hwpoison unknown page");
> > res = -EBUSY;
> > goto unlock_mutex;
> > }
>
> I added that patch against upstream (v5.14-rc5). Here's the dump. The "pfn" matches the physical address where I injected,
> and it has the hwpoison flag bit that was set early in memory_failure() --- so this is the right page.
>
> [ 79.368212] Memory failure: 0x623889: recovery action for unknown page: Ignored
> [ 79.375525] page:0000000065ad9479 refcount:3 mapcount:1 mapping:00000000a4ac843b index:0x0 pfn:0x623889
> [ 79.384909] memcg:ff40a569f2966000
> [ 79.388313] aops:shmem_aops ino:4c00 dentry name:"dev/zero"
> [ 79.393896] flags: 0x17ffffc088000c(uptodate|dirty|swapbacked|hwpoison|node=0|zone=2|lastcpupid=0x1fffff)
> [ 79.403455] raw: 0017ffffc088000c 0000000000000000 dead000000000122 ff40a569f45a7160
> [ 79.411191] raw: 0000000000000000 0000000000000000 0000000300000000 ff40a569f2966000
> [ 79.418931] page dumped because: hwpoison unknown page
Thank you for your help.
This dump indicates that HWPoisonHandlable() returned false due to
the lack of PG_lru flag. In older code before 5.13, get_any_page() does
retry with shake_page(), but does not since 5.13, which seems to me
the root cause of the issue. So my suggestion is to call shake_page()
when HWPoisonHandlable() is false.
Could you try checking that the following diff fixes the issue?
I could still have better fix (like inserting shake_page() to other
retry paths in get_any_page()), but the below is the minimum one.
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 76cc53b2999a..3e770e4f259e 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1146,7 +1146,7 @@ static int __get_hwpoison_page(struct page *page)
* unexpected races caused by taking a page refcount.
*/
if (!HWPoisonHandlable(head))
- return 0;
+ return -EBUSY;
if (PageTransHuge(head)) {
/*
@@ -1199,9 +1199,14 @@ static int get_any_page(struct page *p, unsigned long flags)
}
goto out;
} else if (ret == -EBUSY) {
- /* We raced with freeing huge page to buddy, retry. */
- if (pass++ < 3)
+ /*
+ * We raced with (possibly temporary) unhandlable
+ * page, retry.
+ */
+ if (pass++ < 3) {
+ shake_page(p, 1);
goto try_again;
+ }
goto out;
}
}
Thanks,
Naoya Horiguchi
next prev parent reply other threads:[~2021-08-16 17:12 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-06-03 23:36 [PATCH v6 0/2] hwpoison: fix race with hugetlb page allocation Naoya Horiguchi
2021-06-03 23:36 ` [PATCH v6 1/2] mm,hwpoison: " Naoya Horiguchi
2021-06-04 23:55 ` Mike Kravetz
2021-08-12 4:28 ` Luck, Tony
2021-08-12 9:03 ` HORIGUCHI NAOYA(堀口 直也)
2021-08-12 15:25 ` Luck, Tony
2021-08-13 6:29 ` HORIGUCHI NAOYA(堀口 直也)
2021-08-13 15:07 ` Luck, Tony
2021-08-16 17:12 ` Naoya Horiguchi [this message]
2021-08-16 17:56 ` Luck, Tony
2021-08-17 5:40 ` HORIGUCHI NAOYA(堀口 直也)
2021-06-03 23:36 ` [PATCH v6 2/2] mm,hwpoison: make get_hwpoison_page() call get_any_page() Naoya Horiguchi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210816171207.GA2239284@u2004 \
--to=naoya.horiguchi@linux.dev \
--cc=akpm@linux-foundation.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.com \
--cc=mike.kravetz@oracle.com \
--cc=nao.horiguchi@gmail.com \
--cc=naoya.horiguchi@nec.com \
--cc=osalvador@suse.de \
--cc=songmuchun@bytedance.com \
--cc=tony.luck@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).