Linux-EDAC Archive mirror
 help / color / mirror / Atom feed
From: Tong Tiangen <tongtiangen@huawei.com>
To: Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	<wangkefeng.wang@huawei.com>,
	Dave Hansen <dave.hansen@linux.intel.com>, <x86@kernel.org>,
	"H. Peter Anvin" <hpa@zytor.com>, Tony Luck <tony.luck@intel.com>,
	Andy Lutomirski <luto@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: <linux-kernel@vger.kernel.org>, <linux-edac@vger.kernel.org>,
	<linux-mm@kvack.org>, Tong Tiangen <tongtiangen@huawei.com>,
	Guohanjun <guohanjun@huawei.com>
Subject: [PATCH -next v3 3/3] x86/mce: set MCE_IN_KERNEL_COPY_MC for DEFAULT_MCE_SAFE exception
Date: Mon, 18 Dec 2023 16:24:00 +0800	[thread overview]
Message-ID: <20231218082400.2694698-4-tongtiangen@huawei.com> (raw)
In-Reply-To: <20231218082400.2694698-1-tongtiangen@huawei.com>

From: Kefeng Wang <wangkefeng.wang@huawei.com>

If an MCE has happened in kernel space and the kernel can recover,
mce.kflags MCE_IN_KERNEL_RECOV will set in error_context().

With the setting of MCE_IN_KERNEL_RECOV, the MCE is handled in
do_machine_check(). But due to lack of MCE_IN_KERNEL_COPY_MC, although the
kernel won't panic, the corrupted page don't be isolated, new one maybe
consume it again, which is not what we expected.

In order to avoid above issue, some hwpoison recover process[1][2][3][4],
memory_failure_queue() is called to cope with such unhandled corrupted
pages, also there are some other already existed MC-safe copy scenarios,
eg, nvdimm, dm-writecache, dax, which don't isolate corrupted pages.

The best way to fix them is set MCE_IN_KERNEL_COPY_MC for MC-Safe Copy,
then let the core do_machine_check() to isolate corrupted page instead
of doing it one-by-one.

EX_TYPE_FAULT_MCE_SAFE is used for the FPU. Here, we do not touch the logic
of FPU. We only modify the logic of EX_TYPE_DEFAULT_MCE_SAFE which is used
in the scenarios described above.

[1] commit d302c2398ba2 ("mm, hwpoison: when copy-on-write hits poison, take page offline")
[2] commit 1cb9dc4b475c ("mm: hwpoison: support recovery from HugePage copy-on-write faults")
[3] commit 6b970599e807 ("mm: hwpoison: support recovery from ksm_might_need_to_copy()")
[4] commit 1cb9dc4b475c ("mm: hwpoison: support recovery from HugePage copy-on-write faults")

Reviewed-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: Tong Tiangen <tongtiangen@huawei.com>
---
 arch/x86/kernel/cpu/mce/severity.c |  4 ++--
 mm/ksm.c                           |  1 -
 mm/memory.c                        | 12 +++---------
 3 files changed, 5 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/severity.c b/arch/x86/kernel/cpu/mce/severity.c
index df67a7a13034..b4b1d028cbb3 100644
--- a/arch/x86/kernel/cpu/mce/severity.c
+++ b/arch/x86/kernel/cpu/mce/severity.c
@@ -292,11 +292,11 @@ static noinstr int error_context(struct mce *m, struct pt_regs *regs)
 	case EX_TYPE_UACCESS:
 		if (!copy_user)
 			return IN_KERNEL;
+		fallthrough;
+	case EX_TYPE_DEFAULT_MCE_SAFE:
 		m->kflags |= MCE_IN_KERNEL_COPY_MC;
 		fallthrough;
-
 	case EX_TYPE_FAULT_MCE_SAFE:
-	case EX_TYPE_DEFAULT_MCE_SAFE:
 		m->kflags |= MCE_IN_KERNEL_RECOV;
 		return IN_KERNEL_RECOV;
 
diff --git a/mm/ksm.c b/mm/ksm.c
index ae05fb438ac5..01e3a7ef1b9d 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -3075,7 +3075,6 @@ struct folio *ksm_might_need_to_copy(struct folio *folio,
 		if (copy_mc_user_highpage(folio_page(new_folio, 0), page,
 								addr, vma)) {
 			folio_put(new_folio);
-			memory_failure_queue(folio_pfn(folio), 0);
 			return ERR_PTR(-EHWPOISON);
 		}
 		folio_set_dirty(new_folio);
diff --git a/mm/memory.c b/mm/memory.c
index 809746555827..9f0d875b1d3f 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2836,10 +2836,8 @@ static inline int __wp_page_copy_user(struct page *dst, struct page *src,
 	unsigned long addr = vmf->address;
 
 	if (likely(src)) {
-		if (copy_mc_user_highpage(dst, src, addr, vma)) {
-			memory_failure_queue(page_to_pfn(src), 0);
+		if (copy_mc_user_highpage(dst, src, addr, vma))
 			return -EHWPOISON;
-		}
 		return 0;
 	}
 
@@ -6168,10 +6166,8 @@ static int copy_user_gigantic_page(struct folio *dst, struct folio *src,
 
 		cond_resched();
 		if (copy_mc_user_highpage(dst_page, src_page,
-					  addr + i*PAGE_SIZE, vma)) {
-			memory_failure_queue(page_to_pfn(src_page), 0);
+					  addr + i*PAGE_SIZE, vma))
 			return -EHWPOISON;
-		}
 	}
 	return 0;
 }
@@ -6187,10 +6183,8 @@ static int copy_subpage(unsigned long addr, int idx, void *arg)
 	struct copy_subpage_arg *copy_arg = arg;
 
 	if (copy_mc_user_highpage(copy_arg->dst + idx, copy_arg->src + idx,
-				  addr, copy_arg->vma)) {
-		memory_failure_queue(page_to_pfn(copy_arg->src + idx), 0);
+				  addr, copy_arg->vma))
 		return -EHWPOISON;
-	}
 	return 0;
 }
 
-- 
2.25.1


      parent reply	other threads:[~2023-12-18  8:24 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-12-18  8:23 [PATCH -next v3 0/3] minor improvements for x86 mce processing Tong Tiangen
2023-12-18  8:23 ` [PATCH -next v3 1/3] x86/mce: remove redundant fixup type EX_TYPE_COPY Tong Tiangen
2023-12-18  8:23 ` [PATCH -next v3 2/3] x86/mce: rename MCE_IN_KERNEL_COPYIN to MCE_IN_KERNEL_COPY_MC Tong Tiangen
2023-12-18  8:24 ` Tong Tiangen [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20231218082400.2694698-4-tongtiangen@huawei.com \
    --to=tongtiangen@huawei.com \
    --cc=akpm@linux-foundation.org \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=guohanjun@huawei.com \
    --cc=hpa@zytor.com \
    --cc=linux-edac@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=mingo@redhat.com \
    --cc=naoya.horiguchi@nec.com \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=tony.luck@intel.com \
    --cc=wangkefeng.wang@huawei.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).