Linux-EDAC Archive mirror
 help / color / mirror / Atom feed
From: Zhiquan Li <zhiquan1.li@intel.com>
To: x86@kernel.org, linux-edac@vger.kernel.org,
	linux-kernel@vger.kernel.org, patches@lists.linux.dev,
	bp@alien8.de, mingo@kernel.org, tony.luck@intel.com,
	naoya.horiguchi@nec.com
Cc: zhiquan1.li@intel.com, Youquan Song <youquan.song@intel.com>
Subject: [PATCH v3] x86/mce: Set PG_hwpoison page flag to avoid the capture kernel panic
Date: Sat, 14 Oct 2023 13:17:54 +0800	[thread overview]
Message-ID: <20231014051754.3759099-1-zhiquan1.li@intel.com> (raw)

Memory errors don't happen very often, especially the severity is fatal.
However, in large-scale scenarios, such as data centers, it might still
happen.  For some MCE fatal error cases, the kernel might call
mce_panic() to terminate the production kernel directly, thus there is
no opportunity to queue a task for calling memory_failure() which will
try to make the kernel survive via memory failure handling.

Unfortunately, the capture kernel will panic for the same reason if it
touches the error memory again.  The consequence is that only an
incomplete vmcore is left for sustaining engineers, it's a big headache
for them to make clear what happened in the past.

The main task of kdump kernel is providing a "window" - /proc/vmcore,
for the dump program to access old memory.  A dump program running in
userspace determines the "policy".  Which pages need to be dumped is
determined by the configuration of dump program, it reads out the pages
that the sustaining engineer is interested in and excludes the rest.

Likewise, the dump program can exclude the poisoned page to avoid
touching the error page again, the prerequisite is the PG_hwpoison page
flag is set correctly by kernel.  The de facto dump program
(makedumpfile) already supports this function in a decade ago.  To set
the PG_hwpoison flag in the production kernel just before it panics is
the only missing step to make everything work.

And it would not introduce additional overhead in capture kernel or
conflict with other hwpoision-related code in production kernel.  It
leverages the already existing mechanisms to fix the issue as much as
possible, so the code changes are lightweight.

[ Tony: Changed TestSetPageHWPoison() to SetPageHWPoison() ]
[ mingo: Fixed the comments & changelog ]

Co-developed-by: Youquan Song <youquan.song@intel.com>
Signed-off-by: Youquan Song <youquan.song@intel.com>
Signed-off-by: Zhiquan Li <zhiquan1.li@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: Borislav Petkov <bp@alien8.de>
Link: https://lore.kernel.org/all/20230719211625.298785-1-tony.luck@intel.com/#t

---

V2: https://lore.kernel.org/all/20230914030539.1622477-1-zhiquan1.li@intel.com/

Changes since V2:
- Rebased to v6.6-rc5.
- Explained full scenario in commit message per Boris's suggestion.
- Included Ingo's fixes.
  Link: https://lore.kernel.org/all/ZRsUpM%2FXtPAE50Rm@gmail.com/

V1: https://lore.kernel.org/all/20230127015030.30074-1-tony.luck@intel.com/

Changes since V1:
- Revised the commit message as per Naoya's suggestion.
- Replaced "TODO" comment in code with comments based on mailing list
  discussion on the lack of value in covering other page types.
- Added the tag from Naoya.
  Link: https://lore.kernel.org/all/20230327083739.GA956278@hori.linux.bs1.fc.nec.co.jp/
---
 arch/x86/kernel/cpu/mce/core.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 6f35f724cc14..905e80c776b8 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -233,6 +233,7 @@ static noinstr void mce_panic(const char *msg, struct mce *final, char *exp)
 	struct llist_node *pending;
 	struct mce_evt_llist *l;
 	int apei_err = 0;
+	struct page *p;
 
 	/*
 	 * Allow instrumentation around external facilities usage. Not that it
@@ -286,6 +287,17 @@ static noinstr void mce_panic(const char *msg, struct mce *final, char *exp)
 	if (!fake_panic) {
 		if (panic_timeout == 0)
 			panic_timeout = mca_cfg.panic_timeout;
+		/*
+		 * Kdump can exclude the HWPoison page to avoid touching the error
+		 * page again, the prerequisite is that the PG_hwpoison page flag is
+		 * set.  However, for some MCE fatal error cases, there is no
+		 * opportunity to queue a task for calling memory_failure(), and as a
+		 * result, the capture kernel panics.  So mark the page as HWPoison
+		 * before kernel panic() for MCE.
+		 */
+		p = pfn_to_online_page(final->addr >> PAGE_SHIFT);
+		if (final && (final->status & MCI_STATUS_ADDRV) && p)
+			SetPageHWPoison(p);
 		panic(msg);
 	} else
 		pr_emerg(HW_ERR "Fake kernel panic: %s\n", msg);
-- 
2.25.1


             reply	other threads:[~2023-10-14  4:59 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-14  5:17 Zhiquan Li [this message]
2023-10-14  5:12 ` [PATCH v3] x86/mce: Set PG_hwpoison page flag to avoid the capture kernel panic Luck, Tony
2023-10-14  9:34   ` Zhiquan Li
2023-10-14 10:18     ` Borislav Petkov
2023-10-17  1:39       ` Zhiquan Li
2023-10-16  9:11     ` Borislav Petkov
2023-10-17  1:05       ` Zhiquan Li
2023-10-17  1:24         ` Luck, Tony
2023-10-17 11:18           ` Borislav Petkov
2023-10-17 15:00             ` Zhiquan Li
2023-10-17 17:35               ` Luck, Tony

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20231014051754.3759099-1-zhiquan1.li@intel.com \
    --to=zhiquan1.li@intel.com \
    --cc=bp@alien8.de \
    --cc=linux-edac@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=naoya.horiguchi@nec.com \
    --cc=patches@lists.linux.dev \
    --cc=tony.luck@intel.com \
    --cc=x86@kernel.org \
    --cc=youquan.song@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).