Linux-EDAC Archive mirror
 help / color / mirror / Atom feed
From: "Sironi, Filippo" <sironi@amazon.de>
To: Borislav Petkov <bp@alien8.de>
Cc: Tony Luck <tony.luck@intel.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"x86@kernel.org" <x86@kernel.org>,
	"tglx@linutronix.de" <tglx@linutronix.de>,
	"mingo@redhat.com" <mingo@redhat.com>,
	"dave.hansen@linux.intel.com" <dave.hansen@linux.intel.com>,
	"hpa@zytor.com" <hpa@zytor.com>,
	"linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>
Subject: Re: [PATCH] x86/MCE: Get microcode revision from cpu_data instead of boot_cpu_data
Date: Thu, 7 Dec 2023 09:34:42 +0000	[thread overview]
Message-ID: <0A46F54F-CEF5-42EE-8A95-F442FAD7A05D@amazon.de> (raw)
In-Reply-To: <20231206210421.GFZXDh1UQ7L8K/toOM@fat_crate.local>

> > Boris, I just took a quick look and I might be missing something. If cores
> > fail to load the microcode or timeout, we taint the kernel, print an error
> > message, and then bubble up an error to userspace via:
> >
> > load_late_stop_cpus
> > load_late_locked
> > reload_store
> >
> > Right?
> 
> Yap.
> 
> > We would take servers that fail out of production;
> 
> And I'd like to hear about such issues. We added this failure checking
> only recently because something might go wrong and we want to warn. But
> it all updates fine here so kinda hard to test.

In a very large fleet, let's say that we have a handful of DPMs when considering
the entire processor, which means that in terms of cores, the defect rate is
much much lower.

What we've seen in these cases is that early loading - through the BIOS, I
actually never tried via the hypervisor - is successful while late loading
consistently fails. When it fails, we've seen two cases: 1/ the core still
reports the old microcode version or 2/ the core reports a bogus microcode
version (0xfffffffe is quite common, at least on Intel).

> My expectation is that if microcode fails loading on a subset of
> machines, the machine would more or less freeze. Depending, ofc, on what
> the microcode is updating...

It's bi-modal. We've seen servers that move along till we take them out of
production as well as servers that fail with an MCE of some sort likely leading
to a CATERR/IERR.

> > however, for others it might be interesting to have the correct
> > information. The patch - with a reworked commit message - might still
> > be useful to a few.
> 
> 
> https://lore.kernel.org/r/20231118193248.1296798-3-yazen.ghannam@amd.com <mailto:20231118193248.1296798-3-yazen.ghannam@amd.com>
> 
> 
> :)

:looking:




Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879



      reply	other threads:[~2023-12-07  9:34 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20231201112327.42319-1-sironi@amazon.de>
     [not found] ` <ZWos70EKhlAl2VPb@agluck-desk3>
2023-12-01 19:56   ` [PATCH] x86/MCE: Get microcode revision from cpu_data instead of boot_cpu_data Sironi, Filippo
2023-12-06 21:04     ` Borislav Petkov
2023-12-07  9:34       ` Sironi, Filippo [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0A46F54F-CEF5-42EE-8A95-F442FAD7A05D@amazon.de \
    --to=sironi@amazon.de \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=hpa@zytor.com \
    --cc=linux-edac@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=tony.luck@intel.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).