Openbmc archive mirror
 help / color / mirror / Atom feed
From: Andrew Geissler <geissonator@gmail.com>
To: OpenBMC Maillist <openbmc@lists.ozlabs.org>
Subject: Redundant BMC's
Date: Wed, 13 Dec 2023 14:43:37 -0600	[thread overview]
Message-ID: <E5183DEB-8B54-45AF-BE0F-6D470937B73D@gmail.com> (raw)

Greetings,

We at IBM are looking at implementing a server with redundant BMC's. The idea of
redundant BMC's is that if one fails (software or hardware related), the other
BMC takes over and there is no impact to the owner of the server (enterprise,
high availability market). One BMC is the "Active" BMC and the other is the
"Passive”.

High level you have 2 or more chassis's in a single server. 2 of those chassis's
have BMC's running OpenBMC. The BMC's negotiate on startup which one will be the
Active BMC and which one will be the Passive. Both BMC's have full access to the
server hardware (fans, power supplies, VPD chips, ...) but only one can access
the hardware at one time (via hardware mux).

The Passive BMC will be running a subset of OpenBMC services. As it will need to
support firmware update, and other basic features, it will have bmcweb running.
But other services like fan or power control would not be running on the
Passive.

The Active BMC will utilize bmcweb aggregation to provide basic information
about the Passive BMC. Server management can only occur via the Active BMC.

As the user changes settings (BIOS, certificates, system policy, ...) via the
Active BMC, we need to ensure we replicate these settings over to the Passive.
We've done a bit of initial exploration into using corosync/pacemaker. It has
some potential but also feels a bit heavy for what we need. The thought is that
a role change where the Passive BMC becomes the Active BMC and the Active
becomes the Passive is mostly driven by our external software managers. There's
potential for some cases where the BMC's themselves drive the role changes but
most of our use cases are situations where something in the BMC hardware (or its
connections to the server) have failed and the BIOS firmware or Redfish
management client direct the Passive BMC to become the Active.

A roll-our-own data synchronization daemon (utilizing rsync) to monitor for file
changes with some basic rules on when to synch (immediate, synch points) doesn't
seem all that bad but there's probably a lot of unknown pitfalls something like
corosync/pacemaker already handle.

Just throwing this out there in case anyone is also working on this or has any
opinions on direction here.

Thanks,
Andrew

                 reply	other threads:[~2023-12-13 20:44 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=E5183DEB-8B54-45AF-BE0F-6D470937B73D@gmail.com \
    --to=geissonator@gmail.com \
    --cc=openbmc@lists.ozlabs.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).