kexec.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: Alexander Graf <graf@amazon.com>
To: <linux-kernel@vger.kernel.org>
Cc: <linux-trace-kernel@vger.kernel.org>, <linux-mm@kvack.org>,
	<devicetree@vger.kernel.org>,
	<linux-arm-kernel@lists.infradead.org>,
	<kexec@lists.infradead.org>, <linux-doc@vger.kernel.org>,
	<x86@kernel.org>, Eric Biederman <ebiederm@xmission.com>,
	"H. Peter Anvin" <hpa@zytor.com>,
	Andy Lutomirski <luto@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	"Rob Herring" <robh+dt@kernel.org>,
	Steven Rostedt <rostedt@goodmis.org>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	Mark Rutland <mark.rutland@arm.com>,
	"Tom Lendacky" <thomas.lendacky@amd.com>,
	Ashish Kalra <ashish.kalra@amd.com>,
	James Gowans <jgowans@amazon.com>,
	Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>,
	<arnd@arndb.de>, <pbonzini@redhat.com>,
	<madvenka@linux.microsoft.com>,
	Anthony Yznaga <anthony.yznaga@oracle.com>,
	Usama Arif <usama.arif@bytedance.com>,
	David Woodhouse <dwmw@amazon.co.uk>,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>
Subject: [PATCH 00/15] kexec: Allow preservation of ftrace buffers
Date: Wed, 13 Dec 2023 00:04:37 +0000	[thread overview]
Message-ID: <20231213000452.88295-1-graf@amazon.com> (raw)

Kexec today considers itself purely a boot loader: When we enter the new
kernel, any state the previous kernel left behind is irrelevant and the
new kernel reinitializes the system.

However, there are use cases where this mode of operation is not what we
actually want. In virtualization hosts for example, we want to use kexec
to update the host kernel while virtual machine memory stays untouched.
When we add device assignment to the mix, we also need to ensure that
IOMMU and VFIO states are untouched. If we add PCIe peer to peer DMA, we
need to do the same for the PCI subsystem. If we want to kexec while an
SEV-SNP enabled virtual machine is running, we need to preserve the VM
context pages and physical memory. See James' and my Linux Plumbers
Conference 2023 presentation for details:

  https://lpc.events/event/17/contributions/1485/

To start us on the journey to support all the use cases above, this
patch implements basic infrastructure to allow hand over of kernel state
across kexec (Kexec HandOver, aka KHO). As example target, we use ftrace:
With this patch set applied, you can read ftrace records from the
pre-kexec environment in your post-kexec one. This creates a very powerful
debugging and performance analysis tool for kexec. It's also slightly
easier to reason about than full blown VFIO state preservation.

== Alternatives ==

There are alternative approaches to (parts of) the problems above:

  * Memory Pools [1] - preallocated persistent memory region + allocator
  * PRMEM [2] - resizable persistent memory regions with fixed metadata
                pointer on the kernel command line + allocator
  * Pkernfs [3] - preallocated file system for in-kernel data with fixed
                  address location on the kernel command line
  * PKRAM [4] - handover of user space pages using a fixed metadata page
                specified via command line

All of the approaches above fundamentally have the same problem: They
require the administrator to explicitly carve out a physical memory
location because they have no mechanism outside of the kernel command
line to pass data (including memory reservations) between kexec'ing
kernels.

KHO provides that base foundation. We will determine later whether we
still need any of the approaches above for fast bulk memory handover of for
example IOMMU page tables. But IMHO they would all be users of KHO, with
KHO providing the foundational primitive to pass metadata and bulk memory
reservations as well as provide easy versioning for data.

== Documentation ==

If people are happy with the approach in this patch set, I will write up
conclusive documentation including schemas for the metadata as part of its
next iteration. For now, here's a rudimentary overview:

We introduce a metadata file that the kernels pass between each other. How
they pass it is architecture specific. The file's format is a Flattened
Device Tree (fdt) which has a generator and parser already included in
Linux. When the root user enables KHO through /sys/kernel/kho/active, the
kernel invokes callbacks to every driver that supports KHO to serialize
its state. When the actual kexec happens, the fdt is part of the image
set that we boot into. In addition, we keep a "scratch region" available
for kexec: A physically contiguous memory region that is guaranteed to
not have any memory that KHO would preserve.  The new kernel bootstraps
itself using the scratch region and sets all handed over memory as in use.
When drivers initialize that support KHO, they introspect the fdt and
recover their state from it. This includes memory reservations, where the
driver can either discard or claim reservations.

== Limitations ==

I currently only implemented file based kexec. The kernel interfaces
in the patch set are already in place to support user space kexec as well,
but I have not implemented it yet.

== How to Use ==

To use the code, please boot the kernel with the "kho_scratch=" command
line parameter set: "kho_scratch=512M". KHO requires a scratch region.

Make sure to fill ftrace with contents that you want to observe after
kexec.  Then, before you invoke file based "kexec -l", activate KHO:

  # echo 1 > /sys/kernel/kho/active
  # kexec -l Image --initrd=initrd -s
  # kexec -e

The new kernel will boot up and contain the previous kernel's trace
buffers in /sys/kernel/debug/tracing/trace.



Alex

[1] https://lore.kernel.org/all/169645773092.11424.7258549771090599226.stgit@skinsburskii./
[2] https://lore.kernel.org/all/20231016233215.13090-1-madvenka@linux.microsoft.com/
[3] https://lpc.events/event/17/contributions/1485/attachments/1296/2650/jgowans-preserving-across-kexec.pdf
[4] https://lore.kernel.org/kexec/1682554137-13938-1-git-send-email-anthony.yznaga@oracle.com/


Alexander Graf (15):
  mm,memblock: Add support for scratch memory
  memblock: Declare scratch memory as CMA
  kexec: Add Kexec HandOver (KHO) generation helpers
  kexec: Add KHO parsing support
  kexec: Add KHO support to kexec file loads
  arm64: Add KHO support
  x86: Add KHO support
  tracing: Introduce names for ring buffers
  tracing: Introduce names for events
  tracing: Introduce kho serialization
  tracing: Add kho serialization of trace buffers
  tracing: Recover trace buffers from kexec handover
  tracing: Add kho serialization of trace events
  tracing: Recover trace events from kexec handover
  tracing: Add config option for kexec handover

 Documentation/ABI/testing/sysfs-firmware-kho  |   9 +
 Documentation/ABI/testing/sysfs-kernel-kho    |  53 ++
 .../admin-guide/kernel-parameters.txt         |  10 +
 MAINTAINERS                                   |   2 +
 arch/arm64/Kconfig                            |  12 +
 arch/arm64/kernel/setup.c                     |   2 +
 arch/arm64/mm/init.c                          |   8 +
 arch/x86/Kconfig                              |  12 +
 arch/x86/boot/compressed/kaslr.c              |  55 ++
 arch/x86/include/uapi/asm/bootparam.h         |  15 +-
 arch/x86/kernel/e820.c                        |   9 +
 arch/x86/kernel/kexec-bzimage64.c             |  39 ++
 arch/x86/kernel/setup.c                       |  46 ++
 arch/x86/mm/init_32.c                         |   7 +
 arch/x86/mm/init_64.c                         |   7 +
 drivers/of/fdt.c                              |  41 ++
 drivers/of/kexec.c                            |  36 ++
 include/linux/kexec.h                         |  56 ++
 include/linux/memblock.h                      |  19 +
 include/linux/ring_buffer.h                   |   9 +-
 include/linux/trace_events.h                  |   1 +
 include/trace/trace_events.h                  |   2 +
 include/uapi/linux/kexec.h                    |   6 +
 kernel/Makefile                               |   2 +
 kernel/kexec_file.c                           |  41 ++
 kernel/kexec_kho_in.c                         | 298 ++++++++++
 kernel/kexec_kho_out.c                        | 526 ++++++++++++++++++
 kernel/trace/Kconfig                          |  13 +
 kernel/trace/blktrace.c                       |   1 +
 kernel/trace/ring_buffer.c                    | 267 ++++++++-
 kernel/trace/trace.c                          |  76 ++-
 kernel/trace/trace_branch.c                   |   1 +
 kernel/trace/trace_events.c                   |   3 +
 kernel/trace/trace_functions_graph.c          |   4 +-
 kernel/trace/trace_output.c                   | 106 +++-
 kernel/trace/trace_output.h                   |   1 +
 kernel/trace/trace_probe.c                    |   3 +
 kernel/trace/trace_syscalls.c                 |  29 +
 mm/Kconfig                                    |   4 +
 mm/memblock.c                                 |  83 ++-
 40 files changed, 1901 insertions(+), 13 deletions(-)
 create mode 100644 Documentation/ABI/testing/sysfs-firmware-kho
 create mode 100644 Documentation/ABI/testing/sysfs-kernel-kho
 create mode 100644 kernel/kexec_kho_in.c
 create mode 100644 kernel/kexec_kho_out.c

-- 
2.40.1




Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879




_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

             reply	other threads:[~2023-12-13  0:05 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-12-13  0:04 Alexander Graf [this message]
2023-12-13  0:04 ` [PATCH 01/15] mm,memblock: Add support for scratch memory Alexander Graf
2023-12-13  0:04 ` [PATCH 02/15] memblock: Declare scratch memory as CMA Alexander Graf
2023-12-13 11:32   ` kernel test robot
2023-12-13  0:04 ` [PATCH 03/15] kexec: Add Kexec HandOver (KHO) generation helpers Alexander Graf
2023-12-13 18:36   ` Stanislav Kinsburskii
2023-12-13 23:36     ` Alexander Graf
2023-12-13  0:04 ` [PATCH 04/15] kexec: Add KHO parsing support Alexander Graf
2023-12-13 18:56   ` Stanislav Kinsburskii
2023-12-13  0:04 ` [PATCH 05/15] kexec: Add KHO support to kexec file loads Alexander Graf
2023-12-13  0:04 ` [PATCH 06/15] arm64: Add KHO support Alexander Graf
2023-12-13 11:22   ` kernel test robot
2023-12-13 13:41   ` kernel test robot
2023-12-14 22:36   ` Rob Herring
2023-12-18 23:01     ` Alexander Graf
2023-12-13  0:04 ` [PATCH 07/15] x86: " Alexander Graf
2023-12-13  0:04 ` [PATCH 08/15] tracing: Introduce names for ring buffers Alexander Graf
2023-12-13  0:15   ` Steven Rostedt
2023-12-13  0:35     ` Alexander Graf
2023-12-13  0:44       ` Steven Rostedt
2023-12-13 11:22   ` kernel test robot
2023-12-13  0:04 ` [PATCH 09/15] tracing: Introduce names for events Alexander Graf
2023-12-13  0:49   ` Steven Rostedt
2023-12-13  0:04 ` [PATCH 10/15] tracing: Introduce kho serialization Alexander Graf
2023-12-13  0:04 ` [PATCH 11/15] tracing: Add kho serialization of trace buffers Alexander Graf
2023-12-13  0:04 ` [PATCH 12/15] tracing: Recover trace buffers from kexec handover Alexander Graf
2023-12-13  0:04 ` [PATCH 13/15] tracing: Add kho serialization of trace events Alexander Graf
2023-12-13  0:04 ` [PATCH 14/15] tracing: Recover trace events from kexec handover Alexander Graf
2023-12-13  0:04 ` [PATCH 15/15] tracing: Add config option for " Alexander Graf
2023-12-14 14:58 ` [PATCH 00/15] kexec: Allow preservation of ftrace buffers Eric W. Biederman
2023-12-14 16:02   ` Alexander Graf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20231213000452.88295-1-graf@amazon.com \
    --to=graf@amazon.com \
    --cc=akpm@linux-foundation.org \
    --cc=anthony.yznaga@oracle.com \
    --cc=arnd@arndb.de \
    --cc=ashish.kalra@amd.com \
    --cc=benh@kernel.crashing.org \
    --cc=devicetree@vger.kernel.org \
    --cc=dwmw@amazon.co.uk \
    --cc=ebiederm@xmission.com \
    --cc=hpa@zytor.com \
    --cc=jgowans@amazon.com \
    --cc=kexec@lists.infradead.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-trace-kernel@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=madvenka@linux.microsoft.com \
    --cc=mark.rutland@arm.com \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=robh+dt@kernel.org \
    --cc=rostedt@goodmis.org \
    --cc=skinsburskii@linux.microsoft.com \
    --cc=thomas.lendacky@amd.com \
    --cc=usama.arif@bytedance.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).