QEMU-Devel Archive mirror
 help / color / mirror / Atom feed
From: Stefan Hajnoczi <stefanha@redhat.com>
To: Alex Kalenyuk <akalenyu@redhat.com>, Adam Litke <alitke@redhat.com>
Cc: qemu-devel@nongnu.org, kwolf@redhat.com,
	"Richard W.M. Jones" <rjones@redhat.com>
Subject: qemu-img cache modes with Linux cgroup v1
Date: Mon, 31 Jul 2023 11:40:36 -0400	[thread overview]
Message-ID: <20230731154036.GA1258836@fedora> (raw)

[-- Attachment #1: Type: text/plain, Size: 2493 bytes --]

Hi,
qemu-img -t writeback -T writeback is not designed to run with the Linux
cgroup v1 memory controller because dirtying too much page cache leads
to process termination instead of usual non-cgroup and cgroup v2
throttling behavior:
https://bugzilla.redhat.com/show_bug.cgi?id=2196072

I wanted to share my thoughts on this issue.

cache=none bypasses the host page cache and will not hit the cgroup
memory limit. It's an easy solution to avoid exceeding the cgroup v1
memory limit.

However, not all Linux file systems support O_DIRECT and qemu-img's I/O
pattern may perform worse under cache=none than cache=writeback.

1. Which file systems support O_DIRECT in Linux 6.5?

I searched the Linux source code for file systems that implement
.direct_IO or set FMODE_CAN_ODIRECT. This is not exhaustive and may not
be 100% accurate.

The big name file systems (ext4, XFS, btrfs, nfs, smb, ceph) support
O_DIRECT. The most obvious omission is tmpfs.

If your users are running file systems that support O_DIRECT, then
qemu-img -t none -T none is an easy solution to the cgroup v1 memory
limit issue.

Supported:
9p
affs
btrfs
ceph
erofs
exfat
ext2
ext4
f2fs
fat
fuse
gfs2
hfs
hfsplus
jfs
minix
nfs
nilfs2
ntfs3
ocfs2
orangefs
overlayfs
reiserfs
smb
udf
xfs
zonefs

Unsupported:
adfs
befs
bfs
cramfs
ecryptfs
efs
freevxfs
hpfs
hugetlbfs
isofs
jffs2
ntfs
omfs
qnx4
qnx6
ramfs
romfs
squashfs
sysv
tmpfs
ubifs
ufs
vboxsf

2. Is qemu-img performance with O_DIRECT acceptable?

The I/O pattern matters more with O_DIRECT because every I/O request is
sent to the storage device. This means buffer sizes matter more (more
small I/Os have higher overhead than fewer large I/Os). Concurrency can
also help saturate the storage device.

If you switch to O_DIRECT and encounter performance problems then
qemu-img can be optimized to send I/O patterns with less overhead. This
requires performance analysis.

3. Using buffered I/O because O_DIRECT is not universally supported?

If you can't use O_DIRECT, then qemu-img could be extended to manage its
dirty page cache set carefully. This consists of picking a budget and
writing back to disk when the budget is exhausted. Richard Jones has
shared links covering posix_fadvise(2) and sync_file_range(2):
https://lkml.iu.edu/hypermail/linux/kernel/1005.2/01845.html
https://lkml.iu.edu/hypermail/linux/kernel/1005.2/01953.html

We can discuss qemu-img code changes and performance analysis more if
you decide to take that direction.

Hope this helps!

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

             reply	other threads:[~2023-07-31 15:41 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-31 15:40 Stefan Hajnoczi [this message]
2023-07-31 16:06 ` qemu-img cache modes with Linux cgroup v1 Richard W.M. Jones
2023-07-31 17:19 ` Daniel P. Berrangé
2023-07-31 19:15   ` Stefan Hajnoczi
2024-05-06 17:10     ` Alex Kalenyuk
2024-05-06 18:24       ` Stefan Hajnoczi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230731154036.GA1258836@fedora \
    --to=stefanha@redhat.com \
    --cc=akalenyu@redhat.com \
    --cc=alitke@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=rjones@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).