Re: I/O stalls when merging qcow2 snapshots on nfs

KVM Archive mirror
 help / color / mirror / Atom feed

From: Benjamin Coddington <bcodding@redhat.com>
To: Thomas Glanzmann <thomas@glanzmann.de>
Cc: kvm@vger.kernel.org, linux-nfs@vger.kernel.org
Subject: Re: I/O stalls when merging qcow2 snapshots on nfs
Date: Mon, 06 May 2024 07:25:34 -0400	[thread overview]
Message-ID: <CC139243-7C48-4416-BE71-3C7B651F00FC@redhat.com> (raw)
In-Reply-To: <ZjdtmFu92mSlaHZ2@glanzmann.de>

On 5 May 2024, at 7:29, Thomas Glanzmann wrote:

> Hello,
> I often take snapshots in order to move kvm VMs from one nfs share to
> another while they're running or to take backups. Sometimes I have very
> large VMs (1.1 TB) which take a very long time (40 minutes - 2 hours) to
> backup or move. They also write between 20 - 60 GB of data while being
> backed up or moved. Once the backup or move is done the dirty snapshot
> data needs to be merged to the parent disk. While doing this I often
> experience I/O stalls within the VMs in the range of 1 - 20 seconds.
> Sometimes worse. But I have some very latency sensitive VMs which crash
> or misbehave after 15 seconds I/O stalls. So I would like to know if there
> is some tuening I can do to make these I/O stalls shorter.
>
> - I already tried to set vm.dirty_expire_centisecs=100 which appears to
>   make it better, but not under 15 seconds. Perfect would be I/O stalls
>   no more than 1 second.
>
> This is how you can reproduce the issue:
>
> - NFS Server:
> mkdir /ssd
> apt install -y nfs-kernel-server
> echo '/nfs 0.0.0.0/0.0.0.0(rw,no_root_squash,no_subtree_check,sync)' > /etc/exports
> exports -ra
>
> - NFS Client / KVM Host:
> mount server:/ssd /mnt
> # Put a VM on /mnt and start it.
> # Create a snapshot:
> virsh snapshot-create-as --domain testy guest-state1 --diskspec vda,file=/mnt/overlay.qcow2 --disk-only --atomic --no-metadata -no-metadata

What NFS version ends up getting mounted here?  You might eliminate some
head-of-line blocking issues with the "nconnect=16" mount option to open
additional TCP connections.

My view of what could be happening is that the IO from your guest's process
is congesting with the IO from your 'virsh blockcommit' process, and we
don't currently have a great way to classify and queue IO from various
sources in various ways.

Ben

next prev parent reply	other threads:[~2024-05-06 11:25 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-05 11:29 I/O stalls when merging qcow2 snapshots on nfs Thomas Glanzmann
2024-05-06 11:25 ` Benjamin Coddington [this message]
2024-05-06 17:21   ` Thomas Glanzmann
2024-05-06 13:47 ` Trond Myklebust

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CC139243-7C48-4416-BE71-3C7B651F00FC@redhat.com \
    --to=bcodding@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=thomas@glanzmann.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).