Linux-Block Archive mirror
 help / color / mirror / Atom feed
From: Holger Kiehl <Holger.Kiehl@dwd.de>
To: linux-kernel <linux-kernel@vger.kernel.org>,
	 linux-raid <linux-raid@vger.kernel.org>,
	linux-block@vger.kernel.org,  Jens Axboe <axboe@kernel.dk>,
	linux-ext4@vger.kernel.org,  Theodore Ts'o <tytso@mit.edu>
Subject: Massive slowdown in kernels as of 6.x
Date: Mon, 6 May 2024 11:31:37 +0000 (GMT)	[thread overview]
Message-ID: <1ebabc15-51a8-59f3-c813-4e65e897a373@diagnostix.dwd.de> (raw)

Hello,

on a 4 socket file server distributing ~90 million files with
~130TiB of data daily, I see a massive slowdown of IO operation
after some time (sometimes in less then a day). This slowdown
only started as of Kernel 6.x and does not happen with 5.15.x.
Have so far tried, 6.0.9, 6.1.27 and 6.6.30 kernel and they all
have this same slowdown effect after some time. If the load is
taken away from the server and it is nearly idle, it still has
this slowdown effect and only recovers after some hours by itself.
During this slowdown and idle time I had a look at an rsync
process with strace that was uploading some small files to the
server and could see the slowdown here was in the rename() system
call, all other system calls (read(), write(), newfstatat(),
openat(), fchmod(), etc) where not effected:

   rename(".27095571.iXVMMT", "27095571")  = 0 <18.305817>
   rename(".272629ef.22gv2x", "272629ef")  = 0 <18.325222>
   rename(".275fbacf.UBj6J5", "275fbacf")  = 0 <18.317571>
   rename(".277ab7da.K5y144", "277ab7da")  = 0 <18.312568>
   rename(".27873039.ZQ4Lum", "27873039")  = 0 <18.310120>
   rename(".27ebf01f.t1FKeU", "27ebf01f")  = 0 <18.376816>
   rename(".27f97e6a.kJqqfL", "27f97e6a")  = 0 <18.290618>
   rename(".28078cd9.rV7JdN", "28078cd9")  = 0 <18.315415>
   rename(".28105bb4.gljiDk", "28105bb4")  = 0 <18.325392>
   rename(".282209b1.Cy3Wt2", "282209b1")  = 0 <30.188303>
   rename(".28888272.aUCxRj", "28888272")  = 0 <18.263236>
   rename(".288d8408.XjfGbH", "288d8408")  = 0 <18.312444>
   rename(".2897f455.hm3FG6", "2897f455")  = 0 <18.281729>
   rename(".28d7d7e8.pzMMF6", "28d7d7e8")  = 0 <18.281402>
   rename(".28d9a820.KQuaM0", "28d9a820")  = 0 <32.620562>
   rename(".294ae845.8Y6vYR", "294ae845")  = 0 <18.289532>
   rename(".294fee3f.eccu2p", "294fee3f")  = 0 <18.260564>
   rename(".29581b50.zPTjTh", "29581b50")  = 0 <18.314536>
   rename(".2975d45f.l5FUYX", "2975d45f")  = 0 <18.293864>
   rename(".29b3770a.tlNMvb", "29b3770a")  = 0 <0.000062>
   rename(".29c5e6ee.EexCwZ", "29c5e6ee")  = 0 <18.268144>
   rename(".29d23183.sLqxpd", "29d23183")  = 0 <18.344478>
   rename(".29d4f65.oyjRWj", "29d4f65")    = 0 <18.553610>
   rename(".29dcfab1.Y47Z1B", "29dcfab1")  = 0 <18.339336>
   rename(".29f26c7c.KNZXEe", "29f26c7c")  = 0 <18.372242>
   rename(".2a09907b.SXIgev", "2a09907b")  = 0 <18.317119>
   rename(".2a0c499c.8DiCsM", "2a0c499c")  = 0 <18.380393>
   rename(".2a64b7e8.FPnsB3", "2a64b7e8")  = 0 <18.372004>
   rename(".2a6765c9.t7Z0hj", "2a6765c9")  = 0 <18.296044>
   rename(".2a83d78f.UJVoMu", "2a83d78f")  = 0 <18.380678>
   rename(".2a94e724.AorYof", "2a94e724")  = 0 <18.360716>
   rename(".2a9ea651.EWpBHM", "2a9ea651")  = 0 <18.327733>
   rename(".2a9f1679.xDYq9Q", "2a9f1679")  = 0 <18.312850>
   rename(".2ab0a134.2GWgmr", "2ab0a134")  = 0 <18.326181>
   rename(".2aebf110.pGkILq", "2aebf110")  = 0 <0.000188>
   rename(".2af10031.7Sl5g6", "2af10031")  = 0 <18.342683>
   rename(".2b095066.MCauJX", "2b095066")  = 0 <18.375003>
   rename(".2b217bfd.HauJjr", "2b217bfd")  = 0 <18.427703>
   rename(".2b336a06.w5NN0p", "2b336a06")  = 0 <18.378774>
   rename(".2b40b422.i2v0E6", "2b40b422")  = 0 <14.727797>
   rename(".2b568d13.9zmRRX", "2b568d13")  = 0 <0.000056>
   rename(".2b5ccc66.AFd86P", "2b5ccc66")  = 0 <0.000063>
   rename(".2b7d0a43.qWyxge", "2b7d0a43")  = 0 <0.000046>
   rename(".2b7f968a.QAqOCb", "2b7f968a")  = 0 <0.000041>
   rename(".2ba6dddf.ynNTvi", "2ba6dddf")  = 0 <0.000039>
   rename(".2bce23ab.tliDkg", "2bce23ab")  = 0 <0.000040>
   rename(".2c19e144.CvHPV5", "2c19e144")  = 0 <0.000060>
   rename(".2c7c0651.8x1kQy", "2c7c0651")  = 0 <0.000057>
   rename(".2ca1a6b7.QwujH4", "2ca1a6b7")  = 0 <0.000396>
   rename(".2cc71683.7n9EYA", "2cc71683")  = 0 <0.000045>
   rename(".2cebde90.ZiGcTa", "2cebde90")  = 0 <0.000042>
   rename(".2d057cb4.5PGOIP", "2d057cb4")  = 0 <0.000042>
   rename(".2d29b4a7.A8hfwg", "2d29b4a7")  = 0 <0.000043>

So during the slow phase it took mostly ~18 seconds and as the phase
ends, the renames are very fast again.

Tried to change the priority of the process with renice and
also enabled some different IO schedulers for the block device,
but this had no effect.

Could not find anything in the logs or dmesg when this happens.

Any idea what could be the cause of this slowdown?

What else can I do to better locate in which part of the kernel
the IO is stuck?

The system has 1.5TiB memory and the filesystem is ext4 on a MD
raid10 with 10 nvme drives (Intel P4610):

   cat /proc/mdstat
   Personalities : [raid10]
   md0 : active raid10 nvme1n1[2] nvme4n1[4] nvme5n1[5] nvme3n1[3] nvme9n1[9] nvme8n1[8] nvme7n1[7] nvme6n1[6] nvme2n1[1] nvme0n1[0]
         7813406720 blocks super 1.2 512K chunks 2 near-copies [10/10] [UUUUUUUUUU]
         bitmap: 28/59 pages [112KB], 65536KB chunk

Mounted as follows:

   /dev/md0 on /u2 type ext4 (rw,nodev,noatime,commit=600,stripe=640)

The following cron entry is used to trim the device:

   25 */2 * * * root /usr/sbin/fstrim -v /u2 >> /tmp/u2.trim 2>&1

A check of the raid was also performed with no issues:

   [Sun May  5 13:52:01 2024] md: data-check of RAID array md0
   [Sun May  5 14:54:25 2024] md: md0: data-check done.
   cat /sys/block/md0/md/mismatch_cnt
   0

CPU's are four Intel Xeon Platinum 8268 and server is a Dell Poweredge R940.

Additional information of the kernel config and other information I have
uploaded to https://download.dwd.de/pub/afd/test/kernel_problem

Regards,
Holger

             reply	other threads:[~2024-05-06 11:39 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-06 11:31 Holger Kiehl [this message]
2024-05-06 11:47 ` Massive slowdown in kernels as of 6.x Paul Menzel
2024-05-06 12:01 ` Dr. David Alan Gilbert
2024-05-06 12:02 ` Hannes Reinecke
2024-05-06 12:23 ` Carlos Carvalho
2024-05-06 13:32 ` Carlos Carvalho

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1ebabc15-51a8-59f3-c813-4e65e897a373@diagnostix.dwd.de \
    --to=holger.kiehl@dwd.de \
    --cc=axboe@kernel.dk \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-raid@vger.kernel.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).