All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
From: Guoqing Jiang <jgq516@gmail.com>
To: linux-raid@vger.kernel.org
Subject: Re: Intermittent stalling of all MD IO, Debian buster (4.19.0-16)
Date: Fri, 18 Jun 2021 13:35:08 +0800	[thread overview]
Message-ID: <33236a83-a14d-a9e0-5384-91aa007858dc@gmail.com> (raw)
In-Reply-To: <20210616150549.ojm3nvdamkmqb6ev@bitfolk.com>

Hi Andy,

On 6/16/21 11:05 PM, Andy Smith wrote:
> Hi Guoqing,
>
> Thanks for looking at this.
>
> On Wed, Jun 16, 2021 at 11:57:33AM +0800, Guoqing Jiang wrote:
>> The above looks like the bio for sb write was throttled by wbt, which caused
>> the first calltrace.
>> I am wondering if there  were intensive IOs happened to the
>> underlying device of md5, which triggered wbt to throttle sb
>> write, or can you access the underlying device directly?
> Next time it occurs I can check if I am able to read from the SSDs
> that make up the MD device, if that information would be helpful.
>
> I have never been able to replicate the problem in a test
> environment so it is likely that it needs to be under heavy load for
> it to happen.

I guess so, and a reliable reproducer definitely  helps us to analysis 
the root cause.

>> And there was a report [1] for raid5 which may related to wbt throttle as
>> well, not sure if the
>> change [2] could help or not.
>>
>> [1]. https://lore.kernel.org/linux-raid/d3fced3f-6c2b-5ffa-fd24-b24ec6e7d4be@xmyslivec.cz/
>> [2]. https://lore.kernel.org/linux-raid/cb0f312e-55dc-cdc4-5d2e-b9b415de617f@gmail.com/
> All of my MD arrays tend to be RAID-1 or RAID-10, two devices, no
> journal, internal bitmap. I see the reporter of this problem was
> using RAID-6 with an external write journal. I can still build a
> kernel with this patch and try it out, if you think it could possibly
> help.

Yes, because both of the two issues have wbt related call traces though 
raid level is different.

> The long time between incidents obviously makes things
> extra challenging.
>
> The next step I have taken is to put the buster-backports kernel
> package (5.10.24-1~bpo10+1) on two test servers, and will also boot
> the production hosts into this if they should experience the problem
> again.

Good luck :).

Thanks,
Guoqing

      reply	other threads:[~2021-06-18  5:35 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-12 12:41 Intermittent stalling of all MD IO, Debian buster (4.19.0-16) Andy Smith
2021-06-12 13:39 ` Andy Smith
2021-06-16  3:57 ` Guoqing Jiang
2021-06-16 15:05   ` Andy Smith
2021-06-18  5:35     ` Guoqing Jiang [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=33236a83-a14d-a9e0-5384-91aa007858dc@gmail.com \
    --to=jgq516@gmail.com \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.