All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Brian Foster <bfoster@redhat.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [BUG] generic/475 recovery failure(s)
Date: Tue, 15 Jun 2021 14:39:16 +1000	[thread overview]
Message-ID: <20210615043916.GV664593@dread.disaster.area> (raw)
In-Reply-To: <20210614234145.GU664593@dread.disaster.area>

On Tue, Jun 15, 2021 at 09:41:45AM +1000, Dave Chinner wrote:
> On Mon, Jun 14, 2021 at 08:56:00AM -0400, Brian Foster wrote:
> > On Mon, Jun 14, 2021 at 08:32:58AM -0400, Brian Foster wrote:
> > > On Sat, Jun 12, 2021 at 08:33:32AM +1000, Dave Chinner wrote:
> > > > On Fri, Jun 11, 2021 at 03:02:39PM -0400, Brian Foster wrote:
> > > > > On Thu, Jun 10, 2021 at 11:14:32AM -0400, Brian Foster wrote:
> > ...
> > > 
> > > I've also now been hitting a deadlock issue more frequently with the
> > > same test. I'll provide more on that in a separate mail..
> > > 
> > 
> > So I originally thought this one was shutdown related (there still may
> > be a shutdown hang, I was quickly working around other issues to give
> > priority to the corruption issue), but what I'm seeing actually looks
> > like a log reservation deadlock. More specifically, it looks like a
> > stuck CIL push and everything else backed up behind it.
> > 
> > I suspect this is related to the same patch (it does show 4 concurrent
> > CIL push tasks, fwiw), but I'm not 100% certain atm. I'll repeat some
> > tests on the prior commit to try and confirm or rule that out. In the
> > meantime, dmesg with blocked task output is attached.
> > 
> > [49989.354537] sysrq: Show Blocked State
> > [49989.362009] task:kworker/u161:4  state:D stack:    0 pid:105326 ppid:     2 flags:0x00004000
> > [49989.370464] Workqueue: xfs-cil/dm-5 xlog_cil_push_work [xfs]
> > [49989.376278] Call Trace:
> > [49989.378744]  __schedule+0x38b/0xc50
> > [49989.382250]  ? lock_release+0x1cd/0x2a0
> > [49989.386097]  schedule+0x5b/0xc0
> > [49989.389251]  xlog_wait_on_iclog+0xeb/0x100 [xfs]
> > [49989.393932]  ? wake_up_q+0xa0/0xa0
> > [49989.397353]  xlog_cil_push_work+0x5cb/0x630 [xfs]
> > [49989.402123]  ? sched_clock_cpu+0xc/0xb0
> > [49989.405971]  ? lock_acquire+0x15d/0x380
> > [49989.409823]  ? lock_release+0x1cd/0x2a0
> > [49989.413662]  ? lock_acquire+0x15d/0x380
> > [49989.417502]  ? lock_release+0x1cd/0x2a0
> > [49989.421353]  ? finish_task_switch.isra.0+0xa0/0x2c0
> > [49989.426238]  process_one_work+0x26e/0x560
> > [49989.430271]  worker_thread+0x52/0x3b0
> > [49989.433942]  ? process_one_work+0x560/0x560
> > [49989.438138]  kthread+0x12c/0x150
> > [49989.441380]  ? __kthread_bind_mask+0x60/0x60
> 
> Only way we can get stuck here is that the ctx->start_lsn !=
> commit_lsn and the previous iclog is still in WANT_SYNC/SYNC state.
> This implies the iclog has an elevated reference count and so while
> it has been released, IO wasn't issued on it because refcount > 0.
> 
> The only way I can see this happening is start_lsn != commit_lsn
> on the same iclog, or we have a bug in the iclog reference counting
> and it's never reaching zero on the previous iclog.
> 
> Ok, I think I've reproduced this - took about 300 iterations of
> g/019, but I've got a system stuck like this but all the other CIL
> pushes in progress are stuck here:
> 
> [ 2458.915655] INFO: task kworker/u8:4:31656 blocked for more than 123 seconds.
> [ 2458.917629]       Not tainted 5.13.0-rc6-dgc+ #205
> [ 2458.919011] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ 2458.921205] task:kworker/u8:4    state:D stack:12160 pid:31656 ppid:     2 flags:0x00004000
> [ 2458.923538] Workqueue: xfs-cil/vdb xlog_cil_push_work
> [ 2458.925000] Call Trace:
> [ 2458.925763]  __schedule+0x30b/0x9f0
> [ 2458.926811]  schedule+0x68/0xe0
> [ 2458.927757]  xlog_state_get_iclog_space+0x1b8/0x2d0
> [ 2458.929122]  ? wake_up_q+0xa0/0xa0
> [ 2458.930142]  xlog_write+0x7b/0x650
> [ 2458.931145]  ? _raw_spin_unlock_irq+0xe/0x30
> [ 2458.932405]  ? __wait_for_common+0x133/0x160
> [ 2458.933647]  xlog_cil_push_work+0x5eb/0xa10
> [ 2458.934854]  ? wake_up_q+0xa0/0xa0
> [ 2458.935881]  ? xfs_swap_extents+0x920/0x920
> [ 2458.937126]  process_one_work+0x1ab/0x390
> [ 2458.938284]  worker_thread+0x56/0x3d0
> [ 2458.939355]  ? rescuer_thread+0x3c0/0x3c0
> [ 2458.940557]  kthread+0x14d/0x170
> [ 2458.941530]  ? __kthread_bind_mask+0x70/0x70
> [ 2458.942779]  ret_from_fork+0x1f/0x30
> 
> Which indicates that there are no iclogs in XLOG_STATE_ACTIVE (i.e.
> the "noiclogs" state, and that definitely correlates with an iclog
> stuck in an WANT_SYNC state....
> 
> Now I know I can reproduce it, I'll dig into it.

I have a trace on this one and I think I know what it is. It
definitely is related to overlapping push work; the problem is
incorrect use of commit_iclog->ic_prev here:

        /*
         * If the checkpoint spans multiple iclogs, wait for all previous
         * iclogs to complete before we submit the commit_iclog. In this case,
         * the commit_iclog write needs to issue a pre-flush so that the
         * ordering is correctly preserved down to stable storage.
         */
        spin_lock(&log->l_icloglock);
        if (ctx->start_lsn != commit_lsn) {
                xlog_wait_on_iclog(commit_iclog->ic_prev);
                spin_lock(&log->l_icloglock);
                commit_iclog->ic_flags |= XLOG_ICL_NEED_FLUSH;
        }

On the surface, this looks fine - wait on the previous iclog if the
commit lsn indicates that this checkpoint is spread over more than
one iclogbuf. But waht is happening is this:

T1					T2
xlog_write(ctx->start_lsn)
.....
xlog_write(&commit_iclog, &commit_lsn)
.....
<preempt>
					xlog_write() {
					  get iclog
					  loop {
					    fill iclog
					    release iclog
					      iclog gets written
					    get iclog
					      <runs out of ACTIVE iclogs>
					  }
<scheduled again>
spin_lock(&log->l_icloglock);
if (ctx->start_lsn != commit_lsn) {
  xlog_wait_on_iclog(commit_iclog->ic_prev);
    ic_prev points at iclog in DONE_SYNC state
    waits on iclog
    <hang>


The problem here is that commit_iclog->ic_prev is *not* the previous
iclog that was written to disk - it's already been written,
completed and recycled back into active state. Then T2 picks it up,
writes to it, releases it again and more IO is done on it. It now
ends up getting as far as DONE_SYNC in IO completion, and then the
iclog ordering loop goes no further because it's waiting on the
WANT_SYNC iclog that is at the head of the iclogbuf loop that hasn't
had it's IO issued yet.

So, yeah, an iclog stuck in WANT_SYNC state is the problem here
because we're waiting on a future iclog rather than a past iclog
in xlog_cil_push_work()....

I'll write up a patch to fix this and send it along with the tracing
patch I used to debug this.

Cheers,

Dave.



					

> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> 

-- 
Dave Chinner
david@fromorbit.com

  reply	other threads:[~2021-06-15  4:39 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-10 15:14 [BUG] generic/475 recovery failure(s) Brian Foster
2021-06-11 19:02 ` Brian Foster
2021-06-11 22:33   ` Dave Chinner
     [not found]     ` <YMdMehWQoBJC9l0W@bfoster>
2021-06-14 12:56       ` Brian Foster
2021-06-14 23:41         ` Dave Chinner
2021-06-15  4:39           ` Dave Chinner [this message]
2021-06-16  7:05     ` Dave Chinner
2021-06-16 20:33       ` Brian Foster
2021-06-16 21:05         ` Darrick J. Wong
2021-06-16 22:54           ` Dave Chinner
2021-06-17  1:28             ` Darrick J. Wong
2021-06-17 12:52           ` Brian Foster

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210615043916.GV664593@dread.disaster.area \
    --to=david@fromorbit.com \
    --cc=bfoster@redhat.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.