Linux-XFS Archive mirror
 help / color / mirror / Atom feed
From: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
To: "Darrick J. Wong" <djwong@kernel.org>
Cc: linux-xfs@vger.kernel.org, Dave Chinner <david@fromorbit.com>,
	Christoph Hellwig <hch@infradead.org>,
	Ojaswin Mujoo <ojaswin@linux.ibm.com>
Subject: Re: [PATCHv2 1/1] xfs: Add cond_resched in xfs_bunmapi_range loop
Date: Tue, 30 Apr 2024 10:34:24 +0530	[thread overview]
Message-ID: <87le4vxwyv.fsf@gmail.com> (raw)
In-Reply-To: <20240429151146.GU360919@frogsfrogsfrogs>

"Darrick J. Wong" <djwong@kernel.org> writes:

> On Mon, Apr 29, 2024 at 02:14:46PM +0530, Ritesh Harjani wrote:
>> "Ritesh Harjani (IBM)" <ritesh.list@gmail.com> writes:
>> 
>> > An async dio write to a sparse file can generate a lot of extents
>> > and when we unlink this file (using rm), the kernel can be busy in umapping
>> > and freeing those extents as part of transaction processing.
>> > Add cond_resched() in xfs_bunmapi_range() to avoid soft lockups
>> > messages like these. Here is a call trace of such a soft lockup.
>> >
>> > watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [kworker/1:0:82435]
>> > CPU: 1 PID: 82435 Comm: kworker/1:0 Tainted: G S  L   6.9.0-rc5-0-default #1
>> > Workqueue: xfs-inodegc/sda2 xfs_inodegc_worker
>> > NIP [c000000000beea10] xfs_extent_busy_trim+0x100/0x290
>> > LR [c000000000bee958] xfs_extent_busy_trim+0x48/0x290
>> > Call Trace:
>> >   xfs_alloc_get_rec+0x54/0x1b0 (unreliable)
>> >   xfs_alloc_compute_aligned+0x5c/0x144
>> >   xfs_alloc_ag_vextent_size+0x238/0x8d4
>> >   xfs_alloc_fix_freelist+0x540/0x694
>> >   xfs_free_extent_fix_freelist+0x84/0xe0
>> >   __xfs_free_extent+0x74/0x1ec
>> >   xfs_extent_free_finish_item+0xcc/0x214
>> >   xfs_defer_finish_one+0x194/0x388
>> >   xfs_defer_finish_noroll+0x1b4/0x5c8
>> >   xfs_defer_finish+0x2c/0xc4
>> >   xfs_bunmapi_range+0xa4/0x100
>> >   xfs_itruncate_extents_flags+0x1b8/0x2f4
>> >   xfs_inactive_truncate+0xe0/0x124
>> >   xfs_inactive+0x30c/0x3e0
>> >   xfs_inodegc_worker+0x140/0x234
>> >   process_scheduled_works+0x240/0x57c
>> >   worker_thread+0x198/0x468
>> >   kthread+0x138/0x140
>> >   start_kernel_thread+0x14/0x18
>> >
>> 
>> My v1 patch had cond_resched() in xfs_defer_finish_noroll, since I was
>> suspecting that it's a common point where we loop for many other
>> operations. And initially Dave also suggested for the same [1].
>> But I was not totally convinced given the only problematic path I
>> had till now was in unmapping extents. So this patch keeps the
>> cond_resched() in xfs_bunmapi_range() loop.
>> 
>> [1]: https://lore.kernel.org/all/ZZ8OaNnp6b%2FPJzsb@dread.disaster.area/
>> 
>> However, I was able to reproduce a problem with reflink remapping path
>> both on Power (with 64k bs) and on x86 (with preempt=none and with KASAN
>> enabled). I actually noticed while I was doing regression testing of
>> some of the iomap changes with KASAN enabled. The issue was seen with
>> generic/175 for both on Power and x86.
>> 
>> Do you think we should keep the cond_resched() inside
>> xfs_defer_finish_noroll() loop like we had in v1 [2]. If yes, then I can rebase
>> v1 on the latest upstream tree and also update the commit msg with both
>> call stacks.
>
> I think there ought to be one in xfs_defer_finish_noroll (or even
> xfs_trans_roll) when we're between dirty transactions, since long
> running transaction chains can indeed stall.  That'll hopefully solve
> the problem for bunmapi and long running online repair operations.
>
> But that said, the main FICLONE loop continually creates new transaction
> chains, so you probably need one there too.

We need not right, as long as we have one in xfs_defer_finish_noroll(),
because we commit everything before returning from
xfs_reflink_remap_extent(). And we do have XFS_TRANS_PERM_LOG_RES set,
so xfs_defer_finish_noroll() should always be called. 
(Apologies, if I am not making sense. I new to XFS code)

I gave generic/175 an overnight run with cond_resched() call inside
xfs_defer_finish_noroll() loop. I couldn't reproduce the soft lockup bug.

So does it sound ok to have cond_resched() within
xfs_defer_finish_noroll(). I can submit the new version then.

This should solve the soft lockup problems in both unmapping of extents
causing large transaction chains and also reflink remapping path.

> Hence the grumblings about cond_resched whackamole.
> Thoughts?

Yes, I agree that life will be easy if we don't have to play with the
hacks of putting such calls within tight loops. But until we have such
infrastructure ready, I was hoping we could find a correct fix for such
problems for current upstream and old stable kernels.


Thanks for the review and suggestions!

-ritesh

      reply	other threads:[~2024-04-30  5:04 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-25  8:35 [PATCHv2 0/1] xfs: soft lockups while unmapping large no. of extents Ritesh Harjani (IBM)
2024-04-25  8:35 ` [PATCHv2 1/1] xfs: Add cond_resched in xfs_bunmapi_range loop Ritesh Harjani (IBM)
2024-04-25 12:18   ` Christoph Hellwig
2024-04-29  8:44   ` Ritesh Harjani
2024-04-29 15:11     ` Darrick J. Wong
2024-04-30  5:04       ` Ritesh Harjani [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87le4vxwyv.fsf@gmail.com \
    --to=ritesh.list@gmail.com \
    --cc=david@fromorbit.com \
    --cc=djwong@kernel.org \
    --cc=hch@infradead.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=ojaswin@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).