From: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
To: linux-xfs@vger.kernel.org, Dave Chinner <david@fromorbit.com>,
Christoph Hellwig <hch@infradead.org>
Cc: "Darrick J . Wong" <djwong@kernel.org>,
Ojaswin Mujoo <ojaswin@linux.ibm.com>
Subject: Re: [PATCHv2 1/1] xfs: Add cond_resched in xfs_bunmapi_range loop
Date: Mon, 29 Apr 2024 14:14:46 +0530 [thread overview]
Message-ID: <87sez4y2v5.fsf@gmail.com> (raw)
In-Reply-To: <f7d3db235a2c7e16681a323a99bb0ce50a92296a.1714033516.git.ritesh.list@gmail.com>
"Ritesh Harjani (IBM)" <ritesh.list@gmail.com> writes:
> An async dio write to a sparse file can generate a lot of extents
> and when we unlink this file (using rm), the kernel can be busy in umapping
> and freeing those extents as part of transaction processing.
> Add cond_resched() in xfs_bunmapi_range() to avoid soft lockups
> messages like these. Here is a call trace of such a soft lockup.
>
> watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [kworker/1:0:82435]
> CPU: 1 PID: 82435 Comm: kworker/1:0 Tainted: G S L 6.9.0-rc5-0-default #1
> Workqueue: xfs-inodegc/sda2 xfs_inodegc_worker
> NIP [c000000000beea10] xfs_extent_busy_trim+0x100/0x290
> LR [c000000000bee958] xfs_extent_busy_trim+0x48/0x290
> Call Trace:
> xfs_alloc_get_rec+0x54/0x1b0 (unreliable)
> xfs_alloc_compute_aligned+0x5c/0x144
> xfs_alloc_ag_vextent_size+0x238/0x8d4
> xfs_alloc_fix_freelist+0x540/0x694
> xfs_free_extent_fix_freelist+0x84/0xe0
> __xfs_free_extent+0x74/0x1ec
> xfs_extent_free_finish_item+0xcc/0x214
> xfs_defer_finish_one+0x194/0x388
> xfs_defer_finish_noroll+0x1b4/0x5c8
> xfs_defer_finish+0x2c/0xc4
> xfs_bunmapi_range+0xa4/0x100
> xfs_itruncate_extents_flags+0x1b8/0x2f4
> xfs_inactive_truncate+0xe0/0x124
> xfs_inactive+0x30c/0x3e0
> xfs_inodegc_worker+0x140/0x234
> process_scheduled_works+0x240/0x57c
> worker_thread+0x198/0x468
> kthread+0x138/0x140
> start_kernel_thread+0x14/0x18
>
My v1 patch had cond_resched() in xfs_defer_finish_noroll, since I was
suspecting that it's a common point where we loop for many other
operations. And initially Dave also suggested for the same [1].
But I was not totally convinced given the only problematic path I
had till now was in unmapping extents. So this patch keeps the
cond_resched() in xfs_bunmapi_range() loop.
[1]: https://lore.kernel.org/all/ZZ8OaNnp6b%2FPJzsb@dread.disaster.area/
However, I was able to reproduce a problem with reflink remapping path
both on Power (with 64k bs) and on x86 (with preempt=none and with KASAN
enabled). I actually noticed while I was doing regression testing of
some of the iomap changes with KASAN enabled. The issue was seen with
generic/175 for both on Power and x86.
Do you think we should keep the cond_resched() inside
xfs_defer_finish_noroll() loop like we had in v1 [2]. If yes, then I can rebase
v1 on the latest upstream tree and also update the commit msg with both
call stacks.
[2]: https://lore.kernel.org/all/0bfaf740a2d10cc846616ae05963491316850c52.1713674899.git.ritesh.list@gmail.com/
<call stack on Power>
======================
run fstests generic/175 at 2024-02-02 04:40:21
<...>
[ C17] watchdog: BUG: soft lockup - CPU#17 stuck for 23s! [xfs_io:7679]
watchdog: BUG: soft lockup - CPU#17 stuck for 23s! [xfs_io:7679]
CPU: 17 PID: 7679 Comm: xfs_io Kdump: loaded Tainted: G X 6.4.0-150600.5-default #1
NIP [c008000005e3ec94] xfs_rmapbt_diff_two_keys+0x54/0xe0 [xfs]
LR [c008000005e08798] xfs_btree_get_leaf_keys+0x110/0x1e0 [xfs]
Call Trace:
0xc000000014107c00 (unreliable)
__xfs_btree_updkeys+0x8c/0x2c0 [xfs]
xfs_btree_update_keys+0x150/0x170 [xfs]
xfs_btree_lshift+0x534/0x660 [xfs]
xfs_btree_make_block_unfull+0x19c/0x240 [xfs]
xfs_btree_insrec+0x4e4/0x630 [xfs]
xfs_btree_insert+0x104/0x2d0 [xfs]
xfs_rmap_insert+0xc4/0x260 [xfs]
xfs_rmap_map_shared+0x228/0x630 [xfs]
xfs_rmap_finish_one+0x2d4/0x350 [xfs]
xfs_rmap_update_finish_item+0x44/0xc0 [xfs]
xfs_defer_finish_noroll+0x2e4/0x740 [xfs]
__xfs_trans_commit+0x1f4/0x400 [xfs]
xfs_reflink_remap_extent+0x2d8/0x650 [xfs]
xfs_reflink_remap_blocks+0x154/0x320 [xfs]
xfs_file_remap_range+0x138/0x3a0 [xfs]
do_clone_file_range+0x11c/0x2f0
vfs_clone_file_range+0x60/0x1c0
ioctl_file_clone+0x78/0x140
sys_ioctl+0x934/0x1270
system_call_exception+0x158/0x320
system_call_vectored_common+0x15c/0x2ec
<call stack on x86 with KASAN>
===============================
watchdog: BUG: soft lockup - CPU#6 stuck for 26s! [xfs_io:3438095]
CPU: 6 PID: 3438095 Comm: xfs_io Not tainted 6.9.0-rc5-xfstests-perf-00008-g4e2752e99f55 #1
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.o4
RIP: 0010:_raw_spin_unlock_irqrestore+0x3c/0x60
Code: 10 48 89 fb 48 83 c7 18 e8 31 7c 10 fd 48 89 df e8 79 f5 10 fd f7 c5 00 02 00 00 74 06 e8 0c 6
Call Trace:
<IRQ>
? watchdog_timer_fn+0x2dc/0x3a0
? __pfx_watchdog_timer_fn+0x10/0x10
? __hrtimer_run_queues+0x4a1/0x870
? __pfx___hrtimer_run_queues+0x10/0x10
? kvm_clock_get_cycles+0x18/0x30
? ktime_get_update_offsets_now+0xc6/0x2f0
? hrtimer_interrupt+0x2b8/0x7a0
? __sysvec_apic_timer_interrupt+0xca/0x390
? sysvec_apic_timer_interrupt+0x65/0x80
</IRQ>
<TASK>
? asm_sysvec_apic_timer_interrupt+0x1a/0x20
? _raw_spin_unlock_irqrestore+0x34/0x60
? _raw_spin_unlock_irqrestore+0x3c/0x60
? _raw_spin_unlock_irqrestore+0x34/0x60
get_partial_node.part.0+0x1af/0x340
___slab_alloc+0xc07/0x1250
? do_vfs_ioctl+0xe5c/0x1660
? __x64_sys_ioctl+0xd5/0x1b0
? __alloc_object+0x39/0x660
? __pfx___might_resched+0x10/0x10
? __alloc_object+0x39/0x660
? kmem_cache_alloc+0x3cd/0x410
? should_failslab+0xe/0x20
kmem_cache_alloc+0x3cd/0x410
__alloc_object+0x39/0x660
__create_object+0x22/0x90
kmem_cache_alloc+0x324/0x410
xfs_bui_init+0x1b/0x150
xfs_bmap_update_create_intent+0x48/0x110
? __pfx_xfs_bmap_update_create_intent+0x10/0x10
xfs_defer_create_intent+0xcc/0x1b0
xfs_defer_create_intents+0x8f/0x230
xfs_defer_finish_noroll+0x1c0/0x1160
? xfs_inode_item_precommit+0x2c1/0x880
? __create_object+0x5e/0x90
? __pfx_xfs_defer_finish_noroll+0x10/0x10
? xfs_trans_run_precommits+0x126/0x200
__xfs_trans_commit+0x767/0xbe0
? inode_maybe_inc_iversion+0xe2/0x150
? __pfx___xfs_trans_commit+0x10/0x10
xfs_reflink_remap_extent+0x654/0xd40
? __pfx_xfs_reflink_remap_extent+0x10/0x10
? __pfx_down_read_nested+0x10/0x10
xfs_reflink_remap_blocks+0x21a/0x850
? __pfx_xfs_reflink_remap_blocks+0x10/0x10
? _raw_spin_unlock+0x23/0x40
? xfs_reflink_remap_prep+0x47d/0x900
xfs_file_remap_range+0x296/0xb40
? __pfx_xfs_file_remap_range+0x10/0x10
? __pfx_lock_acquire+0x10/0x10
? __pfx___might_resched+0x10/0x10
vfs_clone_file_range+0x260/0xc20
ioctl_file_clone+0x49/0xb0
do_vfs_ioctl+0xe5c/0x1660
? __pfx_do_vfs_ioctl+0x10/0x10
? trace_irq_enable.constprop.0+0xd2/0x110
? kasan_quarantine_put+0x7e/0x1d0
? do_sys_openat2+0x120/0x170
? lock_acquire+0x43b/0x4f0
? __pfx_lock_release+0x10/0x10
? __pfx_do_sys_openat2+0x10/0x10
? __do_sys_newfstatat+0x94/0xe0
? __fget_files+0x1ce/0x330
__x64_sys_ioctl+0xd5/0x1b0
do_syscall_64+0x6a/0x140
entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7ffff7d1a94f
</TASK>
-ritesh
> Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
> cc: Ojaswin Mujoo <ojaswin@linux.ibm.com>
> ---
> fs/xfs/libxfs/xfs_bmap.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> index 656c95a22f2e..44d5381bc66f 100644
> --- a/fs/xfs/libxfs/xfs_bmap.c
> +++ b/fs/xfs/libxfs/xfs_bmap.c
> @@ -6354,6 +6354,7 @@ xfs_bunmapi_range(
> error = xfs_defer_finish(tpp);
> if (error)
> goto out;
> + cond_resched();
> }
> out:
> return error;
> --
> 2.44.0
next prev parent reply other threads:[~2024-04-29 8:44 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-04-25 8:35 [PATCHv2 0/1] xfs: soft lockups while unmapping large no. of extents Ritesh Harjani (IBM)
2024-04-25 8:35 ` [PATCHv2 1/1] xfs: Add cond_resched in xfs_bunmapi_range loop Ritesh Harjani (IBM)
2024-04-25 12:18 ` Christoph Hellwig
2024-04-29 8:44 ` Ritesh Harjani [this message]
2024-04-29 15:11 ` Darrick J. Wong
2024-04-30 5:04 ` Ritesh Harjani
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87sez4y2v5.fsf@gmail.com \
--to=ritesh.list@gmail.com \
--cc=david@fromorbit.com \
--cc=djwong@kernel.org \
--cc=hch@infradead.org \
--cc=linux-xfs@vger.kernel.org \
--cc=ojaswin@linux.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.