All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
From: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
To: linux-xfs@vger.kernel.org, Dave Chinner <david@fromorbit.com>,
	Christoph Hellwig <hch@infradead.org>
Cc: "Darrick J . Wong" <djwong@kernel.org>,
	Ojaswin Mujoo <ojaswin@linux.ibm.com>
Subject: Re: [PATCHv2 1/1] xfs: Add cond_resched in xfs_bunmapi_range loop
Date: Mon, 29 Apr 2024 14:14:46 +0530	[thread overview]
Message-ID: <87sez4y2v5.fsf@gmail.com> (raw)
In-Reply-To: <f7d3db235a2c7e16681a323a99bb0ce50a92296a.1714033516.git.ritesh.list@gmail.com>

"Ritesh Harjani (IBM)" <ritesh.list@gmail.com> writes:

> An async dio write to a sparse file can generate a lot of extents
> and when we unlink this file (using rm), the kernel can be busy in umapping
> and freeing those extents as part of transaction processing.
> Add cond_resched() in xfs_bunmapi_range() to avoid soft lockups
> messages like these. Here is a call trace of such a soft lockup.
>
> watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [kworker/1:0:82435]
> CPU: 1 PID: 82435 Comm: kworker/1:0 Tainted: G S  L   6.9.0-rc5-0-default #1
> Workqueue: xfs-inodegc/sda2 xfs_inodegc_worker
> NIP [c000000000beea10] xfs_extent_busy_trim+0x100/0x290
> LR [c000000000bee958] xfs_extent_busy_trim+0x48/0x290
> Call Trace:
>   xfs_alloc_get_rec+0x54/0x1b0 (unreliable)
>   xfs_alloc_compute_aligned+0x5c/0x144
>   xfs_alloc_ag_vextent_size+0x238/0x8d4
>   xfs_alloc_fix_freelist+0x540/0x694
>   xfs_free_extent_fix_freelist+0x84/0xe0
>   __xfs_free_extent+0x74/0x1ec
>   xfs_extent_free_finish_item+0xcc/0x214
>   xfs_defer_finish_one+0x194/0x388
>   xfs_defer_finish_noroll+0x1b4/0x5c8
>   xfs_defer_finish+0x2c/0xc4
>   xfs_bunmapi_range+0xa4/0x100
>   xfs_itruncate_extents_flags+0x1b8/0x2f4
>   xfs_inactive_truncate+0xe0/0x124
>   xfs_inactive+0x30c/0x3e0
>   xfs_inodegc_worker+0x140/0x234
>   process_scheduled_works+0x240/0x57c
>   worker_thread+0x198/0x468
>   kthread+0x138/0x140
>   start_kernel_thread+0x14/0x18
>

My v1 patch had cond_resched() in xfs_defer_finish_noroll, since I was
suspecting that it's a common point where we loop for many other
operations. And initially Dave also suggested for the same [1].
But I was not totally convinced given the only problematic path I
had till now was in unmapping extents. So this patch keeps the
cond_resched() in xfs_bunmapi_range() loop.

[1]: https://lore.kernel.org/all/ZZ8OaNnp6b%2FPJzsb@dread.disaster.area/

However, I was able to reproduce a problem with reflink remapping path
both on Power (with 64k bs) and on x86 (with preempt=none and with KASAN
enabled). I actually noticed while I was doing regression testing of
some of the iomap changes with KASAN enabled. The issue was seen with
generic/175 for both on Power and x86.

Do you think we should keep the cond_resched() inside
xfs_defer_finish_noroll() loop like we had in v1 [2]. If yes, then I can rebase
v1 on the latest upstream tree and also update the commit msg with both
call stacks.

[2]: https://lore.kernel.org/all/0bfaf740a2d10cc846616ae05963491316850c52.1713674899.git.ritesh.list@gmail.com/


<call stack on Power>
======================
 run fstests generic/175 at 2024-02-02 04:40:21
 <...>
[   C17] watchdog: BUG: soft lockup - CPU#17 stuck for 23s! [xfs_io:7679]
 watchdog: BUG: soft lockup - CPU#17 stuck for 23s! [xfs_io:7679]
 CPU: 17 PID: 7679 Comm: xfs_io Kdump: loaded Tainted: G               X     6.4.0-150600.5-default #1
 NIP [c008000005e3ec94] xfs_rmapbt_diff_two_keys+0x54/0xe0 [xfs]
 LR [c008000005e08798] xfs_btree_get_leaf_keys+0x110/0x1e0 [xfs]
 Call Trace:
  0xc000000014107c00 (unreliable)
  __xfs_btree_updkeys+0x8c/0x2c0 [xfs]
  xfs_btree_update_keys+0x150/0x170 [xfs]
  xfs_btree_lshift+0x534/0x660 [xfs]
  xfs_btree_make_block_unfull+0x19c/0x240 [xfs]
  xfs_btree_insrec+0x4e4/0x630 [xfs]
  xfs_btree_insert+0x104/0x2d0 [xfs]
  xfs_rmap_insert+0xc4/0x260 [xfs]
  xfs_rmap_map_shared+0x228/0x630 [xfs]
  xfs_rmap_finish_one+0x2d4/0x350 [xfs]
  xfs_rmap_update_finish_item+0x44/0xc0 [xfs]
  xfs_defer_finish_noroll+0x2e4/0x740 [xfs]
  __xfs_trans_commit+0x1f4/0x400 [xfs]
  xfs_reflink_remap_extent+0x2d8/0x650 [xfs]
  xfs_reflink_remap_blocks+0x154/0x320 [xfs]
  xfs_file_remap_range+0x138/0x3a0 [xfs]
  do_clone_file_range+0x11c/0x2f0
  vfs_clone_file_range+0x60/0x1c0
  ioctl_file_clone+0x78/0x140
  sys_ioctl+0x934/0x1270
  system_call_exception+0x158/0x320
  system_call_vectored_common+0x15c/0x2ec


<call stack on x86 with KASAN>
===============================
 watchdog: BUG: soft lockup - CPU#6 stuck for 26s! [xfs_io:3438095]
 CPU: 6 PID: 3438095 Comm: xfs_io Not tainted 6.9.0-rc5-xfstests-perf-00008-g4e2752e99f55 #1
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.o4
 RIP: 0010:_raw_spin_unlock_irqrestore+0x3c/0x60
 Code: 10 48 89 fb 48 83 c7 18 e8 31 7c 10 fd 48 89 df e8 79 f5 10 fd f7 c5 00 02 00 00 74 06 e8 0c 6
 Call Trace:
  <IRQ>
  ? watchdog_timer_fn+0x2dc/0x3a0
  ? __pfx_watchdog_timer_fn+0x10/0x10
  ? __hrtimer_run_queues+0x4a1/0x870
  ? __pfx___hrtimer_run_queues+0x10/0x10
  ? kvm_clock_get_cycles+0x18/0x30
  ? ktime_get_update_offsets_now+0xc6/0x2f0
  ? hrtimer_interrupt+0x2b8/0x7a0
  ? __sysvec_apic_timer_interrupt+0xca/0x390
  ? sysvec_apic_timer_interrupt+0x65/0x80
  </IRQ>
  <TASK>
  ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
  ? _raw_spin_unlock_irqrestore+0x34/0x60
  ? _raw_spin_unlock_irqrestore+0x3c/0x60
  ? _raw_spin_unlock_irqrestore+0x34/0x60
  get_partial_node.part.0+0x1af/0x340
  ___slab_alloc+0xc07/0x1250
  ? do_vfs_ioctl+0xe5c/0x1660
  ? __x64_sys_ioctl+0xd5/0x1b0
 ? __alloc_object+0x39/0x660
 ? __pfx___might_resched+0x10/0x10
 ? __alloc_object+0x39/0x660
 ? kmem_cache_alloc+0x3cd/0x410
 ? should_failslab+0xe/0x20
 kmem_cache_alloc+0x3cd/0x410
 __alloc_object+0x39/0x660
 __create_object+0x22/0x90
 kmem_cache_alloc+0x324/0x410
 xfs_bui_init+0x1b/0x150
 xfs_bmap_update_create_intent+0x48/0x110
 ? __pfx_xfs_bmap_update_create_intent+0x10/0x10
 xfs_defer_create_intent+0xcc/0x1b0
 xfs_defer_create_intents+0x8f/0x230
 xfs_defer_finish_noroll+0x1c0/0x1160
 ? xfs_inode_item_precommit+0x2c1/0x880
 ? __create_object+0x5e/0x90
 ? __pfx_xfs_defer_finish_noroll+0x10/0x10
 ? xfs_trans_run_precommits+0x126/0x200
 __xfs_trans_commit+0x767/0xbe0
 ? inode_maybe_inc_iversion+0xe2/0x150
 ? __pfx___xfs_trans_commit+0x10/0x10
 xfs_reflink_remap_extent+0x654/0xd40
 ? __pfx_xfs_reflink_remap_extent+0x10/0x10
 ? __pfx_down_read_nested+0x10/0x10
 xfs_reflink_remap_blocks+0x21a/0x850
 ? __pfx_xfs_reflink_remap_blocks+0x10/0x10
 ? _raw_spin_unlock+0x23/0x40
 ? xfs_reflink_remap_prep+0x47d/0x900
 xfs_file_remap_range+0x296/0xb40
 ? __pfx_xfs_file_remap_range+0x10/0x10
 ? __pfx_lock_acquire+0x10/0x10
 ? __pfx___might_resched+0x10/0x10
 vfs_clone_file_range+0x260/0xc20
 ioctl_file_clone+0x49/0xb0
 do_vfs_ioctl+0xe5c/0x1660
 ? __pfx_do_vfs_ioctl+0x10/0x10
 ? trace_irq_enable.constprop.0+0xd2/0x110
 ? kasan_quarantine_put+0x7e/0x1d0
 ? do_sys_openat2+0x120/0x170
 ? lock_acquire+0x43b/0x4f0
 ? __pfx_lock_release+0x10/0x10
 ? __pfx_do_sys_openat2+0x10/0x10
 ? __do_sys_newfstatat+0x94/0xe0
 ? __fget_files+0x1ce/0x330
 __x64_sys_ioctl+0xd5/0x1b0
 do_syscall_64+0x6a/0x140
 entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7ffff7d1a94f
 </TASK>


-ritesh



> Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
> cc: Ojaswin Mujoo <ojaswin@linux.ibm.com>
> ---
>  fs/xfs/libxfs/xfs_bmap.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> index 656c95a22f2e..44d5381bc66f 100644
> --- a/fs/xfs/libxfs/xfs_bmap.c
> +++ b/fs/xfs/libxfs/xfs_bmap.c
> @@ -6354,6 +6354,7 @@ xfs_bunmapi_range(
>  		error = xfs_defer_finish(tpp);
>  		if (error)
>  			goto out;
> +		cond_resched();
>  	}
>  out:
>  	return error;
> --
> 2.44.0

  parent reply	other threads:[~2024-04-29  8:44 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-25  8:35 [PATCHv2 0/1] xfs: soft lockups while unmapping large no. of extents Ritesh Harjani (IBM)
2024-04-25  8:35 ` [PATCHv2 1/1] xfs: Add cond_resched in xfs_bunmapi_range loop Ritesh Harjani (IBM)
2024-04-25 12:18   ` Christoph Hellwig
2024-04-29  8:44   ` Ritesh Harjani [this message]
2024-04-29 15:11     ` Darrick J. Wong
2024-04-30  5:04       ` Ritesh Harjani

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87sez4y2v5.fsf@gmail.com \
    --to=ritesh.list@gmail.com \
    --cc=david@fromorbit.com \
    --cc=djwong@kernel.org \
    --cc=hch@infradead.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=ojaswin@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.