Linux-XFS Archive mirror
 help / color / mirror / Atom feed
From: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
To: Dave Chinner <david@fromorbit.com>
Cc: linux-xfs@vger.kernel.org, "Darrick J . Wong" <djwong@kernel.org>,
	Ojaswin Mujoo <ojaswin@linux.ibm.com>,
	stable@vger.kernel.org
Subject: Re: [PATCH 1/1] xfs: Add cond_resched to xfs_defer_finish_noroll
Date: Mon, 22 Apr 2024 20:40:03 +0530	[thread overview]
Message-ID: <87ttjto2lw.fsf@gmail.com> (raw)
In-Reply-To: <ZiWjbWrD60W/0s/F@dread.disaster.area>

Dave Chinner <david@fromorbit.com> writes:

> On Sun, Apr 21, 2024 at 01:19:44PM +0530, Ritesh Harjani (IBM) wrote:
>> An async dio write to a sparse file can generate a lot of extents
>> and when we unlink this file (using rm), the kernel can be busy in umapping
>> and freeing those extents as part of transaction processing.
>> Add cond_resched() in xfs_defer_finish_noroll() to avoid soft lockups
>> messages. Here is a call trace of such soft lockup.
>> 
>> watchdog: BUG: soft lockup - CPU#1 stuck for 23s! [rm:81335]
>> CPU: 1 PID: 81335 Comm: rm Kdump: loaded Tainted: G             L X    5.14.21-150500.53-default
>
> Can you reproduce this on a current TOT kernel? 5.14 is pretty old,
> and this stack trace:
>

Yes, I was able to reproduce this on upstream kernel too -

    watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [kworker/1:0:82435]
    CPU: 1 PID: 82435 Comm: kworker/1:0 Tainted: G S  L   6.9.0-rc5-0-default #1
    Workqueue: xfs-inodegc/sda2 xfs_inodegc_worker
    NIP [c000000000beea10] xfs_extent_busy_trim+0x100/0x290
    LR [c000000000bee958] xfs_extent_busy_trim+0x48/0x290
    Call Trace:
      xfs_alloc_get_rec+0x54/0x1b0 (unreliable)
      xfs_alloc_compute_aligned+0x5c/0x144
      xfs_alloc_ag_vextent_size+0x238/0x8d4
      xfs_alloc_fix_freelist+0x540/0x694
      xfs_free_extent_fix_freelist+0x84/0xe0
      __xfs_free_extent+0x74/0x1ec
      xfs_extent_free_finish_item+0xcc/0x214
      xfs_defer_finish_one+0x194/0x388
      xfs_defer_finish_noroll+0x1b4/0x5c8
      xfs_defer_finish+0x2c/0xc4
      xfs_bunmapi_range+0xa4/0x100
      xfs_itruncate_extents_flags+0x1b8/0x2f4
      xfs_inactive_truncate+0xe0/0x124
      xfs_inactive+0x30c/0x3e0
      xfs_inodegc_worker+0x140/0x234
      process_scheduled_works+0x240/0x57c
      worker_thread+0x198/0x468
      kthread+0x138/0x140
      start_kernel_thread+0x14/0x18


>> NIP [c00800001b174768] xfs_extent_busy_trim+0xc0/0x2a0 [xfs]
>> LR [c00800001b1746f4] xfs_extent_busy_trim+0x4c/0x2a0 [xfs]
>> Call Trace:
>>  0xc0000000a8268340 (unreliable)
>>  xfs_alloc_compute_aligned+0x5c/0x150 [xfs]
>>  xfs_alloc_ag_vextent_size+0x1dc/0x8c0 [xfs]
>>  xfs_alloc_ag_vextent+0x17c/0x1c0 [xfs]
>>  xfs_alloc_fix_freelist+0x274/0x4b0 [xfs]
>>  xfs_free_extent_fix_freelist+0x84/0xe0 [xfs]
>>  __xfs_free_extent+0xa0/0x240 [xfs]
>>  xfs_trans_free_extent+0x6c/0x140 [xfs]
>>  xfs_defer_finish_noroll+0x2b0/0x650 [xfs]
>>  xfs_inactive_truncate+0xe8/0x140 [xfs]
>>  xfs_fs_destroy_inode+0xdc/0x320 [xfs]
>>  destroy_inode+0x6c/0xc0
>
> .... doesn't exist anymore.
>
> xfs_inactive_truncate() is now done from a
> background inodegc thread, not directly in destroy_inode().
>
> I also suspect that any sort of cond_resched() should be in the top
> layer loop in xfs_bunmapi_range(), not hidden deep in the defer
> code. The problem is the number of extents being processed without
> yielding, not the time spent processing each individual deferred
> work chain to free the extent. Hence the explicit rescheduling
> should be at the top level loop where it can be easily explained and
> understand, not hidden deep inside the defer chain mechanism....

Yes, sure. I will submit a v2 with this diff then (after I verify the
fix once on the setup, just for my sanity)

diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 656c95a22f2e..44d5381bc66f 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -6354,6 +6354,7 @@ xfs_bunmapi_range(
                error = xfs_defer_finish(tpp);
                if (error)
                        goto out;
+               cond_resched();
        }
 out:
        return error;


-ritesh

      reply	other threads:[~2024-04-22 15:10 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-21  7:49 [PATCH 0/1] xfs: soft lockups for unmapping large no. of extents Ritesh Harjani (IBM)
2024-04-21  7:49 ` [PATCH 1/1] xfs: Add cond_resched to xfs_defer_finish_noroll Ritesh Harjani (IBM)
2024-04-21 23:38   ` Dave Chinner
2024-04-22 15:10     ` Ritesh Harjani [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87ttjto2lw.fsf@gmail.com \
    --to=ritesh.list@gmail.com \
    --cc=david@fromorbit.com \
    --cc=djwong@kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=ojaswin@linux.ibm.com \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).