From: Wengang Wang <wen.gang.wang@oracle.com>
To: Dave Chinner <david@fromorbit.com>
Cc: "linux-xfs@vger.kernel.org" <linux-xfs@vger.kernel.org>
Subject: Re: [PATCH] xfs: make sure sb_fdblocks is non-negative
Date: Mon, 13 May 2024 17:06:46 +0000 [thread overview]
Message-ID: <AEBF87C7-89D5-47B7-A01E-B5C165D56D8C@oracle.com> (raw)
In-Reply-To: <Zj7HLZ5Mp5SjhvrH@dread.disaster.area>
Hi Dave,
Please see inlines,
> On May 10, 2024, at 6:17 PM, Dave Chinner <david@fromorbit.com> wrote:
>
> On Fri, May 10, 2024 at 05:34:26PM -0700, Wengang Wang wrote:
>> when writting super block to disk (in xfs_log_sb), sb_fdblocks is fetched from
>> m_fdblocks without any lock. As m_fdblocks can experience a positive -> negativ
>> -> positive changing when the FS reaches fullness (see xfs_mod_fdblocks)
>> So there is a chance that sb_fdblocks is negative, and because sb_fdblocks is
>> type of unsigned long long, it reads super big. And sb_fdblocks being bigger
>> than sb_dblocks is a problem during log recovery, xfs_validate_sb_write()
>> complains.
>>
>> Fix:
>> As sb_fdblocks will be re-calculated during mount when lazysbcount is enabled,
>> We just need to make xfs_validate_sb_write() happy -- make sure sb_fdblocks is
>> not genative.
>
> Ok, I have no problems with the change being made, but I'm unclear
> on why we care if these values get logged as large positive numbers?
>
> The comment above this code explains that these counts are known to
> be inaccurate and so are not trusted. i.e. they will be corrected
> post-log recovery if they are recovered from the log:
>
> * Lazy sb counters don't update the in-core superblock so do that now.
> * If this is at unmount, the counters will be exactly correct, but at
> * any other time they will only be ballpark correct because of
> * reservations that have been taken out percpu counters. If we have an
> * unclean shutdown, this will be corrected by log recovery rebuilding
> * the counters from the AGF block counts.
>
Things is that we have a metadump, looking at the fdblocks from super block 0, it is good.
$ xfs_db -c "sb 0" -c "p" cust.img |egrep "dblocks|ifree|icount"
dblocks = 26214400
icount = 512
ifree = 337
fdblocks = 25997100
And when looking at the log, we have the following:
$ egrep -a "fdblocks|icount|ifree" cust.log |tail
sb_fdblocks 37
sb_icount 1056
sb_ifree 87
sb_fdblocks 37
sb_icount 1056
sb_ifree 87
sb_fdblocks 37
sb_icount 1056
sb_ifree 87
sb_fdblocks 18446744073709551604
# cust.log is output of my script which tries to parse the log buffer.
18446744073709551604ULL == 0xfffffffffffffff4 or -12LL
With upstream kernel (6.7.0-rc3), when I tried to mount (log recover) the metadump,
I got the following in dmesg:
[ 52.927796] XFS (loop0): SB summary counter sanity check failed
[ 52.928889] XFS (loop0): Metadata corruption detected at xfs_sb_write_verify+0x60/0x110 [xfs], xfs_sb block 0x0
[ 52.930890] XFS (loop0): Unmount and run xfs_repair
[ 52.931797] XFS (loop0): First 128 bytes of corrupted metadata buffer:
[ 52.932954] 00000000: 58 46 53 42 00 00 10 00 00 00 00 00 01 90 00 00 XFSB............
[ 52.934333] 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
[ 52.935733] 00000020: c9 c1 ed ae 84 ed 46 b9 a1 f0 09 57 4a a9 98 42 ......F....WJ..B
[ 52.937120] 00000030: 00 00 00 00 01 00 00 06 00 00 00 00 00 00 00 80 ................
[ 52.938515] 00000040: 00 00 00 00 00 00 00 81 00 00 00 00 00 00 00 82 ................
[ 52.939919] 00000050: 00 00 00 01 00 64 00 00 00 00 00 04 00 00 00 00 .....d..........
[ 52.941293] 00000060: 00 00 64 00 b4 a5 02 00 02 00 00 08 00 00 00 00 ..d.............
[ 52.942661] 00000070: 00 00 00 00 00 00 00 00 0c 09 09 03 17 00 00 19 ................
[ 52.944046] XFS (loop0): Corruption of in-memory data (0x8) detected at _xfs_buf_ioapply+0x38b/0x3a0 [xfs] (fs/xfs/xfs_buf.c:1559). Shutting down filesystem.
[ 52.946710] XFS (loop0): Please unmount the filesystem and rectify the problem(s)
[ 52.948099] XFS (loop0): log mount/recovery failed: error -117
[ 52.949810] XFS (loop0): log mount failed
Looking at corresponding code:
231 xfs_validate_sb_write(
232 struct xfs_mount *mp,
233 struct xfs_buf *bp,
234 struct xfs_sb *sbp)
235 {
236 /*
237 * Carry out additional sb summary counter sanity checks when we write
238 * the superblock. We skip this in the read validator because there
239 * could be newer superblocks in the log and if the values are garbage
240 * even after replay we'll recalculate them at the end of log mount.
241 *
242 * mkfs has traditionally written zeroed counters to inprogress and
243 * secondary superblocks, so allow this usage to continue because
244 * we never read counters from such superblocks.
245 */
246 if (xfs_buf_daddr(bp) == XFS_SB_DADDR && !sbp->sb_inprogress &&
247 (sbp->sb_fdblocks > sbp->sb_dblocks ||
248 !xfs_verify_icount(mp, sbp->sb_icount) ||
249 sbp->sb_ifree > sbp->sb_icount)) {
250 xfs_warn(mp, "SB summary counter sanity check failed");
251 return -EFSCORRUPTED;
252 }
From dmesg and code, we know the check failure was due to bad sb_ifree vs sb_icount or bad sb_fdblocks vs sb_dblocks.
Looking at the super block dump and log dump,
We know ifree and icount are good, what’s bad is sb_fdblocks. And that sb_fdblocks is from log.
# I verified that sb_fdblocks is 0xfffffffffffffff4 with a UEK debug kernel (though not 6.7.0-rc3)
So the sb_fdblocks is updated from log to incore at xfs_log_sb() -> xfs_validate_sb_write() path though
Should be may re-calculated from AGs.
The fix aims to make xfs_validate_sb_write() happy.
Thanks,
Wengang
> IOWs journal recovery doesn't actually care what these values are,
> so what actually goes wrong if this sum returns a negative value?
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com
next prev parent reply other threads:[~2024-05-13 17:22 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-05-11 0:34 [PATCH] xfs: make sure sb_fdblocks is non-negative Wengang Wang
2024-05-11 1:17 ` Dave Chinner
2024-05-13 17:06 ` Wengang Wang [this message]
2024-05-31 15:44 ` Wengang Wang
2024-06-02 23:37 ` Dave Chinner
2024-06-03 17:23 ` Wengang Wang
2024-06-03 18:43 ` Darrick J. Wong
2024-06-04 4:06 ` Dave Chinner
2024-05-31 15:52 ` Darrick J. Wong
2024-05-31 18:03 ` Wengang Wang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=AEBF87C7-89D5-47B7-A01E-B5C165D56D8C@oracle.com \
--to=wen.gang.wang@oracle.com \
--cc=david@fromorbit.com \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).