[PATCH 0/5] xfs: fix discontiguous metadata block recovery

All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/5] xfs: fix discontiguous metadata block recovery
@ 2024-03-19  2:15 Dave Chinner
  2024-03-19  2:15 ` [PATCH 1/5] xfs: buffer log item type mismatches are corruption Dave Chinner
                   ` (4 more replies)
  0 siblings, 5 replies; 17+ messages in thread
From: Dave Chinner @ 2024-03-19  2:15 UTC (permalink / raw
  To: linux-xfs

Recently Zorro tripped over a failure with 64kB directory blocks on
s390x via generic/648. Recovery was reporting failures like this:

 XFS (loop3): Mounting V5 Filesystem c1954438-a18d-4b4a-ad32-0e29c40713ed
 XFS (loop3): Starting recovery (logdev: internal)
 XFS (loop3): Bad dir block magic!
 XFS: Assertion failed: 0, file: fs/xfs/xfs_buf_item_recover.c, line: 414
 ....

Or it was succeeding and later operations were detecting directory
block corruption during idrectory operations such as:

 XFS (loop3): Metadata corruption detected at __xfs_dir3_data_check+0x372/0x6c0 [xfs], xfs_dir3_block block 0x1020
 XFS (loop3): Unmount and run xfs_repair
 XFS (loop3): First 128 bytes of corrupted metadata buffer:
 00000000: 58 44 42 33 00 00 00 00 00 00 00 00 00 00 10 20  XDB3...........
 ....

Futher triage and diagnosis pointed to the fact that the test was
generating a discontiguous (multi-extent) directory block and that
directory block was not being recovered correctly when it was
encountered.

Zorro captured a trace, and what we saw in the trace was a specific
pattern of buffer log items being processed through every phase of
recovery:

 xfs_log_recover_buf_not_cancel: dev 7:0 daddr 0x2c2ce0, bbcount 0x10, flags 0x5000, size 2, map_size 2
 xfs_log_recover_item_recover: dev 7:0 tid 0xce3ce480 lsn 0x300014178, pass 1, item 0x8ea70fc0, item type XFS_LI_BUF item region count/total 2/2
 xfs_log_recover_buf_not_cancel: dev 7:0 daddr 0x331fb0, bbcount 0x58, flags 0x5000, size 2, map_size 11
 xfs_log_recover_item_recover: dev 7:0 tid 0xce3ce480 lsn 0x300014178, pass 1, item 0x8f36c040, item type XFS_LI_BUF item region count/total 2/2

The item addresses, tid and LSN change, but the order of the two
buf log items does not.

These are both "flags 0x5000" which means both log items are
XFS_BLFT_DIR_BLOCK_BUF types, and they are both partial directory
block buffers, and they are discontiguous. They also have different
types of log items both before and after them, so it is likely these
are two extents within the same compound buffer.

The way we log compound buffers is that we create a buf log format
item for each extent in the buffer, and then we log each range as a
separate buf log format item. IOWs, to recovery these fragments of
the directory block appear just like complete regular buffers that
need to be recovered.

Hence what we see above is the first buffer (daddr 0x2c2ce0, bbcount
0x10) is the first extent in the directory block that contains the
header and magic number, so it recovers and verifies just fine. The
second buffer is the tail of the directory block, and it does not
contain a magic number because it starts mid-way through the
directory block. Hence the magic number verification fails and the
buffer is not recovered.

Compound buffers were logged this way so that they didn't require a
change of log format to recover. Prior to compound buffers, the
directory code kept it's own dabuf structure to map multiple extents
in a single directory block, and they got logged as separate buffer
log format items as well.

So the problem isn't directly related to the conversion of dabufs to
compound buffers - the problem is related to the buffer recovery
verification code not knowing that directory buffer fragments are
valid recovery targets.

Hence the fixes in this patchset are to log recovery, and do not
change runtime behaviour at all. The first thing we do is change the
buffer recovery code to consider a type mismatch between the BLF and
the buffer contents as a fatal error instead of a warning. If we
just warn and continue, the recovered metadata may still be corrupt
and so we should just abort with -EFSCORRUPTED when this occurs.
That addresses the silent recovery success followed by runtime
detection of directory corruption issue that was encountered.

We then need to untangle the buffer recovery code a bit. Inode
buffer, dquot buffer and regular buffer recovery are all a bit
different, but they are tightly intertwined. neither dquot nor inode
buffer recovery need discontiguous buffer recovery detection, and
they also have different constraints so separate them out. We also
always recover inode and dquot buffers, so we don't need check
magic numbers or decode internal lsns to determine if they should be
recovered.

With that done, we can then add code to the general buffer recovery
to detect partial block recovery situations. We check the BLF type
to determine if it is a directory buffer, and add a path for
recovery of partial directory block items. This allows recovery of
regions of directory blocks that do not start at offset 0 in the
directory block. This fixes the initial "bad dir block magic" issue
reported, and results in correct recovery of discontiguous directory
blocks.

IOWs, this appears to be a log recovery problem and not a runtime
issue. I think the fix will be to allow directory blocks to fail the
magic number check if and only if the buffer length does not match
the directory block size (i.e. it is a partial directory block
fragment being recovered).

This passes repeated looping over '-g enospc -g recoveryloop' on
64kb directory block size configurations, so the change to recovery
hasn't caused any obvious regressions in fixing this issue.

Thoughts?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/5] xfs: buffer log item type mismatches are corruption
  2024-03-19  2:15 [PATCH 0/5] xfs: fix discontiguous metadata block recovery Dave Chinner
@ 2024-03-19  2:15 ` Dave Chinner
  2024-03-19  7:23   ` Christoph Hellwig
  2024-03-19 18:16   ` Darrick J. Wong
  2024-03-19  2:15 ` [PATCH 2/5] xfs: separate out inode buffer recovery a bit more Dave Chinner
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 17+ messages in thread
From: Dave Chinner @ 2024-03-19  2:15 UTC (permalink / raw
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

We detect when a buffer log format type and the magic number in the
buffer do not match. We issue a warning, but do not return an error
nor do we write back the recovered buffer. If no further recover
action is performed on that buffer, then recovery has left the
buffer in an inconsistent (out of date) state on disk. i.e. the
structure is corrupt on disk.

If this mismatch occurs, return a -EFSCORRUPTED error and cause
recovery to abort instead of letting recovery corrupt the filesystem
and continue onwards.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_buf_item_recover.c | 51 +++++++++++++++++------------------
 1 file changed, 24 insertions(+), 27 deletions(-)

diff --git a/fs/xfs/xfs_buf_item_recover.c b/fs/xfs/xfs_buf_item_recover.c
index d74bf7bb7794..dba57ee6fa6d 100644
--- a/fs/xfs/xfs_buf_item_recover.c
+++ b/fs/xfs/xfs_buf_item_recover.c
@@ -207,7 +207,7 @@ xlog_recover_buf_commit_pass1(
  *	the first 32 bits of the buffer (most blocks),
  *	inside a struct xfs_da_blkinfo at the start of the buffer.
  */
-static void
+static int
 xlog_recover_validate_buf_type(
 	struct xfs_mount		*mp,
 	struct xfs_buf			*bp,
@@ -407,11 +407,12 @@ xlog_recover_validate_buf_type(
 	 * skipped.
 	 */
 	if (current_lsn == NULLCOMMITLSN)
-		return;
+		return 0;;
 
 	if (warnmsg) {
 		xfs_warn(mp, warnmsg);
-		ASSERT(0);
+		xfs_buf_corruption_error(bp, __this_address);
+		return -EFSCORRUPTED;
 	}
 
 	/*
@@ -425,14 +426,11 @@ xlog_recover_validate_buf_type(
 	 * the buffer. Therefore, initialize a bli purely to carry the LSN to
 	 * the verifier.
 	 */
-	if (bp->b_ops) {
-		struct xfs_buf_log_item	*bip;
-
-		bp->b_flags |= _XBF_LOGRECOVERY;
-		xfs_buf_item_init(bp, mp);
-		bip = bp->b_log_item;
-		bip->bli_item.li_lsn = current_lsn;
-	}
+	ASSERT(bp->b_ops);
+	bp->b_flags |= _XBF_LOGRECOVERY;
+	xfs_buf_item_init(bp, mp);
+	bp->b_log_item->bli_item.li_lsn = current_lsn;
+	return 0;
 }
 
 /*
@@ -441,7 +439,7 @@ xlog_recover_validate_buf_type(
  * given buffer.  The bitmap in the buf log format structure indicates
  * where to place the logged data.
  */
-STATIC void
+static int
 xlog_recover_do_reg_buffer(
 	struct xfs_mount		*mp,
 	struct xlog_recover_item	*item,
@@ -523,20 +521,20 @@ xlog_recover_do_reg_buffer(
 	/* Shouldn't be any more regions */
 	ASSERT(i == item->ri_total);
 
-	xlog_recover_validate_buf_type(mp, bp, buf_f, current_lsn);
+	return xlog_recover_validate_buf_type(mp, bp, buf_f, current_lsn);
 }
 
 /*
- * Perform a dquot buffer recovery.
+ * Test if this dquot buffer item should be recovered.
  * Simple algorithm: if we have found a QUOTAOFF log item of the same type
  * (ie. USR or GRP), then just toss this buffer away; don't recover it.
  * Else, treat it as a regular buffer and do recovery.
  *
- * Return false if the buffer was tossed and true if we recovered the buffer to
- * indicate to the caller if the buffer needs writing.
+ * Return false if the buffer should be tossed and true if the buffer needs
+ * to be recovered.
  */
-STATIC bool
-xlog_recover_do_dquot_buffer(
+static bool
+xlog_recover_this_dquot_buffer(
 	struct xfs_mount		*mp,
 	struct xlog			*log,
 	struct xlog_recover_item	*item,
@@ -565,8 +563,6 @@ xlog_recover_do_dquot_buffer(
 	 */
 	if (log->l_quotaoffs_flag & type)
 		return false;
-
-	xlog_recover_do_reg_buffer(mp, item, bp, buf_f, NULLCOMMITLSN);
 	return true;
 }
 
@@ -952,18 +948,19 @@ xlog_recover_buf_commit_pass2(
 
 	if (buf_f->blf_flags & XFS_BLF_INODE_BUF) {
 		error = xlog_recover_do_inode_buffer(mp, item, bp, buf_f);
-		if (error)
-			goto out_release;
 	} else if (buf_f->blf_flags &
 		  (XFS_BLF_UDQUOT_BUF|XFS_BLF_PDQUOT_BUF|XFS_BLF_GDQUOT_BUF)) {
-		bool	dirty;
-
-		dirty = xlog_recover_do_dquot_buffer(mp, log, item, bp, buf_f);
-		if (!dirty)
+		if (!xlog_recover_this_dquot_buffer(mp, log, item, bp, buf_f))
 			goto out_release;
+
+		error = xlog_recover_do_reg_buffer(mp, item, bp, buf_f,
+				NULLCOMMITLSN);
 	} else {
-		xlog_recover_do_reg_buffer(mp, item, bp, buf_f, current_lsn);
+		error = xlog_recover_do_reg_buffer(mp, item, bp, buf_f,
+				current_lsn);
 	}
+	if (error)
+		goto out_release;
 
 	/*
 	 * Perform delayed write on the buffer.  Asynchronous writes will be
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 2/5] xfs: separate out inode buffer recovery a bit more
  2024-03-19  2:15 [PATCH 0/5] xfs: fix discontiguous metadata block recovery Dave Chinner
  2024-03-19  2:15 ` [PATCH 1/5] xfs: buffer log item type mismatches are corruption Dave Chinner
@ 2024-03-19  2:15 ` Dave Chinner
  2024-03-19  7:26   ` Christoph Hellwig
  2024-03-19 18:40   ` Darrick J. Wong
  2024-03-19  2:15 ` [PATCH 3/5] xfs: recover dquot buffers unconditionally Dave Chinner
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 17+ messages in thread
From: Dave Chinner @ 2024-03-19  2:15 UTC (permalink / raw
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

It really is a unique snowflake, so peal off from normal buffer
recovery earlier and shuffle all the unique bits into the inode
buffer recovery function.

Also, it looks like the handling of mismatched inode cluster buffer
sizes is wrong - we have to write the recovered buffer -before- we
mark it stale as we're not supposed to write stale buffers. I don't
think we check that anywhere in the buffer IO path, but lets do it
the right way anyway.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_buf_item_recover.c | 99 ++++++++++++++++++++++-------------
 1 file changed, 63 insertions(+), 36 deletions(-)

diff --git a/fs/xfs/xfs_buf_item_recover.c b/fs/xfs/xfs_buf_item_recover.c
index dba57ee6fa6d..f994a303ad0a 100644
--- a/fs/xfs/xfs_buf_item_recover.c
+++ b/fs/xfs/xfs_buf_item_recover.c
@@ -229,7 +229,7 @@ xlog_recover_validate_buf_type(
 	 * just avoid the verification stage for non-crc filesystems
 	 */
 	if (!xfs_has_crc(mp))
-		return;
+		return 0;
 
 	magic32 = be32_to_cpu(*(__be32 *)bp->b_addr);
 	magic16 = be16_to_cpu(*(__be16*)bp->b_addr);
@@ -407,7 +407,7 @@ xlog_recover_validate_buf_type(
 	 * skipped.
 	 */
 	if (current_lsn == NULLCOMMITLSN)
-		return 0;;
+		return 0;
 
 	if (warnmsg) {
 		xfs_warn(mp, warnmsg);
@@ -567,18 +567,22 @@ xlog_recover_this_dquot_buffer(
 }
 
 /*
- * Perform recovery for a buffer full of inodes.  In these buffers, the only
- * data which should be recovered is that which corresponds to the
- * di_next_unlinked pointers in the on disk inode structures.  The rest of the
- * data for the inodes is always logged through the inodes themselves rather
- * than the inode buffer and is recovered in xlog_recover_inode_pass2().
+ * Perform recovery for a buffer full of inodes. We don't have inode cluster
+ * buffer specific LSNs, so we always recover inode buffers if they contain
+ * inodes.
+ *
+ * In these buffers, the only inode data which should be recovered is that which
+ * corresponds to the di_next_unlinked pointers in the on disk inode structures.
+ * The rest of the data for the inodes is always logged through the inodes
+ * themselves rather than the inode buffer and is recovered in
+ * xlog_recover_inode_pass2().
  *
  * The only time when buffers full of inodes are fully recovered is when the
- * buffer is full of newly allocated inodes.  In this case the buffer will
- * not be marked as an inode buffer and so will be sent to
- * xlog_recover_do_reg_buffer() below during recovery.
+ * buffer is full of newly allocated inodes.  In this case the buffer will not
+ * be marked as an inode buffer and so xlog_recover_do_reg_buffer() will be used
+ * instead.
  */
-STATIC int
+static int
 xlog_recover_do_inode_buffer(
 	struct xfs_mount		*mp,
 	struct xlog_recover_item	*item,
@@ -598,6 +602,13 @@ xlog_recover_do_inode_buffer(
 
 	trace_xfs_log_recover_buf_inode_buf(mp->m_log, buf_f);
 
+	/*
+	 * If the magic number doesn't match, something has gone wrong. Don't
+	 * recover the buffer.
+	 */
+	if (cpu_to_be16(XFS_DINODE_MAGIC) != *((__be16 *)bp->b_addr))
+		return -EFSCORRUPTED;
+
 	/*
 	 * Post recovery validation only works properly on CRC enabled
 	 * filesystems.
@@ -677,6 +688,31 @@ xlog_recover_do_inode_buffer(
 
 	}
 
+	/*
+	 * Make sure that only inode buffers with good sizes remain valid after
+	 * recovering this buffer item.
+	 *
+	 * The kernel moves inodes in buffers of 1 block or inode_cluster_size
+	 * bytes, whichever is bigger.  The inode buffers in the log can be a
+	 * different size if the log was generated by an older kernel using
+	 * unclustered inode buffers or a newer kernel running with a different
+	 * inode cluster size.  Regardless, if the inode buffer size isn't
+	 * max(blocksize, inode_cluster_size) for *our* value of
+	 * inode_cluster_size, then we need to keep the buffer out of the buffer
+	 * cache so that the buffer won't overlap with future reads of those
+	 * inodes.
+	 *
+	 * To acheive this, we write the buffer ito recover the inodes then mark
+	 * it stale so that it won't be found on overlapping buffer lookups and
+	 * caller knows not to queue it for delayed write.
+	 */
+	if (BBTOB(bp->b_length) != M_IGEO(mp)->inode_cluster_size) {
+		int error;
+
+		error = xfs_bwrite(bp);
+		xfs_buf_stale(bp);
+		return error;
+	}
 	return 0;
 }
 
@@ -840,7 +876,6 @@ xlog_recover_get_buf_lsn(
 	magic16 = be16_to_cpu(*(__be16 *)blk);
 	switch (magic16) {
 	case XFS_DQUOT_MAGIC:
-	case XFS_DINODE_MAGIC:
 		goto recover_immediately;
 	default:
 		break;
@@ -910,6 +945,17 @@ xlog_recover_buf_commit_pass2(
 	if (error)
 		return error;
 
+	/*
+	 * Inode buffer recovery is quite unique, so go out separate ways here
+	 * to simplify the rest of the code.
+	 */
+	if (buf_f->blf_flags & XFS_BLF_INODE_BUF) {
+		error = xlog_recover_do_inode_buffer(mp, item, bp, buf_f);
+		if (error || (bp->b_flags & XBF_STALE))
+			goto out_release;
+		goto out_write;
+	}
+
 	/*
 	 * Recover the buffer only if we get an LSN from it and it's less than
 	 * the lsn of the transaction we are replaying.
@@ -946,9 +992,7 @@ xlog_recover_buf_commit_pass2(
 		goto out_release;
 	}
 
-	if (buf_f->blf_flags & XFS_BLF_INODE_BUF) {
-		error = xlog_recover_do_inode_buffer(mp, item, bp, buf_f);
-	} else if (buf_f->blf_flags &
+	if (buf_f->blf_flags &
 		  (XFS_BLF_UDQUOT_BUF|XFS_BLF_PDQUOT_BUF|XFS_BLF_GDQUOT_BUF)) {
 		if (!xlog_recover_this_dquot_buffer(mp, log, item, bp, buf_f))
 			goto out_release;
@@ -965,28 +1009,11 @@ xlog_recover_buf_commit_pass2(
 	/*
 	 * Perform delayed write on the buffer.  Asynchronous writes will be
 	 * slower when taking into account all the buffers to be flushed.
-	 *
-	 * Also make sure that only inode buffers with good sizes stay in
-	 * the buffer cache.  The kernel moves inodes in buffers of 1 block
-	 * or inode_cluster_size bytes, whichever is bigger.  The inode
-	 * buffers in the log can be a different size if the log was generated
-	 * by an older kernel using unclustered inode buffers or a newer kernel
-	 * running with a different inode cluster size.  Regardless, if
-	 * the inode buffer size isn't max(blocksize, inode_cluster_size)
-	 * for *our* value of inode_cluster_size, then we need to keep
-	 * the buffer out of the buffer cache so that the buffer won't
-	 * overlap with future reads of those inodes.
 	 */
-	if (XFS_DINODE_MAGIC ==
-	    be16_to_cpu(*((__be16 *)xfs_buf_offset(bp, 0))) &&
-	    (BBTOB(bp->b_length) != M_IGEO(log->l_mp)->inode_cluster_size)) {
-		xfs_buf_stale(bp);
-		error = xfs_bwrite(bp);
-	} else {
-		ASSERT(bp->b_mount == mp);
-		bp->b_flags |= _XBF_LOGRECOVERY;
-		xfs_buf_delwri_queue(bp, buffer_list);
-	}
+out_write:
+	ASSERT(bp->b_mount == mp);
+	bp->b_flags |= _XBF_LOGRECOVERY;
+	xfs_buf_delwri_queue(bp, buffer_list);
 
 out_release:
 	xfs_buf_relse(bp);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 3/5] xfs: recover dquot buffers unconditionally
  2024-03-19  2:15 [PATCH 0/5] xfs: fix discontiguous metadata block recovery Dave Chinner
  2024-03-19  2:15 ` [PATCH 1/5] xfs: buffer log item type mismatches are corruption Dave Chinner
  2024-03-19  2:15 ` [PATCH 2/5] xfs: separate out inode buffer recovery a bit more Dave Chinner
@ 2024-03-19  2:15 ` Dave Chinner
  2024-03-19 18:49   ` Darrick J. Wong
  2024-03-19 21:46   ` Christoph Hellwig
  2024-03-19  2:15 ` [PATCH 4/5] xfs: detect partial buffer recovery operations Dave Chinner
  2024-03-19  2:15 ` [PATCH 5/5] xfs: consistently use struct xlog in buffer item recovery Dave Chinner
  4 siblings, 2 replies; 17+ messages in thread
From: Dave Chinner @ 2024-03-19  2:15 UTC (permalink / raw
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

Dquot buffers are only logged when the dquot cluster is allocated.
Recovery of them is always done and not conditional on an LSN found
in the buffer because there aren't dquots stamped in the buffer when
the initialisation is replayed after allocation.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/libxfs/xfs_log_format.h |   6 ++
 fs/xfs/xfs_buf_item_recover.c  | 129 +++++++++++++++++----------------
 2 files changed, 72 insertions(+), 63 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_log_format.h b/fs/xfs/libxfs/xfs_log_format.h
index 16872972e1e9..5ac0c3066930 100644
--- a/fs/xfs/libxfs/xfs_log_format.h
+++ b/fs/xfs/libxfs/xfs_log_format.h
@@ -508,6 +508,9 @@ struct xfs_log_dinode {
 #define XFS_BLF_PDQUOT_BUF	(1<<3)
 #define	XFS_BLF_GDQUOT_BUF	(1<<4)
 
+#define XFS_BLF_DQUOT_BUF	\
+	(XFS_BLF_UDQUOT_BUF | XFS_BLF_PDQUOT_BUF | XFS_BLF_GDQUOT_BUF)
+
 /*
  * This is the structure used to lay out a buf log item in the log.  The data
  * map describes which 128 byte chunks of the buffer have been logged.
@@ -571,6 +574,9 @@ enum xfs_blft {
 	XFS_BLFT_MAX_BUF = (1 << XFS_BLFT_BITS),
 };
 
+#define XFS_BLFT_DQUOT_BUF	\
+	(XFS_BLFT_UDQUOT_BUF | XFS_BLFT_PDQUOT_BUF | XFS_BLFT_GDQUOT_BUF)
+
 static inline void
 xfs_blft_to_flags(struct xfs_buf_log_format *blf, enum xfs_blft type)
 {
diff --git a/fs/xfs/xfs_buf_item_recover.c b/fs/xfs/xfs_buf_item_recover.c
index f994a303ad0a..90740fcf2fbe 100644
--- a/fs/xfs/xfs_buf_item_recover.c
+++ b/fs/xfs/xfs_buf_item_recover.c
@@ -450,8 +450,6 @@ xlog_recover_do_reg_buffer(
 	int			i;
 	int			bit;
 	int			nbits;
-	xfs_failaddr_t		fa;
-	const size_t		size_disk_dquot = sizeof(struct xfs_disk_dquot);
 
 	trace_xfs_log_recover_buf_reg_buf(mp->m_log, buf_f);
 
@@ -481,39 +479,10 @@ xlog_recover_do_reg_buffer(
 		if (item->ri_buf[i].i_len < (nbits << XFS_BLF_SHIFT))
 			nbits = item->ri_buf[i].i_len >> XFS_BLF_SHIFT;
 
-		/*
-		 * Do a sanity check if this is a dquot buffer. Just checking
-		 * the first dquot in the buffer should do. XXXThis is
-		 * probably a good thing to do for other buf types also.
-		 */
-		fa = NULL;
-		if (buf_f->blf_flags &
-		   (XFS_BLF_UDQUOT_BUF|XFS_BLF_PDQUOT_BUF|XFS_BLF_GDQUOT_BUF)) {
-			if (item->ri_buf[i].i_addr == NULL) {
-				xfs_alert(mp,
-					"XFS: NULL dquot in %s.", __func__);
-				goto next;
-			}
-			if (item->ri_buf[i].i_len < size_disk_dquot) {
-				xfs_alert(mp,
-					"XFS: dquot too small (%d) in %s.",
-					item->ri_buf[i].i_len, __func__);
-				goto next;
-			}
-			fa = xfs_dquot_verify(mp, item->ri_buf[i].i_addr, -1);
-			if (fa) {
-				xfs_alert(mp,
-	"dquot corrupt at %pS trying to replay into block 0x%llx",
-					fa, xfs_buf_daddr(bp));
-				goto next;
-			}
-		}
-
 		memcpy(xfs_buf_offset(bp,
 			(uint)bit << XFS_BLF_SHIFT),	/* dest */
 			item->ri_buf[i].i_addr,		/* source */
 			nbits<<XFS_BLF_SHIFT);		/* length */
- next:
 		i++;
 		bit += nbits;
 	}
@@ -566,6 +535,56 @@ xlog_recover_this_dquot_buffer(
 	return true;
 }
 
+/*
+ * Do a sanity check of each region in the item to ensure we are actually
+ * recovering a dquot buffer item.
+ */
+static int
+xlog_recover_verify_dquot_buf_item(
+	struct xfs_mount		*mp,
+	struct xlog_recover_item	*item,
+	struct xfs_buf			*bp,
+	struct xfs_buf_log_format	*buf_f)
+{
+	xfs_failaddr_t			fa;
+	int				i;
+
+	switch (xfs_blft_from_flags(buf_f)) {
+	case XFS_BLFT_UDQUOT_BUF:
+	case XFS_BLFT_PDQUOT_BUF:
+	case XFS_BLFT_GDQUOT_BUF:
+		break;
+	default:
+		xfs_alert(mp,
+			"XFS: dquot buffer log format type mismatch in %s.",
+			__func__);
+		xfs_buf_corruption_error(bp, __this_address);
+		return -EFSCORRUPTED;
+	}
+
+	for (i = 1; i < item->ri_total; i++) {
+		if (item->ri_buf[i].i_addr == NULL) {
+			xfs_alert(mp,
+				"XFS: NULL dquot in %s.", __func__);
+			return -EFSCORRUPTED;
+		}
+		if (item->ri_buf[i].i_len < sizeof(struct xfs_disk_dquot)) {
+			xfs_alert(mp,
+				"XFS: dquot too small (%d) in %s.",
+				item->ri_buf[i].i_len, __func__);
+			return -EFSCORRUPTED;
+		}
+		fa = xfs_dquot_verify(mp, item->ri_buf[i].i_addr, -1);
+		if (fa) {
+			xfs_alert(mp,
+"dquot corrupt at %pS trying to replay into block 0x%llx",
+				fa, xfs_buf_daddr(bp));
+			return -EFSCORRUPTED;
+		}
+	}
+	return 0;
+}
+
 /*
  * Perform recovery for a buffer full of inodes. We don't have inode cluster
  * buffer specific LSNs, so we always recover inode buffers if they contain
@@ -743,7 +762,6 @@ xlog_recover_get_buf_lsn(
 	struct xfs_buf_log_format *buf_f)
 {
 	uint32_t		magic32;
-	uint16_t		magic16;
 	uint16_t		magicda;
 	void			*blk = bp->b_addr;
 	uuid_t			*uuid;
@@ -862,27 +880,7 @@ xlog_recover_get_buf_lsn(
 		return lsn;
 	}
 
-	/*
-	 * We do individual object checks on dquot and inode buffers as they
-	 * have their own individual LSN records. Also, we could have a stale
-	 * buffer here, so we have to at least recognise these buffer types.
-	 *
-	 * A notd complexity here is inode unlinked list processing - it logs
-	 * the inode directly in the buffer, but we don't know which inodes have
-	 * been modified, and there is no global buffer LSN. Hence we need to
-	 * recover all inode buffer types immediately. This problem will be
-	 * fixed by logical logging of the unlinked list modifications.
-	 */
-	magic16 = be16_to_cpu(*(__be16 *)blk);
-	switch (magic16) {
-	case XFS_DQUOT_MAGIC:
-		goto recover_immediately;
-	default:
-		break;
-	}
-
 	/* unknown buffer contents, recover immediately */
-
 recover_immediately:
 	return (xfs_lsn_t)-1;
 
@@ -956,6 +954,21 @@ xlog_recover_buf_commit_pass2(
 		goto out_write;
 	}
 
+	if (buf_f->blf_flags & XFS_BLF_DQUOT_BUF) {
+		if (!xlog_recover_this_dquot_buffer(mp, log, item, bp, buf_f))
+			goto out_release;
+
+		error = xlog_recover_verify_dquot_buf_item(mp, item, bp, buf_f);
+		if (error)
+			goto out_release;
+
+		error = xlog_recover_do_reg_buffer(mp, item, bp, buf_f,
+				NULLCOMMITLSN);
+		if (error)
+			goto out_release;
+		goto out_write;
+	}
+
 	/*
 	 * Recover the buffer only if we get an LSN from it and it's less than
 	 * the lsn of the transaction we are replaying.
@@ -992,17 +1005,7 @@ xlog_recover_buf_commit_pass2(
 		goto out_release;
 	}
 
-	if (buf_f->blf_flags &
-		  (XFS_BLF_UDQUOT_BUF|XFS_BLF_PDQUOT_BUF|XFS_BLF_GDQUOT_BUF)) {
-		if (!xlog_recover_this_dquot_buffer(mp, log, item, bp, buf_f))
-			goto out_release;
-
-		error = xlog_recover_do_reg_buffer(mp, item, bp, buf_f,
-				NULLCOMMITLSN);
-	} else {
-		error = xlog_recover_do_reg_buffer(mp, item, bp, buf_f,
-				current_lsn);
-	}
+	error = xlog_recover_do_reg_buffer(mp, item, bp, buf_f, current_lsn);
 	if (error)
 		goto out_release;
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 4/5] xfs: detect partial buffer recovery operations
  2024-03-19  2:15 [PATCH 0/5] xfs: fix discontiguous metadata block recovery Dave Chinner
                   ` (2 preceding siblings ...)
  2024-03-19  2:15 ` [PATCH 3/5] xfs: recover dquot buffers unconditionally Dave Chinner
@ 2024-03-19  2:15 ` Dave Chinner
  2024-03-19 20:39   ` Darrick J. Wong
  2024-03-19 22:54   ` Christoph Hellwig
  2024-03-19  2:15 ` [PATCH 5/5] xfs: consistently use struct xlog in buffer item recovery Dave Chinner
  4 siblings, 2 replies; 17+ messages in thread
From: Dave Chinner @ 2024-03-19  2:15 UTC (permalink / raw
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

When a compound buffer is logged (e.g. fragmented large directory
block) we record it in the log as a series of separate buffer log
format items in the journal. These get recovered individually, and
because they are for non-contiguous extent ranges, we cannot use
buffer addresses to detect that the buffer format items are from the
same directory block.

Further, we cannot use LSN checks to determine if the partial
block buffers should be recovered - apart from the first buffer we
don't have a header with an LSN in it to check.

Finally, we cannot add a verifier to a partial block buffer because,
again, it will fail the verifier checks and report corruption. We
already skip this step due to bad magic number detection, but we
should be able to do better here.

The one thing we can rely on, though, is that each buffer format
item is written consecutively in the journal. They are built at
commit time into a single log iovec and chained into the iclog write
log vector chain as an unbroken sequence. Hence all the parts of a
compound buffer should be consecutive buf log format items in the
transaction being recovered.

Unfortunately, we don't have the information available in recovery
to do a full compound buffer instantiation for recovery. We only
have the fragments that contained modifications in the journal, and
so there may be missing fragments that are still clean and hence are
not in the journal. Hence we cannot use journal state to rebuild the
compound buffer entirely and hence recover it as a complete entity
and run a verifier over it before writeback.

Hence the first thing we need to do is detect such partial buffer
recovery situations and track whether we need to skip all the
partial buffers due to the LSN check in the initial header fragment
read from disk.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_buf_item_recover.c | 178 +++++++++++++++++++++++++++-------
 1 file changed, 143 insertions(+), 35 deletions(-)

diff --git a/fs/xfs/xfs_buf_item_recover.c b/fs/xfs/xfs_buf_item_recover.c
index 90740fcf2fbe..9225baa62755 100644
--- a/fs/xfs/xfs_buf_item_recover.c
+++ b/fs/xfs/xfs_buf_item_recover.c
@@ -434,18 +434,17 @@ xlog_recover_validate_buf_type(
 }
 
 /*
- * Perform a 'normal' buffer recovery.  Each logged region of the
+ * Perform recovery of the logged regions. Each logged region of the
  * buffer should be copied over the corresponding region in the
  * given buffer.  The bitmap in the buf log format structure indicates
  * where to place the logged data.
  */
-static int
-xlog_recover_do_reg_buffer(
+static void
+xlog_recover_buffer(
 	struct xfs_mount		*mp,
 	struct xlog_recover_item	*item,
 	struct xfs_buf			*bp,
-	struct xfs_buf_log_format	*buf_f,
-	xfs_lsn_t			current_lsn)
+	struct xfs_buf_log_format	*buf_f)
 {
 	int			i;
 	int			bit;
@@ -489,7 +488,17 @@ xlog_recover_do_reg_buffer(
 
 	/* Shouldn't be any more regions */
 	ASSERT(i == item->ri_total);
+}
 
+static int
+xlog_recover_do_reg_buffer(
+	struct xfs_mount		*mp,
+	struct xlog_recover_item	*item,
+	struct xfs_buf			*bp,
+	struct xfs_buf_log_format	*buf_f,
+	xfs_lsn_t			current_lsn)
+{
+	xlog_recover_buffer(mp, item, bp, buf_f);
 	return xlog_recover_validate_buf_type(mp, bp, buf_f, current_lsn);
 }
 
@@ -735,6 +744,67 @@ xlog_recover_do_inode_buffer(
 	return 0;
 }
 
+static bool
+xlog_recovery_is_dir_buf(
+	struct xfs_buf_log_format	*buf_f)
+{
+	switch (xfs_blft_from_flags(buf_f)) {
+	case XFS_BLFT_DIR_BLOCK_BUF:
+	case XFS_BLFT_DIR_DATA_BUF:
+	case XFS_BLFT_DIR_FREE_BUF:
+	case XFS_BLFT_DIR_LEAF1_BUF:
+	case XFS_BLFT_DIR_LEAFN_BUF:
+	case XFS_BLFT_DA_NODE_BUF:
+		return true;
+	default:
+		break;
+	}
+	return false;
+}
+
+/*
+ * Partial dabuf recovery.
+ *
+ * There are two main cases here - a buffer that contains the dabuf header and
+ * hence can be magic number and LSN checked, and then everything else.
+ *
+ * We can determine if the former should be replayed or not via LSN checks, but
+ * we cannot do that with the latter, so the only choice we have here is to
+ * always recover the changes regardless of whether this means metadata on disk
+ * will go backwards in time. This, at least, means that the changes in each
+ * checkpoint are applied consistently to the dabuf and we don't do really
+ * stupid things like skip the header fragment replay and then replay all the
+ * other changes to the dabuf block.
+ *
+ * While this is not ideal, finishing log recovery should them replay all the
+ * remaining changes across this buffer and so bring it back to being consistent
+ * on disk at the completion of recovery. Hence this "going backwards in time"
+ * situation will only be relevant to failed journal replay situations. These
+ * are rare and will require xfs_repair to be run, anyway, so the inconsistency
+ * that results will be corrected before the filesystem goes back into service,
+ * anyway.
+ *
+ * Important: This partial fragment recovery relies on log recovery purging the
+ * buffer cache after completion of this recovery phase. These partial buffers
+ * are never used at runtime (discontiguous buffers will be used instead), so
+ * they must be removed from the buffer cache to prevent them from causing
+ * overlapping range lookup failures for the entire dabuf range.
+ */
+static void
+xlog_recover_do_partial_dabuf(
+	struct xfs_mount		*mp,
+	struct xlog_recover_item	*item,
+	struct xfs_buf			*bp,
+	struct xfs_buf_log_format	*buf_f)
+{
+	/*
+	 * Always recover without verification or write verifiers. Use delwri
+	 * and rely on post pass2 recovery cache purge to clean these out of
+	 * memory.
+	 */
+	xlog_recover_buffer(mp, item, bp, buf_f);
+}
+
 /*
  * V5 filesystems know the age of the buffer on disk being recovered. We can
  * have newer objects on disk than we are replaying, and so for these cases we
@@ -886,6 +956,54 @@ xlog_recover_get_buf_lsn(
 
 }
 
+/*
+ * Recover the buffer only if we get an LSN from it and it's less than the lsn
+ * of the transaction we are replaying.
+ *
+ * Note that we have to be extremely careful of readahead here.  Readahead does
+ * not attach verfiers to the buffers so if we don't actually do any replay
+ * after readahead because of the LSN we found in the buffer if more recent than
+ * that current transaction then we need to attach the verifier directly.
+ * Failure to do so can lead to future recovery actions (e.g. EFI and unlinked
+ * list recovery) can operate on the buffers and they won't get the verifier
+ * attached. This can lead to blocks on disk having the correct content but a
+ * stale CRC.
+ *
+ * It is safe to assume these clean buffers are currently up to date.  If the
+ * buffer is dirtied by a later transaction being replayed, then the verifier
+ * will be reset to match whatever recover turns that buffer into.
+ *
+ * Return true if the buffer needs to be recovered, false if it doesn't.
+ */
+static bool
+xlog_recover_this_buffer(
+	struct xfs_mount		*mp,
+	struct xfs_buf			*bp,
+	struct xfs_buf_log_format	*buf_f,
+	xfs_lsn_t			current_lsn)
+{
+	xfs_lsn_t			lsn;
+
+	lsn = xlog_recover_get_buf_lsn(mp, bp, buf_f);
+	if (!lsn)
+		return true;
+	if (lsn == -1)
+		return true;
+	if (XFS_LSN_CMP(lsn, current_lsn) < 0)
+		return true;
+
+	trace_xfs_log_recover_buf_skip(mp->m_log, buf_f);
+	xlog_recover_validate_buf_type(mp, bp, buf_f, NULLCOMMITLSN);
+
+	/*
+	 * We're skipping replay of this buffer log item due to the log
+	 * item LSN being behind the ondisk buffer.  Verify the buffer
+	 * contents since we aren't going to run the write verifier.
+	 */
+	if (bp->b_ops)
+		bp->b_ops->verify_read(bp);
+	return false;
+}
 /*
  * This routine replays a modification made to a buffer at runtime.
  * There are actually two types of buffer, regular and inode, which
@@ -920,7 +1038,6 @@ xlog_recover_buf_commit_pass2(
 	struct xfs_mount		*mp = log->l_mp;
 	struct xfs_buf			*bp;
 	int				error;
-	xfs_lsn_t			lsn;
 
 	/*
 	 * In this pass we only want to recover all the buffers which have
@@ -962,7 +1079,8 @@ xlog_recover_buf_commit_pass2(
 		if (error)
 			goto out_release;
 
-		error = xlog_recover_do_reg_buffer(mp, item, bp, buf_f,
+		xlog_recover_buffer(mp, item, bp, buf_f);
+		error = xlog_recover_validate_buf_type(mp, bp, buf_f,
 				NULLCOMMITLSN);
 		if (error)
 			goto out_release;
@@ -970,41 +1088,31 @@ xlog_recover_buf_commit_pass2(
 	}
 
 	/*
-	 * Recover the buffer only if we get an LSN from it and it's less than
-	 * the lsn of the transaction we are replaying.
-	 *
-	 * Note that we have to be extremely careful of readahead here.
-	 * Readahead does not attach verfiers to the buffers so if we don't
-	 * actually do any replay after readahead because of the LSN we found
-	 * in the buffer if more recent than that current transaction then we
-	 * need to attach the verifier directly. Failure to do so can lead to
-	 * future recovery actions (e.g. EFI and unlinked list recovery) can
-	 * operate on the buffers and they won't get the verifier attached. This
-	 * can lead to blocks on disk having the correct content but a stale
-	 * CRC.
-	 *
-	 * It is safe to assume these clean buffers are currently up to date.
-	 * If the buffer is dirtied by a later transaction being replayed, then
-	 * the verifier will be reset to match whatever recover turns that
-	 * buffer into.
+	 * Directory buffers can be larger than a single filesystem block and
+	 * if they are they can be fragmented. There are lots of concerns about
+	 * recovering these, so push them out of line where the concerns can be
+	 * documented clearly.
 	 */
-	lsn = xlog_recover_get_buf_lsn(mp, bp, buf_f);
-	if (lsn && lsn != -1 && XFS_LSN_CMP(lsn, current_lsn) >= 0) {
-		trace_xfs_log_recover_buf_skip(log, buf_f);
-		xlog_recover_validate_buf_type(mp, bp, buf_f, NULLCOMMITLSN);
+	if (xlog_recovery_is_dir_buf(buf_f) &&
+	    mp->m_dir_geo->blksize != BBTOB(buf_f->blf_len)) {
+		xlog_recover_do_partial_dabuf(mp, item, bp, buf_f);
+		goto out_write;
+	}
 
+	/*
+	 * Whole buffer recovery, dependent on the LSN in the on-disk structure.
+	 */
+	if (!xlog_recover_this_buffer(mp, bp, buf_f, current_lsn)) {
 		/*
-		 * We're skipping replay of this buffer log item due to the log
-		 * item LSN being behind the ondisk buffer.  Verify the buffer
-		 * contents since we aren't going to run the write verifier.
+		 * We may have verified this buffer even though we aren't
+		 * recovering it. Return the verifier error for early detection
+		 * of recovery inconsistencies.
 		 */
-		if (bp->b_ops) {
-			bp->b_ops->verify_read(bp);
-			error = bp->b_error;
-		}
+		error = bp->b_error;
 		goto out_release;
 	}
 
+
 	error = xlog_recover_do_reg_buffer(mp, item, bp, buf_f, current_lsn);
 	if (error)
 		goto out_release;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 5/5] xfs: consistently use struct xlog in buffer item recovery
  2024-03-19  2:15 [PATCH 0/5] xfs: fix discontiguous metadata block recovery Dave Chinner
                   ` (3 preceding siblings ...)
  2024-03-19  2:15 ` [PATCH 4/5] xfs: detect partial buffer recovery operations Dave Chinner
@ 2024-03-19  2:15 ` Dave Chinner
  2024-03-19 20:40   ` Darrick J. Wong
  2024-03-19 21:48   ` Christoph Hellwig
  4 siblings, 2 replies; 17+ messages in thread
From: Dave Chinner @ 2024-03-19  2:15 UTC (permalink / raw
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

Rather than passing a mix of xfs_mount and xlog (and sometimes both)
through the call chain of buffer log item recovery, just pass
the struct xlog and pull the xfs_mount from that only at the leaf
functions where it is needed. This makes it all just a little
cleaner.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_buf_item_recover.c | 94 +++++++++++++++++------------------
 1 file changed, 47 insertions(+), 47 deletions(-)

diff --git a/fs/xfs/xfs_buf_item_recover.c b/fs/xfs/xfs_buf_item_recover.c
index 9225baa62755..edd03b08c969 100644
--- a/fs/xfs/xfs_buf_item_recover.c
+++ b/fs/xfs/xfs_buf_item_recover.c
@@ -209,7 +209,7 @@ xlog_recover_buf_commit_pass1(
  */
 static int
 xlog_recover_validate_buf_type(
-	struct xfs_mount		*mp,
+	struct xlog			*log,
 	struct xfs_buf			*bp,
 	struct xfs_buf_log_format	*buf_f,
 	xfs_lsn_t			current_lsn)
@@ -228,7 +228,7 @@ xlog_recover_validate_buf_type(
 	 * inconsistent state resulting in verification failures. Hence for now
 	 * just avoid the verification stage for non-crc filesystems
 	 */
-	if (!xfs_has_crc(mp))
+	if (!xfs_has_crc(log->l_mp))
 		return 0;
 
 	magic32 = be32_to_cpu(*(__be32 *)bp->b_addr);
@@ -396,7 +396,7 @@ xlog_recover_validate_buf_type(
 		break;
 #endif /* CONFIG_XFS_RT */
 	default:
-		xfs_warn(mp, "Unknown buffer type %d!",
+		xfs_warn(log->l_mp, "Unknown buffer type %d!",
 			 xfs_blft_from_flags(buf_f));
 		break;
 	}
@@ -410,7 +410,7 @@ xlog_recover_validate_buf_type(
 		return 0;
 
 	if (warnmsg) {
-		xfs_warn(mp, warnmsg);
+		xfs_warn(log->l_mp, warnmsg);
 		xfs_buf_corruption_error(bp, __this_address);
 		return -EFSCORRUPTED;
 	}
@@ -428,7 +428,7 @@ xlog_recover_validate_buf_type(
 	 */
 	ASSERT(bp->b_ops);
 	bp->b_flags |= _XBF_LOGRECOVERY;
-	xfs_buf_item_init(bp, mp);
+	xfs_buf_item_init(bp, log->l_mp);
 	bp->b_log_item->bli_item.li_lsn = current_lsn;
 	return 0;
 }
@@ -441,7 +441,7 @@ xlog_recover_validate_buf_type(
  */
 static void
 xlog_recover_buffer(
-	struct xfs_mount		*mp,
+	struct xlog			*log,
 	struct xlog_recover_item	*item,
 	struct xfs_buf			*bp,
 	struct xfs_buf_log_format	*buf_f)
@@ -450,7 +450,7 @@ xlog_recover_buffer(
 	int			bit;
 	int			nbits;
 
-	trace_xfs_log_recover_buf_reg_buf(mp->m_log, buf_f);
+	trace_xfs_log_recover_buf_reg_buf(log, buf_f);
 
 	bit = 0;
 	i = 1;  /* 0 is the buf format structure */
@@ -492,14 +492,14 @@ xlog_recover_buffer(
 
 static int
 xlog_recover_do_reg_buffer(
-	struct xfs_mount		*mp,
+	struct xlog			*log,
 	struct xlog_recover_item	*item,
 	struct xfs_buf			*bp,
 	struct xfs_buf_log_format	*buf_f,
 	xfs_lsn_t			current_lsn)
 {
-	xlog_recover_buffer(mp, item, bp, buf_f);
-	return xlog_recover_validate_buf_type(mp, bp, buf_f, current_lsn);
+	xlog_recover_buffer(log, item, bp, buf_f);
+	return xlog_recover_validate_buf_type(log, bp, buf_f, current_lsn);
 }
 
 /*
@@ -513,7 +513,6 @@ xlog_recover_do_reg_buffer(
  */
 static bool
 xlog_recover_this_dquot_buffer(
-	struct xfs_mount		*mp,
 	struct xlog			*log,
 	struct xlog_recover_item	*item,
 	struct xfs_buf			*bp,
@@ -526,7 +525,7 @@ xlog_recover_this_dquot_buffer(
 	/*
 	 * Filesystems are required to send in quota flags at mount time.
 	 */
-	if (!mp->m_qflags)
+	if (!log->l_mp->m_qflags)
 		return false;
 
 	type = 0;
@@ -550,7 +549,7 @@ xlog_recover_this_dquot_buffer(
  */
 static int
 xlog_recover_verify_dquot_buf_item(
-	struct xfs_mount		*mp,
+	struct xlog			*log,
 	struct xlog_recover_item	*item,
 	struct xfs_buf			*bp,
 	struct xfs_buf_log_format	*buf_f)
@@ -564,7 +563,7 @@ xlog_recover_verify_dquot_buf_item(
 	case XFS_BLFT_GDQUOT_BUF:
 		break;
 	default:
-		xfs_alert(mp,
+		xfs_alert(log->l_mp,
 			"XFS: dquot buffer log format type mismatch in %s.",
 			__func__);
 		xfs_buf_corruption_error(bp, __this_address);
@@ -573,19 +572,19 @@ xlog_recover_verify_dquot_buf_item(
 
 	for (i = 1; i < item->ri_total; i++) {
 		if (item->ri_buf[i].i_addr == NULL) {
-			xfs_alert(mp,
+			xfs_alert(log->l_mp,
 				"XFS: NULL dquot in %s.", __func__);
 			return -EFSCORRUPTED;
 		}
 		if (item->ri_buf[i].i_len < sizeof(struct xfs_disk_dquot)) {
-			xfs_alert(mp,
+			xfs_alert(log->l_mp,
 				"XFS: dquot too small (%d) in %s.",
 				item->ri_buf[i].i_len, __func__);
 			return -EFSCORRUPTED;
 		}
-		fa = xfs_dquot_verify(mp, item->ri_buf[i].i_addr, -1);
+		fa = xfs_dquot_verify(log->l_mp, item->ri_buf[i].i_addr, -1);
 		if (fa) {
-			xfs_alert(mp,
+			xfs_alert(log->l_mp,
 "dquot corrupt at %pS trying to replay into block 0x%llx",
 				fa, xfs_buf_daddr(bp));
 			return -EFSCORRUPTED;
@@ -612,11 +611,12 @@ xlog_recover_verify_dquot_buf_item(
  */
 static int
 xlog_recover_do_inode_buffer(
-	struct xfs_mount		*mp,
+	struct xlog			*log,
 	struct xlog_recover_item	*item,
 	struct xfs_buf			*bp,
 	struct xfs_buf_log_format	*buf_f)
 {
+	struct xfs_sb			*sbp = &log->l_mp->m_sb;
 	int				i;
 	int				item_index = 0;
 	int				bit = 0;
@@ -628,7 +628,7 @@ xlog_recover_do_inode_buffer(
 	xfs_agino_t			*logged_nextp;
 	xfs_agino_t			*buffer_nextp;
 
-	trace_xfs_log_recover_buf_inode_buf(mp->m_log, buf_f);
+	trace_xfs_log_recover_buf_inode_buf(log, buf_f);
 
 	/*
 	 * If the magic number doesn't match, something has gone wrong. Don't
@@ -641,12 +641,12 @@ xlog_recover_do_inode_buffer(
 	 * Post recovery validation only works properly on CRC enabled
 	 * filesystems.
 	 */
-	if (xfs_has_crc(mp))
+	if (xfs_has_crc(log->l_mp))
 		bp->b_ops = &xfs_inode_buf_ops;
 
-	inodes_per_buf = BBTOB(bp->b_length) >> mp->m_sb.sb_inodelog;
+	inodes_per_buf = BBTOB(bp->b_length) >> sbp->sb_inodelog;
 	for (i = 0; i < inodes_per_buf; i++) {
-		next_unlinked_offset = (i * mp->m_sb.sb_inodesize) +
+		next_unlinked_offset = (i * sbp->sb_inodesize) +
 			offsetof(struct xfs_dinode, di_next_unlinked);
 
 		while (next_unlinked_offset >=
@@ -695,8 +695,8 @@ xlog_recover_do_inode_buffer(
 		 */
 		logged_nextp = item->ri_buf[item_index].i_addr +
 				next_unlinked_offset - reg_buf_offset;
-		if (XFS_IS_CORRUPT(mp, *logged_nextp == 0)) {
-			xfs_alert(mp,
+		if (XFS_IS_CORRUPT(log->l_mp, *logged_nextp == 0)) {
+			xfs_alert(log->l_mp,
 		"Bad inode buffer log record (ptr = "PTR_FMT", bp = "PTR_FMT"). "
 		"Trying to replay bad (0) inode di_next_unlinked field.",
 				item, bp);
@@ -711,8 +711,8 @@ xlog_recover_do_inode_buffer(
 		 * have to leave the inode in a consistent state for whoever
 		 * reads it next....
 		 */
-		xfs_dinode_calc_crc(mp,
-				xfs_buf_offset(bp, i * mp->m_sb.sb_inodesize));
+		xfs_dinode_calc_crc(log->l_mp,
+				xfs_buf_offset(bp, i * sbp->sb_inodesize));
 
 	}
 
@@ -734,7 +734,7 @@ xlog_recover_do_inode_buffer(
 	 * it stale so that it won't be found on overlapping buffer lookups and
 	 * caller knows not to queue it for delayed write.
 	 */
-	if (BBTOB(bp->b_length) != M_IGEO(mp)->inode_cluster_size) {
+	if (BBTOB(bp->b_length) != M_IGEO(log->l_mp)->inode_cluster_size) {
 		int error;
 
 		error = xfs_bwrite(bp);
@@ -792,7 +792,7 @@ xlog_recovery_is_dir_buf(
  */
 static void
 xlog_recover_do_partial_dabuf(
-	struct xfs_mount		*mp,
+	struct xlog			*log,
 	struct xlog_recover_item	*item,
 	struct xfs_buf			*bp,
 	struct xfs_buf_log_format	*buf_f)
@@ -802,7 +802,7 @@ xlog_recover_do_partial_dabuf(
 	 * and rely on post pass2 recovery cache purge to clean these out of
 	 * memory.
 	 */
-	xlog_recover_buffer(mp, item, bp, buf_f);
+	xlog_recover_buffer(log, item, bp, buf_f);
 }
 
 /*
@@ -827,7 +827,7 @@ xlog_recover_do_partial_dabuf(
  */
 static xfs_lsn_t
 xlog_recover_get_buf_lsn(
-	struct xfs_mount	*mp,
+	struct xlog		*log,
 	struct xfs_buf		*bp,
 	struct xfs_buf_log_format *buf_f)
 {
@@ -839,7 +839,7 @@ xlog_recover_get_buf_lsn(
 	uint16_t		blft;
 
 	/* v4 filesystems always recover immediately */
-	if (!xfs_has_crc(mp))
+	if (!xfs_has_crc(log->l_mp))
 		goto recover_immediately;
 
 	/*
@@ -916,7 +916,7 @@ xlog_recover_get_buf_lsn(
 		 * the relevant UUID in the superblock.
 		 */
 		lsn = be64_to_cpu(((struct xfs_dsb *)blk)->sb_lsn);
-		if (xfs_has_metauuid(mp))
+		if (xfs_has_metauuid(log->l_mp))
 			uuid = &((struct xfs_dsb *)blk)->sb_meta_uuid;
 		else
 			uuid = &((struct xfs_dsb *)blk)->sb_uuid;
@@ -926,7 +926,7 @@ xlog_recover_get_buf_lsn(
 	}
 
 	if (lsn != (xfs_lsn_t)-1) {
-		if (!uuid_equal(&mp->m_sb.sb_meta_uuid, uuid))
+		if (!uuid_equal(&log->l_mp->m_sb.sb_meta_uuid, uuid))
 			goto recover_immediately;
 		return lsn;
 	}
@@ -945,7 +945,7 @@ xlog_recover_get_buf_lsn(
 	}
 
 	if (lsn != (xfs_lsn_t)-1) {
-		if (!uuid_equal(&mp->m_sb.sb_meta_uuid, uuid))
+		if (!uuid_equal(&log->l_mp->m_sb.sb_meta_uuid, uuid))
 			goto recover_immediately;
 		return lsn;
 	}
@@ -977,14 +977,14 @@ xlog_recover_get_buf_lsn(
  */
 static bool
 xlog_recover_this_buffer(
-	struct xfs_mount		*mp,
+	struct xlog			*log,
 	struct xfs_buf			*bp,
 	struct xfs_buf_log_format	*buf_f,
 	xfs_lsn_t			current_lsn)
 {
 	xfs_lsn_t			lsn;
 
-	lsn = xlog_recover_get_buf_lsn(mp, bp, buf_f);
+	lsn = xlog_recover_get_buf_lsn(log, bp, buf_f);
 	if (!lsn)
 		return true;
 	if (lsn == -1)
@@ -992,8 +992,8 @@ xlog_recover_this_buffer(
 	if (XFS_LSN_CMP(lsn, current_lsn) < 0)
 		return true;
 
-	trace_xfs_log_recover_buf_skip(mp->m_log, buf_f);
-	xlog_recover_validate_buf_type(mp, bp, buf_f, NULLCOMMITLSN);
+	trace_xfs_log_recover_buf_skip(log, buf_f);
+	xlog_recover_validate_buf_type(log, bp, buf_f, NULLCOMMITLSN);
 
 	/*
 	 * We're skipping replay of this buffer log item due to the log
@@ -1065,22 +1065,22 @@ xlog_recover_buf_commit_pass2(
 	 * to simplify the rest of the code.
 	 */
 	if (buf_f->blf_flags & XFS_BLF_INODE_BUF) {
-		error = xlog_recover_do_inode_buffer(mp, item, bp, buf_f);
+		error = xlog_recover_do_inode_buffer(log, item, bp, buf_f);
 		if (error || (bp->b_flags & XBF_STALE))
 			goto out_release;
 		goto out_write;
 	}
 
 	if (buf_f->blf_flags & XFS_BLF_DQUOT_BUF) {
-		if (!xlog_recover_this_dquot_buffer(mp, log, item, bp, buf_f))
+		if (!xlog_recover_this_dquot_buffer(log, item, bp, buf_f))
 			goto out_release;
 
-		error = xlog_recover_verify_dquot_buf_item(mp, item, bp, buf_f);
+		error = xlog_recover_verify_dquot_buf_item(log, item, bp, buf_f);
 		if (error)
 			goto out_release;
 
-		xlog_recover_buffer(mp, item, bp, buf_f);
-		error = xlog_recover_validate_buf_type(mp, bp, buf_f,
+		xlog_recover_buffer(log, item, bp, buf_f);
+		error = xlog_recover_validate_buf_type(log, bp, buf_f,
 				NULLCOMMITLSN);
 		if (error)
 			goto out_release;
@@ -1095,14 +1095,14 @@ xlog_recover_buf_commit_pass2(
 	 */
 	if (xlog_recovery_is_dir_buf(buf_f) &&
 	    mp->m_dir_geo->blksize != BBTOB(buf_f->blf_len)) {
-		xlog_recover_do_partial_dabuf(mp, item, bp, buf_f);
+		xlog_recover_do_partial_dabuf(log, item, bp, buf_f);
 		goto out_write;
 	}
 
 	/*
 	 * Whole buffer recovery, dependent on the LSN in the on-disk structure.
 	 */
-	if (!xlog_recover_this_buffer(mp, bp, buf_f, current_lsn)) {
+	if (!xlog_recover_this_buffer(log, bp, buf_f, current_lsn)) {
 		/*
 		 * We may have verified this buffer even though we aren't
 		 * recovering it. Return the verifier error for early detection
@@ -1113,7 +1113,7 @@ xlog_recover_buf_commit_pass2(
 	}
 
 
-	error = xlog_recover_do_reg_buffer(mp, item, bp, buf_f, current_lsn);
+	error = xlog_recover_do_reg_buffer(log, item, bp, buf_f, current_lsn);
 	if (error)
 		goto out_release;
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/5] xfs: buffer log item type mismatches are corruption
  2024-03-19  2:15 ` [PATCH 1/5] xfs: buffer log item type mismatches are corruption Dave Chinner
@ 2024-03-19  7:23   ` Christoph Hellwig
  2024-03-19 18:16   ` Darrick J. Wong
  1 sibling, 0 replies; 17+ messages in thread
From: Christoph Hellwig @ 2024-03-19  7:23 UTC (permalink / raw
  To: Dave Chinner; +Cc: linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/5] xfs: separate out inode buffer recovery a bit more
  2024-03-19  2:15 ` [PATCH 2/5] xfs: separate out inode buffer recovery a bit more Dave Chinner
@ 2024-03-19  7:26   ` Christoph Hellwig
  2024-03-19 18:40   ` Darrick J. Wong
  1 sibling, 0 replies; 17+ messages in thread
From: Christoph Hellwig @ 2024-03-19  7:26 UTC (permalink / raw
  To: Dave Chinner; +Cc: linux-xfs

On Tue, Mar 19, 2024 at 01:15:21PM +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> It really is a unique snowflake, so peal off from normal buffer
> recovery earlier and shuffle all the unique bits into the inode
> buffer recovery function.
> 
> Also, it looks like the handling of mismatched inode cluster buffer
> sizes is wrong - we have to write the recovered buffer -before- we
> mark it stale as we're not supposed to write stale buffers. I don't
> think we check that anywhere in the buffer IO path, but lets do it
> the right way anyway.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/xfs_buf_item_recover.c | 99 ++++++++++++++++++++++-------------
>  1 file changed, 63 insertions(+), 36 deletions(-)
> 
> diff --git a/fs/xfs/xfs_buf_item_recover.c b/fs/xfs/xfs_buf_item_recover.c
> index dba57ee6fa6d..f994a303ad0a 100644
> --- a/fs/xfs/xfs_buf_item_recover.c
> +++ b/fs/xfs/xfs_buf_item_recover.c
> @@ -229,7 +229,7 @@ xlog_recover_validate_buf_type(
>  	 * just avoid the verification stage for non-crc filesystems
>  	 */
>  	if (!xfs_has_crc(mp))
> -		return;
> +		return 0;
>  
>  	magic32 = be32_to_cpu(*(__be32 *)bp->b_addr);
>  	magic16 = be16_to_cpu(*(__be16*)bp->b_addr);
> @@ -407,7 +407,7 @@ xlog_recover_validate_buf_type(
>  	 * skipped.
>  	 */
>  	if (current_lsn == NULLCOMMITLSN)
> -		return 0;;
> +		return 0;

Looks like these two should be in the previous patch.

Otherwise this looks good:


Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/5] xfs: buffer log item type mismatches are corruption
  2024-03-19  2:15 ` [PATCH 1/5] xfs: buffer log item type mismatches are corruption Dave Chinner
  2024-03-19  7:23   ` Christoph Hellwig
@ 2024-03-19 18:16   ` Darrick J. Wong
  1 sibling, 0 replies; 17+ messages in thread
From: Darrick J. Wong @ 2024-03-19 18:16 UTC (permalink / raw
  To: Dave Chinner; +Cc: linux-xfs

On Tue, Mar 19, 2024 at 01:15:20PM +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> We detect when a buffer log format type and the magic number in the
> buffer do not match. We issue a warning, but do not return an error
> nor do we write back the recovered buffer. If no further recover
> action is performed on that buffer, then recovery has left the
> buffer in an inconsistent (out of date) state on disk. i.e. the
> structure is corrupt on disk.
> 
> If this mismatch occurs, return a -EFSCORRUPTED error and cause
> recovery to abort instead of letting recovery corrupt the filesystem
> and continue onwards.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/xfs_buf_item_recover.c | 51 +++++++++++++++++------------------
>  1 file changed, 24 insertions(+), 27 deletions(-)
> 
> diff --git a/fs/xfs/xfs_buf_item_recover.c b/fs/xfs/xfs_buf_item_recover.c
> index d74bf7bb7794..dba57ee6fa6d 100644
> --- a/fs/xfs/xfs_buf_item_recover.c
> +++ b/fs/xfs/xfs_buf_item_recover.c
> @@ -207,7 +207,7 @@ xlog_recover_buf_commit_pass1(
>   *	the first 32 bits of the buffer (most blocks),
>   *	inside a struct xfs_da_blkinfo at the start of the buffer.
>   */
> -static void
> +static int
>  xlog_recover_validate_buf_type(
>  	struct xfs_mount		*mp,
>  	struct xfs_buf			*bp,
> @@ -407,11 +407,12 @@ xlog_recover_validate_buf_type(
>  	 * skipped.
>  	 */
>  	if (current_lsn == NULLCOMMITLSN)
> -		return;
> +		return 0;;

Unnecessary double-semicolon.

With that fixed,
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D


>  
>  	if (warnmsg) {
>  		xfs_warn(mp, warnmsg);
> -		ASSERT(0);
> +		xfs_buf_corruption_error(bp, __this_address);
> +		return -EFSCORRUPTED;
>  	}
>  
>  	/*
> @@ -425,14 +426,11 @@ xlog_recover_validate_buf_type(
>  	 * the buffer. Therefore, initialize a bli purely to carry the LSN to
>  	 * the verifier.
>  	 */
> -	if (bp->b_ops) {
> -		struct xfs_buf_log_item	*bip;
> -
> -		bp->b_flags |= _XBF_LOGRECOVERY;
> -		xfs_buf_item_init(bp, mp);
> -		bip = bp->b_log_item;
> -		bip->bli_item.li_lsn = current_lsn;
> -	}
> +	ASSERT(bp->b_ops);
> +	bp->b_flags |= _XBF_LOGRECOVERY;
> +	xfs_buf_item_init(bp, mp);
> +	bp->b_log_item->bli_item.li_lsn = current_lsn;
> +	return 0;
>  }
>  
>  /*
> @@ -441,7 +439,7 @@ xlog_recover_validate_buf_type(
>   * given buffer.  The bitmap in the buf log format structure indicates
>   * where to place the logged data.
>   */
> -STATIC void
> +static int
>  xlog_recover_do_reg_buffer(
>  	struct xfs_mount		*mp,
>  	struct xlog_recover_item	*item,
> @@ -523,20 +521,20 @@ xlog_recover_do_reg_buffer(
>  	/* Shouldn't be any more regions */
>  	ASSERT(i == item->ri_total);
>  
> -	xlog_recover_validate_buf_type(mp, bp, buf_f, current_lsn);
> +	return xlog_recover_validate_buf_type(mp, bp, buf_f, current_lsn);
>  }
>  
>  /*
> - * Perform a dquot buffer recovery.
> + * Test if this dquot buffer item should be recovered.
>   * Simple algorithm: if we have found a QUOTAOFF log item of the same type
>   * (ie. USR or GRP), then just toss this buffer away; don't recover it.
>   * Else, treat it as a regular buffer and do recovery.
>   *
> - * Return false if the buffer was tossed and true if we recovered the buffer to
> - * indicate to the caller if the buffer needs writing.
> + * Return false if the buffer should be tossed and true if the buffer needs
> + * to be recovered.
>   */
> -STATIC bool
> -xlog_recover_do_dquot_buffer(
> +static bool
> +xlog_recover_this_dquot_buffer(
>  	struct xfs_mount		*mp,
>  	struct xlog			*log,
>  	struct xlog_recover_item	*item,
> @@ -565,8 +563,6 @@ xlog_recover_do_dquot_buffer(
>  	 */
>  	if (log->l_quotaoffs_flag & type)
>  		return false;
> -
> -	xlog_recover_do_reg_buffer(mp, item, bp, buf_f, NULLCOMMITLSN);
>  	return true;
>  }
>  
> @@ -952,18 +948,19 @@ xlog_recover_buf_commit_pass2(
>  
>  	if (buf_f->blf_flags & XFS_BLF_INODE_BUF) {
>  		error = xlog_recover_do_inode_buffer(mp, item, bp, buf_f);
> -		if (error)
> -			goto out_release;
>  	} else if (buf_f->blf_flags &
>  		  (XFS_BLF_UDQUOT_BUF|XFS_BLF_PDQUOT_BUF|XFS_BLF_GDQUOT_BUF)) {
> -		bool	dirty;
> -
> -		dirty = xlog_recover_do_dquot_buffer(mp, log, item, bp, buf_f);
> -		if (!dirty)
> +		if (!xlog_recover_this_dquot_buffer(mp, log, item, bp, buf_f))
>  			goto out_release;
> +
> +		error = xlog_recover_do_reg_buffer(mp, item, bp, buf_f,
> +				NULLCOMMITLSN);
>  	} else {
> -		xlog_recover_do_reg_buffer(mp, item, bp, buf_f, current_lsn);
> +		error = xlog_recover_do_reg_buffer(mp, item, bp, buf_f,
> +				current_lsn);
>  	}
> +	if (error)
> +		goto out_release;
>  
>  	/*
>  	 * Perform delayed write on the buffer.  Asynchronous writes will be
> -- 
> 2.43.0
> 
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/5] xfs: separate out inode buffer recovery a bit more
  2024-03-19  2:15 ` [PATCH 2/5] xfs: separate out inode buffer recovery a bit more Dave Chinner
  2024-03-19  7:26   ` Christoph Hellwig
@ 2024-03-19 18:40   ` Darrick J. Wong
  1 sibling, 0 replies; 17+ messages in thread
From: Darrick J. Wong @ 2024-03-19 18:40 UTC (permalink / raw
  To: Dave Chinner; +Cc: linux-xfs

On Tue, Mar 19, 2024 at 01:15:21PM +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> It really is a unique snowflake, so peal off from normal buffer
> recovery earlier and shuffle all the unique bits into the inode
> buffer recovery function.
> 
> Also, it looks like the handling of mismatched inode cluster buffer
> sizes is wrong - we have to write the recovered buffer -before- we
> mark it stale as we're not supposed to write stale buffers. I don't
> think we check that anywhere in the buffer IO path, but lets do it
> the right way anyway.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>

Looks good to me,
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> ---
>  fs/xfs/xfs_buf_item_recover.c | 99 ++++++++++++++++++++++-------------
>  1 file changed, 63 insertions(+), 36 deletions(-)
> 
> diff --git a/fs/xfs/xfs_buf_item_recover.c b/fs/xfs/xfs_buf_item_recover.c
> index dba57ee6fa6d..f994a303ad0a 100644
> --- a/fs/xfs/xfs_buf_item_recover.c
> +++ b/fs/xfs/xfs_buf_item_recover.c
> @@ -229,7 +229,7 @@ xlog_recover_validate_buf_type(
>  	 * just avoid the verification stage for non-crc filesystems
>  	 */
>  	if (!xfs_has_crc(mp))
> -		return;
> +		return 0;
>  
>  	magic32 = be32_to_cpu(*(__be32 *)bp->b_addr);
>  	magic16 = be16_to_cpu(*(__be16*)bp->b_addr);
> @@ -407,7 +407,7 @@ xlog_recover_validate_buf_type(
>  	 * skipped.
>  	 */
>  	if (current_lsn == NULLCOMMITLSN)
> -		return 0;;
> +		return 0;
>  
>  	if (warnmsg) {
>  		xfs_warn(mp, warnmsg);
> @@ -567,18 +567,22 @@ xlog_recover_this_dquot_buffer(
>  }
>  
>  /*
> - * Perform recovery for a buffer full of inodes.  In these buffers, the only
> - * data which should be recovered is that which corresponds to the
> - * di_next_unlinked pointers in the on disk inode structures.  The rest of the
> - * data for the inodes is always logged through the inodes themselves rather
> - * than the inode buffer and is recovered in xlog_recover_inode_pass2().
> + * Perform recovery for a buffer full of inodes. We don't have inode cluster
> + * buffer specific LSNs, so we always recover inode buffers if they contain
> + * inodes.
> + *
> + * In these buffers, the only inode data which should be recovered is that which
> + * corresponds to the di_next_unlinked pointers in the on disk inode structures.
> + * The rest of the data for the inodes is always logged through the inodes
> + * themselves rather than the inode buffer and is recovered in
> + * xlog_recover_inode_pass2().
>   *
>   * The only time when buffers full of inodes are fully recovered is when the
> - * buffer is full of newly allocated inodes.  In this case the buffer will
> - * not be marked as an inode buffer and so will be sent to
> - * xlog_recover_do_reg_buffer() below during recovery.
> + * buffer is full of newly allocated inodes.  In this case the buffer will not
> + * be marked as an inode buffer and so xlog_recover_do_reg_buffer() will be used
> + * instead.
>   */
> -STATIC int
> +static int
>  xlog_recover_do_inode_buffer(
>  	struct xfs_mount		*mp,
>  	struct xlog_recover_item	*item,
> @@ -598,6 +602,13 @@ xlog_recover_do_inode_buffer(
>  
>  	trace_xfs_log_recover_buf_inode_buf(mp->m_log, buf_f);
>  
> +	/*
> +	 * If the magic number doesn't match, something has gone wrong. Don't
> +	 * recover the buffer.
> +	 */
> +	if (cpu_to_be16(XFS_DINODE_MAGIC) != *((__be16 *)bp->b_addr))
> +		return -EFSCORRUPTED;
> +
>  	/*
>  	 * Post recovery validation only works properly on CRC enabled
>  	 * filesystems.
> @@ -677,6 +688,31 @@ xlog_recover_do_inode_buffer(
>  
>  	}
>  
> +	/*
> +	 * Make sure that only inode buffers with good sizes remain valid after
> +	 * recovering this buffer item.
> +	 *
> +	 * The kernel moves inodes in buffers of 1 block or inode_cluster_size
> +	 * bytes, whichever is bigger.  The inode buffers in the log can be a
> +	 * different size if the log was generated by an older kernel using
> +	 * unclustered inode buffers or a newer kernel running with a different
> +	 * inode cluster size.  Regardless, if the inode buffer size isn't
> +	 * max(blocksize, inode_cluster_size) for *our* value of
> +	 * inode_cluster_size, then we need to keep the buffer out of the buffer
> +	 * cache so that the buffer won't overlap with future reads of those
> +	 * inodes.
> +	 *
> +	 * To acheive this, we write the buffer ito recover the inodes then mark
> +	 * it stale so that it won't be found on overlapping buffer lookups and
> +	 * caller knows not to queue it for delayed write.
> +	 */
> +	if (BBTOB(bp->b_length) != M_IGEO(mp)->inode_cluster_size) {
> +		int error;
> +
> +		error = xfs_bwrite(bp);
> +		xfs_buf_stale(bp);
> +		return error;
> +	}
>  	return 0;
>  }
>  
> @@ -840,7 +876,6 @@ xlog_recover_get_buf_lsn(
>  	magic16 = be16_to_cpu(*(__be16 *)blk);
>  	switch (magic16) {
>  	case XFS_DQUOT_MAGIC:
> -	case XFS_DINODE_MAGIC:
>  		goto recover_immediately;
>  	default:
>  		break;
> @@ -910,6 +945,17 @@ xlog_recover_buf_commit_pass2(
>  	if (error)
>  		return error;
>  
> +	/*
> +	 * Inode buffer recovery is quite unique, so go out separate ways here
> +	 * to simplify the rest of the code.
> +	 */
> +	if (buf_f->blf_flags & XFS_BLF_INODE_BUF) {
> +		error = xlog_recover_do_inode_buffer(mp, item, bp, buf_f);
> +		if (error || (bp->b_flags & XBF_STALE))
> +			goto out_release;
> +		goto out_write;
> +	}
> +
>  	/*
>  	 * Recover the buffer only if we get an LSN from it and it's less than
>  	 * the lsn of the transaction we are replaying.
> @@ -946,9 +992,7 @@ xlog_recover_buf_commit_pass2(
>  		goto out_release;
>  	}
>  
> -	if (buf_f->blf_flags & XFS_BLF_INODE_BUF) {
> -		error = xlog_recover_do_inode_buffer(mp, item, bp, buf_f);
> -	} else if (buf_f->blf_flags &
> +	if (buf_f->blf_flags &
>  		  (XFS_BLF_UDQUOT_BUF|XFS_BLF_PDQUOT_BUF|XFS_BLF_GDQUOT_BUF)) {
>  		if (!xlog_recover_this_dquot_buffer(mp, log, item, bp, buf_f))
>  			goto out_release;
> @@ -965,28 +1009,11 @@ xlog_recover_buf_commit_pass2(
>  	/*
>  	 * Perform delayed write on the buffer.  Asynchronous writes will be
>  	 * slower when taking into account all the buffers to be flushed.
> -	 *
> -	 * Also make sure that only inode buffers with good sizes stay in
> -	 * the buffer cache.  The kernel moves inodes in buffers of 1 block
> -	 * or inode_cluster_size bytes, whichever is bigger.  The inode
> -	 * buffers in the log can be a different size if the log was generated
> -	 * by an older kernel using unclustered inode buffers or a newer kernel
> -	 * running with a different inode cluster size.  Regardless, if
> -	 * the inode buffer size isn't max(blocksize, inode_cluster_size)
> -	 * for *our* value of inode_cluster_size, then we need to keep
> -	 * the buffer out of the buffer cache so that the buffer won't
> -	 * overlap with future reads of those inodes.
>  	 */
> -	if (XFS_DINODE_MAGIC ==
> -	    be16_to_cpu(*((__be16 *)xfs_buf_offset(bp, 0))) &&
> -	    (BBTOB(bp->b_length) != M_IGEO(log->l_mp)->inode_cluster_size)) {
> -		xfs_buf_stale(bp);
> -		error = xfs_bwrite(bp);
> -	} else {
> -		ASSERT(bp->b_mount == mp);
> -		bp->b_flags |= _XBF_LOGRECOVERY;
> -		xfs_buf_delwri_queue(bp, buffer_list);
> -	}
> +out_write:
> +	ASSERT(bp->b_mount == mp);
> +	bp->b_flags |= _XBF_LOGRECOVERY;
> +	xfs_buf_delwri_queue(bp, buffer_list);
>  
>  out_release:
>  	xfs_buf_relse(bp);
> -- 
> 2.43.0
> 
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/5] xfs: recover dquot buffers unconditionally
  2024-03-19  2:15 ` [PATCH 3/5] xfs: recover dquot buffers unconditionally Dave Chinner
@ 2024-03-19 18:49   ` Darrick J. Wong
  2024-03-19 21:46   ` Christoph Hellwig
  1 sibling, 0 replies; 17+ messages in thread
From: Darrick J. Wong @ 2024-03-19 18:49 UTC (permalink / raw
  To: Dave Chinner; +Cc: linux-xfs

On Tue, Mar 19, 2024 at 01:15:22PM +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> Dquot buffers are only logged when the dquot cluster is allocated.
> Recovery of them is always done and not conditional on an LSN found
> in the buffer because there aren't dquots stamped in the buffer when
> the initialisation is replayed after allocation.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>

Looks sane,
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> ---
>  fs/xfs/libxfs/xfs_log_format.h |   6 ++
>  fs/xfs/xfs_buf_item_recover.c  | 129 +++++++++++++++++----------------
>  2 files changed, 72 insertions(+), 63 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_log_format.h b/fs/xfs/libxfs/xfs_log_format.h
> index 16872972e1e9..5ac0c3066930 100644
> --- a/fs/xfs/libxfs/xfs_log_format.h
> +++ b/fs/xfs/libxfs/xfs_log_format.h
> @@ -508,6 +508,9 @@ struct xfs_log_dinode {
>  #define XFS_BLF_PDQUOT_BUF	(1<<3)
>  #define	XFS_BLF_GDQUOT_BUF	(1<<4)
>  
> +#define XFS_BLF_DQUOT_BUF	\
> +	(XFS_BLF_UDQUOT_BUF | XFS_BLF_PDQUOT_BUF | XFS_BLF_GDQUOT_BUF)
> +
>  /*
>   * This is the structure used to lay out a buf log item in the log.  The data
>   * map describes which 128 byte chunks of the buffer have been logged.
> @@ -571,6 +574,9 @@ enum xfs_blft {
>  	XFS_BLFT_MAX_BUF = (1 << XFS_BLFT_BITS),
>  };
>  
> +#define XFS_BLFT_DQUOT_BUF	\
> +	(XFS_BLFT_UDQUOT_BUF | XFS_BLFT_PDQUOT_BUF | XFS_BLFT_GDQUOT_BUF)
> +
>  static inline void
>  xfs_blft_to_flags(struct xfs_buf_log_format *blf, enum xfs_blft type)
>  {
> diff --git a/fs/xfs/xfs_buf_item_recover.c b/fs/xfs/xfs_buf_item_recover.c
> index f994a303ad0a..90740fcf2fbe 100644
> --- a/fs/xfs/xfs_buf_item_recover.c
> +++ b/fs/xfs/xfs_buf_item_recover.c
> @@ -450,8 +450,6 @@ xlog_recover_do_reg_buffer(
>  	int			i;
>  	int			bit;
>  	int			nbits;
> -	xfs_failaddr_t		fa;
> -	const size_t		size_disk_dquot = sizeof(struct xfs_disk_dquot);
>  
>  	trace_xfs_log_recover_buf_reg_buf(mp->m_log, buf_f);
>  
> @@ -481,39 +479,10 @@ xlog_recover_do_reg_buffer(
>  		if (item->ri_buf[i].i_len < (nbits << XFS_BLF_SHIFT))
>  			nbits = item->ri_buf[i].i_len >> XFS_BLF_SHIFT;
>  
> -		/*
> -		 * Do a sanity check if this is a dquot buffer. Just checking
> -		 * the first dquot in the buffer should do. XXXThis is
> -		 * probably a good thing to do for other buf types also.
> -		 */
> -		fa = NULL;
> -		if (buf_f->blf_flags &
> -		   (XFS_BLF_UDQUOT_BUF|XFS_BLF_PDQUOT_BUF|XFS_BLF_GDQUOT_BUF)) {
> -			if (item->ri_buf[i].i_addr == NULL) {
> -				xfs_alert(mp,
> -					"XFS: NULL dquot in %s.", __func__);
> -				goto next;
> -			}
> -			if (item->ri_buf[i].i_len < size_disk_dquot) {
> -				xfs_alert(mp,
> -					"XFS: dquot too small (%d) in %s.",
> -					item->ri_buf[i].i_len, __func__);
> -				goto next;
> -			}
> -			fa = xfs_dquot_verify(mp, item->ri_buf[i].i_addr, -1);
> -			if (fa) {
> -				xfs_alert(mp,
> -	"dquot corrupt at %pS trying to replay into block 0x%llx",
> -					fa, xfs_buf_daddr(bp));
> -				goto next;
> -			}
> -		}
> -
>  		memcpy(xfs_buf_offset(bp,
>  			(uint)bit << XFS_BLF_SHIFT),	/* dest */
>  			item->ri_buf[i].i_addr,		/* source */
>  			nbits<<XFS_BLF_SHIFT);		/* length */
> - next:
>  		i++;
>  		bit += nbits;
>  	}
> @@ -566,6 +535,56 @@ xlog_recover_this_dquot_buffer(
>  	return true;
>  }
>  
> +/*
> + * Do a sanity check of each region in the item to ensure we are actually
> + * recovering a dquot buffer item.
> + */
> +static int
> +xlog_recover_verify_dquot_buf_item(
> +	struct xfs_mount		*mp,
> +	struct xlog_recover_item	*item,
> +	struct xfs_buf			*bp,
> +	struct xfs_buf_log_format	*buf_f)
> +{
> +	xfs_failaddr_t			fa;
> +	int				i;
> +
> +	switch (xfs_blft_from_flags(buf_f)) {
> +	case XFS_BLFT_UDQUOT_BUF:
> +	case XFS_BLFT_PDQUOT_BUF:
> +	case XFS_BLFT_GDQUOT_BUF:
> +		break;
> +	default:
> +		xfs_alert(mp,
> +			"XFS: dquot buffer log format type mismatch in %s.",
> +			__func__);
> +		xfs_buf_corruption_error(bp, __this_address);
> +		return -EFSCORRUPTED;
> +	}
> +
> +	for (i = 1; i < item->ri_total; i++) {
> +		if (item->ri_buf[i].i_addr == NULL) {
> +			xfs_alert(mp,
> +				"XFS: NULL dquot in %s.", __func__);
> +			return -EFSCORRUPTED;
> +		}
> +		if (item->ri_buf[i].i_len < sizeof(struct xfs_disk_dquot)) {
> +			xfs_alert(mp,
> +				"XFS: dquot too small (%d) in %s.",
> +				item->ri_buf[i].i_len, __func__);
> +			return -EFSCORRUPTED;
> +		}
> +		fa = xfs_dquot_verify(mp, item->ri_buf[i].i_addr, -1);
> +		if (fa) {
> +			xfs_alert(mp,
> +"dquot corrupt at %pS trying to replay into block 0x%llx",
> +				fa, xfs_buf_daddr(bp));
> +			return -EFSCORRUPTED;
> +		}
> +	}
> +	return 0;
> +}
> +
>  /*
>   * Perform recovery for a buffer full of inodes. We don't have inode cluster
>   * buffer specific LSNs, so we always recover inode buffers if they contain
> @@ -743,7 +762,6 @@ xlog_recover_get_buf_lsn(
>  	struct xfs_buf_log_format *buf_f)
>  {
>  	uint32_t		magic32;
> -	uint16_t		magic16;
>  	uint16_t		magicda;
>  	void			*blk = bp->b_addr;
>  	uuid_t			*uuid;
> @@ -862,27 +880,7 @@ xlog_recover_get_buf_lsn(
>  		return lsn;
>  	}
>  
> -	/*
> -	 * We do individual object checks on dquot and inode buffers as they
> -	 * have their own individual LSN records. Also, we could have a stale
> -	 * buffer here, so we have to at least recognise these buffer types.
> -	 *
> -	 * A notd complexity here is inode unlinked list processing - it logs
> -	 * the inode directly in the buffer, but we don't know which inodes have
> -	 * been modified, and there is no global buffer LSN. Hence we need to
> -	 * recover all inode buffer types immediately. This problem will be
> -	 * fixed by logical logging of the unlinked list modifications.
> -	 */
> -	magic16 = be16_to_cpu(*(__be16 *)blk);
> -	switch (magic16) {
> -	case XFS_DQUOT_MAGIC:
> -		goto recover_immediately;
> -	default:
> -		break;
> -	}
> -
>  	/* unknown buffer contents, recover immediately */
> -
>  recover_immediately:
>  	return (xfs_lsn_t)-1;
>  
> @@ -956,6 +954,21 @@ xlog_recover_buf_commit_pass2(
>  		goto out_write;
>  	}
>  
> +	if (buf_f->blf_flags & XFS_BLF_DQUOT_BUF) {
> +		if (!xlog_recover_this_dquot_buffer(mp, log, item, bp, buf_f))
> +			goto out_release;
> +
> +		error = xlog_recover_verify_dquot_buf_item(mp, item, bp, buf_f);
> +		if (error)
> +			goto out_release;
> +
> +		error = xlog_recover_do_reg_buffer(mp, item, bp, buf_f,
> +				NULLCOMMITLSN);
> +		if (error)
> +			goto out_release;
> +		goto out_write;
> +	}
> +
>  	/*
>  	 * Recover the buffer only if we get an LSN from it and it's less than
>  	 * the lsn of the transaction we are replaying.
> @@ -992,17 +1005,7 @@ xlog_recover_buf_commit_pass2(
>  		goto out_release;
>  	}
>  
> -	if (buf_f->blf_flags &
> -		  (XFS_BLF_UDQUOT_BUF|XFS_BLF_PDQUOT_BUF|XFS_BLF_GDQUOT_BUF)) {
> -		if (!xlog_recover_this_dquot_buffer(mp, log, item, bp, buf_f))
> -			goto out_release;
> -
> -		error = xlog_recover_do_reg_buffer(mp, item, bp, buf_f,
> -				NULLCOMMITLSN);
> -	} else {
> -		error = xlog_recover_do_reg_buffer(mp, item, bp, buf_f,
> -				current_lsn);
> -	}
> +	error = xlog_recover_do_reg_buffer(mp, item, bp, buf_f, current_lsn);
>  	if (error)
>  		goto out_release;
>  
> -- 
> 2.43.0
> 
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 4/5] xfs: detect partial buffer recovery operations
  2024-03-19  2:15 ` [PATCH 4/5] xfs: detect partial buffer recovery operations Dave Chinner
@ 2024-03-19 20:39   ` Darrick J. Wong
  2024-03-19 22:54   ` Christoph Hellwig
  1 sibling, 0 replies; 17+ messages in thread
From: Darrick J. Wong @ 2024-03-19 20:39 UTC (permalink / raw
  To: Dave Chinner; +Cc: linux-xfs

On Tue, Mar 19, 2024 at 01:15:23PM +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> When a compound buffer is logged (e.g. fragmented large directory
> block) we record it in the log as a series of separate buffer log
> format items in the journal. These get recovered individually, and
> because they are for non-contiguous extent ranges, we cannot use
> buffer addresses to detect that the buffer format items are from the
> same directory block.
> 
> Further, we cannot use LSN checks to determine if the partial
> block buffers should be recovered - apart from the first buffer we
> don't have a header with an LSN in it to check.
> 
> Finally, we cannot add a verifier to a partial block buffer because,
> again, it will fail the verifier checks and report corruption. We
> already skip this step due to bad magic number detection, but we
> should be able to do better here.
> 
> The one thing we can rely on, though, is that each buffer format
> item is written consecutively in the journal. They are built at
> commit time into a single log iovec and chained into the iclog write
> log vector chain as an unbroken sequence. Hence all the parts of a
> compound buffer should be consecutive buf log format items in the
> transaction being recovered.
> 
> Unfortunately, we don't have the information available in recovery
> to do a full compound buffer instantiation for recovery. We only
> have the fragments that contained modifications in the journal, and
> so there may be missing fragments that are still clean and hence are
> not in the journal. Hence we cannot use journal state to rebuild the
> compound buffer entirely and hence recover it as a complete entity
> and run a verifier over it before writeback.
> 
> Hence the first thing we need to do is detect such partial buffer
> recovery situations and track whether we need to skip all the
> partial buffers due to the LSN check in the initial header fragment
> read from disk.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/xfs_buf_item_recover.c | 178 +++++++++++++++++++++++++++-------
>  1 file changed, 143 insertions(+), 35 deletions(-)
> 
> diff --git a/fs/xfs/xfs_buf_item_recover.c b/fs/xfs/xfs_buf_item_recover.c
> index 90740fcf2fbe..9225baa62755 100644
> --- a/fs/xfs/xfs_buf_item_recover.c
> +++ b/fs/xfs/xfs_buf_item_recover.c
> @@ -434,18 +434,17 @@ xlog_recover_validate_buf_type(
>  }
>  
>  /*
> - * Perform a 'normal' buffer recovery.  Each logged region of the
> + * Perform recovery of the logged regions. Each logged region of the
>   * buffer should be copied over the corresponding region in the
>   * given buffer.  The bitmap in the buf log format structure indicates
>   * where to place the logged data.
>   */
> -static int
> -xlog_recover_do_reg_buffer(
> +static void
> +xlog_recover_buffer(
>  	struct xfs_mount		*mp,
>  	struct xlog_recover_item	*item,
>  	struct xfs_buf			*bp,
> -	struct xfs_buf_log_format	*buf_f,
> -	xfs_lsn_t			current_lsn)
> +	struct xfs_buf_log_format	*buf_f)
>  {
>  	int			i;
>  	int			bit;
> @@ -489,7 +488,17 @@ xlog_recover_do_reg_buffer(
>  
>  	/* Shouldn't be any more regions */
>  	ASSERT(i == item->ri_total);
> +}
>  
> +static int
> +xlog_recover_do_reg_buffer(
> +	struct xfs_mount		*mp,
> +	struct xlog_recover_item	*item,
> +	struct xfs_buf			*bp,
> +	struct xfs_buf_log_format	*buf_f,
> +	xfs_lsn_t			current_lsn)
> +{
> +	xlog_recover_buffer(mp, item, bp, buf_f);
>  	return xlog_recover_validate_buf_type(mp, bp, buf_f, current_lsn);
>  }
>  
> @@ -735,6 +744,67 @@ xlog_recover_do_inode_buffer(
>  	return 0;
>  }
>  
> +static bool
> +xlog_recovery_is_dir_buf(
> +	struct xfs_buf_log_format	*buf_f)
> +{
> +	switch (xfs_blft_from_flags(buf_f)) {

/me wonders if the argument to xlog_recovery_is_dir_buf and
xfs_blft_from_flags ought to be switched to (const xfs_buf_log_format *).

> +	case XFS_BLFT_DIR_BLOCK_BUF:
> +	case XFS_BLFT_DIR_DATA_BUF:
> +	case XFS_BLFT_DIR_FREE_BUF:
> +	case XFS_BLFT_DIR_LEAF1_BUF:
> +	case XFS_BLFT_DIR_LEAFN_BUF:
> +	case XFS_BLFT_DA_NODE_BUF:
> +		return true;
> +	default:
> +		break;
> +	}
> +	return false;
> +}
> +
> +/*
> + * Partial dabuf recovery.
> + *
> + * There are two main cases here - a buffer that contains the dabuf header and
> + * hence can be magic number and LSN checked, and then everything else.
> + *
> + * We can determine if the former should be replayed or not via LSN checks, but
> + * we cannot do that with the latter, so the only choice we have here is to
> + * always recover the changes regardless of whether this means metadata on disk
> + * will go backwards in time. This, at least, means that the changes in each
> + * checkpoint are applied consistently to the dabuf and we don't do really
> + * stupid things like skip the header fragment replay and then replay all the
> + * other changes to the dabuf block.
> + *
> + * While this is not ideal, finishing log recovery should them replay all the
> + * remaining changes across this buffer and so bring it back to being consistent
> + * on disk at the completion of recovery. Hence this "going backwards in time"
> + * situation will only be relevant to failed journal replay situations. These
> + * are rare and will require xfs_repair to be run, anyway, so the inconsistency
> + * that results will be corrected before the filesystem goes back into service,
> + * anyway.

Does this means that all fragments of logged discontig dabufs will be
replayed fully?  IOWs, no skipping ahead based on LSNs like we do for
everything else (except the rtbitmap/rtsummary blocks)?

> + *
> + * Important: This partial fragment recovery relies on log recovery purging the
> + * buffer cache after completion of this recovery phase. These partial buffers
> + * are never used at runtime (discontiguous buffers will be used instead), so
> + * they must be removed from the buffer cache to prevent them from causing
> + * overlapping range lookup failures for the entire dabuf range.

Should the comment about the xfs_buftarg_drain call in
xfs_log_mount_finish gain a comment about needing to do this for
consistency of discontig directory blocks?

Also, do we have an easy way to fault-inject discontig directory blocks?

> + */
> +static void
> +xlog_recover_do_partial_dabuf(
> +	struct xfs_mount		*mp,
> +	struct xlog_recover_item	*item,
> +	struct xfs_buf			*bp,
> +	struct xfs_buf_log_format	*buf_f)
> +{
> +	/*
> +	 * Always recover without verification or write verifiers. Use delwri
> +	 * and rely on post pass2 recovery cache purge to clean these out of
> +	 * memory.
> +	 */
> +	xlog_recover_buffer(mp, item, bp, buf_f);
> +}
> +
>  /*
>   * V5 filesystems know the age of the buffer on disk being recovered. We can
>   * have newer objects on disk than we are replaying, and so for these cases we
> @@ -886,6 +956,54 @@ xlog_recover_get_buf_lsn(
>  
>  }
>  
> +/*
> + * Recover the buffer only if we get an LSN from it and it's less than the lsn
> + * of the transaction we are replaying.
> + *
> + * Note that we have to be extremely careful of readahead here.  Readahead does
> + * not attach verfiers to the buffers so if we don't actually do any replay

                 verifiers

> + * after readahead because of the LSN we found in the buffer if more recent than

                                                                is more

> + * that current transaction then we need to attach the verifier directly.
> + * Failure to do so can lead to future recovery actions (e.g. EFI and unlinked
> + * list recovery) can operate on the buffers and they won't get the verifier

This sentence reads awkwardly to me.  How about

"Failure to do so can lead to future recovery actions (e.g. EFI and
unlinked list recovery) operating on the buffers without attaching a
verifier."

Also, does 'unlinked' refer to unlinked inode processing?

> + * attached. This can lead to blocks on disk having the correct content but a
> + * stale CRC.
> + *
> + * It is safe to assume these clean buffers are currently up to date.  If the
> + * buffer is dirtied by a later transaction being replayed, then the verifier
> + * will be reset to match whatever recover turns that buffer into.
> + *
> + * Return true if the buffer needs to be recovered, false if it doesn't.
> + */
> +static bool
> +xlog_recover_this_buffer(

I kinda wish this function returned either 0 for "recovery not needed",
1 for "recovery needed", or a negative errno for corruption.  Then the
call site looks a little more regular:

	error = xlog_recover_this_buffer(...);
	if (error)
		goto out_release;

Except for the 'return 1' behavior.  If you were thinking about that
already then go ahead and change it; if not, then leave it.

> +	struct xfs_mount		*mp,
> +	struct xfs_buf			*bp,
> +	struct xfs_buf_log_format	*buf_f,
> +	xfs_lsn_t			current_lsn)
> +{
> +	xfs_lsn_t			lsn;
> +
> +	lsn = xlog_recover_get_buf_lsn(mp, bp, buf_f);
> +	if (!lsn)
> +		return true;
> +	if (lsn == -1)
> +		return true;
> +	if (XFS_LSN_CMP(lsn, current_lsn) < 0)
> +		return true;
> +
> +	trace_xfs_log_recover_buf_skip(mp->m_log, buf_f);
> +	xlog_recover_validate_buf_type(mp, bp, buf_f, NULLCOMMITLSN);
> +
> +	/*
> +	 * We're skipping replay of this buffer log item due to the log
> +	 * item LSN being behind the ondisk buffer.  Verify the buffer
> +	 * contents since we aren't going to run the write verifier.
> +	 */
> +	if (bp->b_ops)
> +		bp->b_ops->verify_read(bp);
> +	return false;
> +}
>  /*
>   * This routine replays a modification made to a buffer at runtime.
>   * There are actually two types of buffer, regular and inode, which
> @@ -920,7 +1038,6 @@ xlog_recover_buf_commit_pass2(
>  	struct xfs_mount		*mp = log->l_mp;
>  	struct xfs_buf			*bp;
>  	int				error;
> -	xfs_lsn_t			lsn;
>  
>  	/*
>  	 * In this pass we only want to recover all the buffers which have
> @@ -962,7 +1079,8 @@ xlog_recover_buf_commit_pass2(
>  		if (error)
>  			goto out_release;
>  
> -		error = xlog_recover_do_reg_buffer(mp, item, bp, buf_f,
> +		xlog_recover_buffer(mp, item, bp, buf_f);
> +		error = xlog_recover_validate_buf_type(mp, bp, buf_f,
>  				NULLCOMMITLSN);
>  		if (error)
>  			goto out_release;
> @@ -970,41 +1088,31 @@ xlog_recover_buf_commit_pass2(
>  	}
>  
>  	/*
> -	 * Recover the buffer only if we get an LSN from it and it's less than
> -	 * the lsn of the transaction we are replaying.
> -	 *
> -	 * Note that we have to be extremely careful of readahead here.
> -	 * Readahead does not attach verfiers to the buffers so if we don't
> -	 * actually do any replay after readahead because of the LSN we found
> -	 * in the buffer if more recent than that current transaction then we
> -	 * need to attach the verifier directly. Failure to do so can lead to
> -	 * future recovery actions (e.g. EFI and unlinked list recovery) can
> -	 * operate on the buffers and they won't get the verifier attached. This
> -	 * can lead to blocks on disk having the correct content but a stale
> -	 * CRC.
> -	 *
> -	 * It is safe to assume these clean buffers are currently up to date.
> -	 * If the buffer is dirtied by a later transaction being replayed, then
> -	 * the verifier will be reset to match whatever recover turns that
> -	 * buffer into.
> +	 * Directory buffers can be larger than a single filesystem block and
> +	 * if they are they can be fragmented. There are lots of concerns about
> +	 * recovering these, so push them out of line where the concerns can be
> +	 * documented clearly.
>  	 */
> -	lsn = xlog_recover_get_buf_lsn(mp, bp, buf_f);
> -	if (lsn && lsn != -1 && XFS_LSN_CMP(lsn, current_lsn) >= 0) {
> -		trace_xfs_log_recover_buf_skip(log, buf_f);
> -		xlog_recover_validate_buf_type(mp, bp, buf_f, NULLCOMMITLSN);
> +	if (xlog_recovery_is_dir_buf(buf_f) &&
> +	    mp->m_dir_geo->blksize != BBTOB(buf_f->blf_len)) {
> +		xlog_recover_do_partial_dabuf(mp, item, bp, buf_f);
> +		goto out_write;
> +	}
>  
> +	/*
> +	 * Whole buffer recovery, dependent on the LSN in the on-disk structure.
> +	 */
> +	if (!xlog_recover_this_buffer(mp, bp, buf_f, current_lsn)) {
>  		/*
> -		 * We're skipping replay of this buffer log item due to the log
> -		 * item LSN being behind the ondisk buffer.  Verify the buffer
> -		 * contents since we aren't going to run the write verifier.
> +		 * We may have verified this buffer even though we aren't
> +		 * recovering it. Return the verifier error for early detection
> +		 * of recovery inconsistencies.
>  		 */
> -		if (bp->b_ops) {
> -			bp->b_ops->verify_read(bp);
> -			error = bp->b_error;
> -		}
> +		error = bp->b_error;
>  		goto out_release;
>  	}
>  
> +

Extra blank line.

--D

>  	error = xlog_recover_do_reg_buffer(mp, item, bp, buf_f, current_lsn);
>  	if (error)
>  		goto out_release;
> -- 
> 2.43.0
> 
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 5/5] xfs: consistently use struct xlog in buffer item recovery
  2024-03-19  2:15 ` [PATCH 5/5] xfs: consistently use struct xlog in buffer item recovery Dave Chinner
@ 2024-03-19 20:40   ` Darrick J. Wong
  2024-03-19 21:48   ` Christoph Hellwig
  1 sibling, 0 replies; 17+ messages in thread
From: Darrick J. Wong @ 2024-03-19 20:40 UTC (permalink / raw
  To: Dave Chinner; +Cc: linux-xfs

On Tue, Mar 19, 2024 at 01:15:24PM +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> Rather than passing a mix of xfs_mount and xlog (and sometimes both)
> through the call chain of buffer log item recovery, just pass
> the struct xlog and pull the xfs_mount from that only at the leaf
> functions where it is needed. This makes it all just a little
> cleaner.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>

Looks good to me,
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> ---
>  fs/xfs/xfs_buf_item_recover.c | 94 +++++++++++++++++------------------
>  1 file changed, 47 insertions(+), 47 deletions(-)
> 
> diff --git a/fs/xfs/xfs_buf_item_recover.c b/fs/xfs/xfs_buf_item_recover.c
> index 9225baa62755..edd03b08c969 100644
> --- a/fs/xfs/xfs_buf_item_recover.c
> +++ b/fs/xfs/xfs_buf_item_recover.c
> @@ -209,7 +209,7 @@ xlog_recover_buf_commit_pass1(
>   */
>  static int
>  xlog_recover_validate_buf_type(
> -	struct xfs_mount		*mp,
> +	struct xlog			*log,
>  	struct xfs_buf			*bp,
>  	struct xfs_buf_log_format	*buf_f,
>  	xfs_lsn_t			current_lsn)
> @@ -228,7 +228,7 @@ xlog_recover_validate_buf_type(
>  	 * inconsistent state resulting in verification failures. Hence for now
>  	 * just avoid the verification stage for non-crc filesystems
>  	 */
> -	if (!xfs_has_crc(mp))
> +	if (!xfs_has_crc(log->l_mp))
>  		return 0;
>  
>  	magic32 = be32_to_cpu(*(__be32 *)bp->b_addr);
> @@ -396,7 +396,7 @@ xlog_recover_validate_buf_type(
>  		break;
>  #endif /* CONFIG_XFS_RT */
>  	default:
> -		xfs_warn(mp, "Unknown buffer type %d!",
> +		xfs_warn(log->l_mp, "Unknown buffer type %d!",
>  			 xfs_blft_from_flags(buf_f));
>  		break;
>  	}
> @@ -410,7 +410,7 @@ xlog_recover_validate_buf_type(
>  		return 0;
>  
>  	if (warnmsg) {
> -		xfs_warn(mp, warnmsg);
> +		xfs_warn(log->l_mp, warnmsg);
>  		xfs_buf_corruption_error(bp, __this_address);
>  		return -EFSCORRUPTED;
>  	}
> @@ -428,7 +428,7 @@ xlog_recover_validate_buf_type(
>  	 */
>  	ASSERT(bp->b_ops);
>  	bp->b_flags |= _XBF_LOGRECOVERY;
> -	xfs_buf_item_init(bp, mp);
> +	xfs_buf_item_init(bp, log->l_mp);
>  	bp->b_log_item->bli_item.li_lsn = current_lsn;
>  	return 0;
>  }
> @@ -441,7 +441,7 @@ xlog_recover_validate_buf_type(
>   */
>  static void
>  xlog_recover_buffer(
> -	struct xfs_mount		*mp,
> +	struct xlog			*log,
>  	struct xlog_recover_item	*item,
>  	struct xfs_buf			*bp,
>  	struct xfs_buf_log_format	*buf_f)
> @@ -450,7 +450,7 @@ xlog_recover_buffer(
>  	int			bit;
>  	int			nbits;
>  
> -	trace_xfs_log_recover_buf_reg_buf(mp->m_log, buf_f);
> +	trace_xfs_log_recover_buf_reg_buf(log, buf_f);
>  
>  	bit = 0;
>  	i = 1;  /* 0 is the buf format structure */
> @@ -492,14 +492,14 @@ xlog_recover_buffer(
>  
>  static int
>  xlog_recover_do_reg_buffer(
> -	struct xfs_mount		*mp,
> +	struct xlog			*log,
>  	struct xlog_recover_item	*item,
>  	struct xfs_buf			*bp,
>  	struct xfs_buf_log_format	*buf_f,
>  	xfs_lsn_t			current_lsn)
>  {
> -	xlog_recover_buffer(mp, item, bp, buf_f);
> -	return xlog_recover_validate_buf_type(mp, bp, buf_f, current_lsn);
> +	xlog_recover_buffer(log, item, bp, buf_f);
> +	return xlog_recover_validate_buf_type(log, bp, buf_f, current_lsn);
>  }
>  
>  /*
> @@ -513,7 +513,6 @@ xlog_recover_do_reg_buffer(
>   */
>  static bool
>  xlog_recover_this_dquot_buffer(
> -	struct xfs_mount		*mp,
>  	struct xlog			*log,
>  	struct xlog_recover_item	*item,
>  	struct xfs_buf			*bp,
> @@ -526,7 +525,7 @@ xlog_recover_this_dquot_buffer(
>  	/*
>  	 * Filesystems are required to send in quota flags at mount time.
>  	 */
> -	if (!mp->m_qflags)
> +	if (!log->l_mp->m_qflags)
>  		return false;
>  
>  	type = 0;
> @@ -550,7 +549,7 @@ xlog_recover_this_dquot_buffer(
>   */
>  static int
>  xlog_recover_verify_dquot_buf_item(
> -	struct xfs_mount		*mp,
> +	struct xlog			*log,
>  	struct xlog_recover_item	*item,
>  	struct xfs_buf			*bp,
>  	struct xfs_buf_log_format	*buf_f)
> @@ -564,7 +563,7 @@ xlog_recover_verify_dquot_buf_item(
>  	case XFS_BLFT_GDQUOT_BUF:
>  		break;
>  	default:
> -		xfs_alert(mp,
> +		xfs_alert(log->l_mp,
>  			"XFS: dquot buffer log format type mismatch in %s.",
>  			__func__);
>  		xfs_buf_corruption_error(bp, __this_address);
> @@ -573,19 +572,19 @@ xlog_recover_verify_dquot_buf_item(
>  
>  	for (i = 1; i < item->ri_total; i++) {
>  		if (item->ri_buf[i].i_addr == NULL) {
> -			xfs_alert(mp,
> +			xfs_alert(log->l_mp,
>  				"XFS: NULL dquot in %s.", __func__);
>  			return -EFSCORRUPTED;
>  		}
>  		if (item->ri_buf[i].i_len < sizeof(struct xfs_disk_dquot)) {
> -			xfs_alert(mp,
> +			xfs_alert(log->l_mp,
>  				"XFS: dquot too small (%d) in %s.",
>  				item->ri_buf[i].i_len, __func__);
>  			return -EFSCORRUPTED;
>  		}
> -		fa = xfs_dquot_verify(mp, item->ri_buf[i].i_addr, -1);
> +		fa = xfs_dquot_verify(log->l_mp, item->ri_buf[i].i_addr, -1);
>  		if (fa) {
> -			xfs_alert(mp,
> +			xfs_alert(log->l_mp,
>  "dquot corrupt at %pS trying to replay into block 0x%llx",
>  				fa, xfs_buf_daddr(bp));
>  			return -EFSCORRUPTED;
> @@ -612,11 +611,12 @@ xlog_recover_verify_dquot_buf_item(
>   */
>  static int
>  xlog_recover_do_inode_buffer(
> -	struct xfs_mount		*mp,
> +	struct xlog			*log,
>  	struct xlog_recover_item	*item,
>  	struct xfs_buf			*bp,
>  	struct xfs_buf_log_format	*buf_f)
>  {
> +	struct xfs_sb			*sbp = &log->l_mp->m_sb;
>  	int				i;
>  	int				item_index = 0;
>  	int				bit = 0;
> @@ -628,7 +628,7 @@ xlog_recover_do_inode_buffer(
>  	xfs_agino_t			*logged_nextp;
>  	xfs_agino_t			*buffer_nextp;
>  
> -	trace_xfs_log_recover_buf_inode_buf(mp->m_log, buf_f);
> +	trace_xfs_log_recover_buf_inode_buf(log, buf_f);
>  
>  	/*
>  	 * If the magic number doesn't match, something has gone wrong. Don't
> @@ -641,12 +641,12 @@ xlog_recover_do_inode_buffer(
>  	 * Post recovery validation only works properly on CRC enabled
>  	 * filesystems.
>  	 */
> -	if (xfs_has_crc(mp))
> +	if (xfs_has_crc(log->l_mp))
>  		bp->b_ops = &xfs_inode_buf_ops;
>  
> -	inodes_per_buf = BBTOB(bp->b_length) >> mp->m_sb.sb_inodelog;
> +	inodes_per_buf = BBTOB(bp->b_length) >> sbp->sb_inodelog;
>  	for (i = 0; i < inodes_per_buf; i++) {
> -		next_unlinked_offset = (i * mp->m_sb.sb_inodesize) +
> +		next_unlinked_offset = (i * sbp->sb_inodesize) +
>  			offsetof(struct xfs_dinode, di_next_unlinked);
>  
>  		while (next_unlinked_offset >=
> @@ -695,8 +695,8 @@ xlog_recover_do_inode_buffer(
>  		 */
>  		logged_nextp = item->ri_buf[item_index].i_addr +
>  				next_unlinked_offset - reg_buf_offset;
> -		if (XFS_IS_CORRUPT(mp, *logged_nextp == 0)) {
> -			xfs_alert(mp,
> +		if (XFS_IS_CORRUPT(log->l_mp, *logged_nextp == 0)) {
> +			xfs_alert(log->l_mp,
>  		"Bad inode buffer log record (ptr = "PTR_FMT", bp = "PTR_FMT"). "
>  		"Trying to replay bad (0) inode di_next_unlinked field.",
>  				item, bp);
> @@ -711,8 +711,8 @@ xlog_recover_do_inode_buffer(
>  		 * have to leave the inode in a consistent state for whoever
>  		 * reads it next....
>  		 */
> -		xfs_dinode_calc_crc(mp,
> -				xfs_buf_offset(bp, i * mp->m_sb.sb_inodesize));
> +		xfs_dinode_calc_crc(log->l_mp,
> +				xfs_buf_offset(bp, i * sbp->sb_inodesize));
>  
>  	}
>  
> @@ -734,7 +734,7 @@ xlog_recover_do_inode_buffer(
>  	 * it stale so that it won't be found on overlapping buffer lookups and
>  	 * caller knows not to queue it for delayed write.
>  	 */
> -	if (BBTOB(bp->b_length) != M_IGEO(mp)->inode_cluster_size) {
> +	if (BBTOB(bp->b_length) != M_IGEO(log->l_mp)->inode_cluster_size) {
>  		int error;
>  
>  		error = xfs_bwrite(bp);
> @@ -792,7 +792,7 @@ xlog_recovery_is_dir_buf(
>   */
>  static void
>  xlog_recover_do_partial_dabuf(
> -	struct xfs_mount		*mp,
> +	struct xlog			*log,
>  	struct xlog_recover_item	*item,
>  	struct xfs_buf			*bp,
>  	struct xfs_buf_log_format	*buf_f)
> @@ -802,7 +802,7 @@ xlog_recover_do_partial_dabuf(
>  	 * and rely on post pass2 recovery cache purge to clean these out of
>  	 * memory.
>  	 */
> -	xlog_recover_buffer(mp, item, bp, buf_f);
> +	xlog_recover_buffer(log, item, bp, buf_f);
>  }
>  
>  /*
> @@ -827,7 +827,7 @@ xlog_recover_do_partial_dabuf(
>   */
>  static xfs_lsn_t
>  xlog_recover_get_buf_lsn(
> -	struct xfs_mount	*mp,
> +	struct xlog		*log,
>  	struct xfs_buf		*bp,
>  	struct xfs_buf_log_format *buf_f)
>  {
> @@ -839,7 +839,7 @@ xlog_recover_get_buf_lsn(
>  	uint16_t		blft;
>  
>  	/* v4 filesystems always recover immediately */
> -	if (!xfs_has_crc(mp))
> +	if (!xfs_has_crc(log->l_mp))
>  		goto recover_immediately;
>  
>  	/*
> @@ -916,7 +916,7 @@ xlog_recover_get_buf_lsn(
>  		 * the relevant UUID in the superblock.
>  		 */
>  		lsn = be64_to_cpu(((struct xfs_dsb *)blk)->sb_lsn);
> -		if (xfs_has_metauuid(mp))
> +		if (xfs_has_metauuid(log->l_mp))
>  			uuid = &((struct xfs_dsb *)blk)->sb_meta_uuid;
>  		else
>  			uuid = &((struct xfs_dsb *)blk)->sb_uuid;
> @@ -926,7 +926,7 @@ xlog_recover_get_buf_lsn(
>  	}
>  
>  	if (lsn != (xfs_lsn_t)-1) {
> -		if (!uuid_equal(&mp->m_sb.sb_meta_uuid, uuid))
> +		if (!uuid_equal(&log->l_mp->m_sb.sb_meta_uuid, uuid))
>  			goto recover_immediately;
>  		return lsn;
>  	}
> @@ -945,7 +945,7 @@ xlog_recover_get_buf_lsn(
>  	}
>  
>  	if (lsn != (xfs_lsn_t)-1) {
> -		if (!uuid_equal(&mp->m_sb.sb_meta_uuid, uuid))
> +		if (!uuid_equal(&log->l_mp->m_sb.sb_meta_uuid, uuid))
>  			goto recover_immediately;
>  		return lsn;
>  	}
> @@ -977,14 +977,14 @@ xlog_recover_get_buf_lsn(
>   */
>  static bool
>  xlog_recover_this_buffer(
> -	struct xfs_mount		*mp,
> +	struct xlog			*log,
>  	struct xfs_buf			*bp,
>  	struct xfs_buf_log_format	*buf_f,
>  	xfs_lsn_t			current_lsn)
>  {
>  	xfs_lsn_t			lsn;
>  
> -	lsn = xlog_recover_get_buf_lsn(mp, bp, buf_f);
> +	lsn = xlog_recover_get_buf_lsn(log, bp, buf_f);
>  	if (!lsn)
>  		return true;
>  	if (lsn == -1)
> @@ -992,8 +992,8 @@ xlog_recover_this_buffer(
>  	if (XFS_LSN_CMP(lsn, current_lsn) < 0)
>  		return true;
>  
> -	trace_xfs_log_recover_buf_skip(mp->m_log, buf_f);
> -	xlog_recover_validate_buf_type(mp, bp, buf_f, NULLCOMMITLSN);
> +	trace_xfs_log_recover_buf_skip(log, buf_f);
> +	xlog_recover_validate_buf_type(log, bp, buf_f, NULLCOMMITLSN);
>  
>  	/*
>  	 * We're skipping replay of this buffer log item due to the log
> @@ -1065,22 +1065,22 @@ xlog_recover_buf_commit_pass2(
>  	 * to simplify the rest of the code.
>  	 */
>  	if (buf_f->blf_flags & XFS_BLF_INODE_BUF) {
> -		error = xlog_recover_do_inode_buffer(mp, item, bp, buf_f);
> +		error = xlog_recover_do_inode_buffer(log, item, bp, buf_f);
>  		if (error || (bp->b_flags & XBF_STALE))
>  			goto out_release;
>  		goto out_write;
>  	}
>  
>  	if (buf_f->blf_flags & XFS_BLF_DQUOT_BUF) {
> -		if (!xlog_recover_this_dquot_buffer(mp, log, item, bp, buf_f))
> +		if (!xlog_recover_this_dquot_buffer(log, item, bp, buf_f))
>  			goto out_release;
>  
> -		error = xlog_recover_verify_dquot_buf_item(mp, item, bp, buf_f);
> +		error = xlog_recover_verify_dquot_buf_item(log, item, bp, buf_f);
>  		if (error)
>  			goto out_release;
>  
> -		xlog_recover_buffer(mp, item, bp, buf_f);
> -		error = xlog_recover_validate_buf_type(mp, bp, buf_f,
> +		xlog_recover_buffer(log, item, bp, buf_f);
> +		error = xlog_recover_validate_buf_type(log, bp, buf_f,
>  				NULLCOMMITLSN);
>  		if (error)
>  			goto out_release;
> @@ -1095,14 +1095,14 @@ xlog_recover_buf_commit_pass2(
>  	 */
>  	if (xlog_recovery_is_dir_buf(buf_f) &&
>  	    mp->m_dir_geo->blksize != BBTOB(buf_f->blf_len)) {
> -		xlog_recover_do_partial_dabuf(mp, item, bp, buf_f);
> +		xlog_recover_do_partial_dabuf(log, item, bp, buf_f);
>  		goto out_write;
>  	}
>  
>  	/*
>  	 * Whole buffer recovery, dependent on the LSN in the on-disk structure.
>  	 */
> -	if (!xlog_recover_this_buffer(mp, bp, buf_f, current_lsn)) {
> +	if (!xlog_recover_this_buffer(log, bp, buf_f, current_lsn)) {
>  		/*
>  		 * We may have verified this buffer even though we aren't
>  		 * recovering it. Return the verifier error for early detection
> @@ -1113,7 +1113,7 @@ xlog_recover_buf_commit_pass2(
>  	}
>  
>  
> -	error = xlog_recover_do_reg_buffer(mp, item, bp, buf_f, current_lsn);
> +	error = xlog_recover_do_reg_buffer(log, item, bp, buf_f, current_lsn);
>  	if (error)
>  		goto out_release;
>  
> -- 
> 2.43.0
> 
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/5] xfs: recover dquot buffers unconditionally
  2024-03-19  2:15 ` [PATCH 3/5] xfs: recover dquot buffers unconditionally Dave Chinner
  2024-03-19 18:49   ` Darrick J. Wong
@ 2024-03-19 21:46   ` Christoph Hellwig
  1 sibling, 0 replies; 17+ messages in thread
From: Christoph Hellwig @ 2024-03-19 21:46 UTC (permalink / raw
  To: Dave Chinner; +Cc: linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 5/5] xfs: consistently use struct xlog in buffer item recovery
  2024-03-19  2:15 ` [PATCH 5/5] xfs: consistently use struct xlog in buffer item recovery Dave Chinner
  2024-03-19 20:40   ` Darrick J. Wong
@ 2024-03-19 21:48   ` Christoph Hellwig
  1 sibling, 0 replies; 17+ messages in thread
From: Christoph Hellwig @ 2024-03-19 21:48 UTC (permalink / raw
  To: Dave Chinner; +Cc: linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 4/5] xfs: detect partial buffer recovery operations
  2024-03-19  2:15 ` [PATCH 4/5] xfs: detect partial buffer recovery operations Dave Chinner
  2024-03-19 20:39   ` Darrick J. Wong
@ 2024-03-19 22:54   ` Christoph Hellwig
  2024-03-19 23:14     ` Darrick J. Wong
  1 sibling, 1 reply; 17+ messages in thread
From: Christoph Hellwig @ 2024-03-19 22:54 UTC (permalink / raw
  To: Dave Chinner; +Cc: linux-xfs

> +static bool
> +xlog_recovery_is_dir_buf(
> +	struct xfs_buf_log_format	*buf_f)
> +{
> +	switch (xfs_blft_from_flags(buf_f)) {
> +	case XFS_BLFT_DIR_BLOCK_BUF:
> +	case XFS_BLFT_DIR_DATA_BUF:
> +	case XFS_BLFT_DIR_FREE_BUF:
> +	case XFS_BLFT_DIR_LEAF1_BUF:
> +	case XFS_BLFT_DIR_LEAFN_BUF:
> +	case XFS_BLFT_DA_NODE_BUF:

XFS_BLFT_DA_NODE_BUF can also be a non-directory buffer.  Maybe this
should be named something like xlog_recover_maybe_is_partial_dabuf?

> +		error = bp->b_error;
>  		goto out_release;
>  	}
>  
> +

This adds a spurious new line.

Otherwise this looks good to me, but the lack over verifiation for these
multi-buffer recoveries really scares me..


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 4/5] xfs: detect partial buffer recovery operations
  2024-03-19 22:54   ` Christoph Hellwig
@ 2024-03-19 23:14     ` Darrick J. Wong
  0 siblings, 0 replies; 17+ messages in thread
From: Darrick J. Wong @ 2024-03-19 23:14 UTC (permalink / raw
  To: Christoph Hellwig; +Cc: Dave Chinner, linux-xfs

On Tue, Mar 19, 2024 at 03:54:02PM -0700, Christoph Hellwig wrote:
> > +static bool
> > +xlog_recovery_is_dir_buf(
> > +	struct xfs_buf_log_format	*buf_f)
> > +{
> > +	switch (xfs_blft_from_flags(buf_f)) {
> > +	case XFS_BLFT_DIR_BLOCK_BUF:
> > +	case XFS_BLFT_DIR_DATA_BUF:
> > +	case XFS_BLFT_DIR_FREE_BUF:
> > +	case XFS_BLFT_DIR_LEAF1_BUF:
> > +	case XFS_BLFT_DIR_LEAFN_BUF:
> > +	case XFS_BLFT_DA_NODE_BUF:
> 
> XFS_BLFT_DA_NODE_BUF can also be a non-directory buffer.  Maybe this
> should be named something like xlog_recover_maybe_is_partial_dabuf?
> 
> > +		error = bp->b_error;
> >  		goto out_release;
> >  	}
> >  
> > +
> 
> This adds a spurious new line.
> 
> Otherwise this looks good to me, but the lack over verifiation for these
> multi-buffer recoveries really scares me..

I wondered if there was a good way to reconstruct the discontiguous
buffer, but the only things I could think of relied on guessing from the
subsequent buffer log items.  Does anyone have a better idea?

--D

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2024-03-19 23:14 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-03-19  2:15 [PATCH 0/5] xfs: fix discontiguous metadata block recovery Dave Chinner
2024-03-19  2:15 ` [PATCH 1/5] xfs: buffer log item type mismatches are corruption Dave Chinner
2024-03-19  7:23   ` Christoph Hellwig
2024-03-19 18:16   ` Darrick J. Wong
2024-03-19  2:15 ` [PATCH 2/5] xfs: separate out inode buffer recovery a bit more Dave Chinner
2024-03-19  7:26   ` Christoph Hellwig
2024-03-19 18:40   ` Darrick J. Wong
2024-03-19  2:15 ` [PATCH 3/5] xfs: recover dquot buffers unconditionally Dave Chinner
2024-03-19 18:49   ` Darrick J. Wong
2024-03-19 21:46   ` Christoph Hellwig
2024-03-19  2:15 ` [PATCH 4/5] xfs: detect partial buffer recovery operations Dave Chinner
2024-03-19 20:39   ` Darrick J. Wong
2024-03-19 22:54   ` Christoph Hellwig
2024-03-19 23:14     ` Darrick J. Wong
2024-03-19  2:15 ` [PATCH 5/5] xfs: consistently use struct xlog in buffer item recovery Dave Chinner
2024-03-19 20:40   ` Darrick J. Wong
2024-03-19 21:48   ` Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.