From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8FB60DF53 for ; Thu, 14 Mar 2024 17:13:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710436408; cv=none; b=M3a3ZtZegcVvf0Lb0fh+w1DfhBkv910p9bEC/Az7VlsiF/aZmy3cACLs6km+R9Z0dxvhszyIn2qjpjGdv/1bLokCRsCZOpyLouCtvtqozRm2J0gEy4ZhU9x6LYVhCuWyF0SD1ewSAg2vyYZcaRqOAypKwvaeNZ0ILAskbZNIU4E= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710436408; c=relaxed/simple; bh=f6pRZhE4E1pPzN3Q381rGc8XMJ2RwqNt1g8RS+Jzfpg=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=YgStnn8y1q1TIegM/+i/BFZDz1N+rckdu3RxpFlbQWEZ5y5y3r8xVwgmqUXR8+Tcnbl7uic9gGW+jY0XeTZO1/pv4RElbKoC+Y7x0G45TRF0M0e9+2KOvsXXUdWpApZL7nraAML5yXXL7I1kse/DSLSosMzWGU0VM7A0CgkV4cQ= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=b7svKobA; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="b7svKobA" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1710436404; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=H9UZRX0zf08APZ/U6S43AzT7ciqP5fKiBLIvI7/eGEQ=; b=b7svKobAgCsWxdvQZScJWbcmi6CyEADc7Hetf8TdNFRZhV0uxP3x09C7W6u3mzqsT+6MqR LxrCGGj/X4N4gJBlFDsyPCkgcsXaVjz62ewpYdh3fm8k5Zcj9neX+2TO4wiOPe2n2d/ijC E9Ji1/znJdvOIuzCVTsnZRFHBB/1gmw= Received: from mimecast-mx02.redhat.com (mx-ext.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-321-HVU8fEUgM4aMaW-GrSYejA-1; Thu, 14 Mar 2024 13:13:20 -0400 X-MC-Unique: HVU8fEUgM4aMaW-GrSYejA-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.rdu2.redhat.com [10.11.54.7]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id ADA4D28B6993; Thu, 14 Mar 2024 17:13:18 +0000 (UTC) Received: from redhat.com (unknown [10.22.16.77]) by smtp.corp.redhat.com (Postfix) with ESMTPS id EEB781C060D2; Thu, 14 Mar 2024 17:13:17 +0000 (UTC) Date: Thu, 14 Mar 2024 12:13:11 -0500 From: Bill O'Donnell To: "Darrick J. Wong" Cc: cem@kernel.org, linux-xfs@vger.kernel.org Subject: Re: [PATCH 3/3] xfs_repair: rebuild block mappings from rmapbt data Message-ID: References: <171029434322.2065697.15834513610979167624.stgit@frogsfrogsfrogs> <171029434369.2065697.1871117227419755749.stgit@frogsfrogsfrogs> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <171029434369.2065697.1871117227419755749.stgit@frogsfrogsfrogs> X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.7 On Tue, Mar 12, 2024 at 07:14:40PM -0700, Darrick J. Wong wrote: > From: Darrick J. Wong > > Use rmap records to rebuild corrupt inode forks instead of zapping > the whole inode if we think the rmap data is reasonably sane. > > Signed-off-by: Darrick J. Wong Reviewed-by: Bill O'Donnell > --- > include/xfs_trans.h | 2 > libxfs/libxfs_api_defs.h | 15 + > libxfs/trans.c | 48 +++ > repair/Makefile | 2 > repair/agbtree.c | 2 > repair/bmap_repair.c | 749 ++++++++++++++++++++++++++++++++++++++++++++++ > repair/bmap_repair.h | 13 + > repair/bulkload.c | 205 ++++++++++++- > repair/bulkload.h | 24 + > repair/dinode.c | 54 +++ > repair/rmap.c | 2 > repair/rmap.h | 1 > 12 files changed, 1106 insertions(+), 11 deletions(-) > create mode 100644 repair/bmap_repair.c > create mode 100644 repair/bmap_repair.h > > > diff --git a/include/xfs_trans.h b/include/xfs_trans.h > index ab298ccfe556..ac82c3bc480a 100644 > --- a/include/xfs_trans.h > +++ b/include/xfs_trans.h > @@ -98,6 +98,8 @@ int libxfs_trans_alloc_rollable(struct xfs_mount *mp, uint blocks, > int libxfs_trans_alloc_empty(struct xfs_mount *mp, struct xfs_trans **tpp); > int libxfs_trans_commit(struct xfs_trans *); > void libxfs_trans_cancel(struct xfs_trans *); > +int libxfs_trans_reserve_more(struct xfs_trans *tp, uint blocks, > + uint rtextents); > > /* cancel dfops associated with a transaction */ > void xfs_defer_cancel(struct xfs_trans *); > diff --git a/libxfs/libxfs_api_defs.h b/libxfs/libxfs_api_defs.h > index 28960317ab6b..769733ec2ee3 100644 > --- a/libxfs/libxfs_api_defs.h > +++ b/libxfs/libxfs_api_defs.h > @@ -32,7 +32,7 @@ > #define xfs_alloc_fix_freelist libxfs_alloc_fix_freelist > #define xfs_alloc_min_freelist libxfs_alloc_min_freelist > #define xfs_alloc_read_agf libxfs_alloc_read_agf > -#define xfs_alloc_vextent libxfs_alloc_vextent > +#define xfs_alloc_vextent_start_ag libxfs_alloc_vextent_start_ag > > #define xfs_ascii_ci_hashname libxfs_ascii_ci_hashname > > @@ -44,11 +44,18 @@ > #define xfs_attr_shortform_verify libxfs_attr_shortform_verify > > #define __xfs_bmap_add_free __libxfs_bmap_add_free > +#define xfs_bmap_validate_extent libxfs_bmap_validate_extent > #define xfs_bmapi_read libxfs_bmapi_read > +#define xfs_bmapi_remap libxfs_bmapi_remap > #define xfs_bmapi_write libxfs_bmapi_write > #define xfs_bmap_last_offset libxfs_bmap_last_offset > +#define xfs_bmbt_calc_size libxfs_bmbt_calc_size > +#define xfs_bmbt_commit_staged_btree libxfs_bmbt_commit_staged_btree > +#define xfs_bmbt_disk_get_startoff libxfs_bmbt_disk_get_startoff > +#define xfs_bmbt_disk_set_all libxfs_bmbt_disk_set_all > #define xfs_bmbt_maxlevels_ondisk libxfs_bmbt_maxlevels_ondisk > #define xfs_bmbt_maxrecs libxfs_bmbt_maxrecs > +#define xfs_bmbt_stage_cursor libxfs_bmbt_stage_cursor > #define xfs_bmdr_maxrecs libxfs_bmdr_maxrecs > > #define xfs_btree_bload libxfs_btree_bload > @@ -117,6 +124,7 @@ > > #define xfs_finobt_calc_reserves libxfs_finobt_calc_reserves > #define xfs_free_extent libxfs_free_extent > +#define xfs_free_extent_later libxfs_free_extent_later > #define xfs_free_perag libxfs_free_perag > #define xfs_fs_geometry libxfs_fs_geometry > #define xfs_highbit32 libxfs_highbit32 > @@ -127,7 +135,10 @@ > #define xfs_ialloc_read_agi libxfs_ialloc_read_agi > #define xfs_idata_realloc libxfs_idata_realloc > #define xfs_idestroy_fork libxfs_idestroy_fork > +#define xfs_iext_first libxfs_iext_first > +#define xfs_iext_insert_raw libxfs_iext_insert_raw > #define xfs_iext_lookup_extent libxfs_iext_lookup_extent > +#define xfs_iext_next libxfs_iext_next > #define xfs_ifork_zap_attr libxfs_ifork_zap_attr > #define xfs_imap_to_bp libxfs_imap_to_bp > #define xfs_initialize_perag libxfs_initialize_perag > @@ -174,10 +185,12 @@ > #define xfs_rmapbt_stage_cursor libxfs_rmapbt_stage_cursor > #define xfs_rmap_compare libxfs_rmap_compare > #define xfs_rmap_get_rec libxfs_rmap_get_rec > +#define xfs_rmap_ino_bmbt_owner libxfs_rmap_ino_bmbt_owner > #define xfs_rmap_irec_offset_pack libxfs_rmap_irec_offset_pack > #define xfs_rmap_irec_offset_unpack libxfs_rmap_irec_offset_unpack > #define xfs_rmap_lookup_le libxfs_rmap_lookup_le > #define xfs_rmap_lookup_le_range libxfs_rmap_lookup_le_range > +#define xfs_rmap_query_all libxfs_rmap_query_all > #define xfs_rmap_query_range libxfs_rmap_query_range > > #define xfs_rtbitmap_getword libxfs_rtbitmap_getword > diff --git a/libxfs/trans.c b/libxfs/trans.c > index bd1186b24e62..8143a6a99f62 100644 > --- a/libxfs/trans.c > +++ b/libxfs/trans.c > @@ -1143,3 +1143,51 @@ libxfs_trans_alloc_inode( > *tpp = tp; > return 0; > } > + > +/* > + * Try to reserve more blocks for a transaction. The single use case we > + * support is for offline repair -- use a transaction to gather data without > + * fear of btree cycle deadlocks; calculate how many blocks we really need > + * from that data; and only then start modifying data. This can fail due to > + * ENOSPC, so we have to be able to cancel the transaction. > + */ > +int > +libxfs_trans_reserve_more( > + struct xfs_trans *tp, > + uint blocks, > + uint rtextents) > +{ > + int error = 0; > + > + ASSERT(!(tp->t_flags & XFS_TRANS_DIRTY)); > + > + /* > + * Attempt to reserve the needed disk blocks by decrementing > + * the number needed from the number available. This will > + * fail if the count would go below zero. > + */ > + if (blocks > 0) { > + if (tp->t_mountp->m_sb.sb_fdblocks < blocks) > + return -ENOSPC; > + tp->t_blk_res += blocks; > + } > + > + /* > + * Attempt to reserve the needed realtime extents by decrementing > + * the number needed from the number available. This will > + * fail if the count would go below zero. > + */ > + if (rtextents > 0) { > + if (tp->t_mountp->m_sb.sb_rextents < rtextents) { > + error = -ENOSPC; > + goto out_blocks; > + } > + } > + > + return 0; > +out_blocks: > + if (blocks > 0) > + tp->t_blk_res -= blocks; > + > + return error; > +} > diff --git a/repair/Makefile b/repair/Makefile > index 2c40e59a30fc..e5014deb0ce8 100644 > --- a/repair/Makefile > +++ b/repair/Makefile > @@ -16,6 +16,7 @@ HFILES = \ > avl.h \ > bulkload.h \ > bmap.h \ > + bmap_repair.h \ > btree.h \ > da_util.h \ > dinode.h \ > @@ -41,6 +42,7 @@ CFILES = \ > avl.c \ > bulkload.c \ > bmap.c \ > + bmap_repair.c \ > btree.c \ > da_util.c \ > dino_chunks.c \ > diff --git a/repair/agbtree.c b/repair/agbtree.c > index c6f0512fe7de..38f3f7b8feac 100644 > --- a/repair/agbtree.c > +++ b/repair/agbtree.c > @@ -22,7 +22,7 @@ init_rebuild( > { > memset(btr, 0, sizeof(struct bt_rebuild)); > > - bulkload_init_ag(&btr->newbt, sc, oinfo); > + bulkload_init_ag(&btr->newbt, sc, oinfo, NULLFSBLOCK); > btr->bload.max_dirty = XFS_B_TO_FSBT(sc->mp, 256U << 10); /* 256K */ > bulkload_estimate_ag_slack(sc, &btr->bload, est_agfreeblocks); > } > diff --git a/repair/bmap_repair.c b/repair/bmap_repair.c > new file mode 100644 > index 000000000000..7705980621c1 > --- /dev/null > +++ b/repair/bmap_repair.c > @@ -0,0 +1,749 @@ > +// SPDX-License-Identifier: GPL-2.0-or-later > +/* > + * Copyright (c) 2019-2024 Oracle. All Rights Reserved. > + * Author: Darrick J. Wong > + */ > +#include > +#include "btree.h" > +#include "err_protos.h" > +#include "libxlog.h" > +#include "incore.h" > +#include "globals.h" > +#include "dinode.h" > +#include "slab.h" > +#include "rmap.h" > +#include "bulkload.h" > +#include "bmap_repair.h" > + > +#define min_t(type, x, y) ( ((type)(x)) > ((type)(y)) ? ((type)(y)) : ((type)(x)) ) > + > +/* > + * Inode Fork Block Mapping (BMBT) Repair > + * ====================================== > + * > + * Gather all the rmap records for the inode and fork we're fixing, reset the > + * incore fork, then recreate the btree. > + */ > +struct xrep_bmap { > + /* List of new bmap records. */ > + struct xfs_slab *bmap_records; > + struct xfs_slab_cursor *bmap_cursor; > + > + /* New fork. */ > + struct bulkload new_fork_info; > + struct xfs_btree_bload bmap_bload; > + > + struct repair_ctx *sc; > + > + /* How many blocks did we find allocated to this file? */ > + xfs_rfsblock_t nblocks; > + > + /* How many bmbt blocks did we find for this fork? */ > + xfs_rfsblock_t old_bmbt_block_count; > + > + /* Which fork are we fixing? */ > + int whichfork; > +}; > + > +/* Remember this reverse-mapping as a series of bmap records. */ > +STATIC int > +xrep_bmap_from_rmap( > + struct xrep_bmap *rb, > + xfs_fileoff_t startoff, > + xfs_fsblock_t startblock, > + xfs_filblks_t blockcount, > + bool unwritten) > +{ > + struct xfs_bmbt_rec rbe; > + struct xfs_bmbt_irec irec; > + int error = 0; > + > + irec.br_startoff = startoff; > + irec.br_startblock = startblock; > + irec.br_state = unwritten ? XFS_EXT_UNWRITTEN : XFS_EXT_NORM; > + > + do { > + xfs_failaddr_t fa; > + > + irec.br_blockcount = min_t(xfs_filblks_t, blockcount, > + XFS_MAX_BMBT_EXTLEN); > + > + fa = libxfs_bmap_validate_extent(rb->sc->ip, rb->whichfork, > + &irec); > + if (fa) > + return -EFSCORRUPTED; > + > + libxfs_bmbt_disk_set_all(&rbe, &irec); > + > + error = slab_add(rb->bmap_records, &rbe); > + if (error) > + return error; > + > + irec.br_startblock += irec.br_blockcount; > + irec.br_startoff += irec.br_blockcount; > + blockcount -= irec.br_blockcount; > + } while (blockcount > 0); > + > + return 0; > +} > + > +/* Check for any obvious errors or conflicts in the file mapping. */ > +STATIC int > +xrep_bmap_check_fork_rmap( > + struct xrep_bmap *rb, > + struct xfs_btree_cur *cur, > + const struct xfs_rmap_irec *rec) > +{ > + struct repair_ctx *sc = rb->sc; > + > + /* > + * Data extents for rt files are never stored on the data device, but > + * everything else (xattrs, bmbt blocks) can be. > + */ > + if (XFS_IS_REALTIME_INODE(sc->ip) && > + !(rec->rm_flags & (XFS_RMAP_ATTR_FORK | XFS_RMAP_BMBT_BLOCK))) > + return EFSCORRUPTED; > + > + /* Check that this is within the AG. */ > + if (!xfs_verify_agbext(cur->bc_ag.pag, rec->rm_startblock, > + rec->rm_blockcount)) > + return EFSCORRUPTED; > + > + /* No contradictory flags. */ > + if ((rec->rm_flags & (XFS_RMAP_ATTR_FORK | XFS_RMAP_BMBT_BLOCK)) && > + (rec->rm_flags & XFS_RMAP_UNWRITTEN)) > + return EFSCORRUPTED; > + > + /* Check the file offset range. */ > + if (!(rec->rm_flags & XFS_RMAP_BMBT_BLOCK) && > + !xfs_verify_fileext(sc->mp, rec->rm_offset, rec->rm_blockcount)) > + return EFSCORRUPTED; > + > + return 0; > +} > + > +/* Record extents that belong to this inode's fork. */ > +STATIC int > +xrep_bmap_walk_rmap( > + struct xfs_btree_cur *cur, > + const struct xfs_rmap_irec *rec, > + void *priv) > +{ > + struct xrep_bmap *rb = priv; > + struct xfs_mount *mp = cur->bc_mp; > + xfs_fsblock_t fsbno; > + int error; > + > + /* Skip extents which are not owned by this inode and fork. */ > + if (rec->rm_owner != rb->sc->ip->i_ino) > + return 0; > + > + error = xrep_bmap_check_fork_rmap(rb, cur, rec); > + if (error) > + return error; > + > + /* > + * Record all blocks allocated to this file even if the extent isn't > + * for the fork we're rebuilding so that we can reset di_nblocks later. > + */ > + rb->nblocks += rec->rm_blockcount; > + > + /* If this rmap isn't for the fork we want, we're done. */ > + if (rb->whichfork == XFS_DATA_FORK && > + (rec->rm_flags & XFS_RMAP_ATTR_FORK)) > + return 0; > + if (rb->whichfork == XFS_ATTR_FORK && > + !(rec->rm_flags & XFS_RMAP_ATTR_FORK)) > + return 0; > + > + fsbno = XFS_AGB_TO_FSB(mp, cur->bc_ag.pag->pag_agno, > + rec->rm_startblock); > + > + if (rec->rm_flags & XFS_RMAP_BMBT_BLOCK) { > + rb->old_bmbt_block_count += rec->rm_blockcount; > + return 0; > + } > + > + return xrep_bmap_from_rmap(rb, rec->rm_offset, fsbno, > + rec->rm_blockcount, > + rec->rm_flags & XFS_RMAP_UNWRITTEN); > +} > + > +/* Compare two bmap extents. */ > +static int > +xrep_bmap_extent_cmp( > + const void *a, > + const void *b) > +{ > + xfs_fileoff_t ao; > + xfs_fileoff_t bo; > + > + ao = libxfs_bmbt_disk_get_startoff((struct xfs_bmbt_rec *)a); > + bo = libxfs_bmbt_disk_get_startoff((struct xfs_bmbt_rec *)b); > + > + if (ao > bo) > + return 1; > + else if (ao < bo) > + return -1; > + return 0; > +} > + > +/* Scan one AG for reverse mappings that we can turn into extent maps. */ > +STATIC int > +xrep_bmap_scan_ag( > + struct xrep_bmap *rb, > + struct xfs_perag *pag) > +{ > + struct repair_ctx *sc = rb->sc; > + struct xfs_mount *mp = sc->mp; > + struct xfs_buf *agf_bp = NULL; > + struct xfs_btree_cur *cur; > + int error; > + > + error = -libxfs_alloc_read_agf(pag, sc->tp, 0, &agf_bp); > + if (error) > + return error; > + if (!agf_bp) > + return ENOMEM; > + cur = libxfs_rmapbt_init_cursor(mp, sc->tp, agf_bp, pag); > + error = -libxfs_rmap_query_all(cur, xrep_bmap_walk_rmap, rb); > + libxfs_btree_del_cursor(cur, error); > + libxfs_trans_brelse(sc->tp, agf_bp); > + return error; > +} > + > +/* > + * Collect block mappings for this fork of this inode and decide if we have > + * enough space to rebuild. Caller is responsible for cleaning up the list if > + * anything goes wrong. > + */ > +STATIC int > +xrep_bmap_find_mappings( > + struct xrep_bmap *rb) > +{ > + struct xfs_perag *pag; > + xfs_agnumber_t agno; > + int error; > + > + /* Iterate the rmaps for extents. */ > + for_each_perag(rb->sc->mp, agno, pag) { > + error = xrep_bmap_scan_ag(rb, pag); > + if (error) { > + libxfs_perag_put(pag); > + return error; > + } > + } > + > + return 0; > +} > + > +/* Retrieve bmap data for bulk load. */ > +STATIC int > +xrep_bmap_get_records( > + struct xfs_btree_cur *cur, > + unsigned int idx, > + struct xfs_btree_block *block, > + unsigned int nr_wanted, > + void *priv) > +{ > + struct xfs_bmbt_rec *rec; > + struct xfs_bmbt_irec *irec = &cur->bc_rec.b; > + struct xrep_bmap *rb = priv; > + union xfs_btree_rec *block_rec; > + unsigned int loaded; > + > + for (loaded = 0; loaded < nr_wanted; loaded++, idx++) { > + rec = pop_slab_cursor(rb->bmap_cursor); > + libxfs_bmbt_disk_get_all(rec, irec); > + > + block_rec = libxfs_btree_rec_addr(cur, idx, block); > + cur->bc_ops->init_rec_from_cur(cur, block_rec); > + } > + > + return loaded; > +} > + > +/* Feed one of the new btree blocks to the bulk loader. */ > +STATIC int > +xrep_bmap_claim_block( > + struct xfs_btree_cur *cur, > + union xfs_btree_ptr *ptr, > + void *priv) > +{ > + struct xrep_bmap *rb = priv; > + > + return bulkload_claim_block(cur, &rb->new_fork_info, ptr); > +} > + > +/* Figure out how much space we need to create the incore btree root block. */ > +STATIC size_t > +xrep_bmap_iroot_size( > + struct xfs_btree_cur *cur, > + unsigned int level, > + unsigned int nr_this_level, > + void *priv) > +{ > + ASSERT(level > 0); > + > + return XFS_BMAP_BROOT_SPACE_CALC(cur->bc_mp, nr_this_level); > +} > + > +/* Update the inode counters. */ > +STATIC int > +xrep_bmap_reset_counters( > + struct xrep_bmap *rb) > +{ > + struct repair_ctx *sc = rb->sc; > + struct xbtree_ifakeroot *ifake = &rb->new_fork_info.ifake; > + int64_t delta; > + > + /* > + * Update the inode block counts to reflect the extents we found in the > + * rmapbt. > + */ > + delta = ifake->if_blocks - rb->old_bmbt_block_count; > + sc->ip->i_nblocks = rb->nblocks + delta; > + libxfs_trans_log_inode(sc->tp, sc->ip, XFS_ILOG_CORE); > + > + /* Quotas don't exist so we're done. */ > + return 0; > +} > + > +/* > + * Ensure that the inode being repaired is ready to handle a certain number of > + * extents, or return EFSCORRUPTED. Caller must hold the ILOCK of the inode > + * being repaired and have joined it to the scrub transaction. > + */ > +static int > +xrep_ino_ensure_extent_count( > + struct repair_ctx *sc, > + int whichfork, > + xfs_extnum_t nextents) > +{ > + xfs_extnum_t max_extents; > + bool large_extcount; > + > + large_extcount = xfs_inode_has_large_extent_counts(sc->ip); > + max_extents = xfs_iext_max_nextents(large_extcount, whichfork); > + if (nextents <= max_extents) > + return 0; > + if (large_extcount) > + return EFSCORRUPTED; > + if (!xfs_has_large_extent_counts(sc->mp)) > + return EFSCORRUPTED; > + > + max_extents = xfs_iext_max_nextents(true, whichfork); > + if (nextents > max_extents) > + return EFSCORRUPTED; > + > + sc->ip->i_diflags2 |= XFS_DIFLAG2_NREXT64; > + libxfs_trans_log_inode(sc->tp, sc->ip, XFS_ILOG_CORE); > + return 0; > +} > + > +/* > + * Create a new iext tree and load it with block mappings. If the inode is > + * in extents format, that's all we need to do to commit the new mappings. > + * If it is in btree format, this takes care of preloading the incore tree. > + */ > +STATIC int > +xrep_bmap_extents_load( > + struct xrep_bmap *rb, > + struct xfs_btree_cur *bmap_cur, > + uint64_t nextents) > +{ > + struct xfs_iext_cursor icur; > + struct xbtree_ifakeroot *ifake = &rb->new_fork_info.ifake; > + struct xfs_ifork *ifp = ifake->if_fork; > + unsigned int i; > + int error; > + > + ASSERT(ifp->if_bytes == 0); > + > + error = init_slab_cursor(rb->bmap_records, xrep_bmap_extent_cmp, > + &rb->bmap_cursor); > + if (error) > + return error; > + > + /* Add all the mappings to the incore extent tree. */ > + libxfs_iext_first(ifp, &icur); > + for (i = 0; i < nextents; i++) { > + struct xfs_bmbt_rec *rec; > + > + rec = pop_slab_cursor(rb->bmap_cursor); > + libxfs_bmbt_disk_get_all(rec, &bmap_cur->bc_rec.b); > + libxfs_iext_insert_raw(ifp, &icur, &bmap_cur->bc_rec.b); > + ifp->if_nextents++; > + libxfs_iext_next(ifp, &icur); > + } > + free_slab_cursor(&rb->bmap_cursor); > + > + return xrep_ino_ensure_extent_count(rb->sc, rb->whichfork, > + ifp->if_nextents); > +} > + > +/* > + * Reserve new btree blocks, bulk load the bmap records into the ondisk btree, > + * and load the incore extent tree. > + */ > +STATIC int > +xrep_bmap_btree_load( > + struct xrep_bmap *rb, > + struct xfs_btree_cur *bmap_cur, > + uint64_t nextents) > +{ > + struct repair_ctx *sc = rb->sc; > + int error; > + > + rb->bmap_bload.get_records = xrep_bmap_get_records; > + rb->bmap_bload.claim_block = xrep_bmap_claim_block; > + rb->bmap_bload.iroot_size = xrep_bmap_iroot_size; > + rb->bmap_bload.max_dirty = XFS_B_TO_FSBT(sc->mp, 256U << 10); /* 256K */ > + > + /* > + * Always make the btree as small as possible, since we might need the > + * space to rebuild the space metadata btrees in later phases. > + */ > + rb->bmap_bload.leaf_slack = 0; > + rb->bmap_bload.node_slack = 0; > + > + /* Compute how many blocks we'll need. */ > + error = -libxfs_btree_bload_compute_geometry(bmap_cur, &rb->bmap_bload, > + nextents); > + if (error) > + return error; > + > + /* > + * Guess how many blocks we're going to need to rebuild an entire bmap > + * from the number of extents we found, and pump up our transaction to > + * have sufficient block reservation. > + */ > + error = -libxfs_trans_reserve_more(sc->tp, rb->bmap_bload.nr_blocks, 0); > + if (error) > + return error; > + > + /* Reserve the space we'll need for the new btree. */ > + error = bulkload_alloc_file_blocks(&rb->new_fork_info, > + rb->bmap_bload.nr_blocks); > + if (error) > + return error; > + > + /* Add all observed bmap records. */ > + error = init_slab_cursor(rb->bmap_records, xrep_bmap_extent_cmp, > + &rb->bmap_cursor); > + if (error) > + return error; > + error = -libxfs_btree_bload(bmap_cur, &rb->bmap_bload, rb); > + free_slab_cursor(&rb->bmap_cursor); > + if (error) > + return error; > + > + /* > + * Load the new bmap records into the new incore extent tree to > + * preserve delalloc reservations for regular files. The directory > + * code loads the extent tree during xfs_dir_open and assumes > + * thereafter that it remains loaded, so we must not violate that > + * assumption. > + */ > + return xrep_bmap_extents_load(rb, bmap_cur, nextents); > +} > + > +/* > + * Use the collected bmap information to stage a new bmap fork. If this is > + * successful we'll return with the new fork information logged to the repair > + * transaction but not yet committed. > + */ > +STATIC int > +xrep_bmap_build_new_fork( > + struct xrep_bmap *rb) > +{ > + struct xfs_owner_info oinfo; > + struct repair_ctx *sc = rb->sc; > + struct xfs_btree_cur *bmap_cur; > + struct xbtree_ifakeroot *ifake = &rb->new_fork_info.ifake; > + uint64_t nextents; > + int error; > + > + /* > + * Sort the bmap extents by startblock to avoid btree splits when we > + * rebuild the bmbt btree. > + */ > + qsort_slab(rb->bmap_records, xrep_bmap_extent_cmp); > + > + /* > + * Prepare to construct the new fork by initializing the new btree > + * structure and creating a fake ifork in the ifakeroot structure. > + */ > + libxfs_rmap_ino_bmbt_owner(&oinfo, sc->ip->i_ino, rb->whichfork); > + bulkload_init_inode(&rb->new_fork_info, sc, rb->whichfork, &oinfo); > + bmap_cur = libxfs_bmbt_stage_cursor(sc->mp, sc->ip, ifake); > + > + /* > + * Figure out the size and format of the new fork, then fill it with > + * all the bmap records we've found. Join the inode to the transaction > + * so that we can roll the transaction while holding the inode locked. > + */ > + libxfs_trans_ijoin(sc->tp, sc->ip, 0); > + nextents = slab_count(rb->bmap_records); > + if (nextents <= XFS_IFORK_MAXEXT(sc->ip, rb->whichfork)) { > + ifake->if_fork->if_format = XFS_DINODE_FMT_EXTENTS; > + error = xrep_bmap_extents_load(rb, bmap_cur, nextents); > + } else { > + ifake->if_fork->if_format = XFS_DINODE_FMT_BTREE; > + error = xrep_bmap_btree_load(rb, bmap_cur, nextents); > + } > + if (error) > + goto err_cur; > + > + /* > + * Install the new fork in the inode. After this point the old mapping > + * data are no longer accessible and the new tree is live. We delete > + * the cursor immediately after committing the staged root because the > + * staged fork might be in extents format. > + */ > + libxfs_bmbt_commit_staged_btree(bmap_cur, sc->tp, rb->whichfork); > + libxfs_btree_del_cursor(bmap_cur, 0); > + > + /* Reset the inode counters now that we've changed the fork. */ > + error = xrep_bmap_reset_counters(rb); > + if (error) > + goto err_newbt; > + > + /* Dispose of any unused blocks and the accounting infomation. */ > + error = bulkload_commit(&rb->new_fork_info); > + if (error) > + return error; > + > + return -libxfs_trans_roll_inode(&sc->tp, sc->ip); > +err_cur: > + if (bmap_cur) > + libxfs_btree_del_cursor(bmap_cur, error); > +err_newbt: > + bulkload_cancel(&rb->new_fork_info); > + return error; > +} > + > +/* Check for garbage inputs. Returns ECANCELED if there's nothing to do. */ > +STATIC int > +xrep_bmap_check_inputs( > + struct repair_ctx *sc, > + int whichfork) > +{ > + struct xfs_ifork *ifp = xfs_ifork_ptr(sc->ip, whichfork); > + > + ASSERT(whichfork == XFS_DATA_FORK || whichfork == XFS_ATTR_FORK); > + > + if (!xfs_has_rmapbt(sc->mp)) > + return EOPNOTSUPP; > + > + /* No fork means nothing to rebuild. */ > + if (!ifp) > + return ECANCELED; > + > + /* > + * We only know how to repair extent mappings, which is to say that we > + * only support extents and btree fork format. Repairs to a local > + * format fork require a higher level repair function, so we do not > + * have any work to do here. > + */ > + switch (ifp->if_format) { > + case XFS_DINODE_FMT_DEV: > + case XFS_DINODE_FMT_LOCAL: > + case XFS_DINODE_FMT_UUID: > + return ECANCELED; > + case XFS_DINODE_FMT_EXTENTS: > + case XFS_DINODE_FMT_BTREE: > + break; > + default: > + return EFSCORRUPTED; > + } > + > + if (whichfork == XFS_ATTR_FORK) > + return 0; > + > + /* Only files, symlinks, and directories get to have data forks. */ > + switch (VFS_I(sc->ip)->i_mode & S_IFMT) { > + case S_IFREG: > + case S_IFDIR: > + case S_IFLNK: > + /* ok */ > + break; > + default: > + return EINVAL; > + } > + > + /* Don't know how to rebuild realtime data forks. */ > + if (XFS_IS_REALTIME_INODE(sc->ip)) > + return EOPNOTSUPP; > + > + return 0; > +} > + > +/* Repair an inode fork. */ > +STATIC int > +xrep_bmap( > + struct repair_ctx *sc, > + int whichfork) > +{ > + struct xrep_bmap *rb; > + int error = 0; > + > + error = xrep_bmap_check_inputs(sc, whichfork); > + if (error == ECANCELED) > + return 0; > + if (error) > + return error; > + > + rb = kmem_zalloc(sizeof(struct xrep_bmap), KM_NOFS | KM_MAYFAIL); > + if (!rb) > + return ENOMEM; > + rb->sc = sc; > + rb->whichfork = whichfork; > + > + /* Set up some storage */ > + error = init_slab(&rb->bmap_records, sizeof(struct xfs_bmbt_rec)); > + if (error) > + goto out_rb; > + > + /* Collect all reverse mappings for this fork's extents. */ > + error = xrep_bmap_find_mappings(rb); > + if (error) > + goto out_bitmap; > + > + /* Rebuild the bmap information. */ > + error = xrep_bmap_build_new_fork(rb); > + > + /* > + * We don't need to free the old bmbt blocks because we're rebuilding > + * all the space metadata later. > + */ > + > +out_bitmap: > + free_slab(&rb->bmap_records); > +out_rb: > + kmem_free(rb); > + return error; > +} > + > +/* Rebuild some inode's bmap. */ > +int > +rebuild_bmap( > + struct xfs_mount *mp, > + xfs_ino_t ino, > + int whichfork, > + unsigned long nr_extents, > + struct xfs_buf **ino_bpp, > + struct xfs_dinode **dinop, > + int *dirty) > +{ > + struct repair_ctx sc = { > + .mp = mp, > + }; > + const struct xfs_buf_ops *bp_ops; > + unsigned long boffset; > + unsigned long long resblks; > + xfs_daddr_t bp_bn; > + int bp_length; > + int error, err2; > + > + bp_bn = xfs_buf_daddr(*ino_bpp); > + bp_length = (*ino_bpp)->b_length; > + bp_ops = (*ino_bpp)->b_ops; > + boffset = (char *)(*dinop) - (char *)(*ino_bpp)->b_addr; > + > + /* > + * Bail out if the inode didn't think it had extents. Otherwise, zap > + * it back to a zero-extents fork so that we can rebuild it. > + */ > + switch (whichfork) { > + case XFS_DATA_FORK: > + if ((*dinop)->di_nextents == 0) > + return 0; > + (*dinop)->di_format = XFS_DINODE_FMT_EXTENTS; > + (*dinop)->di_nextents = 0; > + libxfs_dinode_calc_crc(mp, *dinop); > + *dirty = 1; > + break; > + case XFS_ATTR_FORK: > + if ((*dinop)->di_anextents == 0) > + return 0; > + (*dinop)->di_aformat = XFS_DINODE_FMT_EXTENTS; > + (*dinop)->di_anextents = 0; > + libxfs_dinode_calc_crc(mp, *dinop); > + *dirty = 1; > + break; > + default: > + return EINVAL; > + } > + > + resblks = libxfs_bmbt_calc_size(mp, nr_extents); > + error = -libxfs_trans_alloc(mp, &M_RES(mp)->tr_itruncate, resblks, 0, > + 0, &sc.tp); > + if (error) > + return error; > + > + /* > + * Repair magic: the caller passed us the inode cluster buffer for the > + * inode. The _iget call grabs the buffer to load the incore inode, so > + * the buffer must be attached to the transaction to avoid recursing > + * the buffer lock. > + * > + * Unfortunately, the _iget call drops the buffer once the inode is > + * loaded, so if we've made any changes we have to log the buffer, hold > + * it, and roll the transaction. This persists the caller's changes > + * and maintains our ownership of the cluster buffer. > + */ > + libxfs_trans_bjoin(sc.tp, *ino_bpp); > + if (*dirty) { > + unsigned int end = BBTOB((*ino_bpp)->b_length) - 1; > + > + libxfs_trans_log_buf(sc.tp, *ino_bpp, 0, end); > + *dirty = 0; > + > + libxfs_trans_bhold(sc.tp, *ino_bpp); > + error = -libxfs_trans_roll(&sc.tp); > + libxfs_trans_bjoin(sc.tp, *ino_bpp); > + if (error) > + goto out_cancel; > + } > + > + /* Grab the inode and fix the bmbt. */ > + error = -libxfs_iget(mp, sc.tp, ino, 0, &sc.ip); > + if (error) > + goto out_cancel; > + error = xrep_bmap(&sc, whichfork); > + if (error) > + libxfs_trans_cancel(sc.tp); > + else > + error = -libxfs_trans_commit(sc.tp); > + > + /* > + * Rebuilding the inode fork rolled the transaction, so we need to > + * re-grab the inode cluster buffer and dinode pointer for the caller. > + */ > + err2 = -libxfs_imap_to_bp(mp, NULL, &sc.ip->i_imap, ino_bpp); > + if (err2) > + do_error( > + _("Unable to re-grab inode cluster buffer after failed repair of inode %llu, error %d.\n"), > + (unsigned long long)ino, err2); > + *dinop = xfs_buf_offset(*ino_bpp, sc.ip->i_imap.im_boffset); > + libxfs_irele(sc.ip); > + > + return error; > + > +out_cancel: > + libxfs_trans_cancel(sc.tp); > + > + /* > + * Try to regrab the old buffer so we have something to return to the > + * caller. > + */ > + err2 = -libxfs_trans_read_buf(mp, NULL, mp->m_ddev_targp, bp_bn, > + bp_length, 0, ino_bpp, bp_ops); > + if (err2) > + do_error( > + _("Unable to re-grab inode cluster buffer after failed repair of inode %llu, error %d.\n"), > + (unsigned long long)ino, err2); > + *dinop = xfs_buf_offset(*ino_bpp, boffset); > + return error; > +} > diff --git a/repair/bmap_repair.h b/repair/bmap_repair.h > new file mode 100644 > index 000000000000..6d55359490a0 > --- /dev/null > +++ b/repair/bmap_repair.h > @@ -0,0 +1,13 @@ > +/* SPDX-License-Identifier: GPL-2.0-or-later */ > +/* > + * Copyright (c) 2019-2024 Oracle. All Rights Reserved. > + * Author: Darrick J. Wong > + */ > +#ifndef REBUILD_H_ > +#define REBUILD_H_ > + > +int rebuild_bmap(struct xfs_mount *mp, xfs_ino_t ino, int whichfork, > + unsigned long nr_extents, struct xfs_buf **ino_bpp, > + struct xfs_dinode **dinop, int *dirty); > + > +#endif /* REBUILD_H_ */ > diff --git a/repair/bulkload.c b/repair/bulkload.c > index 18158c397f56..a97839f549dd 100644 > --- a/repair/bulkload.c > +++ b/repair/bulkload.c > @@ -14,14 +14,29 @@ void > bulkload_init_ag( > struct bulkload *bkl, > struct repair_ctx *sc, > - const struct xfs_owner_info *oinfo) > + const struct xfs_owner_info *oinfo, > + xfs_fsblock_t alloc_hint) > { > memset(bkl, 0, sizeof(struct bulkload)); > bkl->sc = sc; > bkl->oinfo = *oinfo; /* structure copy */ > + bkl->alloc_hint = alloc_hint; > INIT_LIST_HEAD(&bkl->resv_list); > } > > +/* Initialize accounting resources for staging a new inode fork btree. */ > +void > +bulkload_init_inode( > + struct bulkload *bkl, > + struct repair_ctx *sc, > + int whichfork, > + const struct xfs_owner_info *oinfo) > +{ > + bulkload_init_ag(bkl, sc, oinfo, XFS_INO_TO_FSB(sc->mp, sc->ip->i_ino)); > + bkl->ifake.if_fork = kmem_cache_zalloc(xfs_ifork_cache, 0); > + bkl->ifake.if_fork_size = xfs_inode_fork_size(sc->ip, whichfork); > +} > + > /* Designate specific blocks to be used to build our new btree. */ > static int > bulkload_add_blocks( > @@ -71,17 +86,199 @@ bulkload_add_extent( > return bulkload_add_blocks(bkl, pag, &args); > } > > +/* Don't let our allocation hint take us beyond EOFS */ > +static inline void > +bulkload_validate_file_alloc_hint( > + struct bulkload *bkl) > +{ > + struct repair_ctx *sc = bkl->sc; > + > + if (libxfs_verify_fsbno(sc->mp, bkl->alloc_hint)) > + return; > + > + bkl->alloc_hint = XFS_AGB_TO_FSB(sc->mp, 0, XFS_AGFL_BLOCK(sc->mp) + 1); > +} > + > +/* Allocate disk space for our new file-based btree. */ > +int > +bulkload_alloc_file_blocks( > + struct bulkload *bkl, > + uint64_t nr_blocks) > +{ > + struct repair_ctx *sc = bkl->sc; > + struct xfs_mount *mp = sc->mp; > + int error = 0; > + > + while (nr_blocks > 0) { > + struct xfs_alloc_arg args = { > + .tp = sc->tp, > + .mp = mp, > + .oinfo = bkl->oinfo, > + .minlen = 1, > + .maxlen = nr_blocks, > + .prod = 1, > + .resv = XFS_AG_RESV_NONE, > + }; > + struct xfs_perag *pag; > + xfs_agnumber_t agno; > + > + bulkload_validate_file_alloc_hint(bkl); > + > + error = -libxfs_alloc_vextent_start_ag(&args, bkl->alloc_hint); > + if (error) > + return error; > + if (args.fsbno == NULLFSBLOCK) > + return ENOSPC; > + > + agno = XFS_FSB_TO_AGNO(mp, args.fsbno); > + > + pag = libxfs_perag_get(mp, agno); > + if (!pag) { > + ASSERT(0); > + return -EFSCORRUPTED; > + } > + > + error = bulkload_add_blocks(bkl, pag, &args); > + libxfs_perag_put(pag); > + if (error) > + return error; > + > + nr_blocks -= args.len; > + bkl->alloc_hint = args.fsbno + args.len; > + > + error = -libxfs_defer_finish(&sc->tp); > + if (error) > + return error; > + } > + > + return 0; > +} > + > +/* > + * Free the unused part of a space extent that was reserved for a new ondisk > + * structure. Returns the number of EFIs logged or a negative errno. > + */ > +static inline int > +bulkload_free_extent( > + struct bulkload *bkl, > + struct bulkload_resv *resv, > + bool btree_committed) > +{ > + struct repair_ctx *sc = bkl->sc; > + xfs_agblock_t free_agbno = resv->agbno; > + xfs_extlen_t free_aglen = resv->len; > + xfs_fsblock_t fsbno; > + int error; > + > + if (!btree_committed || resv->used == 0) { > + /* > + * If we're not committing a new btree or we didn't use the > + * space reservation, free the entire space extent. > + */ > + goto free; > + } > + > + /* > + * We used space and committed the btree. Remove the written blocks > + * from the reservation and possibly log a new EFI to free any unused > + * reservation space. > + */ > + free_agbno += resv->used; > + free_aglen -= resv->used; > + > + if (free_aglen == 0) > + return 0; > + > +free: > + /* > + * Use EFIs to free the reservations. We don't need to use EFIs here > + * like the kernel, but we'll do it to keep the code matched. > + */ > + fsbno = XFS_AGB_TO_FSB(sc->mp, resv->pag->pag_agno, free_agbno); > + error = -libxfs_free_extent_later(sc->tp, fsbno, free_aglen, > + &bkl->oinfo, XFS_AG_RESV_NONE, true); > + if (error) > + return error; > + > + return 1; > +} > + > /* Free all the accounting info and disk space we reserved for a new btree. */ > -void > -bulkload_commit( > - struct bulkload *bkl) > +static int > +bulkload_free( > + struct bulkload *bkl, > + bool btree_committed) > { > + struct repair_ctx *sc = bkl->sc; > struct bulkload_resv *resv, *n; > + unsigned int freed = 0; > + int error = 0; > > list_for_each_entry_safe(resv, n, &bkl->resv_list, list) { > + int ret; > + > + ret = bulkload_free_extent(bkl, resv, btree_committed); > list_del(&resv->list); > + libxfs_perag_put(resv->pag); > kfree(resv); > + > + if (ret < 0) { > + error = ret; > + goto junkit; > + } > + > + freed += ret; > + if (freed >= XREP_MAX_ITRUNCATE_EFIS) { > + error = -libxfs_defer_finish(&sc->tp); > + if (error) > + goto junkit; > + freed = 0; > + } > } > + > + if (freed) > + error = -libxfs_defer_finish(&sc->tp); > +junkit: > + /* > + * If we still have reservations attached to @newbt, cleanup must have > + * failed and the filesystem is about to go down. Clean up the incore > + * reservations. > + */ > + list_for_each_entry_safe(resv, n, &bkl->resv_list, list) { > + list_del(&resv->list); > + libxfs_perag_put(resv->pag); > + kfree(resv); > + } > + > + if (sc->ip) { > + kmem_cache_free(xfs_ifork_cache, bkl->ifake.if_fork); > + bkl->ifake.if_fork = NULL; > + } > + > + return error; > +} > + > +/* > + * Free all the accounting info and unused disk space allocations after > + * committing a new btree. > + */ > +int > +bulkload_commit( > + struct bulkload *bkl) > +{ > + return bulkload_free(bkl, true); > +} > + > +/* > + * Free all the accounting info and all of the disk space we reserved for a new > + * btree that we're not going to commit. We want to try to roll things back > + * cleanly for things like ENOSPC midway through allocation. > + */ > +void > +bulkload_cancel( > + struct bulkload *bkl) > +{ > + bulkload_free(bkl, false); > } > > /* Feed one of the reserved btree blocks to the bulk loader. */ > diff --git a/repair/bulkload.h b/repair/bulkload.h > index f4790e3b3de6..a88aafaa678a 100644 > --- a/repair/bulkload.h > +++ b/repair/bulkload.h > @@ -8,9 +8,17 @@ > > extern int bload_leaf_slack; > extern int bload_node_slack; > +/* > + * This is the maximum number of deferred extent freeing item extents (EFIs) > + * that we'll attach to a transaction without rolling the transaction to avoid > + * overrunning a tr_itruncate reservation. > + */ > +#define XREP_MAX_ITRUNCATE_EFIS (128) > > struct repair_ctx { > struct xfs_mount *mp; > + struct xfs_inode *ip; > + struct xfs_trans *tp; > }; > > struct bulkload_resv { > @@ -36,7 +44,10 @@ struct bulkload { > struct list_head resv_list; > > /* Fake root for new btree. */ > - struct xbtree_afakeroot afake; > + union { > + struct xbtree_afakeroot afake; > + struct xbtree_ifakeroot ifake; > + }; > > /* rmap owner of these blocks */ > struct xfs_owner_info oinfo; > @@ -44,6 +55,9 @@ struct bulkload { > /* The last reservation we allocated from. */ > struct bulkload_resv *last_resv; > > + /* Hint as to where we should allocate blocks. */ > + xfs_fsblock_t alloc_hint; > + > /* Number of blocks reserved via resv_list. */ > unsigned int nr_reserved; > }; > @@ -52,12 +66,16 @@ struct bulkload { > list_for_each_entry_safe((resv), (n), &(bkl)->resv_list, list) > > void bulkload_init_ag(struct bulkload *bkl, struct repair_ctx *sc, > - const struct xfs_owner_info *oinfo); > + const struct xfs_owner_info *oinfo, xfs_fsblock_t alloc_hint); > +void bulkload_init_inode(struct bulkload *bkl, struct repair_ctx *sc, > + int whichfork, const struct xfs_owner_info *oinfo); > int bulkload_claim_block(struct xfs_btree_cur *cur, struct bulkload *bkl, > union xfs_btree_ptr *ptr); > int bulkload_add_extent(struct bulkload *bkl, struct xfs_perag *pag, > xfs_agblock_t agbno, xfs_extlen_t len); > -void bulkload_commit(struct bulkload *bkl); > +int bulkload_alloc_file_blocks(struct bulkload *bkl, uint64_t nr_blocks); > +void bulkload_cancel(struct bulkload *bkl); > +int bulkload_commit(struct bulkload *bkl); > void bulkload_estimate_ag_slack(struct repair_ctx *sc, > struct xfs_btree_bload *bload, unsigned int free); > > diff --git a/repair/dinode.c b/repair/dinode.c > index a18af3ff7772..b8f5bf4e550e 100644 > --- a/repair/dinode.c > +++ b/repair/dinode.c > @@ -20,6 +20,7 @@ > #include "threads.h" > #include "slab.h" > #include "rmap.h" > +#include "bmap_repair.h" > > /* > * gettext lookups for translations of strings use mutexes internally to > @@ -1909,7 +1910,9 @@ process_inode_data_fork( > xfs_ino_t lino = XFS_AGINO_TO_INO(mp, agno, ino); > int err = 0; > xfs_extnum_t nex, max_nex; > + int try_rebuild = -1; /* don't know yet */ > > +retry: > /* > * extent count on disk is only valid for positive values. The kernel > * uses negative values in memory. hence if we see negative numbers > @@ -1938,11 +1941,15 @@ process_inode_data_fork( > *totblocks = 0; > break; > case XFS_DINODE_FMT_EXTENTS: > + if (!rmapbt_suspect && try_rebuild == -1) > + try_rebuild = 1; > err = process_exinode(mp, agno, ino, dino, type, dirty, > totblocks, nextents, dblkmap, XFS_DATA_FORK, > check_dups); > break; > case XFS_DINODE_FMT_BTREE: > + if (!rmapbt_suspect && try_rebuild == -1) > + try_rebuild = 1; > err = process_btinode(mp, agno, ino, dino, type, dirty, > totblocks, nextents, dblkmap, XFS_DATA_FORK, > check_dups); > @@ -1958,8 +1965,28 @@ process_inode_data_fork( > if (err) { > do_warn(_("bad data fork in inode %" PRIu64 "\n"), lino); > if (!no_modify) { > + if (try_rebuild == 1) { > + do_warn( > +_("rebuilding inode %"PRIu64" data fork\n"), > + lino); > + try_rebuild = 0; > + err = rebuild_bmap(mp, lino, XFS_DATA_FORK, > + be32_to_cpu(dino->di_nextents), > + ino_bpp, dinop, dirty); > + dino = *dinop; > + if (!err) > + goto retry; > + do_warn( > +_("inode %"PRIu64" data fork rebuild failed, error %d, clearing\n"), > + lino, err); > + } > clear_dinode(mp, dino, lino); > *dirty += 1; > + ASSERT(*dirty > 0); > + } else if (try_rebuild == 1) { > + do_warn( > +_("would have tried to rebuild inode %"PRIu64" data fork\n"), > + lino); > } > return 1; > } > @@ -2025,7 +2052,9 @@ process_inode_attr_fork( > struct blkmap *ablkmap = NULL; > int repair = 0; > int err; > + int try_rebuild = -1; /* don't know yet */ > > +retry: > if (!dino->di_forkoff) { > *anextents = 0; > if (dino->di_aformat != XFS_DINODE_FMT_EXTENTS) { > @@ -2052,6 +2081,8 @@ process_inode_attr_fork( > err = process_lclinode(mp, agno, ino, dino, XFS_ATTR_FORK); > break; > case XFS_DINODE_FMT_EXTENTS: > + if (!rmapbt_suspect && try_rebuild == -1) > + try_rebuild = 1; > ablkmap = blkmap_alloc(*anextents, XFS_ATTR_FORK); > *anextents = 0; > err = process_exinode(mp, agno, ino, dino, type, dirty, > @@ -2059,6 +2090,8 @@ process_inode_attr_fork( > XFS_ATTR_FORK, check_dups); > break; > case XFS_DINODE_FMT_BTREE: > + if (!rmapbt_suspect && try_rebuild == -1) > + try_rebuild = 1; > ablkmap = blkmap_alloc(*anextents, XFS_ATTR_FORK); > *anextents = 0; > err = process_btinode(mp, agno, ino, dino, type, dirty, > @@ -2084,10 +2117,29 @@ process_inode_attr_fork( > do_warn(_("bad attribute fork in inode %" PRIu64 "\n"), lino); > > if (!no_modify) { > + if (try_rebuild == 1) { > + do_warn( > +_("rebuilding inode %"PRIu64" attr fork\n"), > + lino); > + try_rebuild = 0; > + err = rebuild_bmap(mp, lino, XFS_ATTR_FORK, > + be16_to_cpu(dino->di_anextents), > + ino_bpp, dinop, dirty); > + dino = *dinop; > + if (!err) > + goto retry; > + do_warn( > +_("inode %"PRIu64" attr fork rebuild failed, error %d"), > + lino, err); > + } > do_warn(_(", clearing attr fork\n")); > *dirty += clear_dinode_attr(mp, dino, lino); > ASSERT(*dirty > 0); > - } else { > + } else if (try_rebuild) { > + do_warn( > +_("would have tried to rebuild inode %"PRIu64" attr fork or cleared it\n"), > + lino); > + } else { > do_warn(_(", would clear attr fork\n")); > } > > diff --git a/repair/rmap.c b/repair/rmap.c > index 6bb77e082492..a2291c7b3b01 100644 > --- a/repair/rmap.c > +++ b/repair/rmap.c > @@ -33,7 +33,7 @@ struct xfs_ag_rmap { > }; > > static struct xfs_ag_rmap *ag_rmaps; > -static bool rmapbt_suspect; > +bool rmapbt_suspect; > static bool refcbt_suspect; > > static inline int rmap_compare(const void *a, const void *b) > diff --git a/repair/rmap.h b/repair/rmap.h > index 6004e9f68b63..1dad2f5890a4 100644 > --- a/repair/rmap.h > +++ b/repair/rmap.h > @@ -7,6 +7,7 @@ > #define RMAP_H_ > > extern bool collect_rmaps; > +extern bool rmapbt_suspect; > > extern bool rmap_needs_work(struct xfs_mount *); > > >