From: Jared Van Bortel <jared.e.vb@gmail.com>
To: Btrfs BTRFS <linux-btrfs@vger.kernel.org>
Subject: system drive corruption, btrfs check failure
Date: Fri, 29 Mar 2024 13:30:31 -0400 [thread overview]
Message-ID: <CALsQ4_x-5+W_7NQR68nTiCM9aptigGf6+HD=jLftrxgXTOLyRA@mail.gmail.com> (raw)
[-- Attachment #1: Type: text/plain, Size: 2330 bytes --]
Hi,
Yesterday I ran `pacman -Syu` to update my Arch Linux installation. I
saw a lot of complaints from ldconfig, and programs started crashing.
Thinking it was related to having only 7GiB of free space available, I
tried deleting some large files and reinstalling the affected
packages. I saw no clear improvement from this, and eventually decided
to shut my computer down.
I booted memtest, and it completed a full pass without errors. I then
booted a live USB and ran `btrfs check --readonly /dev/nvme0n1p2`, and
saw a long list of errors, realizing my filesystem is most likely
beyond repair.
Basic information (RAID1 metadata, single data):
```
Label: 'System' uuid: 76721faa-8c32-4e70-8a9e-859dece0aec1
Total devices 2 FS bytes used 2.18TiB
devid 1 size 422.63GiB used 422.63GiB path /dev/nvme0n1p2
devid 2 size 1.82TiB used 1.82TiB path /dev/nvme1n1
```
The installed kernel is linux-zen 6.6.10 with a few patches. The live
USB I'm using has the Arch Linux 6.4.7-arch1-1 kernel. Full `btrfs
check` log and smartctl information is attached.
There are three main errors. One:
```
ref mismatch on [1248293634048 16384] extent item 1, found 0
tree extent[1248293634048, 16384] parent 2368656916480 has no tree block found
incorrect global backref count on 1248293634048 found 1 wanted 0
backpointer mismatch on [1248293634048 16384]
owner ref check failed [1248293634048 16384]
```
Two:
```
ref mismatch on [1261902016512 4096] extent item 2, found 1
data extent[1261902016512, 4096] bytenr mimsmatch, extent item bytenr
1261902016512 file item bytenr 0
data extent[1261902016512, 4096] referencer count mismatch (parent
2369673248768) wanted 1 have 0
backpointer mismatch on [1261902016512 4096]
```
Three:
```
block group 1342751899648 has wrong amount of free space, free space
cache has 34193408 block group has 42893312
failed to load free space cache for block group 1342751899648
```
And this warning:
```
[4/7] checking fs roots
warning line 3916
```
I bought some replacement disks that I can install alongside the old
ones, but I don't have a recent backup of the full FS. It seems to
mount readonly without issue.
What's the best way to recover the data that's left? And is there any
clue here as to what went wrong? I'm really not sure. If this is a
drive failure, it seems premature.
Thanks,
Jared
[-- Attachment #2: smart_nvme0n1.txt --]
[-- Type: text/plain, Size: 3425 bytes --]
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.4.7-arch1-1] (local build)
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Number: SAMSUNG MZ1WV480HCGL-000MV
Serial Number: S1Y0NY0HC02614
Firmware Version: BXU87M9Q
PCI Vendor/Subsystem ID: 0x144d
IEEE OUI Identifier: 0x002538
Controller ID: 5197
NVMe Version: <1.2
Number of Namespaces: 1
Namespace 1 Size/Capacity: 480,103,981,056 [480 GB]
Namespace 1 Utilization: 470,449,700,864 [470 GB]
Namespace 1 Formatted LBA Size: 512
Namespace 1 IEEE EUI-64: 000000 0000002538
Local Time is: Fri Mar 29 17:08:41 2024 UTC
Firmware Updates (0x07): 3 Slots, Slot 1 R/O
Optional Admin Commands (0x0006): Format Frmw_DL
Optional NVM Commands (0x0014): DS_Mngmt Sav/Sel_Feat
Log Page Attributes (0x01): S/H_per_NS
Supported Power States
St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat
0 + 8.00W - - 0 0 0 0 30 30
Supported LBA Sizes (NSID 0x1)
Id Fmt Data Metadt Rel_Perf
0 + 512 0 0
=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
SMART/Health Information (NVMe Log 0x02)
Critical Warning: 0x00
Temperature: 33 Celsius
Available Spare: 100%
Available Spare Threshold: 10%
Percentage Used: 1%
Data Units Read: 93,578,386 [47.9 TB]
Data Units Written: 190,894,684 [97.7 TB]
Host Read Commands: 946,743,056
Host Write Commands: 2,347,043,730
Controller Busy Time: 227,928
Power Cycles: 477
Power On Hours: 33,750
Unsafe Shutdowns: 109
Media and Data Integrity Errors: 10
Error Information Log Entries: 620
Warning Comp. Temperature Time: 87
Critical Comp. Temperature Time: 49
Error Information (NVMe Log 0x01, 16 of 64 entries)
Num ErrCount SQId CmdId Status PELoc LBA NSID VS
0 620 0 0x001d 0x4004 0x000 0 0 -
1 619 0 0x001c 0x4212 0x028 0 255 -
2 618 0 0x0002 0x4004 0x000 0 0 -
3 617 0 0x001d 0x4004 0x000 0 0 -
4 616 0 0x001c 0x4212 0x028 0 255 -
5 615 0 0x0002 0x4004 0x000 0 0 -
6 614 0 0x001d 0x4004 0x000 0 0 -
7 613 0 0x001c 0x4212 0x028 0 255 -
8 612 0 0x0002 0x4004 0x000 0 0 -
9 611 0 0x001d 0x4004 0x000 0 0 -
10 610 0 0x001c 0x4212 0x028 0 255 -
11 609 0 0x0002 0x4004 0x000 0 0 -
12 608 0 0x001d 0x4004 0x000 0 0 -
13 607 0 0x001c 0x4212 0x028 0 255 -
14 606 0 0x0002 0x4004 0x000 0 0 -
15 605 0 0xa01a 0x4004 0x000 0 1 -
... (48 entries not read)
[-- Attachment #3: check.log.gz --]
[-- Type: application/x-gzip, Size: 66526 bytes --]
[-- Attachment #4: smart_nvme1n1.txt --]
[-- Type: text/plain, Size: 2850 bytes --]
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.4.7-arch1-1] (local build)
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Number: PCIe SSD
Serial Number: 21051220002754
Firmware Version: ECFM53.0
PCI Vendor/Subsystem ID: 0x1987
IEEE OUI Identifier: 0x6479a7
Total NVM Capacity: 2,000,398,934,016 [2.00 TB]
Unallocated NVM Capacity: 0
Controller ID: 1
NVMe Version: 1.3
Number of Namespaces: 1
Namespace 1 Size/Capacity: 2,000,398,934,016 [2.00 TB]
Namespace 1 Formatted LBA Size: 512
Namespace 1 IEEE EUI-64: 6479a7 4d10203702
Local Time is: Fri Mar 29 17:09:07 2024 UTC
Firmware Updates (0x12): 1 Slot, no Reset required
Optional Admin Commands (0x0017): Security Format Frmw_DL Self_Test
Optional NVM Commands (0x005d): Comp DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp
Log Page Attributes (0x08): Telmtry_Lg
Maximum Data Transfer Size: 512 Pages
Warning Comp. Temp. Threshold: 75 Celsius
Critical Comp. Temp. Threshold: 80 Celsius
Supported Power States
St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat
0 + 7.13W - - 0 0 0 0 0 0
1 + 5.29W - - 1 1 1 1 0 0
2 + 4.36W - - 2 2 2 2 0 0
3 - 0.0490W - - 3 3 3 3 2000 2000
4 - 0.0018W - - 4 4 4 4 25000 25000
Supported LBA Sizes (NSID 0x1)
Id Fmt Data Metadt Rel_Perf
0 + 512 0 2
1 - 4096 0 1
=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
SMART/Health Information (NVMe Log 0x02)
Critical Warning: 0x00
Temperature: 29 Celsius
Available Spare: 100%
Available Spare Threshold: 5%
Percentage Used: 20%
Data Units Read: 155,463,764 [79.5 TB]
Data Units Written: 246,041,613 [125 TB]
Host Read Commands: 936,230,375
Host Write Commands: 1,683,141,962
Controller Busy Time: 5,320
Power Cycles: 338
Power On Hours: 20,034
Unsafe Shutdowns: 77
Media and Data Integrity Errors: 0
Error Information Log Entries: 661
Warning Comp. Temperature Time: 0
Critical Comp. Temperature Time: 0
Error Information (NVMe Log 0x01, 16 of 63 entries)
Num ErrCount SQId CmdId Status PELoc LBA NSID VS
0 661 0 0x5003 0x4004 0x028 0 0 -
next reply other threads:[~2024-03-29 17:30 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-03-29 17:30 Jared Van Bortel [this message]
2024-03-29 23:42 ` system drive corruption, btrfs check failure Qu Wenruo
2024-05-19 2:17 ` Jared Van Bortel
2024-05-19 3:34 ` Qu Wenruo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CALsQ4_x-5+W_7NQR68nTiCM9aptigGf6+HD=jLftrxgXTOLyRA@mail.gmail.com' \
--to=jared.e.vb@gmail.com \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).