Linux-BTRFS Archive mirror
 help / color / mirror / Atom feed
From: Jared Van Bortel <jared.e.vb@gmail.com>
To: Btrfs BTRFS <linux-btrfs@vger.kernel.org>
Subject: system drive corruption, btrfs check failure
Date: Fri, 29 Mar 2024 13:30:31 -0400	[thread overview]
Message-ID: <CALsQ4_x-5+W_7NQR68nTiCM9aptigGf6+HD=jLftrxgXTOLyRA@mail.gmail.com> (raw)

[-- Attachment #1: Type: text/plain, Size: 2330 bytes --]

Hi,

Yesterday I ran `pacman -Syu` to update my Arch Linux installation. I
saw a lot of complaints from ldconfig, and programs started crashing.
Thinking it was related to having only 7GiB of free space available, I
tried deleting some large files and reinstalling the affected
packages. I saw no clear improvement from this, and eventually decided
to shut my computer down.

I booted memtest, and it completed a full pass without errors. I then
booted a live USB and ran `btrfs check --readonly /dev/nvme0n1p2`, and
saw a long list of errors, realizing my filesystem is most likely
beyond repair.

Basic information (RAID1 metadata, single data):
```
Label: 'System'  uuid: 76721faa-8c32-4e70-8a9e-859dece0aec1
Total devices 2 FS bytes used 2.18TiB
devid    1 size 422.63GiB used 422.63GiB path /dev/nvme0n1p2
devid    2 size 1.82TiB used 1.82TiB path /dev/nvme1n1
```
The installed kernel is linux-zen 6.6.10 with a few patches. The live
USB I'm using has the Arch Linux 6.4.7-arch1-1 kernel. Full `btrfs
check` log and smartctl information is attached.

There are three main errors. One:
```
ref mismatch on [1248293634048 16384] extent item 1, found 0
tree extent[1248293634048, 16384] parent 2368656916480 has no tree block found
incorrect global backref count on 1248293634048 found 1 wanted 0
backpointer mismatch on [1248293634048 16384]
owner ref check failed [1248293634048 16384]
```

Two:
```
ref mismatch on [1261902016512 4096] extent item 2, found 1
data extent[1261902016512, 4096] bytenr mimsmatch, extent item bytenr
1261902016512 file item bytenr 0
data extent[1261902016512, 4096] referencer count mismatch (parent
2369673248768) wanted 1 have 0
backpointer mismatch on [1261902016512 4096]
```

Three:
```
block group 1342751899648 has wrong amount of free space, free space
cache has 34193408 block group has 42893312
failed to load free space cache for block group 1342751899648
```

And this warning:
```
[4/7] checking fs roots
warning line 3916
```

I bought some replacement disks that I can install alongside the old
ones, but I don't have a recent backup of the full FS. It seems to
mount readonly without issue.

What's the best way to recover the data that's left? And is there any
clue here as to what went wrong? I'm really not sure. If this is a
drive failure, it seems premature.

Thanks,
Jared

[-- Attachment #2: smart_nvme0n1.txt --]
[-- Type: text/plain, Size: 3425 bytes --]

smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.4.7-arch1-1] (local build)
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       SAMSUNG MZ1WV480HCGL-000MV
Serial Number:                      S1Y0NY0HC02614
Firmware Version:                   BXU87M9Q
PCI Vendor/Subsystem ID:            0x144d
IEEE OUI Identifier:                0x002538
Controller ID:                      5197
NVMe Version:                       <1.2
Number of Namespaces:               1
Namespace 1 Size/Capacity:          480,103,981,056 [480 GB]
Namespace 1 Utilization:            470,449,700,864 [470 GB]
Namespace 1 Formatted LBA Size:     512
Namespace 1 IEEE EUI-64:            000000 0000002538
Local Time is:                      Fri Mar 29 17:08:41 2024 UTC
Firmware Updates (0x07):            3 Slots, Slot 1 R/O
Optional Admin Commands (0x0006):   Format Frmw_DL
Optional NVM Commands (0x0014):     DS_Mngmt Sav/Sel_Feat
Log Page Attributes (0x01):         S/H_per_NS

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +     8.00W       -        -    0  0  0  0       30      30

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 +     512       0         0

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        33 Celsius
Available Spare:                    100%
Available Spare Threshold:          10%
Percentage Used:                    1%
Data Units Read:                    93,578,386 [47.9 TB]
Data Units Written:                 190,894,684 [97.7 TB]
Host Read Commands:                 946,743,056
Host Write Commands:                2,347,043,730
Controller Busy Time:               227,928
Power Cycles:                       477
Power On Hours:                     33,750
Unsafe Shutdowns:                   109
Media and Data Integrity Errors:    10
Error Information Log Entries:      620
Warning  Comp. Temperature Time:    87
Critical Comp. Temperature Time:    49

Error Information (NVMe Log 0x01, 16 of 64 entries)
Num   ErrCount  SQId   CmdId  Status  PELoc          LBA  NSID    VS
  0        620     0  0x001d  0x4004  0x000            0     0     -
  1        619     0  0x001c  0x4212  0x028            0   255     -
  2        618     0  0x0002  0x4004  0x000            0     0     -
  3        617     0  0x001d  0x4004  0x000            0     0     -
  4        616     0  0x001c  0x4212  0x028            0   255     -
  5        615     0  0x0002  0x4004  0x000            0     0     -
  6        614     0  0x001d  0x4004  0x000            0     0     -
  7        613     0  0x001c  0x4212  0x028            0   255     -
  8        612     0  0x0002  0x4004  0x000            0     0     -
  9        611     0  0x001d  0x4004  0x000            0     0     -
 10        610     0  0x001c  0x4212  0x028            0   255     -
 11        609     0  0x0002  0x4004  0x000            0     0     -
 12        608     0  0x001d  0x4004  0x000            0     0     -
 13        607     0  0x001c  0x4212  0x028            0   255     -
 14        606     0  0x0002  0x4004  0x000            0     0     -
 15        605     0  0xa01a  0x4004  0x000            0     1     -
... (48 entries not read)


[-- Attachment #3: check.log.gz --]
[-- Type: application/x-gzip, Size: 66526 bytes --]

[-- Attachment #4: smart_nvme1n1.txt --]
[-- Type: text/plain, Size: 2850 bytes --]

smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.4.7-arch1-1] (local build)
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       PCIe SSD
Serial Number:                      21051220002754
Firmware Version:                   ECFM53.0
PCI Vendor/Subsystem ID:            0x1987
IEEE OUI Identifier:                0x6479a7
Total NVM Capacity:                 2,000,398,934,016 [2.00 TB]
Unallocated NVM Capacity:           0
Controller ID:                      1
NVMe Version:                       1.3
Number of Namespaces:               1
Namespace 1 Size/Capacity:          2,000,398,934,016 [2.00 TB]
Namespace 1 Formatted LBA Size:     512
Namespace 1 IEEE EUI-64:            6479a7 4d10203702
Local Time is:                      Fri Mar 29 17:09:07 2024 UTC
Firmware Updates (0x12):            1 Slot, no Reset required
Optional Admin Commands (0x0017):   Security Format Frmw_DL Self_Test
Optional NVM Commands (0x005d):     Comp DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp
Log Page Attributes (0x08):         Telmtry_Lg
Maximum Data Transfer Size:         512 Pages
Warning  Comp. Temp. Threshold:     75 Celsius
Critical Comp. Temp. Threshold:     80 Celsius

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +     7.13W       -        -    0  0  0  0        0       0
 1 +     5.29W       -        -    1  1  1  1        0       0
 2 +     4.36W       -        -    2  2  2  2        0       0
 3 -   0.0490W       -        -    3  3  3  3     2000    2000
 4 -   0.0018W       -        -    4  4  4  4    25000   25000

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 +     512       0         2
 1 -    4096       0         1

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        29 Celsius
Available Spare:                    100%
Available Spare Threshold:          5%
Percentage Used:                    20%
Data Units Read:                    155,463,764 [79.5 TB]
Data Units Written:                 246,041,613 [125 TB]
Host Read Commands:                 936,230,375
Host Write Commands:                1,683,141,962
Controller Busy Time:               5,320
Power Cycles:                       338
Power On Hours:                     20,034
Unsafe Shutdowns:                   77
Media and Data Integrity Errors:    0
Error Information Log Entries:      661
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0

Error Information (NVMe Log 0x01, 16 of 63 entries)
Num   ErrCount  SQId   CmdId  Status  PELoc          LBA  NSID    VS
  0        661     0  0x5003  0x4004  0x028            0     0     -


             reply	other threads:[~2024-03-29 17:30 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-29 17:30 Jared Van Bortel [this message]
2024-03-29 23:42 ` system drive corruption, btrfs check failure Qu Wenruo
2024-05-19  2:17   ` Jared Van Bortel
2024-05-19  3:34     ` Qu Wenruo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CALsQ4_x-5+W_7NQR68nTiCM9aptigGf6+HD=jLftrxgXTOLyRA@mail.gmail.com' \
    --to=jared.e.vb@gmail.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).