Recover from "couldn't read tree root"?

Linux-BTRFS Archive mirror
 help / color / mirror / Atom feed

* Recover from "couldn't read tree root"?
@ 2021-06-20 20:37 Nathan Dehnel
  2021-06-20 21:09 ` Chris Murphy
  2021-06-20 21:19 ` Chris Murphy
  0 siblings, 2 replies; 8+ messages in thread
From: Nathan Dehnel @ 2021-06-20 20:37 UTC (permalink / raw
  To: Btrfs BTRFS

A machine failed to boot, so I tried to mount its root partition from
systemrescuecd, which failed:

[ 5404.240019] BTRFS info (device bcache3): disk space caching is enabled
[ 5404.240022] BTRFS info (device bcache3): has skinny extents
[ 5404.243195] BTRFS error (device bcache3): parent transid verify
failed on 3004631449600 wanted 1420882 found 1420435
[ 5404.243279] BTRFS error (device bcache3): parent transid verify
failed on 3004631449600 wanted 1420882 found 1420435
[ 5404.243362] BTRFS error (device bcache3): parent transid verify
failed on 3004631449600 wanted 1420882 found 1420435
[ 5404.243432] BTRFS error (device bcache3): parent transid verify
failed on 3004631449600 wanted 1420882 found 1420435
[ 5404.243435] BTRFS warning (device bcache3): couldn't read tree root
[ 5404.244114] BTRFS error (device bcache3): open_ctree failed

btrfs rescue super-recover -v /dev/bcache0 returned this:

parent transid verify failed on 3004631449600 wanted 1420882 found 1420435
parent transid verify failed on 3004631449600 wanted 1420882 found 1420435
parent transid verify failed on 3004631449600 wanted 1420882 found 1420435
parent transid verify failed on 3004631449600 wanted 1420882 found 1420435
parent transid verify failed on 3004631449600 wanted 1420882 found 1420435
Ignoring transid failure
ERROR: could not setup extent tree
Failed to recover bad superblocks

uname -a:

Linux sysrescue 5.10.34-1-lts #1 SMP Sun, 02 May 2021 12:41:09 +0000
x86_64 GNU/Linux

btrfs --version:

btrfs-progs v5.10.1

btrfs fi show:

Label: none  uuid: 76189222-b60d-4402-a7ff-141f057e8574
        Total devices 10 FS bytes used 1.50TiB
        devid    1 size 931.51GiB used 311.03GiB path /dev/bcache3
        devid    2 size 931.51GiB used 311.00GiB path /dev/bcache2
        devid    3 size 931.51GiB used 311.00GiB path /dev/bcache1
        devid    4 size 931.51GiB used 311.00GiB path /dev/bcache0
        devid    5 size 931.51GiB used 311.00GiB path /dev/bcache4
        devid    6 size 931.51GiB used 311.00GiB path /dev/bcache8
        devid    7 size 931.51GiB used 311.00GiB path /dev/bcache6
        devid    8 size 931.51GiB used 311.03GiB path /dev/bcache9
        devid    9 size 931.51GiB used 311.03GiB path /dev/bcache7
        devid   10 size 931.51GiB used 311.03GiB path /dev/bcache5

Is this filesystem recoverable?

(Sorry, re-sending because I forgot to add a subject)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Recover from "couldn't read tree root"?
  2021-06-20 20:37 Recover from "couldn't read tree root"? Nathan Dehnel
@ 2021-06-20 21:09 ` Chris Murphy
  2021-06-20 21:31   ` Nathan Dehnel
  2021-06-20 21:19 ` Chris Murphy
  1 sibling, 1 reply; 8+ messages in thread
From: Chris Murphy @ 2021-06-20 21:09 UTC (permalink / raw
  To: Nathan Dehnel; +Cc: Btrfs BTRFS

On Sun, Jun 20, 2021 at 2:38 PM Nathan Dehnel <ncdehnel@gmail.com> wrote:
>
> A machine failed to boot, so I tried to mount its root partition from
> systemrescuecd, which failed:
>
> [ 5404.240019] BTRFS info (device bcache3): disk space caching is enabled
> [ 5404.240022] BTRFS info (device bcache3): has skinny extents
> [ 5404.243195] BTRFS error (device bcache3): parent transid verify
> failed on 3004631449600 wanted 1420882 found 1420435
> [ 5404.243279] BTRFS error (device bcache3): parent transid verify
> failed on 3004631449600 wanted 1420882 found 1420435
> [ 5404.243362] BTRFS error (device bcache3): parent transid verify
> failed on 3004631449600 wanted 1420882 found 1420435
> [ 5404.243432] BTRFS error (device bcache3): parent transid verify
> failed on 3004631449600 wanted 1420882 found 1420435
> [ 5404.243435] BTRFS warning (device bcache3): couldn't read tree root
> [ 5404.244114] BTRFS error (device bcache3): open_ctree failed
>
> btrfs rescue super-recover -v /dev/bcache0 returned this:
>
> parent transid verify failed on 3004631449600 wanted 1420882 found 1420435
> parent transid verify failed on 3004631449600 wanted 1420882 found 1420435
> parent transid verify failed on 3004631449600 wanted 1420882 found 1420435
> parent transid verify failed on 3004631449600 wanted 1420882 found 1420435
> parent transid verify failed on 3004631449600 wanted 1420882 found 1420435
> Ignoring transid failure
> ERROR: could not setup extent tree
> Failed to recover bad superblocks
>
> uname -a:
>
> Linux sysrescue 5.10.34-1-lts #1 SMP Sun, 02 May 2021 12:41:09 +0000
> x86_64 GNU/Linux
>
> btrfs --version:
>
> btrfs-progs v5.10.1
>
> btrfs fi show:
>
> Label: none  uuid: 76189222-b60d-4402-a7ff-141f057e8574
>         Total devices 10 FS bytes used 1.50TiB
>         devid    1 size 931.51GiB used 311.03GiB path /dev/bcache3
>         devid    2 size 931.51GiB used 311.00GiB path /dev/bcache2
>         devid    3 size 931.51GiB used 311.00GiB path /dev/bcache1
>         devid    4 size 931.51GiB used 311.00GiB path /dev/bcache0
>         devid    5 size 931.51GiB used 311.00GiB path /dev/bcache4
>         devid    6 size 931.51GiB used 311.00GiB path /dev/bcache8
>         devid    7 size 931.51GiB used 311.00GiB path /dev/bcache6
>         devid    8 size 931.51GiB used 311.03GiB path /dev/bcache9
>         devid    9 size 931.51GiB used 311.03GiB path /dev/bcache7
>         devid   10 size 931.51GiB used 311.03GiB path /dev/bcache5
>
> Is this filesystem recoverable?
>> (Sorry, re-sending because I forgot to add a subject)

Definitely don't write any irreversible changes, such as a repair
attempt, to anything until you understand what what wrong or it'll
make recovery harder or impossible.

Was bcache in write back or write through mode?

What's the configuration? Can you supply something like

lsblk -o NAME,FSTYPE,SIZE,FSUSE%,MOUNTPOINT,UUID,MIN-IO,SCHED,DISC-GRAN,MODEL



-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Recover from "couldn't read tree root"?
  2021-06-20 20:37 Recover from "couldn't read tree root"? Nathan Dehnel
  2021-06-20 21:09 ` Chris Murphy
@ 2021-06-20 21:19 ` Chris Murphy
  2021-06-20 21:48   ` Nathan Dehnel
  1 sibling, 1 reply; 8+ messages in thread
From: Chris Murphy @ 2021-06-20 21:19 UTC (permalink / raw
  To: Nathan Dehnel; +Cc: Btrfs BTRFS

On Sun, Jun 20, 2021 at 2:38 PM Nathan Dehnel <ncdehnel@gmail.com> wrote:
>
> A machine failed to boot, so I tried to mount its root partition from
> systemrescuecd, which failed:
>
> [ 5404.240019] BTRFS info (device bcache3): disk space caching is enabled
> [ 5404.240022] BTRFS info (device bcache3): has skinny extents
> [ 5404.243195] BTRFS error (device bcache3): parent transid verify
> failed on 3004631449600 wanted 1420882 found 1420435
> [ 5404.243279] BTRFS error (device bcache3): parent transid verify
> failed on 3004631449600 wanted 1420882 found 1420435
> [ 5404.243362] BTRFS error (device bcache3): parent transid verify
> failed on 3004631449600 wanted 1420882 found 1420435
> [ 5404.243432] BTRFS error (device bcache3): parent transid verify
> failed on 3004631449600 wanted 1420882 found 1420435
> [ 5404.243435] BTRFS warning (device bcache3): couldn't read tree root
> [ 5404.244114] BTRFS error (device bcache3): open_ctree failed

This is generally bad, and means some lower layer did something wrong,
such as getting write order incorrect, i.e. failing to properly honor
flush/fua. Recovery can be difficult and take a while.
https://btrfs.wiki.kernel.org/index.php/Problem_FAQ#parent_transid_verify_failed

I suggest searching logs since the last time this file system was
working, because the above error is indicating a problem that's
already happened and what we need to know is what happened, if
possible. Something like this:

journalctl --since=-5d -k -o short-monotonic --no-hostname | grep
"Linux version\| ata\|bcache\|Btrfs\|BTRFS\|] hd\| scsi\| sd\| sdhci\|
mmc\| nvme\| usb\| vd"

> btrfs rescue super-recover -v /dev/bcache0 returned this:
>
> parent transid verify failed on 3004631449600 wanted 1420882 found 1420435
> parent transid verify failed on 3004631449600 wanted 1420882 found 1420435
> parent transid verify failed on 3004631449600 wanted 1420882 found 1420435
> parent transid verify failed on 3004631449600 wanted 1420882 found 1420435
> parent transid verify failed on 3004631449600 wanted 1420882 found 1420435
> Ignoring transid failure
> ERROR: could not setup extent tree
> Failed to recover bad superblocks

OK something is really wrong if you're not able to see a single
superblock on any of the bcache devices. Every member device has  3
super blocks, given the sizes you've provided. For there to not be a
single one is a spectacular failure as if the bcache cache device
isn't returning correct information for any of them. So I'm gonna
guess a single shared SSD, which is a single point of failure, and
it's spitting out garbage or zeros. But I'm not even close to a bcache
expert so you might want to ask bcache developers how to figure out
what state bcache is in and whether and how to safely decouple it from
the backing drives so that you can engage in recovery attempts.

If bcache mode is write through, there's a chance the backing drives
have valid btrfs metadata, and it's just that on read the SSD is
returning bogus information.

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Recover from "couldn't read tree root"?
  2021-06-20 21:09 ` Chris Murphy
@ 2021-06-20 21:31   ` Nathan Dehnel
  2021-06-20 22:19     ` Chris Murphy
  2021-06-20 22:53     ` Chris Murphy
  0 siblings, 2 replies; 8+ messages in thread
From: Nathan Dehnel @ 2021-06-20 21:31 UTC (permalink / raw
  To: Chris Murphy; +Cc: Btrfs BTRFS

>Was bcache in write back or write through mode?
Writeback.

>What's the configuration?
NAME        FSTYPE              SIZE FSUSE% MOUNTPOINT
UUID                                  MIN-IO SCHED       DISC-GRAN
MODEL
loop0       squashfs          655.6M   100% /run/archiso/sfs/airootfs
                                        512 mq-deadline        0B
sda                           238.5G
                                        512 mq-deadline      512B
C300-CTFDDAC256MAG
├─sda1                            2M
                                        512 mq-deadline      512B
├─sda2      linux_raid_member   512M
325a2f12-18b8-27f7-2f81-f554a9b0fccc     512 mq-deadline      512B
│ └─md126   vfat              511.9M
EF35-0411                                512                  512B
└─sda3      linux_raid_member    16G
93ed641f-394b-2122-7525-b3311aaac6b8     512 mq-deadline      512B
  └─md125   swap                 16G
9ea84fb7-8bd7-4a0e-91fe-398790643066 1048576                  512B
sdb                           232.9G
                                        512 mq-deadline      512B
Samsung_SSD_850_EVO_250GB
├─sdb1                            2M
                                        512 mq-deadline      512B
├─sdb2      linux_raid_member   512M
325a2f12-18b8-27f7-2f81-f554a9b0fccc     512 mq-deadline      512B
│ └─md126   vfat              511.9M
EF35-0411                                512                  512B
└─sdb3      linux_raid_member    16G
93ed641f-394b-2122-7525-b3311aaac6b8     512 mq-deadline      512B
  └─md125   swap                 16G
9ea84fb7-8bd7-4a0e-91fe-398790643066 1048576                  512B
sdc         btrfs             931.5G
12bcde5c-b3ae-4fa6-8e17-0a4b564f1ba1     512 mq-deadline        0B
WDC_WD1002FAEX-00Z3A0
└─sdc1      bcache            931.5G
f34b26ea-8229-4f3f-bdc5-29c5fe16eaae     512 mq-deadline        0B
  └─bcache0 btrfs             931.5G
76189222-b60d-4402-a7ff-141f057e8574     512                  512B
sdd         btrfs             931.5G
12bcde5c-b3ae-4fa6-8e17-0a4b564f1ba1     512 mq-deadline        0B
WDC_WD1002FAEX-00Z3A0
└─sdd1      bcache            931.5G
beb25260-1b36-473f-93c4-7ef016a62f44     512 mq-deadline        0B
  └─bcache1 btrfs             931.5G
76189222-b60d-4402-a7ff-141f057e8574     512                  512B
sde         btrfs             931.5G
12bcde5c-b3ae-4fa6-8e17-0a4b564f1ba1    4096 mq-deadline        0B
WDC_WD1003FZEX-00MK2A0
└─sde1      bcache            931.5G
21b55c83-c951-4e4f-affc-0b9bf54c8783    4096 mq-deadline        0B
  └─bcache2 btrfs             931.5G
76189222-b60d-4402-a7ff-141f057e8574     512                  512B
sdf         btrfs             931.5G
12bcde5c-b3ae-4fa6-8e17-0a4b564f1ba1     512 mq-deadline        0B
WDC_WD1002FAEX-00Z3A0
└─sdf1      bcache            931.5G
d4d2b9d6-077d-4328-b2cd-14f6db259955     512 mq-deadline        0B
  └─bcache3 btrfs             931.5G
76189222-b60d-4402-a7ff-141f057e8574     512                  512B
sdg         btrfs             931.5G
12bcde5c-b3ae-4fa6-8e17-0a4b564f1ba1     512 mq-deadline        0B
ST1000NM0011
└─sdg1      bcache            931.5G
a8513a01-c6be-4bec-b3f9-a5797225d304     512 mq-deadline        0B
  └─bcache4 btrfs             931.5G
76189222-b60d-4402-a7ff-141f057e8574     512                  512B
sdh                           931.5G
                                        512 mq-deadline        0B
WDC_WD1002FAEX-00Z3A0
└─sdh1      bcache            931.5G
ffeacab7-ff42-453c-b012-58b119236fa5     512 mq-deadline        0B
  └─bcache5 btrfs             931.5G
76189222-b60d-4402-a7ff-141f057e8574     512                  512B
sdi         btrfs             931.5G
12bcde5c-b3ae-4fa6-8e17-0a4b564f1ba1     512 mq-deadline        0B
WDC_WD1002FAEX-00Y9A0
└─sdi1      bcache            931.5G
f3f4d706-7d73-4b48-a5b3-9802fc0de978     512 mq-deadline        0B
  └─bcache6 btrfs             931.5G
76189222-b60d-4402-a7ff-141f057e8574     512                  512B
sdj         btrfs             931.5G
12bcde5c-b3ae-4fa6-8e17-0a4b564f1ba1    4096 mq-deadline        0B
WDC_WD1003FZEX-00MK2A0
└─sdj1      bcache            931.5G
64d10dda-4ac2-44d4-941a-362ccb5ddbba    4096 mq-deadline        0B
  └─bcache7 btrfs             931.5G
76189222-b60d-4402-a7ff-141f057e8574     512                  512B
sdk         btrfs             931.5G
12bcde5c-b3ae-4fa6-8e17-0a4b564f1ba1     512 mq-deadline        0B
WDC_WD1002FAEX-00Y9A0
└─sdk1      bcache            931.5G
c3ddc718-f700-4360-82c9-7db76114e3f6     512 mq-deadline        0B
  └─bcache8 btrfs             931.5G
76189222-b60d-4402-a7ff-141f057e8574     512                  512B
sdl         btrfs             931.5G
12bcde5c-b3ae-4fa6-8e17-0a4b564f1ba1     512 mq-deadline        0B
WDC_WD1002FAEX-00Z3A0
└─sdl1      bcache            931.5G
2bf5ac80-cdf6-4c0c-9434-bcdc4626abff     512 mq-deadline        0B
  └─bcache9 btrfs             931.5G
76189222-b60d-4402-a7ff-141f057e8574     512                  512B
sdm         iso9660            14.9G
2021-05-08-11-22-02-00                   512 mq-deadline        0B
USB_2.0_FD
├─sdm1      iso9660             717M   100% /run/archiso/bootmnt
2021-05-08-11-22-02-00                   512 mq-deadline        0B
└─sdm2      vfat                1.4M
0A52-44A0                                512 mq-deadline        0B
nvme0n1     linux_raid_member  13.4G
4703551c-4570-b6c8-7dda-991b93b99c9a     512 none             512B
INTEL MEMPEK1W016GA
└─md127     bcache             13.4G
dfda7dc0-07a4-40bf-b5b8-e3458c181ce4   16384                  512B
  ├─bcache0 btrfs             931.5G
76189222-b60d-4402-a7ff-141f057e8574     512                  512B
  ├─bcache1 btrfs             931.5G
76189222-b60d-4402-a7ff-141f057e8574     512                  512B
  ├─bcache2 btrfs             931.5G
76189222-b60d-4402-a7ff-141f057e8574     512                  512B
  ├─bcache3 btrfs             931.5G
76189222-b60d-4402-a7ff-141f057e8574     512                  512B
  ├─bcache4 btrfs             931.5G
76189222-b60d-4402-a7ff-141f057e8574     512                  512B
  ├─bcache5 btrfs             931.5G
76189222-b60d-4402-a7ff-141f057e8574     512                  512B
  ├─bcache6 btrfs             931.5G
76189222-b60d-4402-a7ff-141f057e8574     512                  512B
  ├─bcache7 btrfs             931.5G
76189222-b60d-4402-a7ff-141f057e8574     512                  512B
  ├─bcache8 btrfs             931.5G
76189222-b60d-4402-a7ff-141f057e8574     512                  512B
  └─bcache9 btrfs             931.5G
76189222-b60d-4402-a7ff-141f057e8574     512                  512B
nvme1n1     linux_raid_member  13.4G
4703551c-4570-b6c8-7dda-991b93b99c9a     512 none             512B
INTEL MEMPEK1W016GA
└─md127     bcache             13.4G
dfda7dc0-07a4-40bf-b5b8-e3458c181ce4   16384                  512B
  ├─bcache0 btrfs             931.5G
76189222-b60d-4402-a7ff-141f057e8574     512                  512B
  ├─bcache1 btrfs             931.5G
76189222-b60d-4402-a7ff-141f057e8574     512                  512B
  ├─bcache2 btrfs             931.5G
76189222-b60d-4402-a7ff-141f057e8574     512                  512B
  ├─bcache3 btrfs             931.5G
76189222-b60d-4402-a7ff-141f057e8574     512                  512B
  ├─bcache4 btrfs             931.5G
76189222-b60d-4402-a7ff-141f057e8574     512                  512B
  ├─bcache5 btrfs             931.5G
76189222-b60d-4402-a7ff-141f057e8574     512                  512B
  ├─bcache6 btrfs             931.5G
76189222-b60d-4402-a7ff-141f057e8574     512                  512B
  ├─bcache7 btrfs             931.5G
76189222-b60d-4402-a7ff-141f057e8574     512                  512B
  ├─bcache8 btrfs             931.5G
76189222-b60d-4402-a7ff-141f057e8574     512                  512B
  └─bcache9 btrfs             931.5G
76189222-b60d-4402-a7ff-141f057e8574     512                  512B

On Sun, Jun 20, 2021 at 9:09 PM Chris Murphy <lists@colorremedies.com> wrote:
>
> On Sun, Jun 20, 2021 at 2:38 PM Nathan Dehnel <ncdehnel@gmail.com> wrote:
> >
> > A machine failed to boot, so I tried to mount its root partition from
> > systemrescuecd, which failed:
> >
> > [ 5404.240019] BTRFS info (device bcache3): disk space caching is enabled
> > [ 5404.240022] BTRFS info (device bcache3): has skinny extents
> > [ 5404.243195] BTRFS error (device bcache3): parent transid verify
> > failed on 3004631449600 wanted 1420882 found 1420435
> > [ 5404.243279] BTRFS error (device bcache3): parent transid verify
> > failed on 3004631449600 wanted 1420882 found 1420435
> > [ 5404.243362] BTRFS error (device bcache3): parent transid verify
> > failed on 3004631449600 wanted 1420882 found 1420435
> > [ 5404.243432] BTRFS error (device bcache3): parent transid verify
> > failed on 3004631449600 wanted 1420882 found 1420435
> > [ 5404.243435] BTRFS warning (device bcache3): couldn't read tree root
> > [ 5404.244114] BTRFS error (device bcache3): open_ctree failed
> >
> > btrfs rescue super-recover -v /dev/bcache0 returned this:
> >
> > parent transid verify failed on 3004631449600 wanted 1420882 found 1420435
> > parent transid verify failed on 3004631449600 wanted 1420882 found 1420435
> > parent transid verify failed on 3004631449600 wanted 1420882 found 1420435
> > parent transid verify failed on 3004631449600 wanted 1420882 found 1420435
> > parent transid verify failed on 3004631449600 wanted 1420882 found 1420435
> > Ignoring transid failure
> > ERROR: could not setup extent tree
> > Failed to recover bad superblocks
> >
> > uname -a:
> >
> > Linux sysrescue 5.10.34-1-lts #1 SMP Sun, 02 May 2021 12:41:09 +0000
> > x86_64 GNU/Linux
> >
> > btrfs --version:
> >
> > btrfs-progs v5.10.1
> >
> > btrfs fi show:
> >
> > Label: none  uuid: 76189222-b60d-4402-a7ff-141f057e8574
> >         Total devices 10 FS bytes used 1.50TiB
> >         devid    1 size 931.51GiB used 311.03GiB path /dev/bcache3
> >         devid    2 size 931.51GiB used 311.00GiB path /dev/bcache2
> >         devid    3 size 931.51GiB used 311.00GiB path /dev/bcache1
> >         devid    4 size 931.51GiB used 311.00GiB path /dev/bcache0
> >         devid    5 size 931.51GiB used 311.00GiB path /dev/bcache4
> >         devid    6 size 931.51GiB used 311.00GiB path /dev/bcache8
> >         devid    7 size 931.51GiB used 311.00GiB path /dev/bcache6
> >         devid    8 size 931.51GiB used 311.03GiB path /dev/bcache9
> >         devid    9 size 931.51GiB used 311.03GiB path /dev/bcache7
> >         devid   10 size 931.51GiB used 311.03GiB path /dev/bcache5
> >
> > Is this filesystem recoverable?
> >> (Sorry, re-sending because I forgot to add a subject)
>
> Definitely don't write any irreversible changes, such as a repair
> attempt, to anything until you understand what what wrong or it'll
> make recovery harder or impossible.
>
> Was bcache in write back or write through mode?
>
> What's the configuration? Can you supply something like
>
> lsblk -o NAME,FSTYPE,SIZE,FSUSE%,MOUNTPOINT,UUID,MIN-IO,SCHED,DISC-GRAN,MODEL
>
>
>
> --
> Chris Murphy

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Recover from "couldn't read tree root"?
  2021-06-20 21:19 ` Chris Murphy
@ 2021-06-20 21:48   ` Nathan Dehnel
  0 siblings, 0 replies; 8+ messages in thread
From: Nathan Dehnel @ 2021-06-20 21:48 UTC (permalink / raw
  To: Chris Murphy; +Cc: Btrfs BTRFS

>I suggest searching logs since the last time this file system was
working, because the above error is indicating a problem that's
already happened and what we need to know is what happened, if
possible. Something like this:

>journalctl --since=-5d -k -o short-monotonic --no-hostname | grep
"Linux version\| ata\|bcache\|Btrfs\|BTRFS\|] hd\| scsi\| sd\| sdhci\|
mmc\| nvme\| usb\| vd"

Unfortunately I put my journal logs in a different subvolume so they
wouldn't bloat my snapshots so they weren't included in my backups.

>So I'm gonna guess a single shared SSD, which is a single point of failure, and
it's spitting out garbage or zeros.
It's 2 SSDs in mdraid RAID10.

>But I'm not even close to a bcache expert so you might want to ask bcache developers how to figure out
what state bcache is in and whether and how to safely decouple it from
the backing drives so that you can engage in recovery attempts.
They didn't respond the last couple of times I've asked a question on
their irc or mailing list.

On Sun, Jun 20, 2021 at 9:19 PM Chris Murphy <lists@colorremedies.com> wrote:
>
> On Sun, Jun 20, 2021 at 2:38 PM Nathan Dehnel <ncdehnel@gmail.com> wrote:
> >
> > A machine failed to boot, so I tried to mount its root partition from
> > systemrescuecd, which failed:
> >
> > [ 5404.240019] BTRFS info (device bcache3): disk space caching is enabled
> > [ 5404.240022] BTRFS info (device bcache3): has skinny extents
> > [ 5404.243195] BTRFS error (device bcache3): parent transid verify
> > failed on 3004631449600 wanted 1420882 found 1420435
> > [ 5404.243279] BTRFS error (device bcache3): parent transid verify
> > failed on 3004631449600 wanted 1420882 found 1420435
> > [ 5404.243362] BTRFS error (device bcache3): parent transid verify
> > failed on 3004631449600 wanted 1420882 found 1420435
> > [ 5404.243432] BTRFS error (device bcache3): parent transid verify
> > failed on 3004631449600 wanted 1420882 found 1420435
> > [ 5404.243435] BTRFS warning (device bcache3): couldn't read tree root
> > [ 5404.244114] BTRFS error (device bcache3): open_ctree failed
>
> This is generally bad, and means some lower layer did something wrong,
> such as getting write order incorrect, i.e. failing to properly honor
> flush/fua. Recovery can be difficult and take a while.
> https://btrfs.wiki.kernel.org/index.php/Problem_FAQ#parent_transid_verify_failed
>
> I suggest searching logs since the last time this file system was
> working, because the above error is indicating a problem that's
> already happened and what we need to know is what happened, if
> possible. Something like this:
>
> journalctl --since=-5d -k -o short-monotonic --no-hostname | grep
> "Linux version\| ata\|bcache\|Btrfs\|BTRFS\|] hd\| scsi\| sd\| sdhci\|
> mmc\| nvme\| usb\| vd"
>
>
>
> > btrfs rescue super-recover -v /dev/bcache0 returned this:
> >
> > parent transid verify failed on 3004631449600 wanted 1420882 found 1420435
> > parent transid verify failed on 3004631449600 wanted 1420882 found 1420435
> > parent transid verify failed on 3004631449600 wanted 1420882 found 1420435
> > parent transid verify failed on 3004631449600 wanted 1420882 found 1420435
> > parent transid verify failed on 3004631449600 wanted 1420882 found 1420435
> > Ignoring transid failure
> > ERROR: could not setup extent tree
> > Failed to recover bad superblocks
>
> OK something is really wrong if you're not able to see a single
> superblock on any of the bcache devices. Every member device has  3
> super blocks, given the sizes you've provided. For there to not be a
> single one is a spectacular failure as if the bcache cache device
> isn't returning correct information for any of them. So I'm gonna
> guess a single shared SSD, which is a single point of failure, and
> it's spitting out garbage or zeros. But I'm not even close to a bcache
> expert so you might want to ask bcache developers how to figure out
> what state bcache is in and whether and how to safely decouple it from
> the backing drives so that you can engage in recovery attempts.
>
> If bcache mode is write through, there's a chance the backing drives
> have valid btrfs metadata, and it's just that on read the SSD is
> returning bogus information.
>
>
>
>
>
> --
> Chris Murphy

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Recover from "couldn't read tree root"?
  2021-06-20 21:31   ` Nathan Dehnel
@ 2021-06-20 22:19     ` Chris Murphy
  2021-06-20 22:53     ` Chris Murphy
  1 sibling, 0 replies; 8+ messages in thread
From: Chris Murphy @ 2021-06-20 22:19 UTC (permalink / raw
  To: Nathan Dehnel; +Cc: Chris Murphy, Btrfs BTRFS

The two Intel MEMPEK1W016GA's are in raid10, but you aren't really
protected unless the drive reports a discrete read error. Only in that
case would the md driver know to use the mirror copy. While it
certainly should sooner report a read error than return zeros or
garbage, this is the situation we're in with SSDs. Is that what's
happening? *shrug* Needs more investigation. But it's at either the
bcache or mdadm level, near as I can tell.

Was there a crash or power failure while using this array by any chance?

Chris Murphy

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Recover from "couldn't read tree root"?
  2021-06-20 21:31   ` Nathan Dehnel
  2021-06-20 22:19     ` Chris Murphy
@ 2021-06-20 22:53     ` Chris Murphy
  2021-06-22  3:26       ` Nathan Dehnel
  1 sibling, 1 reply; 8+ messages in thread
From: Chris Murphy @ 2021-06-20 22:53 UTC (permalink / raw
  To: Nathan Dehnel; +Cc: Chris Murphy, Btrfs BTRFS

On Sun, Jun 20, 2021 at 3:31 PM Nathan Dehnel <ncdehnel@gmail.com> wrote:
>
> >Was bcache in write back or write through mode?
> Writeback.

Ok that's bad in this configuration because it means all the writes go
to the SSD and could be there for minutes, hours, days, or longer.
That means it's even possible the current supers are only on the SSDs,
as well as other critical btrfs metadata.

My best guess now is to assume one of the drives is bad and spewing
garbage or zeros. And assemble the array degraded with just one SSD
drive, and see if you can mount. If not, then it's the other SSD you
need to assemble degraded. There's a way to set a drive manually as
faulty so it won't assemble; I also thought of using sysfs but on my
own system, /sys/block/nvme0n1/device/delete does not exist like it
does for SATA SSDs.

Next you have to wrestle with this dilemma. If you pick the bad SSD,
you don't want bcache flushing anything from it to your HDDs or it'll
just corrupt them, right? if you pick the good SSD, you actually do
want bcache to flush it all to the drives, so they're in a good state
and you can optionally decouple the SSD entirely so that you're left
with just the individual drives again.

I think you might want to use 'blockdev --setro' on all the block
devices, SSD and HDD, to prevent any changes. You might get some
complaints from bcache if it can't write to HDDs or even to the SSDs,
so that might look like you've picked the bad SSD. But the real test
is if you can mount the btrfs. Try that with 'mount -o
ro,nologreplay,usebackuproot' and if you can at least get that far and
do some basic navigation, that's probably the good SSD. If you still
get mount failure, it's probably the bad one.

If you get a successful ro mount, I'd take advantage of it and backup
anything important. Just get it out now. And then you can try it all
again with everything read write; but with the bad SSD still disabled
and md array assemble degraded with the good SSD; and see if you can
mount read-write again. You need to be read write at the block device
layer to get bcache to flush SSD state to the drives, which I think is
done by setting the mode to writethrough and then waiting until
bcache/state is clean. HDDs need to be writable but btrfs doesn't need
to be mounted for this.

The other possibility is that there some bad data on both SSDs, in
which case it fails and chances are the btrfs is toast.

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Recover from "couldn't read tree root"?
  2021-06-20 22:53     ` Chris Murphy
@ 2021-06-22  3:26       ` Nathan Dehnel
  0 siblings, 0 replies; 8+ messages in thread
From: Nathan Dehnel @ 2021-06-22  3:26 UTC (permalink / raw
  To: Chris Murphy; +Cc: Btrfs BTRFS

I couldn't figure out how to salvage the filesystem, so I wiped it. Oh well.


On Sun, Jun 20, 2021 at 5:53 PM Chris Murphy <lists@colorremedies.com> wrote:
>
> On Sun, Jun 20, 2021 at 3:31 PM Nathan Dehnel <ncdehnel@gmail.com> wrote:
> >
> > >Was bcache in write back or write through mode?
> > Writeback.
>
> Ok that's bad in this configuration because it means all the writes go
> to the SSD and could be there for minutes, hours, days, or longer.
> That means it's even possible the current supers are only on the SSDs,
> as well as other critical btrfs metadata.
>
> My best guess now is to assume one of the drives is bad and spewing
> garbage or zeros. And assemble the array degraded with just one SSD
> drive, and see if you can mount. If not, then it's the other SSD you
> need to assemble degraded. There's a way to set a drive manually as
> faulty so it won't assemble; I also thought of using sysfs but on my
> own system, /sys/block/nvme0n1/device/delete does not exist like it
> does for SATA SSDs.
>
> Next you have to wrestle with this dilemma. If you pick the bad SSD,
> you don't want bcache flushing anything from it to your HDDs or it'll
> just corrupt them, right? if you pick the good SSD, you actually do
> want bcache to flush it all to the drives, so they're in a good state
> and you can optionally decouple the SSD entirely so that you're left
> with just the individual drives again.
>
> I think you might want to use 'blockdev --setro' on all the block
> devices, SSD and HDD, to prevent any changes. You might get some
> complaints from bcache if it can't write to HDDs or even to the SSDs,
> so that might look like you've picked the bad SSD. But the real test
> is if you can mount the btrfs. Try that with 'mount -o
> ro,nologreplay,usebackuproot' and if you can at least get that far and
> do some basic navigation, that's probably the good SSD. If you still
> get mount failure, it's probably the bad one.
>
> If you get a successful ro mount, I'd take advantage of it and backup
> anything important. Just get it out now. And then you can try it all
> again with everything read write; but with the bad SSD still disabled
> and md array assemble degraded with the good SSD; and see if you can
> mount read-write again. You need to be read write at the block device
> layer to get bcache to flush SSD state to the drives, which I think is
> done by setting the mode to writethrough and then waiting until
> bcache/state is clean. HDDs need to be writable but btrfs doesn't need
> to be mounted for this.
>
> The other possibility is that there some bad data on both SSDs, in
> which case it fails and chances are the btrfs is toast.
>
>
> --
> Chris Murphy

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2021-06-22  3:26 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-06-20 20:37 Recover from "couldn't read tree root"? Nathan Dehnel
2021-06-20 21:09 ` Chris Murphy
2021-06-20 21:31   ` Nathan Dehnel
2021-06-20 22:19     ` Chris Murphy
2021-06-20 22:53     ` Chris Murphy
2021-06-22  3:26       ` Nathan Dehnel
2021-06-20 21:19 ` Chris Murphy
2021-06-20 21:48   ` Nathan Dehnel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).