Re: [RFC PATCH] btrfs-progs: mkfs: Enforce 4k sectorsize by default

asahi.lists.linux.dev archive mirror
 help / color / mirror / Atom feed

From: Hector Martin <marcan@marcan.st>
To: Qu Wenruo <quwenruo.btrfs@gmx.com>, Neal Gompa <neal@gompa.dev>,
	linux-btrfs@vger.kernel.org
Cc: David Sterba <dsterba@suse.cz>,
	Davide Cavalca <davide@cavalca.name>,
	Sven Peter <sven@svenpeter.dev>, "axboe@fb.com" <axboe@fb.com>,
	Asahi Linux <asahi@lists.linux.dev>
Subject: Re: [RFC PATCH] btrfs-progs: mkfs: Enforce 4k sectorsize by default
Date: Tue, 21 Mar 2023 19:07:29 +0900	[thread overview]
Message-ID: <8a52f55e-9b91-c6da-f2f6-eb8ccb87093d@marcan.st> (raw)
In-Reply-To: <b4329200-650e-f46e-505a-e5248f187be6@gmx.com>

On 21/03/2023 12.21, Qu Wenruo wrote:
> 
> 
> On 2023/3/21 10:03, Neal Gompa wrote:
>> We have had working subpage support in Btrfs for many cycles now.
>> Generally, we do not want people creating filesystems by default
>> with non-4k sectorsizes since it creates portability problems.
> 
> My biggest concern for now is, I haven't yet done any subpage testing 
> for a while.
> 
> The bottle neck is the lack of computing power.
> 
> I only have one decent (RK3588 based SBC, Rock5B) board available for 
> testing, but it's taken by my daily fstests runs, and it's still using 
> 4K page size for the aarc64 VM.
> 
> Although I have an backup SBC (the same Rock5B), it's reserved mostly 
> for upstreaming and testing for the RK3588 SoC.
> 
> Personally speaking, this change would bring a lot of more testing 
> feedback from Asahi guys, thus would be pretty awesome.

Note that we already have a bunch of people running Fedora Asahi, and as
far as I know those images are created on 4K systems and then used on
16K systems, so this should be already getting real-world testing (and
will get a lot more once we get to official announcement/release)
regardless of the default.

IOW, this change is mostly about people creating secondary btrfs
filesystems on Asahi directly, which is relatively niche in comparison.
So if you have a subpage bug it's going to hit Asahi users whether this
change happens or not :)

> 
> But on the other hand, I really don't want any big bug screwing up early 
> adopters, and further damage the reputation of btrfs.
> 
> 
> Would the Asahi guys gives us some early test results?
> (AKA, full fstests runs with 16K page size and 4K sectorsize).

Gave it a shot. Tested with options:

export TEST_DEV=/dev/nvme0n1p6
export TEST_DIR=/mnt/test
#export SCRATCH_DEV=/dev/nvme0n1p7
export SCRATCH_MNT=/mnt/scratch
export SCRATCH_DEV_POOL="/dev/nvme0n1p7 /dev/nvme0n1p8 /dev/nvme0n1p9
/dev/nvme0n1p10 /dev/nvme0n1p11"
export MKFS_OPTIONS="-s 4096"
export FSTYP=btrfs

btrfs/012 is broken, the converted FS fails to mount with:

[  784.588172] BTRFS warning (device nvme0n1p7): v1 space cache is not
supported for page size 16384 with sectorsize 4096
[  784.588199] BTRFS error (device nvme0n1p7): open_ctree failed

btrfs/131 and 136 have the same issue.

btrfs/106 has a size mismatch in the output, but I get the feeling
that's just a bad test that assumes 4K somewhere?

btrfs/140 is the first one that looks bad. Looks like the corruption
correction didn't work for some reason.

... and then btrfs/142 which is similar actually managed to log a bunch
of errors on our NVMe controller. Nice.

[ 1240.000104] nvme-apple 393cc0000.nvme: RTKit: syslog message:
host_ans2.c:1564: cmd parsing error for tag 12 fast decode err 0x1
[ 1240.000767] nvme0n1: Read(0x2) @ LBA 322753843, 0 blocks, Invalid
Field in Command (sct 0x0 / sc 0x2)
[ 1240.000771] nvme-apple 393cc0000.nvme: RTKit: syslog message:
host_ans2.c:1469: tag 12, completed with err BAD_CMD-
[ 1240.000775] operation not supported error, dev nvme0n1, sector
2582030751 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 2

Looks like it tried to read 0 blocks? I'm pretty sure that's not a valid
block device operation... and then that test hangs because the
_btrfs_direct_read_on_mirror() common function is completely broken, as
it infinite loops if the exec'd command fails (which it does here, with
ENOENT). Cc sven/axboe/asahi: I'm pretty sure we should catch
upper-layer badness like this before sending it to the controller.

Excluding that one and moving on, 143 is broken the same way, as are
btrfs/265, 266, 267, 269.

213 failed to balance with ENOSPC.

btrfs/246 has an output discrepancy, I don't know what's up with that.

generic/251 then failed too, with dmesg logs about failing to trim block
groups.

generic/520 failed with an EBUSY on umount, but I suspect this might be
some desktop/systemd stuff trying to use the mount?

generic/563 suggests there may be cgroup accounting issues

generic/619 seems known to be pathologically slow on arm64, so I killed
it (https://www.spinics.net/lists/linux-btrfs/msg131553.html).

generic/708 failed but that pointed to a known kernel bug that we don't
have the fix for yet (this is running on 6.2 + asahi patches).

Run output and some select dmesg sections:
https://gist.github.com/marcan/822a34266bcaf4f4cffaa6a198b4c616

Let me know if you need anything else.

> 
> If nothing wrong happened, I am very happy to this patch.
> 
> Thanks,
> Qu
>>
>> Signed-off-by: Neal Gompa <neal@gompa.dev>
>> ---
>>   Documentation/Subpage.rst    |  9 +++++----
>>   Documentation/mkfs.btrfs.rst | 11 +++++++----
>>   mkfs/main.c                  |  2 +-
>>   3 files changed, 13 insertions(+), 9 deletions(-)
>>
>> diff --git a/Documentation/Subpage.rst b/Documentation/Subpage.rst
>> index 21a495d5..d7e9b241 100644
>> --- a/Documentation/Subpage.rst
>> +++ b/Documentation/Subpage.rst
>> @@ -9,10 +9,11 @@ to the exactly same size of the block and page. On x86_64 this is typically
>>   pages, like 64KiB on 64bit ARM or PowerPC. This means filesystems created
>>   with 64KiB sector size cannot be mounted on a system with 4KiB page size.
>>   
>> -While with subpage support, systems with 64KiB page size can create (still needs
>> -"-s 4k" option for mkfs.btrfs) and mount filesystems with 4KiB sectorsize,
>> -allowing us to push 4KiB sectorsize as default sectorsize for all platforms in the
>> -near future.
>> +Since v6.3, filesystems are created with a 4KiB sectorsize by default,
>> +though it remains possible to create filesystems with other page sizes
>> +(such as 64KiB with the "-s 64k" option for mkfs.btrfs). This ensures that
>> +new filesystems are compatible across other architecture variants using
>> +larger page sizes.
>>   
>>   Requirements, limitations
>>   -------------------------
>> diff --git a/Documentation/mkfs.btrfs.rst b/Documentation/mkfs.btrfs.rst
>> index ba7227b3..af0b9c03 100644
>> --- a/Documentation/mkfs.btrfs.rst
>> +++ b/Documentation/mkfs.btrfs.rst
>> @@ -116,10 +116,13 @@ OPTIONS
>>   -s|--sectorsize <size>
>>           Specify the sectorsize, the minimum data block allocation unit.
>>   
>> -        The default value is the page size and is autodetected. If the sectorsize
>> -        differs from the page size, the created filesystem may not be mountable by the
>> -        running kernel. Therefore it is not recommended to use this option unless you
>> -        are going to mount it on a system with the appropriate page size.
>> +        The default value is 4KiB (4096). If larger page sizes (such as 64KiB [16384])
>> +        are used, the created filesystem will only mount on systems with a running kernel
>> +        running on a matching page size. Therefore it is not recommended to use this option
>> +        unless you are going to mount it on a system with the appropriate page size.
>> +
>> +        .. note::
>> +                Versions up to 6.3 set the sectorsize matching to the page size.
>>   
>>   -L|--label <string>
>>           Specify a label for the filesystem. The *string* should be less than 256
>> diff --git a/mkfs/main.c b/mkfs/main.c
>> index f5e34cbd..5e1834d7 100644
>> --- a/mkfs/main.c
>> +++ b/mkfs/main.c
>> @@ -1207,7 +1207,7 @@ int BOX_MAIN(mkfs)(int argc, char **argv)
>>   	}
>>   
>>   	if (!sectorsize)
>> -		sectorsize = (u32)sysconf(_SC_PAGESIZE);
>> +		sectorsize = (u32)SZ_4K;
>>   	if (btrfs_check_sectorsize(sectorsize))
>>   		goto error;
>>   
> 


- Hector

next      parent reply	other threads:[~2023-03-21 10:07 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20230321020320.2555362-1-neal@gompa.dev>
     [not found] ` <b4329200-650e-f46e-505a-e5248f187be6@gmx.com>
2023-03-21 10:07   ` Hector Martin [this message]
2023-03-21 11:02     ` [RFC PATCH] btrfs-progs: mkfs: Enforce 4k sectorsize by default Qu Wenruo
2023-03-22  5:19     ` Qu Wenruo
2023-03-22 14:07     ` Su Yue
2023-03-25 14:05 Hector Martin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8a52f55e-9b91-c6da-f2f6-eb8ccb87093d@marcan.st \
    --to=marcan@marcan.st \
    --cc=asahi@lists.linux.dev \
    --cc=axboe@fb.com \
    --cc=davide@cavalca.name \
    --cc=dsterba@suse.cz \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=neal@gompa.dev \
    --cc=quwenruo.btrfs@gmx.com \
    --cc=sven@svenpeter.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).