From: Andreas Dilger <adilger@dilger.ca>
To: David Sterba <dsterba@suse.cz>
Cc: Qu Wenruo <quwenruo.btrfs@gmx.com>,
Sweet Tea Dorminy <sweettea-kernel@dorminy.me>,
Qu Wenruo <wqu@suse.com>, Jonathan Corbet <corbet@lwn.net>,
Chris Mason <clm@fb.com>, Josef Bacik <josef@toxicpanda.com>,
David Sterba <dsterba@suse.com>,
Alexander Viro <viro@zeniv.linux.org.uk>,
Christian Brauner <brauner@kernel.org>, Jan Kara <jack@suse.cz>,
linux-doc@vger.kernel.org, linux-btrfs@vger.kernel.org,
linux-fsdevel@vger.kernel.org, kernel-team@meta.com
Subject: Re: [PATCH v2 5/5] btrfs: fiemap: return extent physical size
Date: Tue, 9 Apr 2024 13:31:18 -0600 [thread overview]
Message-ID: <1868FA47-1303-4E14-BF35-4F89701A3572@dilger.ca> (raw)
In-Reply-To: <20240409185247.GJ3492@twin.jikos.cz>
[-- Attachment #1: Type: text/plain, Size: 5431 bytes --]
On Apr 9, 2024, at 12:52 PM, David Sterba <dsterba@suse.cz> wrote:
>
> On Wed, Apr 03, 2024 at 05:49:42PM +1030, Qu Wenruo wrote:
>>
>>
>> 在 2024/4/3 16:32, Sweet Tea Dorminy 写道:
>>>>> This means, we will emit a entry that uses the end to the physical
>>>>> extent end.
>>>>>
>>>>> Considering a file layout like this:
>>>>>
>>>>> item 6 key (257 EXTENT_DATA 0) itemoff 15816 itemsize 53
>>>>> generation 7 type 1 (regular)
>>>>> extent data disk byte 13631488 nr 65536
>>>>> extent data offset 0 nr 4096 ram 65536
>>>>> extent compression 0 (none)
>>>>> item 7 key (257 EXTENT_DATA 4096) itemoff 15763 itemsize 53
>>>>> generation 8 type 1 (regular)
>>>>> extent data disk byte 13697024 nr 4096
>>>>> extent data offset 0 nr 4096 ram 4096
>>>>> extent compression 0 (none)
>>>>> item 8 key (257 EXTENT_DATA 8192) itemoff 15710 itemsize 53
>>>>> generation 7 type 1 (regular)
>>>>> extent data disk byte 13631488 nr 65536
>>>>> extent data offset 8192 nr 57344 ram 65536
>>>>> extent compression 0 (none)
>>>>>
>>>>> For fiemap, we would got something like this:
>>>>>
>>>>> fileoff 0, logical len 4k, phy 13631488, phy len 64K
>>>>> fileoff 4k, logical len 4k, phy 13697024, phy len 4k
>>>>> fileoff 8k, logical len 56k, phy 13631488 + 8k, phylen 56k
>>>>>
>>>>> [HOW TO CALCULATE WASTED SPACE IN USER SPACE]
>>>>> My concern is on the first entry. It indicates that we have wasted
>>>>> 60K (phy len is 64K, while logical len is only 4K)
>>>>>
>>>>> But that information is not correct, as in reality we only wasted 4K,
>>>>> the remaining 56K is still referred by file range [8K, 64K).
>>>>>
>>>>> Do you mean that user space program should maintain a mapping of each
>>>>> utilized physical range, and when handling the reported file range
>>>>> [8K, 64K), the user space program should find that the physical range
>>>>> covers with one existing extent, and do calculation correctly?
>>>>
>>>> My goal is to give an unprivileged interface for tools like compsize
>>>> to figure out how much space is used by a particular set of files.
>>>> They report the total disk space referenced by the provided list of
>>>> files, currently by doing a tree search (CAP_SYS_ADMIN) for all the
>>>> extents pertaining to the requested files and deduplicating extents
>>>> based on disk_bytenr.
>>>>
>>>> It seems simplest to me for userspace for the kernel to emit the
>>>> entire extent for each part of it referenced in a file, and let
>>>> userspace deal with deduplicating extents. This is also most similar
>>>> to the existing tree-search based interface. Reporting whole extents
>>>> gives more flexibility for userspace to figure out how to report
>>>> bookend extents, or shared extents, or ...
>>>>
>>>> It does seem a little weird where if you request with fiemap only e.g.
>>>> 4k-16k range in that example file you'll get reported all 68k
>>>> involved, but I can't figure out a way to fix that without having the
>>>> kernel keep track of used parts of the extents as part of reporting,
>>>> which sounds expensive.
>>>>
>>>> You're right that I'm being inconsistent, taking off extent_offset
>>>> from the reported disk size when that isn't what I should be doing, so
>>>> I fixed that in v3.
>>>
>>> Ah, I think I grasp a point I'd missed before.
>>> - Without setting disk_bytenr to the actual start of the data on disk,
>>> there's no way to find the location of the actual data on disk within
>>> the extent from fiemap alone
>>
>> Yes, that's my point.
>>
>>> - But reporting disk_bytenr + offset, to get actual start of data on
>>> disk, means we need to report a physical size to figure out the end of
>>> the extent and we can't know the beginning.
>>
>> disk_bytenr + offset + disk_num_bytes, and with the existing things like
>> length (aka, num_bytes), filepos (aka, key.offset) flags
>> (compression/hole/preallocated etc), we have everything we need to know
>> for regular extents.
>>
>> For compressed extents, we also need ram_bytes.
>>
>> If you ask me, I'd say put all the extra members into fiemap entry if we
>> have the space...
>>
>> It would be u64 * 4 if we go 1:1 on the file extent items, otherwise we
>> may cheap on offset and ram_bytes (u32 is enough for btrfs at least), in
>> that case it would be u64 * 2 + u32 * 2.
>>
>> But I'm also 100% sure, the extra members would not be welcomed by other
>> filesystems either.
>
> That's probably right, too many btrfs-specific information in the
> generic FIEMAP, but we may also do our own enhanced fiemap ioctl that
> would provide all the information you suggest and we'd be free to put
> the compression information there too.
I read this thread when it was first posted, but I don't understand what
these extra fields actually mean? Definitely adding the logical/physical
length makes sense for compressed extents, but I didn't see any clear
explanation of what these other fields actually mean?
I'm extrapolating something like btrfs has aggregated compressed chunks
that have multiple independent/disjoint blocks within a chunk, and you
are trying to get the exact offset within the compression byte stream
for the start of each block in the chunk?
Cheers, Andreas
[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 873 bytes --]
prev parent reply other threads:[~2024-04-09 19:31 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-03-28 1:22 [PATCH v2 0/5] fiemap extension for more physical information Sweet Tea Dorminy
2024-03-28 1:22 ` [PATCH v2 1/5] fs: fiemap: add physical_length field to extents Sweet Tea Dorminy
2024-03-28 1:22 ` [PATCH v2 2/5] fs: fiemap: update fiemap_fill_next_extent() signature Sweet Tea Dorminy
2024-03-28 1:22 ` [PATCH v2 3/5] fs: fiemap: add new COMPRESSED extent state Sweet Tea Dorminy
2024-03-28 1:22 ` [PATCH v2 4/5] btrfs: fiemap: emit new COMPRESSED fiemap state Sweet Tea Dorminy
2024-03-28 1:22 ` [PATCH v2 5/5] btrfs: fiemap: return extent physical size Sweet Tea Dorminy
2024-03-31 9:03 ` Qu Wenruo
2024-04-03 5:32 ` Sweet Tea Dorminy
2024-04-03 5:52 ` Qu Wenruo
2024-04-03 7:18 ` Sweet Tea Dorminy
2024-04-03 6:02 ` Sweet Tea Dorminy
2024-04-03 7:19 ` Qu Wenruo
2024-04-09 18:52 ` David Sterba
2024-04-09 19:31 ` Andreas Dilger [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1868FA47-1303-4E14-BF35-4F89701A3572@dilger.ca \
--to=adilger@dilger.ca \
--cc=brauner@kernel.org \
--cc=clm@fb.com \
--cc=corbet@lwn.net \
--cc=dsterba@suse.com \
--cc=dsterba@suse.cz \
--cc=jack@suse.cz \
--cc=josef@toxicpanda.com \
--cc=kernel-team@meta.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=quwenruo.btrfs@gmx.com \
--cc=sweettea-kernel@dorminy.me \
--cc=viro@zeniv.linux.org.uk \
--cc=wqu@suse.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).