Linux-BTRFS Archive mirror
 help / color / mirror / Atom feed
From: Andreas Dilger <adilger@dilger.ca>
To: David Sterba <dsterba@suse.cz>
Cc: Qu Wenruo <quwenruo.btrfs@gmx.com>,
	Sweet Tea Dorminy <sweettea-kernel@dorminy.me>,
	Qu Wenruo <wqu@suse.com>, Jonathan Corbet <corbet@lwn.net>,
	Chris Mason <clm@fb.com>, Josef Bacik <josef@toxicpanda.com>,
	David Sterba <dsterba@suse.com>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Christian Brauner <brauner@kernel.org>, Jan Kara <jack@suse.cz>,
	linux-doc@vger.kernel.org, linux-btrfs@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, kernel-team@meta.com
Subject: Re: [PATCH v2 5/5] btrfs: fiemap: return extent physical size
Date: Tue, 9 Apr 2024 13:31:18 -0600	[thread overview]
Message-ID: <1868FA47-1303-4E14-BF35-4F89701A3572@dilger.ca> (raw)
In-Reply-To: <20240409185247.GJ3492@twin.jikos.cz>

[-- Attachment #1: Type: text/plain, Size: 5431 bytes --]

On Apr 9, 2024, at 12:52 PM, David Sterba <dsterba@suse.cz> wrote:
> 
> On Wed, Apr 03, 2024 at 05:49:42PM +1030, Qu Wenruo wrote:
>> 
>> 
>> 在 2024/4/3 16:32, Sweet Tea Dorminy 写道:
>>>>> This means, we will emit a entry that uses the end to the physical
>>>>> extent end.
>>>>> 
>>>>> Considering a file layout like this:
>>>>> 
>>>>>      item 6 key (257 EXTENT_DATA 0) itemoff 15816 itemsize 53
>>>>>          generation 7 type 1 (regular)
>>>>>          extent data disk byte 13631488 nr 65536
>>>>>          extent data offset 0 nr 4096 ram 65536
>>>>>          extent compression 0 (none)
>>>>>      item 7 key (257 EXTENT_DATA 4096) itemoff 15763 itemsize 53
>>>>>          generation 8 type 1 (regular)
>>>>>          extent data disk byte 13697024 nr 4096
>>>>>          extent data offset 0 nr 4096 ram 4096
>>>>>          extent compression 0 (none)
>>>>>      item 8 key (257 EXTENT_DATA 8192) itemoff 15710 itemsize 53
>>>>>          generation 7 type 1 (regular)
>>>>>          extent data disk byte 13631488 nr 65536
>>>>>          extent data offset 8192 nr 57344 ram 65536
>>>>>          extent compression 0 (none)
>>>>> 
>>>>> For fiemap, we would got something like this:
>>>>> 
>>>>> fileoff 0, logical len 4k, phy 13631488, phy len 64K
>>>>> fileoff 4k, logical len 4k, phy 13697024, phy len 4k
>>>>> fileoff 8k, logical len 56k, phy 13631488 + 8k, phylen 56k
>>>>> 
>>>>> [HOW TO CALCULATE WASTED SPACE IN USER SPACE]
>>>>> My concern is on the first entry. It indicates that we have wasted
>>>>> 60K (phy len is 64K, while logical len is only 4K)
>>>>> 
>>>>> But that information is not correct, as in reality we only wasted 4K,
>>>>> the remaining 56K is still referred by file range [8K, 64K).
>>>>> 
>>>>> Do you mean that user space program should maintain a mapping of each
>>>>> utilized physical range, and when handling the reported file range
>>>>> [8K, 64K), the user space program should find that the physical range
>>>>> covers with one existing extent, and do calculation correctly?
>>>> 
>>>> My goal is to give an unprivileged interface for tools like compsize
>>>> to figure out how much space is used by a particular set of files.
>>>> They report the total disk space referenced by the provided list of
>>>> files, currently by doing a tree search (CAP_SYS_ADMIN) for all the
>>>> extents pertaining to the requested files and deduplicating extents
>>>> based on disk_bytenr.
>>>> 
>>>> It seems simplest to me for userspace for the kernel to emit the
>>>> entire extent for each part of it referenced in a file, and let
>>>> userspace deal with deduplicating extents. This is also most similar
>>>> to the existing tree-search based interface. Reporting whole extents
>>>> gives more flexibility for userspace to figure out how to report
>>>> bookend extents, or shared extents, or ...
>>>> 
>>>> It does seem a little weird where if you request with fiemap only e.g.
>>>> 4k-16k range in that example file you'll get reported all 68k
>>>> involved, but I can't figure out a way to fix that without having the
>>>> kernel keep track of used parts of the extents as part of reporting,
>>>> which sounds expensive.
>>>> 
>>>> You're right that I'm being inconsistent, taking off extent_offset
>>>> from the reported disk size when that isn't what I should be doing, so
>>>> I fixed that in v3.
>>> 
>>> Ah, I think I grasp a point I'd missed before.
>>> - Without setting disk_bytenr to the actual start of the data on disk,
>>> there's no way to find the location of the actual data on disk within
>>> the extent from fiemap alone
>> 
>> Yes, that's my point.
>> 
>>> - But reporting disk_bytenr + offset, to get actual start of data on
>>> disk, means we need to report a physical size to figure out the end of
>>> the extent and we can't know the beginning.
>> 
>> disk_bytenr + offset + disk_num_bytes, and with the existing things like
>> length (aka, num_bytes), filepos (aka, key.offset) flags
>> (compression/hole/preallocated etc), we have everything we need to know
>> for regular extents.
>> 
>> For compressed extents, we also need ram_bytes.
>> 
>> If you ask me, I'd say put all the extra members into fiemap entry if we
>> have the space...
>> 
>> It would be u64 * 4 if we go 1:1 on the file extent items, otherwise we
>> may cheap on offset and ram_bytes (u32 is enough for btrfs at least), in
>> that case it would be u64 * 2 + u32 * 2.
>> 
>> But I'm also 100% sure, the extra members would not be welcomed by other
>> filesystems either.
> 
> That's probably right, too many btrfs-specific information in the
> generic FIEMAP, but we may also do our own enhanced fiemap ioctl that
> would provide all the information you suggest and we'd be free to put
> the compression information there too.

I read this thread when it was first posted, but I don't understand what
these extra fields actually mean?  Definitely adding the logical/physical
length makes sense for compressed extents, but I didn't see any clear
explanation of what these other fields actually mean?

I'm extrapolating something like btrfs has aggregated compressed chunks
that have multiple independent/disjoint blocks within a chunk, and you
are trying to get the exact offset within the compression byte stream
for the start of each block in the chunk?

Cheers, Andreas






[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 873 bytes --]

      reply	other threads:[~2024-04-09 19:31 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-28  1:22 [PATCH v2 0/5] fiemap extension for more physical information Sweet Tea Dorminy
2024-03-28  1:22 ` [PATCH v2 1/5] fs: fiemap: add physical_length field to extents Sweet Tea Dorminy
2024-03-28  1:22 ` [PATCH v2 2/5] fs: fiemap: update fiemap_fill_next_extent() signature Sweet Tea Dorminy
2024-03-28  1:22 ` [PATCH v2 3/5] fs: fiemap: add new COMPRESSED extent state Sweet Tea Dorminy
2024-03-28  1:22 ` [PATCH v2 4/5] btrfs: fiemap: emit new COMPRESSED fiemap state Sweet Tea Dorminy
2024-03-28  1:22 ` [PATCH v2 5/5] btrfs: fiemap: return extent physical size Sweet Tea Dorminy
2024-03-31  9:03   ` Qu Wenruo
2024-04-03  5:32     ` Sweet Tea Dorminy
2024-04-03  5:52       ` Qu Wenruo
2024-04-03  7:18         ` Sweet Tea Dorminy
2024-04-03  6:02       ` Sweet Tea Dorminy
2024-04-03  7:19         ` Qu Wenruo
2024-04-09 18:52           ` David Sterba
2024-04-09 19:31             ` Andreas Dilger [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1868FA47-1303-4E14-BF35-4F89701A3572@dilger.ca \
    --to=adilger@dilger.ca \
    --cc=brauner@kernel.org \
    --cc=clm@fb.com \
    --cc=corbet@lwn.net \
    --cc=dsterba@suse.com \
    --cc=dsterba@suse.cz \
    --cc=jack@suse.cz \
    --cc=josef@toxicpanda.com \
    --cc=kernel-team@meta.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=quwenruo.btrfs@gmx.com \
    --cc=sweettea-kernel@dorminy.me \
    --cc=viro@zeniv.linux.org.uk \
    --cc=wqu@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).