Linux-XFS Archive mirror
 help / color / mirror / Atom feed
From: Wengang Wang <wen.gang.wang@oracle.com>
To: Dave Chinner <david@fromorbit.com>
Cc: "linux-xfs@vger.kernel.org" <linux-xfs@vger.kernel.org>
Subject: Re: [PATCH] xfs: allow changing extsize on file
Date: Mon, 13 May 2024 17:17:45 +0000	[thread overview]
Message-ID: <E0BFF9B4-C167-4534-A3F2-2C8BE8B9BCB5@oracle.com> (raw)
In-Reply-To: <Zj7QRSyiXTJ6Kbli@dread.disaster.area>

Thanks Dave for replying. I will think about it.

Wengang

> On May 10, 2024, at 6:56 PM, Dave Chinner <david@fromorbit.com> wrote:
> 
> On Wed, May 08, 2024 at 10:03:43AM -0700, Wengang Wang wrote:
>> Hi Dave, this is more a question than a patch.
>> 
>> We are current disallowing the change of extsize on files/dirs if the file/dir
>> have blocks allocated. That's not that friendly to users. Say somehow the
>> extsize was set very huge (1GiB), in the following cases, it's not that
> 
> The first problem is ensuring that "say somehow extsize was set very
> huge" doesn't happen in the first place. Then all the other problems
> just don't happen.
> 
>> convenient:
>> case 1: the file now extends very little. -- 1GiB extsize leads a waste of
>>        almost 1GiB.
>> case 2: when CoW happens, 1GiB is preallocated. 1GiB is now too big for the
>>        IO pattern, so the huge preallocting and then reclaiming is not necessary
>>        and that cost extra time especially when the system if fragmented.
>> 
>> In above cases, changing extsize smaller is needed.
>> 
>> In theory, the exthint is a hint for future allocation,
> 
> It's not that simple because future allocation is influenced by past
> allocation. e.g. What happens if the new extent size hint is not
> aligned with the old one and we now have two different extent
> alignments in the file?
> 
> What happens if an admin sees this when trying to triage some
> other problem and doesn't know that the extent size hint has been
> changed? They'll think there is a bug in the filesystem allocator
> and report it.
> 
> What do we do with that report now? Do we waste hours trying to
> reproduce it and fail, maybe never learning that the an extent
> size hint change caused the issue? i.e. how do we determine that the
> issue is a real allocation alignment bug versus it simply being a
> result of "application did something whacky with extent size hints"?
> 
> Hence allowing extent size hints to change dynamically basically
> makes it impossible to trust that the current extent size hint
> defines the alignment for all the extents in the file. And at that
> point, we completely lose the ability to triage allocation alignment
> issues without an exact reproducer from the reporter...
> 
> Now, just disabling extent size hints avoids this issue (i.e. allow
> return to zero if extents already exist) because there's now no
> alignment restriction at all and nobody is going to care. However,
> this creates new issues.
> 
> e.g it opens up the possibility that applications will scan existing
> files for extent size hints set on them and be able to -override the
> admin set alignment hints- used to create the data set.
> 
> The admin may have set inheritable extent size hints to ensure
> allocation alignment to underlying storage because the applications
> don't know about optimal storage alignments (e.g. for PMD alignment
> on DAX storage). We don't want applications to be able to disable
> these hints because the precise reason they are set is to optimise
> storage alignment for better application performance....
> 
> IOWs, there are good reasons for not allowing extent size hints to
> be overrridden by applications just by clearing/changing the inode
> extent size field...
> 
>> I can't connect it
>> to the blocks which are already allocated to the file/dir.
>> So the only reason why we disallow that is that there might be some problems if
>> we allow it.  Well, can we fix the real problem(s) rather than disallowing
>> extsize changing?
> 
> The only reliable way to change extent size hints so allocation
> alignment always matches the new extent size hint is to physically
> realign the data in the file to the new extent size hint. i.e. do it
> through xfs_fsr to "defrag" the file according to the new extent
> size hint. Then when we swap the old and new data extents, we also
> set the new extent size hint that matches the new data extents.
> 
> This extent size hint change is then enabled through a completely
> different interface which is not one applications will use in
> general operation. Hence it becomes an explicit admin operation,
> enabling users to rectify the rare problems you document above
> without compromising the existing behaviour of extent size hints for
> everyone else.
> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com


  reply	other threads:[~2024-05-13 17:17 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-08 17:03 [PATCH] xfs: allow changing extsize on file Wengang Wang
2024-05-11  1:56 ` Dave Chinner
2024-05-13 17:17   ` Wengang Wang [this message]
2024-05-20  4:58 ` kernel test robot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=E0BFF9B4-C167-4534-A3F2-2C8BE8B9BCB5@oracle.com \
    --to=wen.gang.wang@oracle.com \
    --cc=david@fromorbit.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).