Linux-Fsdevel Archive mirror
 help / color / mirror / Atom feed
From: Zhang Yi <yi.zhang@huaweicloud.com>
To: Dave Chinner <david@fromorbit.com>
Cc: linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-ext4@vger.kernel.org,
	djwong@kernel.org, hch@infradead.org, brauner@kernel.org,
	chandanbabu@kernel.org, jack@suse.cz, yi.zhang@huawei.com,
	chengzhihao1@huawei.com, yukuai3@huawei.com
Subject: Re: [PATCH v3 3/3] xfs: correct the zeroing truncate range
Date: Thu, 23 May 2024 10:00:02 +0800	[thread overview]
Message-ID: <92cec08a-7fe7-1cc7-7a39-7f3d5fbc087b@huaweicloud.com> (raw)
In-Reply-To: <Zk6XqIcO+7+VPn35@dread.disaster.area>

On 2024/5/23 9:11, Dave Chinner wrote:
> On Wed, May 22, 2024 at 09:57:13AM +0800, Zhang Yi wrote:
>> On 2024/5/21 10:38, Dave Chinner wrote:
>>> We can do all this with a single writeback operation if we are a
>>> little bit smarter about the order of operations we perform and we
>>> are a little bit smarter in iomap about zeroing dirty pages in the
>>> page cache:
>>>
>>> 	1. change iomap_zero_range() to do the right thing with
>>> 	dirty unwritten and cow extents (the patch I've been working
>>> 	on).
>>>
>>> 	2. pass the range to be zeroed into iomap_truncate_page()
>>> 	(the fundamental change being made here).
>>>
>>> 	3. zero the required range *through the page cache*
>>> 	(iomap_zero_range() already does this).
>>>
>>> 	4. write back the XFS inode from ip->i_disk_size to the end
>>> 	of the range zeroed by iomap_truncate_page()
>>> 	(xfs_setattr_size() already does this).
>>>
>>> 	5. i_size_write(newsize);
>>>
>>> 	6. invalidate_inode_pages2_range(newsize, -1) to trash all
>>> 	the page cache beyond the new EOF without doing any zeroing
>>> 	as we've already done all the zeroing needed to the page
>>> 	cache through iomap_truncate_page().
>>>
>>>
>>> The patch I'm working on for step 1 is below. It still needs to be
>>> extended to handle the cow case, but I'm unclear on how to exercise
>>> that case so I haven't written the code to do it. The rest of it is
>>> just rearranging the code that we already use just to get the order
>>> of operations right. The only notable change in behaviour is using
>>> invalidate_inode_pages2_range() instead of truncate_pagecache(),
>>> because we don't want the EOF page to be dirtied again once we've
>>> already written zeroes to disk....
>>>
>>
>> Indeed, this sounds like the best solution. Since Darrick recommended
>> that we could fix the stale data exposure on realtime inode issue by
>> convert the tail extent to unwritten, I suppose we could do this after
>> fixing the problem.
> 
> We also need to fix the truncate issue for the upcoming forced
> alignment feature (for atomic writes), and in that case we are
> required to write zeroes to the entire tail extent. i.e. forced
> alignment does not allow partial unwritten extent conversion of
> the EOF extent.
> 

Yes, right. I noticed that feature also needs to fix.

> Hence I think we want to fix the problem by zeroing the entire EOF
> extent first, then optimise the large rtextsize case to use
> unwritten extents if that tail zeroing proves to be a performance
> issue.
> 
> I say "if" because the large rtextsize case will still need to write
> zeroes for the fsb that spans EOF. Adding conversion of the rest of
> the extent to unwritten may well be more expensive (in terms of both
> CPU and IO requirements for the transactional metadata updates) than
> just submitting a slightly larger IO containing real zeroes and
> leaving it as a written extent....
> 

Yeah, if the rtextsize if not large (in most cases), I'm pretty sure
that writing zeros would better. If the rtextsize is large enough, I
think it deserves a performance test.

Thanks,
Yi.


  reply	other threads:[~2024-05-23  2:00 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-17 11:13 [PATCH v3 0/3] iomap/xfs: fix stale data exposure when truncating realtime inodes Zhang Yi
2024-05-17 11:13 ` [PATCH v3 1/3] iomap: pass blocksize to iomap_truncate_page() Zhang Yi
2024-05-17 17:29   ` Darrick J. Wong
2024-05-18  2:01     ` Zhang Yi
2024-05-17 11:13 ` [PATCH v3 2/3] fsdax: pass blocksize to dax_truncate_page() Zhang Yi
2024-05-17 11:13 ` [PATCH v3 3/3] xfs: correct the zeroing truncate range Zhang Yi
2024-05-17 17:59   ` Darrick J. Wong
2024-05-18  6:35     ` Zhang Yi
2024-05-18 19:26       ` Darrick J. Wong
2024-05-20  6:56         ` Zhang Yi
2024-05-20  7:11           ` Zhang Yi
2024-05-20 18:37           ` Darrick J. Wong
2024-05-21 13:45             ` Zhang Yi
2024-05-21  2:38   ` Dave Chinner
2024-05-22  1:57     ` Zhang Yi
2024-05-23  1:11       ` Dave Chinner
2024-05-23  2:00         ` Zhang Yi [this message]
2024-05-22  3:00     ` Darrick J. Wong
2024-05-23  1:14       ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=92cec08a-7fe7-1cc7-7a39-7f3d5fbc087b@huaweicloud.com \
    --to=yi.zhang@huaweicloud.com \
    --cc=brauner@kernel.org \
    --cc=chandanbabu@kernel.org \
    --cc=chengzhihao1@huawei.com \
    --cc=david@fromorbit.com \
    --cc=djwong@kernel.org \
    --cc=hch@infradead.org \
    --cc=jack@suse.cz \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=yi.zhang@huawei.com \
    --cc=yukuai3@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).