From: "Pankaj Raghav (Samsung)" <kernel@pankajraghav.com>
To: Matthew Wilcox <willy@infradead.org>,
david@fromorbit.com, djwong@kernel.org, hch@lst.de
Cc: Keith Busch <kbusch@kernel.org>,
mcgrof@kernel.org, akpm@linux-foundation.org, brauner@kernel.org,
chandan.babu@oracle.com, gost.dev@samsung.com, hare@suse.de,
john.g.garry@oracle.com, linux-block@vger.kernel.org,
linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
linux-xfs@vger.kernel.org, p.raghav@samsung.com,
ritesh.list@gmail.com, ziy@nvidia.com
Subject: Re: [RFC] iomap: use huge zero folio in iomap_dio_zero
Date: Wed, 15 May 2024 15:59:43 +0000 [thread overview]
Message-ID: <20240515155943.2uaa23nvddmgtkul@quentin> (raw)
In-Reply-To: <ZkQ0Pj26H81HxQ_4@casper.infradead.org>
> so unless submit_bio() can handle the fallback to "create a new bio
> full of zeroes and resubmit it to the device" if the original fails,
> we're a little mismatched. I'm not really familiar with either part of
> this code, so I don't have much in the way of bright ideas. Perhaps
> we go back to the "allocate a large folio at filesystem mount" plan.
So one thing that became clear after yesterday's discussion was to
**not** use a PMD page for sub block zeroing as in some architectures
we will be using a lot of memory (such as ARM) to zero out a 64k FS block.
So Chinner proposed the idea of using iomap_init function to alloc
large zero folio that could be used in iomap_dio_zero().
The general agreement was 64k large folio is enough for now. We could
always increase it and optimize it in the future when required.
This is a rough prototype of what it might look like:
diff --git a/fs/internal.h b/fs/internal.h
index 7ca738904e34..dad5734b2f75 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -35,6 +35,12 @@ static inline void bdev_cache_init(void)
int __block_write_begin_int(struct folio *folio, loff_t pos, unsigned len,
get_block_t *get_block, const struct iomap *iomap);
+/*
+ * iomap/buffered-io.c
+ */
+
+extern struct folio *zero_fsb_folio;
+
/*
* char_dev.c
*/
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 4e8e41c8b3c0..48235765df7a 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -42,6 +42,7 @@ struct iomap_folio_state {
};
static struct bio_set iomap_ioend_bioset;
+struct folio *zero_fsb_folio;
static inline bool ifs_is_fully_uptodate(struct folio *folio,
struct iomap_folio_state *ifs)
@@ -1985,8 +1986,15 @@ iomap_writepages(struct address_space *mapping, struct writeback_control *wbc,
}
EXPORT_SYMBOL_GPL(iomap_writepages);
+
static int __init iomap_init(void)
{
+ void *addr = kzalloc(16 * PAGE_SIZE, GFP_KERNEL);
+
+ if (!addr)
+ return -ENOMEM;
+
+ zero_fsb_folio = virt_to_folio(addr);
return bioset_init(&iomap_ioend_bioset, 4 * (PAGE_SIZE / SECTOR_SIZE),
offsetof(struct iomap_ioend, io_bio),
BIOSET_NEED_BVECS);
diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
index f3b43d223a46..59a65c3ccf13 100644
--- a/fs/iomap/direct-io.c
+++ b/fs/iomap/direct-io.c
@@ -236,17 +236,23 @@ static void iomap_dio_zero(const struct iomap_iter *iter, struct iomap_dio *dio,
loff_t pos, unsigned len)
{
struct inode *inode = file_inode(dio->iocb->ki_filp);
- struct page *page = ZERO_PAGE(0);
struct bio *bio;
- bio = iomap_dio_alloc_bio(iter, dio, 1, REQ_OP_WRITE | REQ_SYNC | REQ_IDLE);
+ /*
+ * The zero folio used is 64k.
+ */
+ WARN_ON_ONCE(len > (16 * PAGE_SIZE));
+
+ bio = iomap_dio_alloc_bio(iter, dio, BIO_MAX_VECS,
+ REQ_OP_WRITE | REQ_SYNC | REQ_IDLE);
fscrypt_set_bio_crypt_ctx(bio, inode, pos >> inode->i_blkbits,
GFP_KERNEL);
+
bio->bi_iter.bi_sector = iomap_sector(&iter->iomap, pos);
bio->bi_private = dio;
bio->bi_end_io = iomap_dio_bio_end_io;
- __bio_add_page(bio, page, len, 0);
+ bio_add_folio_nofail(bio, zero_fsb_folio, len, 0);
iomap_dio_submit_bio(iter, dio, bio, pos);
}
next prev parent reply other threads:[~2024-05-15 16:00 UTC|newest]
Thread overview: 53+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-05-03 9:53 [PATCH v5 00/11] enable bs > ps in XFS Luis Chamberlain
2024-05-03 9:53 ` [PATCH v5 01/11] readahead: rework loop in page_cache_ra_unbounded() Luis Chamberlain
2024-05-03 9:53 ` [PATCH v5 02/11] fs: Allow fine-grained control of folio sizes Luis Chamberlain
2024-05-03 9:53 ` [PATCH v5 03/11] filemap: allocate mapping_min_order folios in the page cache Luis Chamberlain
2024-05-03 9:53 ` [PATCH v5 04/11] readahead: allocate folios with mapping_min_order in readahead Luis Chamberlain
2024-05-03 14:32 ` Hannes Reinecke
2024-05-03 9:53 ` [PATCH v5 05/11] mm: split a folio in minimum folio order chunks Luis Chamberlain
2024-05-03 14:53 ` Zi Yan
2024-05-15 15:32 ` Matthew Wilcox
2024-05-16 14:56 ` Pankaj Raghav (Samsung)
2024-05-03 9:53 ` [PATCH v5 06/11] filemap: cap PTE range to be created to allowed zero fill in folio_map_range() Luis Chamberlain
2024-05-03 9:53 ` [PATCH v5 07/11] iomap: fix iomap_dio_zero() for fs bs > system page size Luis Chamberlain
2024-05-07 14:58 ` [RFC] iomap: use huge zero folio in iomap_dio_zero Pankaj Raghav (Samsung)
2024-05-07 15:11 ` Zi Yan
2024-05-07 16:11 ` Christoph Hellwig
2024-05-08 11:39 ` Pankaj Raghav (Samsung)
2024-05-08 11:43 ` Christoph Hellwig
2024-05-09 12:31 ` Pankaj Raghav (Samsung)
2024-05-09 12:46 ` Christoph Hellwig
2024-05-09 12:55 ` Pankaj Raghav (Samsung)
2024-05-09 12:58 ` Christoph Hellwig
2024-05-09 14:32 ` Darrick J. Wong
2024-05-09 15:05 ` Christoph Hellwig
2024-05-09 15:08 ` Darrick J. Wong
2024-05-09 15:09 ` Christoph Hellwig
2024-05-15 0:50 ` Matthew Wilcox
2024-05-15 2:34 ` Keith Busch
2024-05-15 4:04 ` Matthew Wilcox
2024-05-15 15:59 ` Pankaj Raghav (Samsung) [this message]
2024-05-15 18:03 ` Matthew Wilcox
2024-05-16 15:02 ` Pankaj Raghav (Samsung)
2024-05-17 12:36 ` Hannes Reinecke
2024-05-17 12:56 ` Hannes Reinecke
2024-05-17 13:30 ` Matthew Wilcox
2024-05-15 11:48 ` Christoph Hellwig
2024-05-07 16:00 ` [PATCH v5 07/11] iomap: fix iomap_dio_zero() for fs bs > system page size Matthew Wilcox
2024-05-07 16:10 ` Christoph Hellwig
2024-05-07 16:11 ` Matthew Wilcox
2024-05-07 16:13 ` Christoph Hellwig
2024-05-08 4:24 ` Matthew Wilcox
2024-05-08 11:22 ` Pankaj Raghav (Samsung)
2024-05-08 11:36 ` Christoph Hellwig
2024-05-08 11:20 ` Pankaj Raghav (Samsung)
2024-05-03 9:53 ` [PATCH v5 08/11] xfs: use kvmalloc for xattr buffers Luis Chamberlain
2024-05-03 9:53 ` [PATCH v5 09/11] xfs: expose block size in stat Luis Chamberlain
2024-05-03 9:53 ` [PATCH v5 10/11] xfs: make the calculation generic in xfs_sb_validate_fsb_count() Luis Chamberlain
2024-05-07 8:40 ` John Garry
2024-05-07 21:13 ` Darrick J. Wong
2024-05-08 11:28 ` Pankaj Raghav (Samsung)
2024-05-03 9:53 ` [PATCH v5 11/11] xfs: enable block size larger than page size support Luis Chamberlain
2024-05-07 0:05 ` Dave Chinner
-- strict thread matches above, loose matches on Subject: below --
2024-05-07 18:38 [RFC] iomap: use huge zero folio in iomap_dio_zero Ritesh Harjani
2024-05-08 11:42 ` Christoph Hellwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240515155943.2uaa23nvddmgtkul@quentin \
--to=kernel@pankajraghav.com \
--cc=akpm@linux-foundation.org \
--cc=brauner@kernel.org \
--cc=chandan.babu@oracle.com \
--cc=david@fromorbit.com \
--cc=djwong@kernel.org \
--cc=gost.dev@samsung.com \
--cc=hare@suse.de \
--cc=hch@lst.de \
--cc=john.g.garry@oracle.com \
--cc=kbusch@kernel.org \
--cc=linux-block@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-xfs@vger.kernel.org \
--cc=mcgrof@kernel.org \
--cc=p.raghav@samsung.com \
--cc=ritesh.list@gmail.com \
--cc=willy@infradead.org \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).