From: Justin Tobler <jltobler@gmail.com>
To: Patrick Steinhardt <ps@pks.im>
Cc: git@vger.kernel.org
Subject: Re: [PATCH 5/9] reftable/block: move ownership of block reader into `struct table_iter`
Date: Tue, 2 Apr 2024 23:52:58 -0500 [thread overview]
Message-ID: <hwfkdfilniy46usnc3vnksaphdxboi5bxep4ek7aj2qxfhu332@6ym7dnn35k7z> (raw)
In-Reply-To: <f10882a0840a77f2569cf891374b70d1e84ceb4b.1711519925.git.ps@pks.im>
On 24/03/27 07:37AM, Patrick Steinhardt wrote:
> The table iterator allows the caller to iterate through all records in a
> reftable table. To do so it iterates through all blocks of the desired
> type one by one, where for each block it creates a new block iterator
> and yields all its entries.
>
> One of the things that is somewhat confusing in this context is who owns
> the block reader that is being used to read the blocks and pass them to
> the block iterator. Intuitively, as the table iterator is responsible
> for iterating through the blocks, one would assume that this iterator is
> also responsible for managing the lifecycle of the reader. And while it
> somewhat is, the block reader is ultimately stored inside of the block
> iterator.
>
> Refactor the code such that the block reader is instead fully managed by
> the table iterator. Instead of passing the reader to the block iterator,
> we now only end up passing the block data to it. Despite clearing up the
> lifecycle of the reader, it will also allow for better reuse of the
> reader in subsequent patches.
>
> The following benchmark prints a single matching ref out of 1 million
> refs. Before:
>
> HEAP SUMMARY:
> in use at exit: 13,603 bytes in 125 blocks
> total heap usage: 6,607 allocs, 6,482 frees, 509,635 bytes allocated
>
> After:
>
> HEAP SUMMARY:
> in use at exit: 13,603 bytes in 125 blocks
> total heap usage: 7,235 allocs, 7,110 frees, 301,481 bytes allocated
>
> Note that while there are more allocation and free calls now, the
> overall number of bytes allocated is significantly lower. The number of
> allocations will be reduced significantly by the next patch though.
>
> Signed-off-by: Patrick Steinhardt <ps@pks.im>
...
> @@ -340,14 +344,14 @@ void block_iter_copy_from(struct block_iter *dest, struct block_iter *src)
> int block_iter_next(struct block_iter *it, struct reftable_record *rec)
> {
> struct string_view in = {
> - .buf = it->br->block.data + it->next_off,
> - .len = it->br->block_len - it->next_off,
> + .buf = (unsigned char *) it->block + it->next_off,
Would it be best to use the `uint8_t *` type instead of `unsigned char *`
to match `string_view.buf`? Not sure if it matters in this case.
> + .len = it->block_len - it->next_off,
> };
...
> diff --git a/reftable/block.h b/reftable/block.h
> index 601a1e0e89..b41efa5042 100644
> --- a/reftable/block.h
> +++ b/reftable/block.h
> @@ -84,16 +84,18 @@ int block_reader_init(struct block_reader *br, struct reftable_block *bl,
> void block_reader_release(struct block_reader *br);
>
> /* Returns the block type (eg. 'r' for refs) */
> -uint8_t block_reader_type(struct block_reader *r);
> +uint8_t block_reader_type(const struct block_reader *r);
>
> /* Decodes the first key in the block */
> -int block_reader_first_key(struct block_reader *br, struct strbuf *key);
> +int block_reader_first_key(const struct block_reader *br, struct strbuf *key);
>
> /* Iterate over entries in a block */
> struct block_iter {
> /* offset within the block of the next entry to read. */
> uint32_t next_off;
> - struct block_reader *br;
> + const unsigned char *block;
Same question here. Would it be better to use `uint8_t *`? Or does it not
really matter?
> + size_t block_len;
> + int hash_size;
>
> /* key for last entry we read. */
> struct strbuf last_key;
> @@ -106,17 +108,22 @@ struct block_iter {
> }
>
> /* Position `it` at start of the block */
> -void block_iter_seek_start(struct block_iter *it, struct block_reader *br);
> +void block_iter_seek_start(struct block_iter *it, const struct block_reader *br);
>
> /* Position `it` to the `want` key in the block */
> -int block_iter_seek_key(struct block_iter *it, struct block_reader *br,
> +int block_iter_seek_key(struct block_iter *it, const struct block_reader *br,
> struct strbuf *want);
>
> -void block_iter_copy_from(struct block_iter *dest, struct block_iter *src);
> +void block_iter_copy_from(struct block_iter *dest, const struct block_iter *src);
>
> /* return < 0 for error, 0 for OK, > 0 for EOF. */
> int block_iter_next(struct block_iter *it, struct reftable_record *rec);
>
> +/*
> + * Reset the block iterator to pristine state without releasing its memory.
> + */
Do we want to make the comment a single line to match the other adjacent
examples?
> +void block_iter_reset(struct block_iter *it);
> +
> /* deallocate memory for `it`. The block reader and its block is left intact. */
> void block_iter_close(struct block_iter *it);
>
...
-Justin
next prev parent reply other threads:[~2024-04-03 4:53 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-03-27 6:36 [PATCH 0/9] reftable: optimize table and block iterators Patrick Steinhardt
2024-03-27 6:36 ` [PATCH 1/9] reftable/block: rename `block_reader_start()` Patrick Steinhardt
2024-03-27 6:37 ` [PATCH 2/9] reftable/block: merge `block_iter_seek()` and `block_reader_seek()` Patrick Steinhardt
2024-03-27 6:37 ` [PATCH 3/9] reftable/block: better grouping of functions Patrick Steinhardt
2024-03-27 6:37 ` [PATCH 4/9] reftable/block: introduce `block_reader_release()` Patrick Steinhardt
2024-04-03 13:16 ` Karthik Nayak
2024-04-08 12:10 ` Patrick Steinhardt
2024-03-27 6:37 ` [PATCH 5/9] reftable/block: move ownership of block reader into `struct table_iter` Patrick Steinhardt
2024-04-03 4:52 ` Justin Tobler [this message]
2024-04-03 13:10 ` Patrick Steinhardt
2024-03-27 6:37 ` [PATCH 6/9] reftable/reader: iterate to next block in place Patrick Steinhardt
2024-03-27 6:37 ` [PATCH 7/9] reftable/block: reuse uncompressed blocks Patrick Steinhardt
2024-03-27 6:37 ` [PATCH 8/9] reftable/block: open-code call to `uncompress2()` Patrick Steinhardt
2024-03-27 6:37 ` [PATCH 9/9] reftable/block: reuse `zstream` state on inflation Patrick Steinhardt
2024-04-03 13:33 ` [PATCH 0/9] reftable: optimize table and block iterators Karthik Nayak
2024-04-08 12:16 ` [PATCH v2 00/10] " Patrick Steinhardt
2024-04-08 12:16 ` [PATCH v2 01/10] reftable/block: rename `block_reader_start()` Patrick Steinhardt
2024-04-08 12:16 ` [PATCH v2 02/10] reftable/block: merge `block_iter_seek()` and `block_reader_seek()` Patrick Steinhardt
2024-04-08 12:16 ` [PATCH v2 03/10] reftable/block: better grouping of functions Patrick Steinhardt
2024-04-08 12:16 ` [PATCH v2 04/10] reftable/block: introduce `block_reader_release()` Patrick Steinhardt
2024-04-08 12:16 ` [PATCH v2 05/10] reftable/block: move ownership of block reader into `struct table_iter` Patrick Steinhardt
2024-04-08 12:16 ` [PATCH v2 06/10] reftable/reader: iterate to next block in place Patrick Steinhardt
2024-04-08 12:16 ` [PATCH v2 07/10] reftable/block: reuse uncompressed blocks Patrick Steinhardt
2024-04-08 12:16 ` [PATCH v2 08/10] reftable/block: open-code call to `uncompress2()` Patrick Steinhardt
2024-04-08 12:17 ` [PATCH v2 09/10] reftable/block: reuse `zstream` state on inflation Patrick Steinhardt
2024-04-10 10:15 ` Karthik Nayak
2024-04-08 12:17 ` [PATCH v2 10/10] reftable/block: avoid copying block iterators on seek Patrick Steinhardt
2024-04-09 1:29 ` Justin Tobler
2024-04-09 3:18 ` Patrick Steinhardt
2024-04-09 1:32 ` [PATCH v2 00/10] reftable: optimize table and block iterators Justin Tobler
2024-04-10 11:35 ` Karthik Nayak
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=hwfkdfilniy46usnc3vnksaphdxboi5bxep4ek7aj2qxfhu332@6ym7dnn35k7z \
--to=jltobler@gmail.com \
--cc=git@vger.kernel.org \
--cc=ps@pks.im \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).