Git Mailing List Archive mirror
 help / color / mirror / Atom feed
From: Justin Tobler <jltobler@gmail.com>
To: Patrick Steinhardt <ps@pks.im>
Cc: git@vger.kernel.org
Subject: Re: [PATCH v2 7/7] reftable/block: avoid decoding keys when searching restart points
Date: Tue, 2 Apr 2024 11:47:16 -0500	[thread overview]
Message-ID: <eiyd2nmwxjaetkux4prwm6adcx7z77ry3wc62art6gnfklvgmw@hox32vwuu5sj> (raw)
In-Reply-To: <e751b3c536ace78f975b7d2553c22dbf6845a8d4.1711361340.git.ps@pks.im>

On 24/03/25 11:11AM, Patrick Steinhardt wrote:
> When searching over restart points in a block we decode the key of each
> of the records, which results in a memory allocation. This is quite
> pointless though given that records it restart points will never use
> prefix compression and thus store their keys verbatim in the block.
> 
> Refactor the code so that we can avoid decoding the keys, which saves us
> some allocations.

Out of curiousity, do you have any benchmarks around this change and
would that be something we would want to add to the commit message?

-Justin

> 
> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
>  reftable/block.c | 29 +++++++++++++++++++----------
>  1 file changed, 19 insertions(+), 10 deletions(-)
> 
> diff --git a/reftable/block.c b/reftable/block.c
> index ca80a05e21..8bb4e43cec 100644
> --- a/reftable/block.c
> +++ b/reftable/block.c
> @@ -287,23 +287,32 @@ static int restart_needle_less(size_t idx, void *_args)
>  		.buf = args->reader->block.data + off,
>  		.len = args->reader->block_len - off,
>  	};
> -	struct strbuf kth_restart_key = STRBUF_INIT;
> -	uint8_t unused_extra;
> -	int result, n;
> +	uint64_t prefix_len, suffix_len;
> +	uint8_t extra;
> +	int n;
>  
>  	/*
> -	 * TODO: The restart key is verbatim in the block, so we can in theory
> -	 * avoid decoding the key and thus save some allocations.
> +	 * Records at restart points are stored without prefix compression, so
> +	 * there is no need to fully decode the record key here. This removes
> +	 * the need for allocating memory.
>  	 */
> -	n = reftable_decode_key(&kth_restart_key, &unused_extra, in);
> -	if (n < 0) {
> +	n = reftable_decode_keylen(in, &prefix_len, &suffix_len, &extra);
> +	if (n < 0 || prefix_len) {
>  		args->error = 1;
>  		return -1;
>  	}
>  
> -	result = strbuf_cmp(&args->needle, &kth_restart_key);
> -	strbuf_release(&kth_restart_key);
> -	return result < 0;
> +	string_view_consume(&in, n);
> +	if (suffix_len > in.len) {
> +		args->error = 1;
> +		return -1;
> +	}
> +
> +	n = memcmp(args->needle.buf, in.buf,
> +		   args->needle.len < suffix_len ? args->needle.len : suffix_len);
> +	if (n)
> +		return n < 0;
> +	return args->needle.len < suffix_len;
>  }
>  
>  void block_iter_copy_from(struct block_iter *dest, struct block_iter *src)
> -- 
> 2.44.GIT
> 



  reply	other threads:[~2024-04-02 16:48 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-22 12:22 [PATCH 0/7] reftable: improvements for the `binsearch()` mechanism Patrick Steinhardt
2024-03-22 12:22 ` [PATCH 1/7] reftable/basics: fix return type of `binsearch()` to be `size_t` Patrick Steinhardt
2024-03-22 12:22 ` [PATCH 2/7] reftable/basics: improve `binsearch()` test Patrick Steinhardt
2024-03-22 18:46   ` Justin Tobler
2024-03-25 10:07     ` Patrick Steinhardt
2024-03-22 12:22 ` [PATCH 3/7] reftable/refname: refactor binary search over refnames Patrick Steinhardt
2024-03-22 18:55   ` Justin Tobler
2024-03-25 10:07     ` Patrick Steinhardt
2024-03-22 12:22 ` [PATCH 4/7] reftable/block: refactor binary search over restart points Patrick Steinhardt
2024-03-22 12:22 ` [PATCH 5/7] reftable/block: fix error handling when searching " Patrick Steinhardt
2024-03-22 12:22 ` [PATCH 6/7] reftable/record: extract function to decode key lengths Patrick Steinhardt
2024-03-22 12:22 ` [PATCH 7/7] reftable/block: avoid decoding keys when searching restart points Patrick Steinhardt
2024-03-25 10:10 ` [PATCH v2 0/7] reftable: improvements for the `binsearch()` mechanism Patrick Steinhardt
2024-03-25 10:10   ` [PATCH v2 1/7] reftable/basics: fix return type of `binsearch()` to be `size_t` Patrick Steinhardt
2024-03-25 10:10   ` [PATCH v2 2/7] reftable/basics: improve `binsearch()` test Patrick Steinhardt
2024-03-25 10:10   ` [PATCH v2 3/7] reftable/refname: refactor binary search over refnames Patrick Steinhardt
2024-04-02 16:27     ` Justin Tobler
2024-04-02 17:15       ` Patrick Steinhardt
2024-03-25 10:10   ` [PATCH v2 4/7] reftable/block: refactor binary search over restart points Patrick Steinhardt
2024-04-02 16:42     ` Justin Tobler
2024-04-02 17:15       ` Patrick Steinhardt
2024-04-02 17:46         ` Justin Tobler
2024-04-03  6:01           ` Patrick Steinhardt
2024-03-25 10:10   ` [PATCH v2 5/7] reftable/block: fix error handling when searching " Patrick Steinhardt
2024-03-25 10:10   ` [PATCH v2 6/7] reftable/record: extract function to decode key lengths Patrick Steinhardt
2024-03-25 10:11   ` [PATCH v2 7/7] reftable/block: avoid decoding keys when searching restart points Patrick Steinhardt
2024-04-02 16:47     ` Justin Tobler [this message]
2024-04-02 17:15       ` Patrick Steinhardt
2024-04-02 17:24 ` [PATCH v3 0/7] reftable: improvements for the `binsearch()` mechanism Patrick Steinhardt
2024-04-02 17:24   ` [PATCH v3 1/7] reftable/basics: fix return type of `binsearch()` to be `size_t` Patrick Steinhardt
2024-04-02 17:24   ` [PATCH v3 2/7] reftable/basics: improve `binsearch()` test Patrick Steinhardt
2024-04-02 17:24   ` [PATCH v3 3/7] reftable/refname: refactor binary search over refnames Patrick Steinhardt
2024-04-02 17:24   ` [PATCH v3 4/7] reftable/block: refactor binary search over restart points Patrick Steinhardt
2024-04-02 17:24   ` [PATCH v3 5/7] reftable/block: fix error handling when searching " Patrick Steinhardt
2024-04-02 17:25   ` [PATCH v3 6/7] reftable/record: extract function to decode key lengths Patrick Steinhardt
2024-04-02 17:25   ` [PATCH v3 7/7] reftable/block: avoid decoding keys when searching restart points Patrick Steinhardt
2024-04-02 17:49   ` [PATCH v3 0/7] reftable: improvements for the `binsearch()` mechanism Justin Tobler
2024-04-03  6:03 ` [PATCH v4 " Patrick Steinhardt
2024-04-03  6:03   ` [PATCH v4 1/7] reftable/basics: fix return type of `binsearch()` to be `size_t` Patrick Steinhardt
2024-04-03  6:04   ` [PATCH v4 2/7] reftable/basics: improve `binsearch()` test Patrick Steinhardt
2024-04-03  6:04   ` [PATCH v4 3/7] reftable/refname: refactor binary search over refnames Patrick Steinhardt
2024-04-03  6:04   ` [PATCH v4 4/7] reftable/block: refactor binary search over restart points Patrick Steinhardt
2024-04-03  6:04   ` [PATCH v4 5/7] reftable/block: fix error handling when searching " Patrick Steinhardt
2024-04-03  6:04   ` [PATCH v4 6/7] reftable/record: extract function to decode key lengths Patrick Steinhardt
2024-04-03  6:04   ` [PATCH v4 7/7] reftable/block: avoid decoding keys when searching restart points Patrick Steinhardt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=eiyd2nmwxjaetkux4prwm6adcx7z77ry3wc62art6gnfklvgmw@hox32vwuu5sj \
    --to=jltobler@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=ps@pks.im \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).