[PATCH] reftable/block: fix binary search over restart counter

Git Mailing List Archive mirror
 help / color / mirror / Atom feed

* [PATCH] reftable/block: fix binary search over restart counter
@ 2024-03-07 15:26 Patrick Steinhardt
  2024-03-07 16:24 ` Patrick Steinhardt
  2024-03-07 20:35 ` [PATCH v2 0/2] " Patrick Steinhardt
  0 siblings, 2 replies; 9+ messages in thread
From: Patrick Steinhardt @ 2024-03-07 15:26 UTC (permalink / raw
  To: git

[-- Attachment #1: Type: text/plain, Size: 3192 bytes --]

Records store their keys prefix-compressed. As many records will share a
common prefix (e.g. "refs/heads/"), this can end up saving quite a bit
of disk space. The downside of this is that it is not possible to just
seek into the middle of a block and consume the corresponding record
because it may depend on prefixes read from preceding records.

To help with this usecase, the reftable format writes every n'th record
without using prefix compression, which is called a "restart". The list
of restarts is stored at the end of each block so that a reader can
figure out entry points at which to read a full record without having to
read all preceding records.

This allows us to do a binary search over the records in a block when
searching for a particular key by iterating through the restarts until
we have found the section in which our record must be located. From
thereon we perform a linear search to locate the desired record.

This mechanism is broken though. In `block_reader_seek()` we call
`binsearch()` over the count of restarts in the current block. The
function we pass to compare records with each other computes the key at
the current index and then compares it to our search key by calling
`strbuf_cmp()`, returning its result directly. But `binsearch()` expects
us to return a truish value that indicates whether the current index is
smaller than the searched-for key. And unless our key exactly matches
the value at the restart counter we always end up returning a truish
value.

The consequence is that `binsearch()` essentially always returns 0,
indicacting to us that we must start searching right at the beginning of
the block. This works by chance because we now always do a linear scan
from the start of the block, and thus we would still end up finding the
desired record. But needless to say, this makes the optimization quite
useless.

Fix this bug by returning whether the current key is smaller than the
searched key. As the current behaviour was correct it is not possible to
write a test. Furthermore it is also not really possible to demonstrate
in a benchmark that this fix speeds up seeking records.

This may cause the reader to question whether this binary search makes
sense in the first place if it doesn't even help with performance. But
it would end up helping if we were to read a reftable with a much larger
block size. Blocks can be up to 16MB in size, in which case it will
become much more important to avoid the linear scan. We are not yet
ready to read or write such larger blocks though, so we have to live
without a benchmark demonstrating this.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/block.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/reftable/block.c b/reftable/block.c
index 72eb73b380..3d7a7022e7 100644
--- a/reftable/block.c
+++ b/reftable/block.c
@@ -302,7 +302,7 @@ static int restart_key_less(size_t idx, void *args)

 	result = strbuf_cmp(&a->key, &rkey);
 	strbuf_release(&rkey);
-	return result;
+	return result <= 0;
 }

 void block_iter_copy_from(struct block_iter *dest, struct block_iter *src)
-- 
2.44.0

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] reftable/block: fix binary search over restart counter
  2024-03-07 15:26 [PATCH] reftable/block: fix binary search over restart counter Patrick Steinhardt
@ 2024-03-07 16:24 ` Patrick Steinhardt
  2024-03-07 20:35 ` [PATCH v2 0/2] " Patrick Steinhardt
  1 sibling, 0 replies; 9+ messages in thread
From: Patrick Steinhardt @ 2024-03-07 16:24 UTC (permalink / raw
  To: git

[-- Attachment #1: Type: text/plain, Size: 3495 bytes --]

On Thu, Mar 07, 2024 at 04:26:38PM +0100, Patrick Steinhardt wrote:
> Records store their keys prefix-compressed. As many records will share a
> common prefix (e.g. "refs/heads/"), this can end up saving quite a bit
> of disk space. The downside of this is that it is not possible to just
> seek into the middle of a block and consume the corresponding record
> because it may depend on prefixes read from preceding records.
> 
> To help with this usecase, the reftable format writes every n'th record
> without using prefix compression, which is called a "restart". The list
> of restarts is stored at the end of each block so that a reader can
> figure out entry points at which to read a full record without having to
> read all preceding records.
> 
> This allows us to do a binary search over the records in a block when
> searching for a particular key by iterating through the restarts until
> we have found the section in which our record must be located. From
> thereon we perform a linear search to locate the desired record.
> 
> This mechanism is broken though. In `block_reader_seek()` we call
> `binsearch()` over the count of restarts in the current block. The
> function we pass to compare records with each other computes the key at
> the current index and then compares it to our search key by calling
> `strbuf_cmp()`, returning its result directly. But `binsearch()` expects
> us to return a truish value that indicates whether the current index is
> smaller than the searched-for key. And unless our key exactly matches
> the value at the restart counter we always end up returning a truish
> value.
> 
> The consequence is that `binsearch()` essentially always returns 0,
> indicacting to us that we must start searching right at the beginning of
> the block. This works by chance because we now always do a linear scan
> from the start of the block, and thus we would still end up finding the
> desired record. But needless to say, this makes the optimization quite
> useless.
> 
> Fix this bug by returning whether the current key is smaller than the
> searched key. As the current behaviour was correct it is not possible to
> write a test. Furthermore it is also not really possible to demonstrate
> in a benchmark that this fix speeds up seeking records.
> 
> This may cause the reader to question whether this binary search makes
> sense in the first place if it doesn't even help with performance. But
> it would end up helping if we were to read a reftable with a much larger
> block size. Blocks can be up to 16MB in size, in which case it will
> become much more important to avoid the linear scan. We are not yet
> ready to read or write such larger blocks though, so we have to live
> without a benchmark demonstrating this.
> 
> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
>  reftable/block.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/reftable/block.c b/reftable/block.c
> index 72eb73b380..3d7a7022e7 100644
> --- a/reftable/block.c
> +++ b/reftable/block.c
> @@ -302,7 +302,7 @@ static int restart_key_less(size_t idx, void *args)
>  
>  	result = strbuf_cmp(&a->key, &rkey);
>  	strbuf_release(&rkey);
> -	return result;
> +	return result <= 0;
>  }
>  
>  void block_iter_copy_from(struct block_iter *dest, struct block_iter *src)
> -- 
> 2.44.0
> 

Hum. I noticed that this causes a memory leak in our CI. I'll need to
investigate.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v2 0/2] reftable/block: fix binary search over restart counter
  2024-03-07 15:26 [PATCH] reftable/block: fix binary search over restart counter Patrick Steinhardt
  2024-03-07 16:24 ` Patrick Steinhardt
@ 2024-03-07 20:35 ` Patrick Steinhardt
  2024-03-07 20:35   ` [PATCH v2 1/2] reftable/record: fix memory leak when decoding object records Patrick Steinhardt
  2024-03-07 20:36   ` [PATCH v2 2/2] reftable/block: fix binary search over restart counter Patrick Steinhardt
  1 sibling, 2 replies; 9+ messages in thread
From: Patrick Steinhardt @ 2024-03-07 20:35 UTC (permalink / raw
  To: git

[-- Attachment #1: Type: text/plain, Size: 542 bytes --]

Hi,

this is the second version of my patch series that fixes the binary
search over block restart counters. I've add another commit on top that
fixes a memory leak that was uncovered by the fix.

Patrick

Patrick Steinhardt (2):
  reftable/record: fix memory leak when decoding object records
  reftable/block: fix binary search over restart counter

 reftable/block.c  | 2 +-
 reftable/record.c | 2 ++
 2 files changed, 3 insertions(+), 1 deletion(-)


base-commit: 43072b4ca132437f21975ac6acc6b72dc22fd398
-- 
2.43.1


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v2 1/2] reftable/record: fix memory leak when decoding object records
  2024-03-07 20:35 ` [PATCH v2 0/2] " Patrick Steinhardt
@ 2024-03-07 20:35   ` Patrick Steinhardt
  2024-03-07 20:36   ` [PATCH v2 2/2] reftable/block: fix binary search over restart counter Patrick Steinhardt
  1 sibling, 0 replies; 9+ messages in thread
From: Patrick Steinhardt @ 2024-03-07 20:35 UTC (permalink / raw
  To: git

[-- Attachment #1: Type: text/plain, Size: 1106 bytes --]

When decoding records it is customary to reuse a `struct
reftable_ref_record` across calls. Thus, it may happen that the record
already holds some allocated memory. When decoding ref and log records
we handle this by releasing or reallocating held memory. But we fail to
do this for object records, which causes us to leak memory.

Fix this memory leak by releasing object records before we decode into
them. We may eventually want to reuse memory instead to avoid needless
reallocations. But for now, let's just plug the leak and be done.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/record.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/reftable/record.c b/reftable/record.c
index d6bb42e887..9c31feb35c 100644
--- a/reftable/record.c
+++ b/reftable/record.c
@@ -569,6 +569,8 @@ static int reftable_obj_record_decode(void *rec, struct strbuf key,
 	uint64_t last;
 	int j;

+	reftable_obj_record_release(r);
+
 	REFTABLE_ALLOC_ARRAY(r->hash_prefix, key.len);
 	memcpy(r->hash_prefix, key.buf, key.len);
 	r->hash_prefix_len = key.len;
-- 
2.43.1

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v2 2/2] reftable/block: fix binary search over restart counter
  2024-03-07 20:35 ` [PATCH v2 0/2] " Patrick Steinhardt
  2024-03-07 20:35   ` [PATCH v2 1/2] reftable/record: fix memory leak when decoding object records Patrick Steinhardt
@ 2024-03-07 20:36   ` Patrick Steinhardt
  2024-03-07 23:29     ` Junio C Hamano
  1 sibling, 1 reply; 9+ messages in thread
From: Patrick Steinhardt @ 2024-03-07 20:36 UTC (permalink / raw
  To: git

[-- Attachment #1: Type: text/plain, Size: 3191 bytes --]

Records store their keys prefix-compressed. As many records will share a
common prefix (e.g. "refs/heads/"), this can end up saving quite a bit
of disk space. The downside of this is that it is not possible to just
seek into the middle of a block and consume the corresponding record
because it may depend on prefixes read from preceding records.

To help with this usecase, the reftable format writes every n'th record
without using prefix compression, which is called a "restart". The list
of restarts is stored at the end of each block so that a reader can
figure out entry points at which to read a full record without having to
read all preceding records.

This allows us to do a binary search over the records in a block when
searching for a particular key by iterating through the restarts until
we have found the section in which our record must be located. From
thereon we perform a linear search to locate the desired record.

This mechanism is broken though. In `block_reader_seek()` we call
`binsearch()` over the count of restarts in the current block. The
function we pass to compare records with each other computes the key at
the current index and then compares it to our search key by calling
`strbuf_cmp()`, returning its result directly. But `binsearch()` expects
us to return a truish value that indicates whether the current index is
smaller than the searched-for key. And unless our key exactly matches
the value at the restart counter we always end up returning a truish
value.

The consequence is that `binsearch()` essentially always returns 0,
indicacting to us that we must start searching right at the beginning of
the block. This works by chance because we now always do a linear scan
from the start of the block, and thus we would still end up finding the
desired record. But needless to say, this makes the optimization quite
useless.

Fix this bug by returning whether the current key is smaller than the
searched key. As the current behaviour was correct it is not possible to
write a test. Furthermore it is also not really possible to demonstrate
in a benchmark that this fix speeds up seeking records.

This may cause the reader to question whether this binary search makes
sense in the first place if it doesn't even help with performance. But
it would end up helping if we were to read a reftable with a much larger
block size. Blocks can be up to 16MB in size, in which case it will
become much more important to avoid the linear scan. We are not yet
ready to read or write such larger blocks though, so we have to live
without a benchmark demonstrating this.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/block.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/reftable/block.c b/reftable/block.c
index 72eb73b380..1663030386 100644
--- a/reftable/block.c
+++ b/reftable/block.c
@@ -302,7 +302,7 @@ static int restart_key_less(size_t idx, void *args)

 	result = strbuf_cmp(&a->key, &rkey);
 	strbuf_release(&rkey);
-	return result;
+	return result < 0;
 }

 void block_iter_copy_from(struct block_iter *dest, struct block_iter *src)
-- 
2.43.1

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 2/2] reftable/block: fix binary search over restart counter
  2024-03-07 20:36   ` [PATCH v2 2/2] reftable/block: fix binary search over restart counter Patrick Steinhardt
@ 2024-03-07 23:29     ` Junio C Hamano
  2024-03-08  0:40       ` Junio C Hamano
  0 siblings, 1 reply; 9+ messages in thread
From: Junio C Hamano @ 2024-03-07 23:29 UTC (permalink / raw
  To: Patrick Steinhardt; +Cc: git

Patrick Steinhardt <ps@pks.im> writes:

> The consequence is that `binsearch()` essentially always returns 0,
> indicacting to us that we must start searching right at the beginning of
> the block. This works by chance because we now always do a linear scan
> from the start of the block, and thus we would still end up finding the
> desired record. But needless to say, this makes the optimization quite
> useless.
>
> Fix this bug by returning whether the current key is smaller than the
> searched key. As the current behaviour was correct it is not possible to
> write a test. Furthermore it is also not really possible to demonstrate
> in a benchmark that this fix speeds up seeking records.

This is an amusing bug.  

I wonder if we inherited it from the original implementation---this
was imported from jgit, right?

Thanks for a detailed write-up.  

The "it is a fix, but the breakage is well hidden and cannot be
observed only by checking for correctness" aspect of the bug
deserves the unusually large "number of paragraphs explaining the
change divided by number of changed lines" ratio ;-).

Applied.

> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
>  reftable/block.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/reftable/block.c b/reftable/block.c
> index 72eb73b380..1663030386 100644
> --- a/reftable/block.c
> +++ b/reftable/block.c
> @@ -302,7 +302,7 @@ static int restart_key_less(size_t idx, void *args)
>  
>  	result = strbuf_cmp(&a->key, &rkey);
>  	strbuf_release(&rkey);
> -	return result;
> +	return result < 0;
>  }
>  
>  void block_iter_copy_from(struct block_iter *dest, struct block_iter *src)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 2/2] reftable/block: fix binary search over restart counter
  2024-03-07 23:29     ` Junio C Hamano
@ 2024-03-08  0:40       ` Junio C Hamano
  2024-03-11  5:18         ` Patrick Steinhardt
  0 siblings, 1 reply; 9+ messages in thread
From: Junio C Hamano @ 2024-03-08  0:40 UTC (permalink / raw
  To: Patrick Steinhardt; +Cc: git

Junio C Hamano <gitster@pobox.com> writes:

> Patrick Steinhardt <ps@pks.im> writes:
>
>> The consequence is that `binsearch()` essentially always returns 0,
>> indicacting to us that we must start searching right at the beginning of
>> the block. This works by chance because we now always do a linear scan
>> from the start of the block, and thus we would still end up finding the
>> desired record. But needless to say, this makes the optimization quite
>> useless.

>> Fix this bug by returning whether the current key is smaller than the
>> searched key. As the current behaviour was correct it is not possible to
>> write a test. Furthermore it is also not really possible to demonstrate
>> in a benchmark that this fix speeds up seeking records.
>
> This is an amusing bug.  

Having said all that.

I have to wonder if it is the custom implementation of binsearch()
the reftable/basic.c file has, not this particular comparison
callback.  It makes an unusual expectation on the comparison
function, unlike bsearch(3) whose compar(a,b) is expected to return
an answer with the same sign as "a - b".

I just checked the binary search loops we have in the core part of
the system, like the one in hash-lookup.c (which takes advantage of
the random and uniform nature of hashed values to converge faster
than log2) and ones in builtin/pack-objects.c (both of which are
absolute bog-standard).  Luckily, we do not use such an unusual
convention (well, we avoid overhead of compar callbacks to begin
with, so it is a bit of apples-to-oranges comparison).

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 2/2] reftable/block: fix binary search over restart counter
  2024-03-08  0:40       ` Junio C Hamano
@ 2024-03-11  5:18         ` Patrick Steinhardt
  2024-03-11 17:33           ` Junio C Hamano
  0 siblings, 1 reply; 9+ messages in thread
From: Patrick Steinhardt @ 2024-03-11  5:18 UTC (permalink / raw
  To: Junio C Hamano; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 2514 bytes --]

On Thu, Mar 07, 2024 at 04:40:46PM -0800, Junio C Hamano wrote:
> Junio C Hamano <gitster@pobox.com> writes:
> 
> > Patrick Steinhardt <ps@pks.im> writes:
> >
> >> The consequence is that `binsearch()` essentially always returns 0,
> >> indicacting to us that we must start searching right at the beginning of
> >> the block. This works by chance because we now always do a linear scan
> >> from the start of the block, and thus we would still end up finding the
> >> desired record. But needless to say, this makes the optimization quite
> >> useless.
> 
> >> Fix this bug by returning whether the current key is smaller than the
> >> searched key. As the current behaviour was correct it is not possible to
> >> write a test. Furthermore it is also not really possible to demonstrate
> >> in a benchmark that this fix speeds up seeking records.
> >
> > This is an amusing bug.  
> 
> Having said all that.
> 
> I have to wonder if it is the custom implementation of binsearch()
> the reftable/basic.c file has, not this particular comparison
> callback.  It makes an unusual expectation on the comparison
> function, unlike bsearch(3) whose compar(a,b) is expected to return
> an answer with the same sign as "a - b".
> 
> I just checked the binary search loops we have in the core part of
> the system, like the one in hash-lookup.c (which takes advantage of
> the random and uniform nature of hashed values to converge faster
> than log2) and ones in builtin/pack-objects.c (both of which are
> absolute bog-standard).  Luckily, we do not use such an unusual
> convention (well, we avoid overhead of compar callbacks to begin
> with, so it is a bit of apples-to-oranges comparison).

Very true, this behaviour cought me by surprise, as well, and I do think
it's quite easy to get wrong. Now I would've understood if `binsearch()`
were able to handle and forward errors to the caller by passing -1. And
I almost thought that was the case because `restart_key_less()` can
indeed fail, and it would return a negative value if so. But that error
return code is then not taken as an indicator of failure, but instead
will cause us to treat the current value as smaller than the comparison
key.

But we do know to bubble the error up via the pasesd-in args by setting
`args->error = -1`. Funny thing though: I just now noticed that we check
for `args.error` _before_ we call `binsearch()`. Oops.

I will send a follow-up patch that addresses these issues.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 2/2] reftable/block: fix binary search over restart counter
  2024-03-11  5:18         ` Patrick Steinhardt
@ 2024-03-11 17:33           ` Junio C Hamano
  0 siblings, 0 replies; 9+ messages in thread
From: Junio C Hamano @ 2024-03-11 17:33 UTC (permalink / raw
  To: Patrick Steinhardt; +Cc: git

Patrick Steinhardt <ps@pks.im> writes:

> But we do know to bubble the error up via the pasesd-in args by setting
> `args->error = -1`. Funny thing though: I just now noticed that we check
> for `args.error` _before_ we call `binsearch()`. Oops.
>
> I will send a follow-up patch that addresses these issues.

Thanks, that is doubly amusing ;-)

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2024-03-11 17:33 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-03-07 15:26 [PATCH] reftable/block: fix binary search over restart counter Patrick Steinhardt
2024-03-07 16:24 ` Patrick Steinhardt
2024-03-07 20:35 ` [PATCH v2 0/2] " Patrick Steinhardt
2024-03-07 20:35   ` [PATCH v2 1/2] reftable/record: fix memory leak when decoding object records Patrick Steinhardt
2024-03-07 20:36   ` [PATCH v2 2/2] reftable/block: fix binary search over restart counter Patrick Steinhardt
2024-03-07 23:29     ` Junio C Hamano
2024-03-08  0:40       ` Junio C Hamano
2024-03-11  5:18         ` Patrick Steinhardt
2024-03-11 17:33           ` Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).