From: Patrick Steinhardt <ps@pks.im>
To: Justin Tobler via GitGitGadget <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, Karthik Nayak <karthik.188@gmail.com>,
Justin Tobler <jltobler@gmail.com>
Subject: Re: [PATCH v3 2/3] reftable/stack: use geometric table compaction
Date: Tue, 2 Apr 2024 09:23:50 +0200 [thread overview]
Message-ID: <Zguyhhyot0ZphACv@tanuki> (raw)
In-Reply-To: <7e62c2286ae9c63368f1508f065b33422ca24bc8.1711685809.git.gitgitgadget@gmail.com>
[-- Attachment #1: Type: text/plain, Size: 12351 bytes --]
On Fri, Mar 29, 2024 at 04:16:48AM +0000, Justin Tobler via GitGitGadget wrote:
> From: Justin Tobler <jltobler@gmail.com>
>
> To reduce the number of on-disk reftables, compaction is performed.
> Contiguous tables with the same binary log value of size are grouped
> into segments. The segment that has both the lowest binary log value and
> contains more than one table is set as the starting point when
> identifying the compaction segment.
>
> Since segments containing a single table are not initially considered
> for compaction, if the table appended to the list does not match the
> previous table log value, no compaction occurs for the new table. It is
> therefore possible for unbounded growth of the table list. This can be
> demonstrated by repeating the following sequence:
>
> git branch -f foo
> git branch -d foo
>
> Each operation results in a new table being written with no compaction
> occurring until a separate operation produces a table matching the
> previous table log value.
>
> Instead, to avoid unbounded growth of the table list, the compaction
> strategy is updated to ensure tables follow a geometric sequence after
> each operation by individually evaluating each table in reverse index
> order. This strategy results in a much simpler and more robust algorithm
> compared to the previous one while also maintaining a minimal ordered
> set of tables on-disk.
>
> When creating 10 thousand references, the new strategy has no
> performance impact:
>
> Benchmark 1: update-ref: create refs sequentially (revision = HEAD~)
> Time (mean ± σ): 26.516 s ± 0.047 s [User: 17.864 s, System: 8.491 s]
> Range (min … max): 26.447 s … 26.569 s 10 runs
>
> Benchmark 2: update-ref: create refs sequentially (revision = HEAD)
> Time (mean ± σ): 26.417 s ± 0.028 s [User: 17.738 s, System: 8.500 s]
> Range (min … max): 26.366 s … 26.444 s 10 runs
>
> Summary
> update-ref: create refs sequentially (revision = HEAD) ran
> 1.00 ± 0.00 times faster than update-ref: create refs sequentially (revision = HEAD~)
>
> Some tests in `t0610-reftable-basics.sh` assert the on-disk state of
> tables and are therefore updated to specify the correct new table count.
> Since compaction is more aggressive in ensuring tables maintain a
> geometric sequence, the expected table count is reduced in these tests.
> In `reftable/stack_test.c` tests related to `sizes_to_segments()` are
> removed because the function is no longer needed. Also, the
> `test_suggest_compaction_segment()` test is updated to better showcase
> and reflect the new geometric compaction behavior.
>
> Signed-off-by: Justin Tobler <jltobler@gmail.com>
> ---
> reftable/stack.c | 120 ++++++++++++++++++-------------------
> reftable/stack.h | 3 -
> reftable/stack_test.c | 67 +++++----------------
> t/t0610-reftable-basics.sh | 45 +++++++++-----
> 4 files changed, 103 insertions(+), 132 deletions(-)
>
> diff --git a/reftable/stack.c b/reftable/stack.c
> index 07262beaaf7..e7b9a1de5a4 100644
> --- a/reftable/stack.c
> +++ b/reftable/stack.c
> @@ -1202,75 +1202,73 @@ static int segment_size(struct segment *s)
> return s->end - s->start;
> }
>
> -int fastlog2(uint64_t sz)
> -{
> - int l = 0;
> - if (sz == 0)
> - return 0;
> - for (; sz; sz /= 2) {
> - l++;
> - }
> - return l - 1;
> -}
> -
> -struct segment *sizes_to_segments(size_t *seglen, uint64_t *sizes, size_t n)
> -{
> - struct segment *segs = reftable_calloc(n, sizeof(*segs));
> - struct segment cur = { 0 };
> - size_t next = 0, i;
> -
> - if (n == 0) {
> - *seglen = 0;
> - return segs;
> - }
> - for (i = 0; i < n; i++) {
> - int log = fastlog2(sizes[i]);
> - if (cur.log != log && cur.bytes > 0) {
> - struct segment fresh = {
> - .start = i,
> - };
> -
> - segs[next++] = cur;
> - cur = fresh;
> - }
> -
> - cur.log = log;
> - cur.end = i + 1;
> - cur.bytes += sizes[i];
> - }
> - segs[next++] = cur;
> - *seglen = next;
> - return segs;
> -}
> -
> struct segment suggest_compaction_segment(uint64_t *sizes, size_t n)
> {
> - struct segment min_seg = {
> - .log = 64,
> - };
> - struct segment *segs;
> - size_t seglen = 0, i;
> -
> - segs = sizes_to_segments(&seglen, sizes, n);
> - for (i = 0; i < seglen; i++) {
> - if (segment_size(&segs[i]) == 1)
> - continue;
> + struct segment seg = { 0 };
> + uint64_t bytes;
> + size_t i;
>
> - if (segs[i].log < min_seg.log)
> - min_seg = segs[i];
> - }
> + /*
> + * If there are no tables or only a single one then we don't have to
> + * compact anything. The sequence is geometric by definition already.
> + */
> + if (n <= 1)
> + return seg;
>
> - while (min_seg.start > 0) {
> - size_t prev = min_seg.start - 1;
> - if (fastlog2(min_seg.bytes) < fastlog2(sizes[prev]))
> + /*
> + * Find the ending table of the compaction segment needed to restore the
> + * geometric sequence.
> + *
> + * To do so, we iterate backwards starting from the most recent table
> + * until a valid segment end is found. If the preceding table is smaller
> + * than the current table multiplied by the geometric factor (2), the
> + * current table is set as the compaction segment end.
> + *
> + * Tables after the ending point are not added to the byte count because
> + * they are already valid members of the geometric sequence. Due to the
> + * properties of a geometric sequence, it is not possible for the sum of
> + * these tables to exceed the value of the ending point table.
> + *
> + * Example table size sequence requiring no compaction:
> + * 64, 32, 16, 8, 4, 2, 1
> + *
> + * Example compaction segment end set to table with size 3:
> + * 64, 32, 16, 8, 4, 3, 1
> + */
> + for (i = n - 1; i > 0; i--) {
> + if (sizes[i - 1] < sizes[i] * 2) {
> + seg.end = i + 1;
> + bytes = sizes[i];
> break;
> + }
> + }
>
> - min_seg.start = prev;
> - min_seg.bytes += sizes[prev];
> + /*
> + * Find the starting table of the compaction segment by iterating
> + * through the remaining tables and keeping track of the accumulated
> + * size of all tables seen from the segment end table. The previous
> + * table is compared to the accumulated size because the tables from the
> + * segment end are merged backwards recursively.
> + *
> + * Note that we keep iterating even after we have found the first
> + * starting point. This is because there may be tables in the stack
> + * preceding that first starting point which violate the geometric
> + * sequence.
> + *
> + * Example compaction segment start set to table with size 32:
> + * 128, 32, 16, 8, 4, 3, 1
> + */
> + for (; i > 0; i--) {
> + uint64_t curr = bytes;
> + bytes += sizes[i - 1];
> +
> + if (sizes[i - 1] < curr * 2) {
> + seg.start = i - 1;
> + seg.bytes = bytes;
> + }
> }
>
> - reftable_free(segs);
> - return min_seg;
> + return seg;
> }
>
> static uint64_t *stack_table_sizes_for_compaction(struct reftable_stack *st)
> diff --git a/reftable/stack.h b/reftable/stack.h
> index d919455669e..656f896cc28 100644
> --- a/reftable/stack.h
> +++ b/reftable/stack.h
> @@ -33,12 +33,9 @@ int read_lines(const char *filename, char ***lines);
>
> struct segment {
> size_t start, end;
> - int log;
> uint64_t bytes;
> };
>
> -int fastlog2(uint64_t sz);
> -struct segment *sizes_to_segments(size_t *seglen, uint64_t *sizes, size_t n);
> struct segment suggest_compaction_segment(uint64_t *sizes, size_t n);
>
> #endif
> diff --git a/reftable/stack_test.c b/reftable/stack_test.c
> index 7336757cf53..21541742fe5 100644
> --- a/reftable/stack_test.c
> +++ b/reftable/stack_test.c
> @@ -717,59 +717,13 @@ static void test_reftable_stack_hash_id(void)
> clear_dir(dir);
> }
>
> -static void test_log2(void)
> -{
> - EXPECT(1 == fastlog2(3));
> - EXPECT(2 == fastlog2(4));
> - EXPECT(2 == fastlog2(5));
> -}
> -
> -static void test_sizes_to_segments(void)
> -{
> - uint64_t sizes[] = { 2, 3, 4, 5, 7, 9 };
> - /* .................0 1 2 3 4 5 */
> -
> - size_t seglen = 0;
> - struct segment *segs =
> - sizes_to_segments(&seglen, sizes, ARRAY_SIZE(sizes));
> - EXPECT(segs[2].log == 3);
> - EXPECT(segs[2].start == 5);
> - EXPECT(segs[2].end == 6);
> -
> - EXPECT(segs[1].log == 2);
> - EXPECT(segs[1].start == 2);
> - EXPECT(segs[1].end == 5);
> - reftable_free(segs);
> -}
> -
> -static void test_sizes_to_segments_empty(void)
> -{
> - size_t seglen = 0;
> - struct segment *segs = sizes_to_segments(&seglen, NULL, 0);
> - EXPECT(seglen == 0);
> - reftable_free(segs);
> -}
> -
> -static void test_sizes_to_segments_all_equal(void)
> -{
> - uint64_t sizes[] = { 5, 5 };
> - size_t seglen = 0;
> - struct segment *segs =
> - sizes_to_segments(&seglen, sizes, ARRAY_SIZE(sizes));
> - EXPECT(seglen == 1);
> - EXPECT(segs[0].start == 0);
> - EXPECT(segs[0].end == 2);
> - reftable_free(segs);
> -}
> -
> static void test_suggest_compaction_segment(void)
> {
> - uint64_t sizes[] = { 128, 64, 17, 16, 9, 9, 9, 16, 16 };
> - /* .................0 1 2 3 4 5 6 */
> + uint64_t sizes[] = { 512, 64, 17, 16, 9, 9, 9, 16, 2, 16 };
> struct segment min =
> suggest_compaction_segment(sizes, ARRAY_SIZE(sizes));
> - EXPECT(min.start == 2);
> - EXPECT(min.end == 7);
> + EXPECT(min.start == 1);
> + EXPECT(min.end == 10);
> }
>
> static void test_suggest_compaction_segment_nothing(void)
> @@ -880,6 +834,17 @@ static void test_empty_add(void)
> reftable_stack_destroy(st2);
> }
>
> +static int fastlog2(uint64_t sz)
> +{
> + int l = 0;
> + if (sz == 0)
> + return 0;
> + for (; sz; sz /= 2) {
> + l++;
> + }
Nit: we could drop the curly braces while at it.
> + return l - 1;
> +}
> +
> static void test_reftable_stack_auto_compaction(void)
> {
> struct reftable_write_options cfg = { 0 };
> @@ -1068,7 +1033,6 @@ static void test_reftable_stack_compaction_concurrent_clean(void)
> int stack_test_main(int argc, const char *argv[])
> {
> RUN_TEST(test_empty_add);
> - RUN_TEST(test_log2);
> RUN_TEST(test_names_equal);
> RUN_TEST(test_parse_names);
> RUN_TEST(test_read_file);
> @@ -1088,9 +1052,6 @@ int stack_test_main(int argc, const char *argv[])
> RUN_TEST(test_reftable_stack_update_index_check);
> RUN_TEST(test_reftable_stack_uptodate);
> RUN_TEST(test_reftable_stack_validate_refname);
> - RUN_TEST(test_sizes_to_segments);
> - RUN_TEST(test_sizes_to_segments_all_equal);
> - RUN_TEST(test_sizes_to_segments_empty);
> RUN_TEST(test_suggest_compaction_segment);
> RUN_TEST(test_suggest_compaction_segment_nothing);
> return 0;
> diff --git a/t/t0610-reftable-basics.sh b/t/t0610-reftable-basics.sh
> index 434044078ed..b95626549e7 100755
> --- a/t/t0610-reftable-basics.sh
> +++ b/t/t0610-reftable-basics.sh
> @@ -293,7 +293,7 @@ test_expect_success 'ref transaction: writes cause auto-compaction' '
> test_line_count = 1 repo/.git/reftable/tables.list &&
>
> test_commit -C repo --no-tag A &&
> - test_line_count = 2 repo/.git/reftable/tables.list &&
> + test_line_count = 1 repo/.git/reftable/tables.list &&
>
> test_commit -C repo --no-tag B &&
> test_line_count = 1 repo/.git/reftable/tables.list
> @@ -308,12 +308,24 @@ test_expect_success 'ref transaction: environment variable disables auto-compact
> do
> GIT_TEST_REFTABLE_NO_AUTOCOMPACTION=true git -C repo update-ref branch-$i HEAD || return 1
> done &&
> - test_line_count = 23 repo/.git/reftable/tables.list &&
> + test_line_count = 22 repo/.git/reftable/tables.list &&
>
> git -C repo update-ref foo HEAD &&
> test_line_count = 1 repo/.git/reftable/tables.list
> '
>
> +test_expect_success 'ref transaction: alternating table sizes are compacted' '
> + test_when_finished "rm -rf repo" &&
> + git init repo &&
> + test_commit -C repo A &&
> + for i in $(test_seq 20)
Nit: we could reduce the number of iterations here so that we don't have
to spawn 40 Git commands. Using something like 10 or even 5 iterations
should be sufficient to demonstrate the problem, no?
Patrick
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
next prev parent reply other threads:[~2024-04-02 7:23 UTC|newest]
Thread overview: 52+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-03-05 20:03 [PATCH] reftable/stack: use geometric table compaction Justin Tobler via GitGitGadget
2024-03-06 12:30 ` Patrick Steinhardt
2024-03-06 12:37 ` Patrick Steinhardt
2024-03-21 22:48 ` Justin Tobler
2024-03-21 22:40 ` [PATCH v2 0/3] " Justin Tobler via GitGitGadget
2024-03-21 22:40 ` [PATCH v2 1/3] reftable/stack: add env to disable autocompaction Justin Tobler via GitGitGadget
2024-03-22 1:25 ` Patrick Steinhardt
2024-03-21 22:40 ` [PATCH v2 2/3] reftable/stack: use geometric table compaction Justin Tobler via GitGitGadget
2024-03-22 1:25 ` Patrick Steinhardt
2024-03-27 13:24 ` Karthik Nayak
2024-03-21 22:40 ` [PATCH v2 3/3] reftable/segment: make segment end inclusive Justin Tobler via GitGitGadget
2024-03-22 1:25 ` [PATCH v2 0/3] reftable/stack: use geometric table compaction Patrick Steinhardt
2024-04-03 10:13 ` Han-Wen Nienhuys
2024-04-03 10:18 ` Patrick Steinhardt
2024-04-03 15:14 ` Justin Tobler
2024-04-03 16:40 ` Junio C Hamano
2024-03-29 4:16 ` [PATCH v3 " Justin Tobler via GitGitGadget
2024-03-29 4:16 ` [PATCH v3 1/3] reftable/stack: add env to disable autocompaction Justin Tobler via GitGitGadget
2024-03-29 18:25 ` Junio C Hamano
2024-03-29 21:56 ` Junio C Hamano
2024-04-02 7:23 ` Patrick Steinhardt
2024-04-02 17:23 ` Junio C Hamano
2024-03-29 4:16 ` [PATCH v3 2/3] reftable/stack: use geometric table compaction Justin Tobler via GitGitGadget
2024-04-02 7:23 ` Patrick Steinhardt [this message]
2024-03-29 4:16 ` [PATCH v3 3/3] reftable/stack: make segment end inclusive Justin Tobler via GitGitGadget
2024-03-29 18:36 ` Junio C Hamano
2024-04-02 7:23 ` Patrick Steinhardt
2024-04-03 0:20 ` [PATCH v4 0/2] reftable/stack: use geometric table compaction Justin Tobler via GitGitGadget
2024-04-03 0:20 ` [PATCH v4 1/2] reftable/stack: add env to disable autocompaction Justin Tobler via GitGitGadget
2024-04-03 0:20 ` [PATCH v4 2/2] reftable/stack: use geometric table compaction Justin Tobler via GitGitGadget
2024-04-03 4:47 ` [PATCH v4 0/2] " Patrick Steinhardt
2024-04-03 11:12 ` Karthik Nayak
2024-04-03 16:56 ` Junio C Hamano
2024-04-04 18:29 ` [PATCH v5 0/3] " Justin Tobler via GitGitGadget
2024-04-04 18:29 ` [PATCH v5 1/3] reftable/stack: allow disabling of auto-compaction Justin Tobler via GitGitGadget
2024-04-08 6:12 ` Patrick Steinhardt
2024-04-04 18:29 ` [PATCH v5 2/3] reftable/stack: add env to disable autocompaction Justin Tobler via GitGitGadget
2024-04-08 6:12 ` Patrick Steinhardt
2024-04-08 16:18 ` Junio C Hamano
2024-04-04 18:29 ` [PATCH v5 3/3] reftable/stack: use geometric table compaction Justin Tobler via GitGitGadget
2024-04-08 6:12 ` [PATCH v5 0/3] " Patrick Steinhardt
2024-04-08 16:17 ` Justin Tobler
2024-04-08 16:16 ` [PATCH v6 " Justin Tobler via GitGitGadget
2024-04-08 16:16 ` [PATCH v6 1/3] reftable/stack: expose option to disable auto-compaction Justin Tobler via GitGitGadget
2024-04-08 16:16 ` [PATCH v6 2/3] reftable/stack: add env to disable autocompaction Justin Tobler via GitGitGadget
2024-04-08 16:16 ` [PATCH v6 3/3] reftable/stack: use geometric table compaction Justin Tobler via GitGitGadget
2024-04-08 16:20 ` [PATCH v6 0/3] " Patrick Steinhardt
2024-04-08 19:12 ` Junio C Hamano
2024-04-03 19:12 ` [PATCH v2 " Junio C Hamano
2024-04-03 19:30 ` Patrick Steinhardt
2024-04-04 5:34 ` Patrick Steinhardt
2024-04-04 18:28 ` Justin Tobler
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Zguyhhyot0ZphACv@tanuki \
--to=ps@pks.im \
--cc=git@vger.kernel.org \
--cc=gitgitgadget@gmail.com \
--cc=jltobler@gmail.com \
--cc=karthik.188@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).