From: Patrick Steinhardt <ps@pks.im>
To: Justin Tobler via GitGitGadget <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, Justin Tobler <jltobler@gmail.com>
Subject: Re: [PATCH v2 2/3] reftable/stack: use geometric table compaction
Date: Fri, 22 Mar 2024 02:25:19 +0100 [thread overview]
Message-ID: <Zfzd_yxeXWWTJdyP@tanuki> (raw)
In-Reply-To: <def7008452303f71c1fa469609bc199c629a19ec.1711060820.git.gitgitgadget@gmail.com>
[-- Attachment #1: Type: text/plain, Size: 16012 bytes --]
On Thu, Mar 21, 2024 at 10:40:18PM +0000, Justin Tobler via GitGitGadget wrote:
> From: Justin Tobler <jltobler@gmail.com>
>
> To reduce the number of on-disk reftables, compaction is performed.
> Contiguous tables with the same binary log value of size are grouped
> into segments. The segment that has both the lowest binary log value and
> contains more than one table is set as the starting point when
> identifying the compaction segment.
>
> Since segments containing a single table are not initially considered
> for compaction, if the table appended to the list does not match the
> previous table log value, no compaction occurs for the new table. It is
> therefore possible for unbounded growth of the table list. This can be
> demonstrated by repeating the following sequence:
>
> git branch -f foo
> git branch -d foo
>
> Each operation results in a new table being written with no compaction
> occurring until a separate operation produces a table matching the
> previous table log value.
>
> Instead, to avoid unbounded growth of the table list, the compaction
> strategy is updated to ensure tables follow a geometric sequence after
> each operation. This is done by walking the table list in reverse index
> order to identify the compaction segment start and end. The compaction
> segment end is found by identifying the first table which has a
> preceding table size less than twice the current table. Next, the
> compaction segment start is found iterating through the remaining tables
> in the list checking if the previous table size is less than twice the
> cumulative of tables from the segment end. This ensures the correct
> segment start is found and that the newly compacted table does not
> violate the geometric sequence.
I don't think we need to go into so much detail how exactly the
algorithm is working -- these kind of comments should ideally exist in
the code. What would be more interesting to explain here is _why_ we
chose the new algorithm over the old one instead of just trying to fix
the issue.
Other than that this patch LGTM.
Patrick
> When creating 10 thousand references, the new strategy has no
> performance impact:
>
> Benchmark 1: update-ref: create refs sequentially (revision = HEAD~)
> Time (mean ± σ): 26.516 s ± 0.047 s [User: 17.864 s, System: 8.491 s]
> Range (min … max): 26.447 s … 26.569 s 10 runs
>
> Benchmark 2: update-ref: create refs sequentially (revision = HEAD)
> Time (mean ± σ): 26.417 s ± 0.028 s [User: 17.738 s, System: 8.500 s]
> Range (min … max): 26.366 s … 26.444 s 10 runs
>
> Summary
> update-ref: create refs sequentially (revision = HEAD) ran
> 1.00 ± 0.00 times faster than update-ref: create refs sequentially (revision = HEAD~)
>
> Some tests in `t0610-reftable-basics.sh` assert the on-disk state of
> tables and are therefore updated to specify the correct new table count.
> Since compaction is more aggressive in ensuring tables maintain a
> geometric sequence, the expected table count is reduced in these tests.
> In `reftable/stack_test.c` tests related to `sizes_to_segments()` are
> removed because the function is no longer needed. Also, the
> `test_suggest_compaction_segment()` test is updated to better showcase
> and reflect the new geometric compaction behavior.
>
> Signed-off-by: Justin Tobler <jltobler@gmail.com>
> ---
> reftable/stack.c | 109 ++++++++++++++++---------------------
> reftable/stack.h | 3 -
> reftable/stack_test.c | 66 +++++-----------------
> t/t0610-reftable-basics.sh | 43 ++++++++++-----
> 4 files changed, 91 insertions(+), 130 deletions(-)
>
> diff --git a/reftable/stack.c b/reftable/stack.c
> index 2370d93d13b..ef55dc75cde 100644
> --- a/reftable/stack.c
> +++ b/reftable/stack.c
> @@ -1214,75 +1214,62 @@ static int segment_size(struct segment *s)
> return s->end - s->start;
> }
>
> -int fastlog2(uint64_t sz)
> -{
> - int l = 0;
> - if (sz == 0)
> - return 0;
> - for (; sz; sz /= 2) {
> - l++;
> - }
> - return l - 1;
> -}
> -
> -struct segment *sizes_to_segments(size_t *seglen, uint64_t *sizes, size_t n)
> -{
> - struct segment *segs = reftable_calloc(n, sizeof(*segs));
> - struct segment cur = { 0 };
> - size_t next = 0, i;
> -
> - if (n == 0) {
> - *seglen = 0;
> - return segs;
> - }
> - for (i = 0; i < n; i++) {
> - int log = fastlog2(sizes[i]);
> - if (cur.log != log && cur.bytes > 0) {
> - struct segment fresh = {
> - .start = i,
> - };
> -
> - segs[next++] = cur;
> - cur = fresh;
> - }
> -
> - cur.log = log;
> - cur.end = i + 1;
> - cur.bytes += sizes[i];
> - }
> - segs[next++] = cur;
> - *seglen = next;
> - return segs;
> -}
> -
> struct segment suggest_compaction_segment(uint64_t *sizes, size_t n)
> {
> - struct segment min_seg = {
> - .log = 64,
> - };
> - struct segment *segs;
> - size_t seglen = 0, i;
> -
> - segs = sizes_to_segments(&seglen, sizes, n);
> - for (i = 0; i < seglen; i++) {
> - if (segment_size(&segs[i]) == 1)
> - continue;
> + struct segment seg = { 0 };
> + uint64_t bytes;
> + size_t i;
>
> - if (segs[i].log < min_seg.log)
> - min_seg = segs[i];
> - }
> + /*
> + * If there are no tables or only a single one then we don't have to
> + * compact anything. The sequence is geometric by definition already.
> + */
> + if (n <= 1)
> + return seg;
>
> - while (min_seg.start > 0) {
> - size_t prev = min_seg.start - 1;
> - if (fastlog2(min_seg.bytes) < fastlog2(sizes[prev]))
> + /*
> + * Find the ending table of the compaction segment needed to restore the
> + * geometric sequence.
> + *
> + * To do so, we iterate backwards starting from the most recent table
> + * until a valid segment end is found. If the preceding table is smaller
> + * than the current table multiplied by the geometric factor (2), the
> + * current table is set as the compaction segment end.
> + *
> + * Tables after the ending point are not added to the byte count because
> + * they are already valid members of the geometric sequence. Due to the
> + * properties of a geometric sequence, it is not possible for the sum of
> + * these tables to exceed the value of the ending point table.
> + */
> + for (i = n - 1; i > 0; i--) {
> + if (sizes[i - 1] < sizes[i] * 2) {
> + seg.end = i + 1;
> + bytes = sizes[i];
> break;
> + }
> + }
> +
> + /*
> + * Find the starting table of the compaction segment by iterating
> + * through the remaining tables and keeping track of the accumulated
> + * size of all tables seen from the segment end table.
> + *
> + * Note that we keep iterating even after we have found the first
> + * starting point. This is because there may be tables in the stack
> + * preceding that first starting point which violate the geometric
> + * sequence.
> + */
> + for (; i > 0; i--) {
> + uint64_t curr = bytes;
> + bytes += sizes[i - 1];
>
> - min_seg.start = prev;
> - min_seg.bytes += sizes[prev];
> + if (sizes[i - 1] < curr * 2) {
> + seg.start = i - 1;
> + seg.bytes = bytes;
> + }
> }
>
> - reftable_free(segs);
> - return min_seg;
> + return seg;
> }
>
> static uint64_t *stack_table_sizes_for_compaction(struct reftable_stack *st)
> diff --git a/reftable/stack.h b/reftable/stack.h
> index d919455669e..656f896cc28 100644
> --- a/reftable/stack.h
> +++ b/reftable/stack.h
> @@ -33,12 +33,9 @@ int read_lines(const char *filename, char ***lines);
>
> struct segment {
> size_t start, end;
> - int log;
> uint64_t bytes;
> };
>
> -int fastlog2(uint64_t sz);
> -struct segment *sizes_to_segments(size_t *seglen, uint64_t *sizes, size_t n);
> struct segment suggest_compaction_segment(uint64_t *sizes, size_t n);
>
> #endif
> diff --git a/reftable/stack_test.c b/reftable/stack_test.c
> index 509f4866236..e5f6ff5c9e4 100644
> --- a/reftable/stack_test.c
> +++ b/reftable/stack_test.c
> @@ -720,59 +720,14 @@ static void test_reftable_stack_hash_id(void)
> clear_dir(dir);
> }
>
> -static void test_log2(void)
> -{
> - EXPECT(1 == fastlog2(3));
> - EXPECT(2 == fastlog2(4));
> - EXPECT(2 == fastlog2(5));
> -}
> -
> -static void test_sizes_to_segments(void)
> -{
> - uint64_t sizes[] = { 2, 3, 4, 5, 7, 9 };
> - /* .................0 1 2 3 4 5 */
> -
> - size_t seglen = 0;
> - struct segment *segs =
> - sizes_to_segments(&seglen, sizes, ARRAY_SIZE(sizes));
> - EXPECT(segs[2].log == 3);
> - EXPECT(segs[2].start == 5);
> - EXPECT(segs[2].end == 6);
> -
> - EXPECT(segs[1].log == 2);
> - EXPECT(segs[1].start == 2);
> - EXPECT(segs[1].end == 5);
> - reftable_free(segs);
> -}
> -
> -static void test_sizes_to_segments_empty(void)
> -{
> - size_t seglen = 0;
> - struct segment *segs = sizes_to_segments(&seglen, NULL, 0);
> - EXPECT(seglen == 0);
> - reftable_free(segs);
> -}
> -
> -static void test_sizes_to_segments_all_equal(void)
> -{
> - uint64_t sizes[] = { 5, 5 };
> - size_t seglen = 0;
> - struct segment *segs =
> - sizes_to_segments(&seglen, sizes, ARRAY_SIZE(sizes));
> - EXPECT(seglen == 1);
> - EXPECT(segs[0].start == 0);
> - EXPECT(segs[0].end == 2);
> - reftable_free(segs);
> -}
> -
> static void test_suggest_compaction_segment(void)
> {
> - uint64_t sizes[] = { 128, 64, 17, 16, 9, 9, 9, 16, 16 };
> + uint64_t sizes[] = { 512, 64, 17, 16, 9, 9, 9, 16, 2, 16 };
> /* .................0 1 2 3 4 5 6 */
> struct segment min =
> suggest_compaction_segment(sizes, ARRAY_SIZE(sizes));
> - EXPECT(min.start == 2);
> - EXPECT(min.end == 7);
> + EXPECT(min.start == 1);
> + EXPECT(min.end == 10);
> }
>
> static void test_suggest_compaction_segment_nothing(void)
> @@ -884,6 +839,17 @@ static void test_empty_add(void)
> reftable_stack_destroy(st2);
> }
>
> +static int fastlog2(uint64_t sz)
> +{
> + int l = 0;
> + if (sz == 0)
> + return 0;
> + for (; sz; sz /= 2) {
> + l++;
> + }
> + return l - 1;
> +}
> +
> static void test_reftable_stack_auto_compaction(void)
> {
> struct reftable_write_options cfg = { 0 };
> @@ -1072,7 +1038,6 @@ static void test_reftable_stack_compaction_concurrent_clean(void)
> int stack_test_main(int argc, const char *argv[])
> {
> RUN_TEST(test_empty_add);
> - RUN_TEST(test_log2);
> RUN_TEST(test_names_equal);
> RUN_TEST(test_parse_names);
> RUN_TEST(test_read_file);
> @@ -1092,9 +1057,6 @@ int stack_test_main(int argc, const char *argv[])
> RUN_TEST(test_reftable_stack_update_index_check);
> RUN_TEST(test_reftable_stack_uptodate);
> RUN_TEST(test_reftable_stack_validate_refname);
> - RUN_TEST(test_sizes_to_segments);
> - RUN_TEST(test_sizes_to_segments_all_equal);
> - RUN_TEST(test_sizes_to_segments_empty);
> RUN_TEST(test_suggest_compaction_segment);
> RUN_TEST(test_suggest_compaction_segment_nothing);
> return 0;
> diff --git a/t/t0610-reftable-basics.sh b/t/t0610-reftable-basics.sh
> index 686781192eb..e6c3f94d874 100755
> --- a/t/t0610-reftable-basics.sh
> +++ b/t/t0610-reftable-basics.sh
> @@ -293,12 +293,24 @@ test_expect_success 'ref transaction: writes cause auto-compaction' '
> test_line_count = 1 repo/.git/reftable/tables.list &&
>
> test_commit -C repo --no-tag A &&
> - test_line_count = 2 repo/.git/reftable/tables.list &&
> + test_line_count = 1 repo/.git/reftable/tables.list &&
>
> test_commit -C repo --no-tag B &&
> test_line_count = 1 repo/.git/reftable/tables.list
> '
>
> +test_expect_success 'ref transaction: alternating table sizes are compacted' '
> + test_when_finished "rm -rf repo" &&
> + git init repo &&
> + test_commit -C repo A &&
> + for i in $(test_seq 20)
> + do
> + git -C repo branch -f foo &&
> + git -C repo branch -d foo || return 1
> + done &&
> + test_line_count = 2 repo/.git/reftable/tables.list
> +'
> +
> check_fsync_events () {
> local trace="$1" &&
> shift &&
> @@ -324,7 +336,7 @@ test_expect_success 'ref transaction: writes are synced' '
> git -C repo -c core.fsync=reference \
> -c core.fsyncMethod=fsync update-ref refs/heads/branch HEAD &&
> check_fsync_events trace2.txt <<-EOF
> - "name":"hardware-flush","count":2
> + "name":"hardware-flush","count":4
> EOF
> '
>
> @@ -346,8 +358,8 @@ test_expect_success 'pack-refs: compacts tables' '
>
> test_commit -C repo A &&
> ls -1 repo/.git/reftable >table-files &&
> - test_line_count = 4 table-files &&
> - test_line_count = 3 repo/.git/reftable/tables.list &&
> + test_line_count = 3 table-files &&
> + test_line_count = 2 repo/.git/reftable/tables.list &&
>
> git -C repo pack-refs &&
> ls -1 repo/.git/reftable >table-files &&
> @@ -379,7 +391,7 @@ do
> umask $umask &&
> git init --shared=true repo &&
> test_commit -C repo A &&
> - test_line_count = 3 repo/.git/reftable/tables.list
> + test_line_count = 2 repo/.git/reftable/tables.list
> ) &&
> git -C repo pack-refs &&
> test_expect_perms "-rw-rw-r--" repo/.git/reftable/tables.list &&
> @@ -747,12 +759,13 @@ test_expect_success 'worktree: pack-refs in main repo packs main refs' '
> test_when_finished "rm -rf repo worktree" &&
> git init repo &&
> test_commit -C repo A &&
> - git -C repo worktree add ../worktree &&
> + GIT_TEST_REFTABLE_NO_AUTOCOMPACTION=true git -C repo worktree add ../worktree &&
> + GIT_TEST_REFTABLE_NO_AUTOCOMPACTION=true git -C worktree update-ref refs/worktree/per-worktree HEAD &&
>
> - test_line_count = 3 repo/.git/worktrees/worktree/reftable/tables.list &&
> - test_line_count = 4 repo/.git/reftable/tables.list &&
> + test_line_count = 4 repo/.git/worktrees/worktree/reftable/tables.list &&
> + test_line_count = 3 repo/.git/reftable/tables.list &&
> git -C repo pack-refs &&
> - test_line_count = 3 repo/.git/worktrees/worktree/reftable/tables.list &&
> + test_line_count = 4 repo/.git/worktrees/worktree/reftable/tables.list &&
> test_line_count = 1 repo/.git/reftable/tables.list
> '
>
> @@ -760,19 +773,21 @@ test_expect_success 'worktree: pack-refs in worktree packs worktree refs' '
> test_when_finished "rm -rf repo worktree" &&
> git init repo &&
> test_commit -C repo A &&
> - git -C repo worktree add ../worktree &&
> + GIT_TEST_REFTABLE_NO_AUTOCOMPACTION=true git -C repo worktree add ../worktree &&
> + GIT_TEST_REFTABLE_NO_AUTOCOMPACTION=true git -C worktree update-ref refs/worktree/per-worktree HEAD &&
>
> - test_line_count = 3 repo/.git/worktrees/worktree/reftable/tables.list &&
> - test_line_count = 4 repo/.git/reftable/tables.list &&
> + test_line_count = 4 repo/.git/worktrees/worktree/reftable/tables.list &&
> + test_line_count = 3 repo/.git/reftable/tables.list &&
> git -C worktree pack-refs &&
> test_line_count = 1 repo/.git/worktrees/worktree/reftable/tables.list &&
> - test_line_count = 4 repo/.git/reftable/tables.list
> + test_line_count = 3 repo/.git/reftable/tables.list
> '
>
> test_expect_success 'worktree: creating shared ref updates main stack' '
> test_when_finished "rm -rf repo worktree" &&
> git init repo &&
> test_commit -C repo A &&
> + test_commit -C repo B &&
>
> git -C repo worktree add ../worktree &&
> git -C repo pack-refs &&
> @@ -780,7 +795,7 @@ test_expect_success 'worktree: creating shared ref updates main stack' '
> test_line_count = 1 repo/.git/worktrees/worktree/reftable/tables.list &&
> test_line_count = 1 repo/.git/reftable/tables.list &&
>
> - git -C worktree update-ref refs/heads/shared HEAD &&
> + GIT_TEST_REFTABLE_NO_AUTOCOMPACTION=true git -C worktree update-ref refs/heads/shared HEAD &&
> test_line_count = 1 repo/.git/worktrees/worktree/reftable/tables.list &&
> test_line_count = 2 repo/.git/reftable/tables.list
> '
> --
> gitgitgadget
>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
next prev parent reply other threads:[~2024-03-22 1:25 UTC|newest]
Thread overview: 52+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-03-05 20:03 [PATCH] reftable/stack: use geometric table compaction Justin Tobler via GitGitGadget
2024-03-06 12:30 ` Patrick Steinhardt
2024-03-06 12:37 ` Patrick Steinhardt
2024-03-21 22:48 ` Justin Tobler
2024-03-21 22:40 ` [PATCH v2 0/3] " Justin Tobler via GitGitGadget
2024-03-21 22:40 ` [PATCH v2 1/3] reftable/stack: add env to disable autocompaction Justin Tobler via GitGitGadget
2024-03-22 1:25 ` Patrick Steinhardt
2024-03-21 22:40 ` [PATCH v2 2/3] reftable/stack: use geometric table compaction Justin Tobler via GitGitGadget
2024-03-22 1:25 ` Patrick Steinhardt [this message]
2024-03-27 13:24 ` Karthik Nayak
2024-03-21 22:40 ` [PATCH v2 3/3] reftable/segment: make segment end inclusive Justin Tobler via GitGitGadget
2024-03-22 1:25 ` [PATCH v2 0/3] reftable/stack: use geometric table compaction Patrick Steinhardt
2024-04-03 10:13 ` Han-Wen Nienhuys
2024-04-03 10:18 ` Patrick Steinhardt
2024-04-03 15:14 ` Justin Tobler
2024-04-03 16:40 ` Junio C Hamano
2024-03-29 4:16 ` [PATCH v3 " Justin Tobler via GitGitGadget
2024-03-29 4:16 ` [PATCH v3 1/3] reftable/stack: add env to disable autocompaction Justin Tobler via GitGitGadget
2024-03-29 18:25 ` Junio C Hamano
2024-03-29 21:56 ` Junio C Hamano
2024-04-02 7:23 ` Patrick Steinhardt
2024-04-02 17:23 ` Junio C Hamano
2024-03-29 4:16 ` [PATCH v3 2/3] reftable/stack: use geometric table compaction Justin Tobler via GitGitGadget
2024-04-02 7:23 ` Patrick Steinhardt
2024-03-29 4:16 ` [PATCH v3 3/3] reftable/stack: make segment end inclusive Justin Tobler via GitGitGadget
2024-03-29 18:36 ` Junio C Hamano
2024-04-02 7:23 ` Patrick Steinhardt
2024-04-03 0:20 ` [PATCH v4 0/2] reftable/stack: use geometric table compaction Justin Tobler via GitGitGadget
2024-04-03 0:20 ` [PATCH v4 1/2] reftable/stack: add env to disable autocompaction Justin Tobler via GitGitGadget
2024-04-03 0:20 ` [PATCH v4 2/2] reftable/stack: use geometric table compaction Justin Tobler via GitGitGadget
2024-04-03 4:47 ` [PATCH v4 0/2] " Patrick Steinhardt
2024-04-03 11:12 ` Karthik Nayak
2024-04-03 16:56 ` Junio C Hamano
2024-04-04 18:29 ` [PATCH v5 0/3] " Justin Tobler via GitGitGadget
2024-04-04 18:29 ` [PATCH v5 1/3] reftable/stack: allow disabling of auto-compaction Justin Tobler via GitGitGadget
2024-04-08 6:12 ` Patrick Steinhardt
2024-04-04 18:29 ` [PATCH v5 2/3] reftable/stack: add env to disable autocompaction Justin Tobler via GitGitGadget
2024-04-08 6:12 ` Patrick Steinhardt
2024-04-08 16:18 ` Junio C Hamano
2024-04-04 18:29 ` [PATCH v5 3/3] reftable/stack: use geometric table compaction Justin Tobler via GitGitGadget
2024-04-08 6:12 ` [PATCH v5 0/3] " Patrick Steinhardt
2024-04-08 16:17 ` Justin Tobler
2024-04-08 16:16 ` [PATCH v6 " Justin Tobler via GitGitGadget
2024-04-08 16:16 ` [PATCH v6 1/3] reftable/stack: expose option to disable auto-compaction Justin Tobler via GitGitGadget
2024-04-08 16:16 ` [PATCH v6 2/3] reftable/stack: add env to disable autocompaction Justin Tobler via GitGitGadget
2024-04-08 16:16 ` [PATCH v6 3/3] reftable/stack: use geometric table compaction Justin Tobler via GitGitGadget
2024-04-08 16:20 ` [PATCH v6 0/3] " Patrick Steinhardt
2024-04-08 19:12 ` Junio C Hamano
2024-04-03 19:12 ` [PATCH v2 " Junio C Hamano
2024-04-03 19:30 ` Patrick Steinhardt
2024-04-04 5:34 ` Patrick Steinhardt
2024-04-04 18:28 ` Justin Tobler
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Zfzd_yxeXWWTJdyP@tanuki \
--to=ps@pks.im \
--cc=git@vger.kernel.org \
--cc=gitgitgadget@gmail.com \
--cc=jltobler@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).