From: Taylor Blau <me@ttaylorr.com>
To: git@vger.kernel.org
Cc: Jeff King <peff@peff.net>,
Derrick Stolee <derrickstolee@github.com>,
Junio C Hamano <gitster@pobox.com>
Subject: [PATCH v2 0/2] pack-bitmap: boundary-based bitmap traversal
Date: Fri, 5 May 2023 13:27:30 -0400 [thread overview]
Message-ID: <cover.1683307620.git.me@ttaylorr.com> (raw)
In-Reply-To: <cover.1682380788.git.me@ttaylorr.com>
Here is a reroll of my series to implement a boundary-based bitmap
traversal algorithm that I worked on towards the end of 2021 with Peff.
The algorithm is unchanged from the last version, but the implementation
has been made much more straightforward, thanks to a very helpful
suggestion from Stolee.
Instead of hackily trying to write down all of the UNINTERESTING commits
between the tips and boundary within limit_list(), we can just perform a
commit-only walk, which will give us the set of commits that we need.
Thanks in advance for your review.
Taylor Blau (2):
pack-bitmap.c: extract `fill_in_bitmap()`
pack-bitmap.c: use commit boundary during bitmap traversal
pack-bitmap.c | 249 +++++++++++++++++++++++++++++++++++++-------------
1 file changed, 188 insertions(+), 61 deletions(-)
Range-diff against v1:
1: a643678c0f < -: ---------- revision: support tracking uninteresting commits
2: 7624d3b5ba ! 1: a3a1522439 pack-bitmap.c: extract `fill_in_bitmap()`
@@ pack-bitmap.c: static int add_commit_to_bitmap(struct bitmap_index *bitmap_git,
+ struct include_data incdata;
+ struct bitmap_show_data show_data;
+
-+ if (base == NULL)
++ if (!base)
+ base = bitmap_new();
+
+ incdata.bitmap_git = bitmap_git;
@@ pack-bitmap.c: static int add_commit_to_bitmap(struct bitmap_index *bitmap_git,
+ revs->include_check_data = &incdata;
+
+ if (prepare_revision_walk(revs))
-+ die("revision walk setup failed");
++ die(_("revision walk setup failed"));
+
+ show_data.bitmap_git = bitmap_git;
+ show_data.base = base;
+
-+ traverse_commit_list_filtered(revs, show_commit, show_object,
-+ &show_data, NULL);
++ traverse_commit_list(revs, show_commit, show_object, &show_data);
+
+ revs->include_check = NULL;
+ revs->include_check_obj = NULL;
3: 91ed8c70f2 ! 2: 1993af00cb pack-bitmap.c: use commit boundary during bitmap traversal
@@ Commit message
different (and hopefully faster) traversal to compute revision walks.
Consider a set of positive and negative tips (which we'll refer to with
- their standard bitmap parlance by, "wants", and "haves"). In order to
+ their standard bitmap parlance by "wants", and "haves"). In order to
figure out what objects exist between the tips, the existing traversal
- in prepare_bitmap_walk() looks something like:
+ in `prepare_bitmap_walk()` does something like:
1. Consider if we can even compute the set of objects with bitmaps,
and fall back to the usual traversal if we cannot. For example,
@@ Commit message
respectively.
3. Fall back to the ordinary object traversal if either (a) there are
- no haves, (b) none of the haves are in the bitmapped pack or MIDX,
- or (c) there are no wants.
+ more than zero haves, none of which are in the bitmapped pack or
+ MIDX, or (b) there are no wants.
4. Construct a reachability bitmap for the "haves" side by walking
from the revision tips down to any existing bitmaps, OR-ing in any
@@ Commit message
And is more-or-less equivalent to using the *old* algorithm with this
invocation:
- $ git rev-list --objects --boundary $WANTS --not $HAVES |
- perl -lne 'print $1 if /^-(.*)/' |
- xargs git rev-list --objects --use-bitmap-index $WANTS --not
+ $ git rev-list --objects --use-bitmap-index $WANTS --not \
+ $(git rev-list --objects --boundary $WANTS --not $HAVES |
+ perl -lne 'print $1 if /^-(.*)/')
The new result performs significantly better in many cases, particularly
when the distance from the boundary commit(s) to an existing bitmap is
@@ Commit message
# (Computing objects unique to 'master' among all branches, not
# using bitmaps).
$ time git rev-list --count --objects master --not --exclude=master --branches
- 54
+ 43
- real 0m1.012s
- user 0m0.796s
- sys 0m0.214s
+ real 0m0.346s
+ user 0m0.231s
+ sys 0m0.115s
# (Computing the same uniqueness query using the old bitmap
# traversal algorithm.)
$ time git rev-list --use-bitmap-index --count --objects master --not --exclude=master --branches
- 42
+ 41
- real 0m29.571s
- user 0m28.404s
- sys 0m1.164s
+ real 0m20.007s
+ user 0m19.045s
+ sys 0m0.960s
# (Computing the same uniqueness query using the new bitmap
# traversal algorithm.)
- $ time git.compile rev-list --count --objects --use-bitmap-index master --not --exclude=master --branches
- 54
+ $ time git.compile rev-list --use-bitmap-index --count --objects master --not --exclude=master --branches
+ 41
- real 0m2.279s
- user 0m2.023s
- sys 0m0.256s
+ real 0m1.748s
+ user 0m1.428s
+ sys 0m0.320s
The new algorithm is still slower than not using bitmaps at all, but it
- is nearly a 13-fold improvement over the existing traversal.
+ is nearly a 11-fold improvement over the existing traversal.
In a more realistic setting (using my local copy of git.git), I can
- observe a similar speed-up:
+ observe a similar (if more modest) speed-up:
$ ours="$(git branch --show-current)"
argv="--count --objects $ours --not --exclude=$ours --branches"
@@ Commit message
-n 'existing bitmap traversal' "git rev-list --use-bitmap-index $argv" \
-n 'this commit' "git.compile rev-list --use-bitmap-index $argv"
Benchmark 1: no bitmaps
- Time (mean ± σ): 15.1 ms ± 4.1 ms [User: 8.1 ms, System: 6.9 ms]
- Range (min … max): 7.4 ms … 21.8 ms 131 runs
+ Time (mean ± σ): 5.8 ms ± 0.2 ms [User: 3.3 ms, System: 2.4 ms]
+ Range (min … max): 5.4 ms … 7.0 ms 409 runs
Benchmark 2: existing bitmap traversal
- Time (mean ± σ): 82.6 ms ± 9.2 ms [User: 63.6 ms, System: 19.0 ms]
- Range (min … max): 73.8 ms … 105.4 ms 28 runs
+ Time (mean ± σ): 65.3 ms ± 0.6 ms [User: 49.3 ms, System: 15.8 ms]
+ Range (min … max): 64.1 ms … 66.9 ms 45 runs
Benchmark 3: this commit
- Time (mean ± σ): 19.8 ms ± 3.1 ms [User: 13.0 ms, System: 6.8 ms]
- Range (min … max): 17.7 ms … 38.6 ms 158 runs
+ Time (mean ± σ): 19.8 ms ± 0.3 ms [User: 12.9 ms, System: 6.8 ms]
+ Range (min … max): 19.1 ms … 20.8 ms 150 runs
Summary
'no bitmaps' ran
- 1.31 ± 0.41 times faster than 'this commit'
- 5.49 ± 1.62 times faster than 'existing bitmap traversal'
+ 3.43 ± 0.14 times faster than 'this commit'
+ 11.34 ± 0.45 times faster than 'existing bitmap traversal'
Here, the new algorithm is also still slower than not using bitmaps, but
- represents a 4-fold improvement over the existing traversal in a more
- modest example.
+ represents a more than 3-fold improvement over the existing traversal in
+ a more modest example.
Since this algorithm was originally written (nearly a year and a half
ago, at the time of writing), the bitmap lookup table shipped, making
@@ Commit message
fewer "summary" bitmaps (which would also help with the above).
Helped-by: Jeff King <peff@peff.net>
+ Helped-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
## pack-bitmap.c ##
-@@ pack-bitmap.c: static struct bitmap *fill_in_bitmap(struct bitmap_index *bitmap_git,
- struct include_data incdata;
- struct bitmap_show_data show_data;
-
-- if (base == NULL)
-+ if (!base)
- base = bitmap_new();
-
- incdata.bitmap_git = bitmap_git;
@@ pack-bitmap.c: static struct bitmap *fill_in_bitmap(struct bitmap_index *bitmap_git,
return base;
}
@@ pack-bitmap.c: static struct bitmap *fill_in_bitmap(struct bitmap_index *bitmap_
+ return bitmap_get(bitmap, pos);
+}
+
-+static void show_boundary_commit(struct commit *commit, void *data)
++struct bitmap_boundary_cb {
++ struct bitmap_index *bitmap_git;
++ struct bitmap *base;
++
++ struct object_array boundary;
++};
++
++static void show_boundary_commit(struct commit *commit, void *_data)
+{
-+ struct object_array *boundary = data;
-+ if (!(commit->object.flags & BOUNDARY))
-+ return;
++ struct bitmap_boundary_cb *data = _data;
+
-+ add_object_array(&commit->object, "", boundary);
++ if (commit->object.flags & BOUNDARY)
++ add_object_array(&commit->object, "", &data->boundary);
++
++ if (commit->object.flags & UNINTERESTING) {
++ if (obj_in_bitmap(data->bitmap_git, &commit->object,
++ data->base))
++ return;
++
++ add_commit_to_bitmap(data->bitmap_git, &data->base, commit);
++ }
+}
+
+static void show_boundary_object(struct object *object,
@@ pack-bitmap.c: static struct bitmap *fill_in_bitmap(struct bitmap_index *bitmap_
+ struct rev_info *revs,
+ struct object_list *roots)
+{
-+ struct bitmap *base = NULL;
-+ struct object_array boundary = OBJECT_ARRAY_INIT;
-+ int any_missing = 0;
++ struct bitmap_boundary_cb cb = { .bitmap_git = bitmap_git };
+ unsigned int i;
-+ int tmp_blobs, tmp_trees, tmp_tags;
++ unsigned int tmp_blobs, tmp_trees, tmp_tags;
++ int any_missing = 0;
+
+ revs->ignore_missing_links = 1;
-+ revs->collect_uninteresting = 1;
+
+ /*
+ * OR in any existing reachability bitmaps among `roots` into `base`.
@@ pack-bitmap.c: static struct bitmap *fill_in_bitmap(struct bitmap_index *bitmap_
+ roots = roots->next;
+
+ if (object->type == OBJ_COMMIT &&
-+ !obj_in_bitmap(bitmap_git, object, base) &&
-+ add_commit_to_bitmap(bitmap_git, &base,
++ !obj_in_bitmap(bitmap_git, object, cb.base) &&
++ add_commit_to_bitmap(bitmap_git, &cb.base,
+ (struct commit *)object)) {
+ object->flags |= SEEN;
+ continue;
@@ pack-bitmap.c: static struct bitmap *fill_in_bitmap(struct bitmap_index *bitmap_
+ revs->tag_objects = 0;
+
+ /*
-+ * We didn't have complete coverage of the roots. First OR in any
-+ * bitmaps that are UNINTERESTING between the tips and boundary.
++ * We didn't have complete coverage of the roots. First setup a
++ * revision walk to (a) OR in any bitmaps that are UNINTERESTING
++ * between the tips and boundary, and (b) record the boundary.
+ */
+ trace2_region_enter("pack-bitmap", "boundary-prepare", the_repository);
+ if (prepare_revision_walk(revs))
+ die("revision walk setup failed");
+ trace2_region_leave("pack-bitmap", "boundary-prepare", the_repository);
+
-+ trace2_region_enter("pack-bitmap", "boundary-load-bitmaps", the_repository);
-+ for (i = 0; i < revs->uninteresting_commits.nr; i++) {
-+ struct object *obj = revs->uninteresting_commits.objects[i].item;
-+ if (obj->type != OBJ_COMMIT)
-+ BUG("unexpected non-commit %s marked uninteresting",
-+ oid_to_hex(&obj->oid));
-+
-+ if (obj_in_bitmap(bitmap_git, obj, base))
-+ continue;
-+
-+ add_commit_to_bitmap(bitmap_git, &base, (struct commit *)obj);
-+ }
-+ trace2_region_leave("pack-bitmap", "boundary-load-bitmaps", the_repository);
-+
-+ /*
-+ * Then add the boundary commit(s) as fill-in traversal tips.
-+ */
+ trace2_region_enter("pack-bitmap", "boundary-traverse", the_repository);
+ revs->boundary = 1;
+ traverse_commit_list_filtered(revs,
+ show_boundary_commit,
+ show_boundary_object,
-+ &boundary, NULL);
++ &cb, NULL);
+ revs->boundary = 0;
+ revs->blob_objects = tmp_blobs;
+ revs->tree_objects = tmp_trees;
@@ pack-bitmap.c: static struct bitmap *fill_in_bitmap(struct bitmap_index *bitmap_
+ clear_object_flags(UNINTERESTING);
+ trace2_region_leave("pack-bitmap", "boundary-traverse", the_repository);
+
++ /*
++ * Then add the boundary commit(s) as fill-in traversal tips.
++ */
+ trace2_region_enter("pack-bitmap", "boundary-fill-in", the_repository);
-+ if (boundary.nr) {
++ if (cb.boundary.nr) {
+ struct object *obj;
+ int needs_walk = 0;
+
-+ for (i = 0; i < boundary.nr; i++) {
-+ obj = boundary.objects[i].item;
++ for (i = 0; i < cb.boundary.nr; i++) {
++ obj = cb.boundary.objects[i].item;
+
-+ if (obj_in_bitmap(bitmap_git, obj, base)) {
++ if (obj_in_bitmap(bitmap_git, obj, cb.base)) {
+ obj->flags |= SEEN;
+ } else {
+ add_pending_object(revs, obj, "");
@@ pack-bitmap.c: static struct bitmap *fill_in_bitmap(struct bitmap_index *bitmap_
+ }
+
+ if (needs_walk)
-+ base = fill_in_bitmap(bitmap_git, revs, base, NULL);
++ cb.base = fill_in_bitmap(bitmap_git, revs, cb.base, NULL);
+ }
+ trace2_region_leave("pack-bitmap", "boundary-fill-in", the_repository);
+
+
+cleanup:
+ revs->ignore_missing_links = 0;
-+ revs->collect_uninteresting = 0;
+
-+ return base;
++ return cb.base;
+}
+
static struct bitmap *find_objects(struct bitmap_index *bitmap_git,
@@ pack-bitmap.c: struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs,
if (!wants)
goto cleanup;
@@ pack-bitmap.c: struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs,
- if (load_bitmap(bitmap_git) < 0)
+ if (load_bitmap(revs->repo, bitmap_git) < 0)
goto cleanup;
- object_array_clear(&revs->pending);
--
2.40.1.478.gcab26587e8
next prev parent reply other threads:[~2023-05-05 17:27 UTC|newest]
Thread overview: 51+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-04-25 0:00 [PATCH 0/3] pack-bitmap: boundary-based bitmap traversal Taylor Blau
2023-04-25 0:00 ` [PATCH 1/3] revision: support tracking uninteresting commits Taylor Blau
2023-04-25 18:15 ` Derrick Stolee
2023-05-03 21:48 ` Taylor Blau
2023-05-04 13:46 ` Derrick Stolee
2023-05-03 22:08 ` Taylor Blau
2023-05-04 13:59 ` Derrick Stolee
2023-05-05 17:30 ` Taylor Blau
2023-04-25 18:22 ` Junio C Hamano
2023-04-25 18:48 ` Taylor Blau
2023-04-25 0:00 ` [PATCH 2/3] pack-bitmap.c: extract `fill_in_bitmap()` Taylor Blau
2023-04-25 18:32 ` Junio C Hamano
2023-04-25 18:51 ` [PATCH 2/3] pack-bitmap.c: extract `fill_in_bitmap()`t Taylor Blau
2023-04-25 0:00 ` [PATCH 3/3] pack-bitmap.c: use commit boundary during bitmap traversal Taylor Blau
2023-04-25 18:52 ` Junio C Hamano
2023-05-02 21:31 ` Felipe Contreras
2023-05-03 21:42 ` Taylor Blau
2023-05-03 21:58 ` Junio C Hamano
2023-04-25 18:53 ` Derrick Stolee
2023-04-25 18:02 ` [PATCH 0/3] pack-bitmap: boundary-based " Junio C Hamano
2023-04-25 18:30 ` Derrick Stolee
2023-04-25 18:57 ` Taylor Blau
2023-04-25 19:52 ` Derrick Stolee
2023-05-03 21:43 ` Taylor Blau
2023-04-25 18:06 ` Derrick Stolee
2023-04-25 19:01 ` Taylor Blau
2023-04-25 20:27 ` Derrick Stolee
2023-05-01 22:08 ` Junio C Hamano
2023-05-02 23:52 ` Taylor Blau
2023-05-05 17:27 ` Taylor Blau [this message]
2023-05-05 17:27 ` [PATCH v2 1/2] pack-bitmap.c: extract `fill_in_bitmap()` Taylor Blau
2023-05-05 17:27 ` [PATCH v2 2/2] pack-bitmap.c: use commit boundary during bitmap traversal Taylor Blau
2023-05-05 18:13 ` Derrick Stolee
2023-05-05 18:43 ` Taylor Blau
2023-05-05 17:33 ` [PATCH v2 0/2] pack-bitmap: boundary-based " Junio C Hamano
2023-05-05 17:59 ` Derrick Stolee
2023-05-05 18:46 ` [PATCH v2 0/2] pack-bitmap: boundary-based bitmap traversalt Taylor Blau
2023-05-05 20:45 ` Derrick Stolee
2023-05-08 17:38 ` [PATCH v3 0/3] pack-bitmap: boundary-based bitmap traversal Taylor Blau
2023-05-08 17:38 ` [PATCH v3 1/3] object: add object_array initializer helper function Taylor Blau
2023-05-08 17:38 ` [PATCH v3 2/3] pack-bitmap.c: extract `fill_in_bitmap()` Taylor Blau
2023-05-08 17:38 ` [PATCH v3 3/3] pack-bitmap.c: use commit boundary during bitmap traversal Taylor Blau
2023-05-08 20:53 ` Derrick Stolee
2023-05-08 22:12 ` Taylor Blau
2023-05-10 22:55 ` [PATCH v3 0/3] pack-bitmap: boundary-based " Junio C Hamano
2023-05-10 23:10 ` Taylor Blau
2023-05-11 15:23 ` Derrick Stolee
2023-06-08 16:25 ` [PATCH v4 " Taylor Blau
2023-06-08 16:25 ` [PATCH v4 1/3] object: add object_array initializer helper function Taylor Blau
2023-06-08 16:25 ` [PATCH v4 2/3] pack-bitmap.c: extract `fill_in_bitmap()` Taylor Blau
2023-06-08 16:25 ` [PATCH v4 3/3] pack-bitmap.c: use commit boundary during bitmap traversal Taylor Blau
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cover.1683307620.git.me@ttaylorr.com \
--to=me@ttaylorr.com \
--cc=derrickstolee@github.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).