From: Derrick Stolee <derrickstolee@github.com>
To: Taylor Blau <me@ttaylorr.com>, git@vger.kernel.org
Cc: Jeff King <peff@peff.net>, Junio C Hamano <gitster@pobox.com>
Subject: Re: [PATCH 3/3] pack-bitmap.c: use commit boundary during bitmap traversal
Date: Tue, 25 Apr 2023 14:53:46 -0400 [thread overview]
Message-ID: <a143150d-7cf5-c697-0e48-0f7af1c6de8f@github.com> (raw)
In-Reply-To: <91ed8c70f22dd2c47c60d650323579fc42cc7f2d.1682380788.git.me@ttaylorr.com>
On 4/24/2023 8:00 PM, Taylor Blau wrote:
> When reachability bitmap coverage exists in a repository, Git will use a
> different (and hopefully faster) traversal to compute revision walks.
Before getting into the code...
> Consider a set of positive and negative tips (which we'll refer to with
> their standard bitmap parlance by, "wants", and "haves"). In order to
> figure out what objects exist between the tips, the existing traversal
> in prepare_bitmap_walk() looks something like:
> 4. Construct a reachability bitmap for the "haves" side by walking
> from the revision tips down to any existing bitmaps, OR-ing in any
> bitmaps as they are found.
>
> 5. Then do the same for the "wants" side, stopping at any objects that
> appear in the "haves" bitmap.
One important thing about this older algorithm is that it will avoid
including trees and blobs that are reachable from the "haves" that
are not reachable from the boundary between "haves" and "wants". The
most obvious case is that someone force-pushed a branch after rewording
the commit messages but keeping the trees and blobs identical.
This case is rare, and usually the objects that would be repeated are
low enough in count and size (with deltas) that it's not a big deal, but
it is worth bringing up. I didn't see a reference to this side of the
difference in your message.
> One of the benefits, however, is that even if the walk is slower, bitmap
> traversals are guaranteed to provide an *exact* answer. Unlike the
> traditional object traversal algorithm, which can over-count the results
> by not opening trees for older commits, the bitmap walk builds an exact
> reachability bitmap for either side, meaning the results are never
> over-counted.
So the new algorithm is a new, third state of "not overcounting things
reachable beyond the commit-walk boundary" and "overcounting things
reachable from the wants..haves commit region".
> But producing non-exact results is OK for our traversal here (both in
> the bitmap case and not), as long as the results are over-counted, not
> under.
True, our response will not create a pack-file that cannot be applied
locally, but it might be larger than required.
For these reasons, I'm surprised that this patch completely replaces
the old algorithm for the new one. I would prefer to see a config
option that enables this new algorithm. It would be safer to deploy
that way, too.
> The new result performs significantly better in many cases, particularly
> when the distance from the boundary commit(s) to an existing bitmap is
> shorter than the distance from (all of) the have tips to the nearest
> bitmapped commit.
Aside: we may even gain benefits by adjusting the commits selected for
bitmaps by trying for this data shape. One way to attempt this is to
not focus on refs but on history common to "most refs" using first-
parent history as a good heuristic. If a commit appears in a lot of
first-parent histories, then it is more likely to be helpful to these
kinds of walks.
> Note that when using the old bitmap traversal algorithm, the results can
> be *slower* than without bitmaps! Under the new algorithm, the result is
> computed faster with bitmaps than without (at the cost of over-counting
> the true number of objects in a similar fashion as the non-bitmap
> traversal):
>
> # (Computing objects unique to 'master' among all branches, not
> # using bitmaps).
> $ time git rev-list --count --objects master --not --exclude=master --branches
> 54
I like how you included the output of --count here. It might be interesting
to demonstrate the different forms of overcounting in a carefully constructed
test case, which would help us be sure that we are using the desired
algorithm during a test case.
> real 0m1.012s (no bitmaps)
> real 0m29.571s (bitmaps, old)
> real 0m2.279s (bitmaps, new)
> The new algorithm is still slower than not using bitmaps at all, but it
> is nearly a 13-fold improvement over the existing traversal.
> In a more realistic setting (using my local copy of git.git), I can
> observe a similar speed-up:
...
> Here, the new algorithm is also still slower than not using bitmaps, but
> represents a 4-fold improvement over the existing traversal in a more
> modest example.
Interesting that the new bitmap algorithm is worse in all presented cases.
How "atypical" do we need to be in order for bitmaps to outperform the
tree-parsing algorithm? Do we need to try "--branches --not master~1000"
to replicate a stale single-branch clone getting every latest commit?
(I'll leave code review to a separate reply.)
Thanks,
-Stolee
next prev parent reply other threads:[~2023-04-25 18:55 UTC|newest]
Thread overview: 51+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-04-25 0:00 [PATCH 0/3] pack-bitmap: boundary-based bitmap traversal Taylor Blau
2023-04-25 0:00 ` [PATCH 1/3] revision: support tracking uninteresting commits Taylor Blau
2023-04-25 18:15 ` Derrick Stolee
2023-05-03 21:48 ` Taylor Blau
2023-05-04 13:46 ` Derrick Stolee
2023-05-03 22:08 ` Taylor Blau
2023-05-04 13:59 ` Derrick Stolee
2023-05-05 17:30 ` Taylor Blau
2023-04-25 18:22 ` Junio C Hamano
2023-04-25 18:48 ` Taylor Blau
2023-04-25 0:00 ` [PATCH 2/3] pack-bitmap.c: extract `fill_in_bitmap()` Taylor Blau
2023-04-25 18:32 ` Junio C Hamano
2023-04-25 18:51 ` [PATCH 2/3] pack-bitmap.c: extract `fill_in_bitmap()`t Taylor Blau
2023-04-25 0:00 ` [PATCH 3/3] pack-bitmap.c: use commit boundary during bitmap traversal Taylor Blau
2023-04-25 18:52 ` Junio C Hamano
2023-05-02 21:31 ` Felipe Contreras
2023-05-03 21:42 ` Taylor Blau
2023-05-03 21:58 ` Junio C Hamano
2023-04-25 18:53 ` Derrick Stolee [this message]
2023-04-25 18:02 ` [PATCH 0/3] pack-bitmap: boundary-based " Junio C Hamano
2023-04-25 18:30 ` Derrick Stolee
2023-04-25 18:57 ` Taylor Blau
2023-04-25 19:52 ` Derrick Stolee
2023-05-03 21:43 ` Taylor Blau
2023-04-25 18:06 ` Derrick Stolee
2023-04-25 19:01 ` Taylor Blau
2023-04-25 20:27 ` Derrick Stolee
2023-05-01 22:08 ` Junio C Hamano
2023-05-02 23:52 ` Taylor Blau
2023-05-05 17:27 ` [PATCH v2 0/2] " Taylor Blau
2023-05-05 17:27 ` [PATCH v2 1/2] pack-bitmap.c: extract `fill_in_bitmap()` Taylor Blau
2023-05-05 17:27 ` [PATCH v2 2/2] pack-bitmap.c: use commit boundary during bitmap traversal Taylor Blau
2023-05-05 18:13 ` Derrick Stolee
2023-05-05 18:43 ` Taylor Blau
2023-05-05 17:33 ` [PATCH v2 0/2] pack-bitmap: boundary-based " Junio C Hamano
2023-05-05 17:59 ` Derrick Stolee
2023-05-05 18:46 ` [PATCH v2 0/2] pack-bitmap: boundary-based bitmap traversalt Taylor Blau
2023-05-05 20:45 ` Derrick Stolee
2023-05-08 17:38 ` [PATCH v3 0/3] pack-bitmap: boundary-based bitmap traversal Taylor Blau
2023-05-08 17:38 ` [PATCH v3 1/3] object: add object_array initializer helper function Taylor Blau
2023-05-08 17:38 ` [PATCH v3 2/3] pack-bitmap.c: extract `fill_in_bitmap()` Taylor Blau
2023-05-08 17:38 ` [PATCH v3 3/3] pack-bitmap.c: use commit boundary during bitmap traversal Taylor Blau
2023-05-08 20:53 ` Derrick Stolee
2023-05-08 22:12 ` Taylor Blau
2023-05-10 22:55 ` [PATCH v3 0/3] pack-bitmap: boundary-based " Junio C Hamano
2023-05-10 23:10 ` Taylor Blau
2023-05-11 15:23 ` Derrick Stolee
2023-06-08 16:25 ` [PATCH v4 " Taylor Blau
2023-06-08 16:25 ` [PATCH v4 1/3] object: add object_array initializer helper function Taylor Blau
2023-06-08 16:25 ` [PATCH v4 2/3] pack-bitmap.c: extract `fill_in_bitmap()` Taylor Blau
2023-06-08 16:25 ` [PATCH v4 3/3] pack-bitmap.c: use commit boundary during bitmap traversal Taylor Blau
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=a143150d-7cf5-c697-0e48-0f7af1c6de8f@github.com \
--to=derrickstolee@github.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=me@ttaylorr.com \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).