From: Derrick Stolee <derrickstolee@github.com>
To: Taylor Blau <me@ttaylorr.com>
Cc: git@vger.kernel.org, Jeff King <peff@peff.net>,
Junio C Hamano <gitster@pobox.com>
Subject: Re: [PATCH 0/3] pack-bitmap: boundary-based bitmap traversal
Date: Tue, 25 Apr 2023 16:27:21 -0400 [thread overview]
Message-ID: <aa162f0f-022d-7216-51c9-c53884362628@github.com> (raw)
In-Reply-To: <ZEgjiAtK8O0+dbht@nand.local>
On 4/25/2023 3:01 PM, Taylor Blau wrote:
> On Tue, Apr 25, 2023 at 02:06:25PM -0400, Derrick Stolee wrote:
>> For my curiosity, and since you already have a test environment set up,
>> could you redo the "without bitmaps" case with pack.useSparse true and
>> false? When the option was created and defaulted to true, we never
>> really explored comparing it to the bitmap case. In fact, I assumed the
>> bitmap case was faster in important cases like this (where there is a
>> substantial difference in object counts), but your data is surprising me
>> that the sparse algorithm is outperforming bitmaps, even with this new
>> algorithm.
>>
>> The main question I'd like to ask is: is pack.useSparse contributing
>> in an important way to the performance here?
>
> I don't know enough about pack.useSparse to say with certainty, but
> trying again on the same repository (which is reasonably well-packed at
> the moment), they appear about the same:
Thanks for running that. I apologize, but I didn't sync in my head
that you're purposefully using 'git rev-list' to do the walk and the
config only makes a difference for 'git pack-objects'. The suspiciously
close timings is indeed too suspicious. Thus, the performance numbers
you are considering are not identical to what would happen in 'git
pack-objects' by default.
So, I created my own example using git.git, which included this input:
refs/remotes/origin/master
refs/remotes/origin/maint
refs/remotes/origin/next
refs/remotes/origin/seen
^refs/remotes/origin/master~1000
And there is a clear preference for pack.useSparse=false in this
case:
Benchmark 1: GIT_TRACE2_PERF="/home/stolee/_git/git/git-review/trace-true.txt" ~/_git/git/git-review/git -c pack.useSparse=true -c pack.useBitmaps=false pack-objects --stdout --revs <in >/dev/null
Time (mean ± σ): 2.857 s ± 0.031 s [User: 6.850 s, System: 0.378 s]
Range (min … max): 2.814 s … 2.922 s 10 runs
Benchmark 2: GIT_TRACE2_PERF="/home/stolee/_git/git/git-review/trace-false.txt" ~/_git/git/git-review/git -c pack.useSparse=false -c pack.useBitmaps=false pack-objects --stdout --revs <in >/dev/null
Time (mean ± σ): 2.682 s ± 0.029 s [User: 6.678 s, System: 0.388 s]
Range (min … max): 2.653 s … 2.753 s 10 runs
and I used the perf traces to extract that the enumerate objects
phase took ~550ms for the sparse walk and ~375ms for the non-sparse
version.
But using this same input, I'm able to construct an example where
your modified bitmap walk is faster than the non-sparse object walk:
Benchmark 1: GIT_TRACE2_PERF="/home/stolee/_git/git/git-review/trace-true.txt" ~/_git/git/git-review/git -c pack.useSparse=false -c pack.useBitmaps=true pack-objects --stdout --revs <in >/dev/null
Time (mean ± σ): 2.045 s ± 0.039 s [User: 6.464 s, System: 0.334 s]
Range (min … max): 1.966 s … 2.089 s 10 runs
Benchmark 2: GIT_TRACE2_PERF="/home/stolee/_git/git/git-review/trace-false.txt" ~/_git/git/git-review/git -c pack.useSparse=false -c pack.useBitmaps=false pack-objects --stdout --revs <in >/dev/null
Time (mean ± σ): 2.138 s ± 0.034 s [User: 6.070 s, System: 0.333 s]
Range (min … max): 2.058 s … 2.182 s 10 runs
(Here, the enumerate objects phase takes ~230ms on average.)
I can easily swap this by changing my negative ref to
^revs/remotes/origin/master
Benchmark 1: GIT_TRACE2_PERF="/home/stolee/_git/git/git-review/trace-true.txt" ~/_git/git/git-review/git -c pack.useSparse=false -c pack.useBitmaps=true pack-objects --stdout --revs <in >/dev/null
Time (mean ± σ): 521.0 ms ± 28.8 ms [User: 1190.9 ms, System: 103.6 ms]
Range (min … max): 496.5 ms … 589.8 ms 10 runs
Benchmark 2: GIT_TRACE2_PERF="/home/stolee/_git/git/git-review/trace-false.txt" ~/_git/git/git-review/git -c pack.useSparse=false -c pack.useBitmaps=false pack-objects --stdout --revs <in >/dev/null
Time (mean ± σ): 498.5 ms ± 13.2 ms [User: 1089.1 ms, System: 123.3 ms]
Range (min … max): 471.3 ms … 513.6 ms 10 runs
which gives us reason to think about "small fetches" don't need
bitmaps, but "big fetches" do. But that can be done separately
from this patch.
Context about the sparse walk:
The sparse walk uses the names of the paths with respect to the root
trees as a way to walk only the "new" objects without walking the
entire object set at the boundary. This makes a big difference in
repos with many files at HEAD, but the risk is that those path names
become significant overhead if there are enough new changes spread
across a large fraction of the tree. For clients pushing the work
of a single developer, the sparse algorithm outperforms the default
walk.
I wrote about this years ago in [1] and almost brought it up earlier
as a description of the commit boundary (called the frontier in [1]),
but the sparse object walk is helpful to visualize.
[1] https://devblogs.microsoft.com/devops/exploring-new-frontiers-for-git-push-performance/
Thanks,
-Stolee
next prev parent reply other threads:[~2023-04-25 20:27 UTC|newest]
Thread overview: 51+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-04-25 0:00 [PATCH 0/3] pack-bitmap: boundary-based bitmap traversal Taylor Blau
2023-04-25 0:00 ` [PATCH 1/3] revision: support tracking uninteresting commits Taylor Blau
2023-04-25 18:15 ` Derrick Stolee
2023-05-03 21:48 ` Taylor Blau
2023-05-04 13:46 ` Derrick Stolee
2023-05-03 22:08 ` Taylor Blau
2023-05-04 13:59 ` Derrick Stolee
2023-05-05 17:30 ` Taylor Blau
2023-04-25 18:22 ` Junio C Hamano
2023-04-25 18:48 ` Taylor Blau
2023-04-25 0:00 ` [PATCH 2/3] pack-bitmap.c: extract `fill_in_bitmap()` Taylor Blau
2023-04-25 18:32 ` Junio C Hamano
2023-04-25 18:51 ` [PATCH 2/3] pack-bitmap.c: extract `fill_in_bitmap()`t Taylor Blau
2023-04-25 0:00 ` [PATCH 3/3] pack-bitmap.c: use commit boundary during bitmap traversal Taylor Blau
2023-04-25 18:52 ` Junio C Hamano
2023-05-02 21:31 ` Felipe Contreras
2023-05-03 21:42 ` Taylor Blau
2023-05-03 21:58 ` Junio C Hamano
2023-04-25 18:53 ` Derrick Stolee
2023-04-25 18:02 ` [PATCH 0/3] pack-bitmap: boundary-based " Junio C Hamano
2023-04-25 18:30 ` Derrick Stolee
2023-04-25 18:57 ` Taylor Blau
2023-04-25 19:52 ` Derrick Stolee
2023-05-03 21:43 ` Taylor Blau
2023-04-25 18:06 ` Derrick Stolee
2023-04-25 19:01 ` Taylor Blau
2023-04-25 20:27 ` Derrick Stolee [this message]
2023-05-01 22:08 ` Junio C Hamano
2023-05-02 23:52 ` Taylor Blau
2023-05-05 17:27 ` [PATCH v2 0/2] " Taylor Blau
2023-05-05 17:27 ` [PATCH v2 1/2] pack-bitmap.c: extract `fill_in_bitmap()` Taylor Blau
2023-05-05 17:27 ` [PATCH v2 2/2] pack-bitmap.c: use commit boundary during bitmap traversal Taylor Blau
2023-05-05 18:13 ` Derrick Stolee
2023-05-05 18:43 ` Taylor Blau
2023-05-05 17:33 ` [PATCH v2 0/2] pack-bitmap: boundary-based " Junio C Hamano
2023-05-05 17:59 ` Derrick Stolee
2023-05-05 18:46 ` [PATCH v2 0/2] pack-bitmap: boundary-based bitmap traversalt Taylor Blau
2023-05-05 20:45 ` Derrick Stolee
2023-05-08 17:38 ` [PATCH v3 0/3] pack-bitmap: boundary-based bitmap traversal Taylor Blau
2023-05-08 17:38 ` [PATCH v3 1/3] object: add object_array initializer helper function Taylor Blau
2023-05-08 17:38 ` [PATCH v3 2/3] pack-bitmap.c: extract `fill_in_bitmap()` Taylor Blau
2023-05-08 17:38 ` [PATCH v3 3/3] pack-bitmap.c: use commit boundary during bitmap traversal Taylor Blau
2023-05-08 20:53 ` Derrick Stolee
2023-05-08 22:12 ` Taylor Blau
2023-05-10 22:55 ` [PATCH v3 0/3] pack-bitmap: boundary-based " Junio C Hamano
2023-05-10 23:10 ` Taylor Blau
2023-05-11 15:23 ` Derrick Stolee
2023-06-08 16:25 ` [PATCH v4 " Taylor Blau
2023-06-08 16:25 ` [PATCH v4 1/3] object: add object_array initializer helper function Taylor Blau
2023-06-08 16:25 ` [PATCH v4 2/3] pack-bitmap.c: extract `fill_in_bitmap()` Taylor Blau
2023-06-08 16:25 ` [PATCH v4 3/3] pack-bitmap.c: use commit boundary during bitmap traversal Taylor Blau
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aa162f0f-022d-7216-51c9-c53884362628@github.com \
--to=derrickstolee@github.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=me@ttaylorr.com \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).