Git Mailing List Archive mirror
 help / color / mirror / code / Atom feed
* [RFC PATCH] Not computing changed path filter for root commits
@ 2023-09-11 22:31 Jonathan Tan
  2023-09-15 18:25 ` Junio C Hamano
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Jonathan Tan @ 2023-09-11 22:31 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, szeder.dev, me, derrickstolee

This is following the discussion about adding a new changed path filter
version due to the current implementation of murmur3 not matching the
algorithm. [1]

SZEDER Gábor suggested [2] that we change the revision walk to read
changed path filters also for root commits, but I don't think that's
possible - we have to tie reading changed path filters to when we read
trees, and right now, we don't seem to read trees when evaluating root
commits (rev_compare_tree() in revision.c is in the only code path that
uses changed path filters, and it itself is only called per-parent and
thus not called for root commits). The alternative is to not generate
changed path filters for root commits (or what I did in this patch,
which is to generate an all-1 filter), which seems reasonable to me.

Is this a good idea? If yes, there is the follow-up question of how to
report it in traces. I don't know off-hand if it's better to reuse the
"large" statistic, since what we output is the same (all 1s), or if we
should make a new one. Presumably what we need to weigh is the clarity
versus the migration costs, but I don't know how to weigh it.

If people are generally in agreement, I can send an updated patch that
does not goto into an "else" block :) and also with updated tests (t0095
needs to check a non-root commit, and t4216 needs to be updated with
whatever statistics we decide to use).

[1] https://lore.kernel.org/git/20230826150610.GA1928@szeder.dev/
[2] https://lore.kernel.org/git/20230830200218.GA5147@szeder.dev/
---
 bloom.c              |  3 ++-
 t/t0095-bloom.sh     |  2 +-
 t/t4216-log-bloom.sh | 12 +++---------
 3 files changed, 6 insertions(+), 11 deletions(-)

diff --git a/bloom.c b/bloom.c
index aef6b5fea2..b21130b236 100644
--- a/bloom.c
+++ b/bloom.c
@@ -226,7 +226,7 @@ struct bloom_filter *get_or_compute_bloom_filter(struct repository *r,
 	if (c->parents)
 		diff_tree_oid(&c->parents->item->object.oid, &c->object.oid, "", &diffopt);
 	else
-		diff_tree_oid(NULL, &c->object.oid, "", &diffopt);
+		goto large;
 	diffcore_std(&diffopt);
 
 	if (diff_queued_diff.nr <= settings->max_changed_paths) {
@@ -292,6 +292,7 @@ struct bloom_filter *get_or_compute_bloom_filter(struct repository *r,
 	} else {
 		for (i = 0; i < diff_queued_diff.nr; i++)
 			diff_free_filepair(diff_queued_diff.queue[i]);
+large:
 		init_truncated_large_filter(filter);
 
 		if (computed)
diff --git a/t/t0095-bloom.sh b/t/t0095-bloom.sh
index b567383eb8..02a0b41026 100755
--- a/t/t0095-bloom.sh
+++ b/t/t0095-bloom.sh
@@ -74,7 +74,7 @@ test_expect_success !SANITIZE_LEAK 'get bloom filters for commit with no changes
 	git commit --allow-empty -m "c0" &&
 	cat >expect <<-\EOF &&
 	Filter_Length:1
-	Filter_Data:00|
+	Filter_Data:ff|
 	EOF
 	test-tool bloom get_filter_for_commit "$(git rev-parse HEAD)" >actual &&
 	test_cmp expect actual
diff --git a/t/t4216-log-bloom.sh b/t/t4216-log-bloom.sh
index fa9d32facf..d14fe93fb1 100755
--- a/t/t4216-log-bloom.sh
+++ b/t/t4216-log-bloom.sh
@@ -283,7 +283,6 @@ test_expect_success 'correctly report changes over limit' '
 			git commit-graph write --reachable --changed-paths &&
 		test_max_changed_paths 11 trace-update &&
 		test_filter_computed 2 trace-update &&
-		test_filter_trunc_large 0 trace-update &&
 
 		for path in $(git ls-tree -r --name-only HEAD)
 		do
@@ -306,9 +305,7 @@ test_expect_success 'correctly report commits with no changed paths' '
 		GIT_TRACE2_EVENT="$(pwd)/trace.event" \
 			git commit-graph write --reachable --changed-paths &&
 		test_filter_computed 1 trace.event &&
-		test_filter_not_computed 0 trace.event &&
-		test_filter_trunc_empty 1 trace.event &&
-		test_filter_trunc_large 0 trace.event
+		test_filter_not_computed 0 trace.event
 	)
 '
 
@@ -363,8 +360,7 @@ test_expect_success '--max-new-filters overrides configuration' '
 				--max-new-filters=1 &&
 		test_filter_computed 1 trace.event &&
 		test_filter_not_computed 1 trace.event &&
-		test_filter_trunc_empty 0 trace.event &&
-		test_filter_trunc_large 0 trace.event
+		test_filter_trunc_empty 0 trace.event
 	)
 '
 
@@ -386,9 +382,7 @@ test_expect_success 'Bloom generation backfills empty commits' '
 				git commit-graph write --reachable \
 					--changed-paths --max-new-filters=2 &&
 			test_filter_computed 2 trace.event &&
-			test_filter_not_computed 4 trace.event &&
-			test_filter_trunc_empty 2 trace.event &&
-			test_filter_trunc_large 0 trace.event || return 1
+			test_filter_not_computed 4 trace.event || return 1
 		done &&
 
 		# Finally, make sure that once all commits have filters, that
-- 
2.42.0.283.g2d96d420d3-goog


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH] Not computing changed path filter for root commits
  2023-09-11 22:31 [RFC PATCH] Not computing changed path filter for root commits Jonathan Tan
@ 2023-09-15 18:25 ` Junio C Hamano
  2023-09-15 20:29 ` SZEDER Gábor
  2023-09-19 18:21 ` Taylor Blau
  2 siblings, 0 replies; 5+ messages in thread
From: Junio C Hamano @ 2023-09-15 18:25 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, szeder.dev, me, derrickstolee

Jonathan Tan <jonathantanmy@google.com> writes:

> This is following the discussion about adding a new changed path filter
> version due to the current implementation of murmur3 not matching the
> algorithm. [1]
>
> SZEDER Gábor suggested [2] that we change the revision walk to read
> changed path filters also for root commits, but I don't think that's
> possible - we have to tie reading changed path filters to when we read
> trees, and right now, we don't seem to read trees when evaluating root
> commits (rev_compare_tree() in revision.c is in the only code path that
> uses changed path filters, and it itself is only called per-parent and
> thus not called for root commits). The alternative is to not generate
> changed path filters for root commits (or what I did in this patch,
> which is to generate an all-1 filter), which seems reasonable to me.

I know this is a very silly question, but if the filter is not read
for root commits at runtime, does it matter if a filter is created
for them beforehand (or not)?  They will not be read whether if they
exist or not, no?  One observation in the thread [2] appears in was:

    In several of the above test cases test_bloom_filters_used is invoked
    in a repository with only a root commit, so they don't check that
    the output is the same with and without Bloom filters.

i.e. the check would be ineffective with the current system that we
know does not use the filter for a root commit even if it existed.
But would it be an improvement to add a filter to a root commit and
test with the filter enabled and disabled to compare the results, if
we know the filter is not used anyway?

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH] Not computing changed path filter for root commits
  2023-09-11 22:31 [RFC PATCH] Not computing changed path filter for root commits Jonathan Tan
  2023-09-15 18:25 ` Junio C Hamano
@ 2023-09-15 20:29 ` SZEDER Gábor
  2023-09-19 18:19   ` Taylor Blau
  2023-09-19 18:21 ` Taylor Blau
  2 siblings, 1 reply; 5+ messages in thread
From: SZEDER Gábor @ 2023-09-15 20:29 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, me, derrickstolee

On Mon, Sep 11, 2023 at 03:31:56PM -0700, Jonathan Tan wrote:
> SZEDER Gábor suggested [2] that we change the revision walk to read
> changed path filters also for root commits, but I don't think that's
> possible - we have to tie reading changed path filters to when we read
> trees, and right now, we don't seem to read trees when evaluating root
> commits (rev_compare_tree() in revision.c is in the only code path that
> uses changed path filters, and it itself is only called per-parent and
> thus not called for root commits).

When encountering a root commit during a pathspec-limited revision
walk we call rev_same_tree_as_empty() instead of rev_compare_tree().
All that's missing there is checking the Bloom filter and accounting
for false positives.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH] Not computing changed path filter for root commits
  2023-09-15 20:29 ` SZEDER Gábor
@ 2023-09-19 18:19   ` Taylor Blau
  0 siblings, 0 replies; 5+ messages in thread
From: Taylor Blau @ 2023-09-19 18:19 UTC (permalink / raw)
  To: SZEDER Gábor; +Cc: Jonathan Tan, git

On Fri, Sep 15, 2023 at 10:29:12PM +0200, SZEDER Gábor wrote:
> On Mon, Sep 11, 2023 at 03:31:56PM -0700, Jonathan Tan wrote:
> > SZEDER Gábor suggested [2] that we change the revision walk to read
> > changed path filters also for root commits, but I don't think that's
> > possible - we have to tie reading changed path filters to when we read
> > trees, and right now, we don't seem to read trees when evaluating root
> > commits (rev_compare_tree() in revision.c is in the only code path that
> > uses changed path filters, and it itself is only called per-parent and
> > thus not called for root commits).
>
> When encountering a root commit during a pathspec-limited revision
> walk we call rev_same_tree_as_empty() instead of rev_compare_tree().
> All that's missing there is checking the Bloom filter and accounting
> for false positives.

I think that we'd want something like this, though I would definitely
appreciate a second set of eyes since I am not 100% confident in my
set of changes here:

--- 8< ---
diff --git a/revision.c b/revision.c
index 2f4c53ea20..1d36df49e2 100644
--- a/revision.c
+++ b/revision.c
@@ -837,14 +837,24 @@ static int rev_compare_tree(struct rev_info *revs,
 static int rev_same_tree_as_empty(struct rev_info *revs, struct commit *commit)
 {
 	struct tree *t1 = repo_get_commit_tree(the_repository, commit);
+	int bloom_ret = 1;

 	if (!t1)
 		return 0;

+	if (revs->bloom_keys_nr) {
+		bloom_ret = check_maybe_different_in_bloom_filter(revs, commit);
+		if (!bloom_ret)
+			return 1;
+	}
+
 	tree_difference = REV_TREE_SAME;
 	revs->pruning.flags.has_changes = 0;
 	diff_tree_oid(NULL, &t1->object.oid, "", &revs->pruning);

+	if (bloom_ret == 1 && tree_difference == REV_TREE_SAME)
+		count_bloom_filter_false_positive++;
+
 	return tree_difference == REV_TREE_SAME;
 }

diff --git a/t/t4216-log-bloom.sh b/t/t4216-log-bloom.sh
index fa9d32facf..3a45cb997b 100755
--- a/t/t4216-log-bloom.sh
+++ b/t/t4216-log-bloom.sh
@@ -162,7 +162,7 @@ test_expect_success 'setup - add commit-graph to the chain with Bloom filters' '

 test_bloom_filters_used_when_some_filters_are_missing () {
 	log_args=$1
-	bloom_trace_prefix="statistics:{\"filter_not_present\":3,\"maybe\":6,\"definitely_not\":9"
+	bloom_trace_prefix="statistics:{\"filter_not_present\":3,\"maybe\":6,\"definitely_not\":10"
 	setup "$log_args" &&
 	grep -q "$bloom_trace_prefix" "$TRASH_DIRECTORY/trace.perf" &&
 	test_cmp log_wo_bloom log_w_bloom
--- >8 ---

Thanks,
Taylor

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH] Not computing changed path filter for root commits
  2023-09-11 22:31 [RFC PATCH] Not computing changed path filter for root commits Jonathan Tan
  2023-09-15 18:25 ` Junio C Hamano
  2023-09-15 20:29 ` SZEDER Gábor
@ 2023-09-19 18:21 ` Taylor Blau
  2 siblings, 0 replies; 5+ messages in thread
From: Taylor Blau @ 2023-09-19 18:21 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, szeder.dev, derrickstolee

On Mon, Sep 11, 2023 at 03:31:56PM -0700, Jonathan Tan wrote:
> SZEDER Gábor suggested [2] that we change the revision walk to read
> changed path filters also for root commits, but I don't think that's
> possible - we have to tie reading changed path filters to when we read
> trees, and right now, we don't seem to read trees when evaluating root
> commits (rev_compare_tree() in revision.c is in the only code path that
> uses changed path filters, and it itself is only called per-parent and
> thus not called for root commits). The alternative is to not generate
> changed path filters for root commits (or what I did in this patch,
> which is to generate an all-1 filter), which seems reasonable to me.

I think between the two, the all-1's filter is the more sensible choice,
since not computing a filter is typically reserved for blowing past the
`commitGraph.maxNewFilters` setting.

But, I agree with Gábor down-thread that we could instead teach
`rev_same_tree_as_empty()` to be aware of Bloom filters, which I think
would accomplish our goal of reading Bloom filters at the root commit
while not having to tweak their generation.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2023-09-19 18:21 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-09-11 22:31 [RFC PATCH] Not computing changed path filter for root commits Jonathan Tan
2023-09-15 18:25 ` Junio C Hamano
2023-09-15 20:29 ` SZEDER Gábor
2023-09-19 18:19   ` Taylor Blau
2023-09-19 18:21 ` Taylor Blau

Code repositories for project(s) associated with this public inbox

	https://80x24.org/pub/scm/git/git.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).