Git Mailing List Archive mirror
 help / color / mirror / Atom feed
From: Jonathan Tan <jonathantanmy@google.com>
To: git@vger.kernel.org
Cc: Jonathan Tan <jonathantanmy@google.com>,
	Junio C Hamano <gitster@pobox.com>, Taylor Blau <me@ttaylorr.com>
Subject: [PATCH v7 0/7] Changed path filter hash fix and version bump
Date: Tue,  1 Aug 2023 11:41:35 -0700	[thread overview]
Message-ID: <cover.1690912539.git.jonathantanmy@google.com> (raw)
In-Reply-To: <cover.1684790529.git.jonathantanmy@google.com>

I've fixed the bug that Taylor described. It was an issue where the
presence of Bloom filters can be indicated both by the presence of the
chunk and the presence of a bloom_filter_settings, and I've fixed it by
avoiding setting the chunk_bloom_data if we're not using Bloom filters
due to an incompatible version. In the future, we might want to refactor
the code so that there is only one way to indicate whether the Bloom
filters are present.

I've also added sign-offs and changed the indentation of the tests, as
remarked by Taylor [1] [2].

[1] https://lore.kernel.org/git/ZMGruenDbAo22aWV@nand.local/
[2] https://lore.kernel.org/git/ZMGsJTxBXZ94lhMU@nand.local/

Taylor also suggested copying forward Bloom filters whenever possible
in this patch set [3], but also that we could work on this outside this
series [4]. I did not implement this in this series.

[3] https://lore.kernel.org/git/ZMKvsObx+uaKA8zF@nand.local/
[4] https://lore.kernel.org/git/ZMlKMmAs3wKULAOd@nand.local/

Jonathan Tan (4):
  gitformat-commit-graph: describe version 2 of BDAT
  t4216: test changed path filters with high bit paths
  repo-settings: introduce commitgraph.changedPathsVersion
  commit-graph: new filter ver. that fixes murmur3

Taylor Blau (3):
  t/helper/test-read-graph.c: extract `dump_graph_info()`
  bloom.h: make `load_bloom_filter_from_graph()` public
  t/helper/test-read-graph: implement `bloom-filters` mode

 Documentation/config/commitgraph.txt     |  26 ++++-
 Documentation/gitformat-commit-graph.txt |   9 +-
 bloom.c                                  |  75 +++++++++++-
 bloom.h                                  |  13 ++-
 commit-graph.c                           |  35 ++++--
 oss-fuzz/fuzz-commit-graph.c             |   2 +-
 repo-settings.c                          |   6 +-
 repository.h                             |   2 +-
 t/helper/test-bloom.c                    |   9 +-
 t/helper/test-read-graph.c               |  67 ++++++++---
 t/t0095-bloom.sh                         |   8 ++
 t/t4216-log-bloom.sh                     | 139 +++++++++++++++++++++++
 12 files changed, 351 insertions(+), 40 deletions(-)

Range-diff against v6:
1:  3ce6090a4d = 1:  3ce6090a4d gitformat-commit-graph: describe version 2 of BDAT
2:  1955734d1f ! 2:  fc6346c039 t/helper/test-read-graph.c: extract `dump_graph_info()`
    @@ Commit message
         main routine into a separate function.
     
         Signed-off-by: Taylor Blau <me@ttaylorr.com>
    +    Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
     
      ## t/helper/test-read-graph.c ##
     @@
3:  4cf7c2f634 ! 3:  f144dc4b15 bloom.h: make `load_bloom_filter_from_graph()` public
    @@ Commit message
         for manual inspection (to be used during tests).
     
         Signed-off-by: Taylor Blau <me@ttaylorr.com>
    +    Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
     
      ## bloom.c ##
     @@ bloom.c: static inline unsigned char get_bitmask(uint32_t pos)
4:  47b55758e6 ! 4:  2ade832a23 t/helper/test-read-graph: implement `bloom-filters` mode
    @@ Commit message
         hexadecimal contents of the Bloom filter(s) contained in a commit-graph.
     
         Signed-off-by: Taylor Blau <me@ttaylorr.com>
    +    Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
     
      ## t/helper/test-read-graph.c ##
     @@ t/helper/test-read-graph.c: static void dump_graph_info(struct commit_graph *graph)
5:  5276e6a90e ! 5:  74863c11e5 t4216: test changed path filters with high bit paths
    @@ t/t4216-log-bloom.sh: test_expect_success 'Bloom generation backfills empty comm
     +'
     +
     +test_expect_success 'setup check value of version 1 changed-path' '
    -+	(cd highbit1 &&
    ++	(
    ++		cd highbit1 &&
     +		echo "52a9" >expect &&
     +		get_first_changed_path_filter >actual &&
    -+		test_cmp expect actual)
    ++		test_cmp expect actual
    ++	)
     +'
     +
     +# expect will not match actual if char is unsigned by default. Write the test
    @@ t/t4216-log-bloom.sh: test_expect_success 'Bloom generation backfills empty comm
     +'
     +
     +test_expect_success 'version 1 changed-path used when version 1 requested' '
    -+	(cd highbit1 &&
    -+		test_bloom_filters_used "-- $CENT")
    ++	(
    ++		cd highbit1 &&
    ++		test_bloom_filters_used "-- $CENT"
    ++	)
     +'
     +
      test_done
6:  dc3f6d2d4f = 6:  60f4faeff9 repo-settings: introduce commitgraph.changedPathsVersion
7:  6e2d797406 ! 7:  68258cfd04 commit-graph: new filter ver. that fixes murmur3
    @@ commit-graph.c: static int graph_read_oid_lookup(const unsigned char *chunk_star
     +	struct graph_read_bloom_data_context *c = data;
     +	struct commit_graph *g = c->g;
      	uint32_t hash_version;
    - 	g->chunk_bloom_data = chunk_start;
    +-	g->chunk_bloom_data = chunk_start;
      	hash_version = get_be32(chunk_start);
      
     -	if (hash_version != 1)
    @@ commit-graph.c: static int graph_read_oid_lookup(const unsigned char *chunk_star
     + 		return 0;
     +	}
      
    ++	g->chunk_bloom_data = chunk_start;
      	g->bloom_filter_settings = xmalloc(sizeof(struct bloom_filter_settings));
      	g->bloom_filter_settings->hash_version = hash_version;
    + 	g->bloom_filter_settings->num_hashes = get_be32(chunk_start + 4);
     @@ commit-graph.c: struct commit_graph *parse_commit_graph(struct repo_settings *s,
      	}
      
    @@ t/t0095-bloom.sh: test_expect_success 'compute unseeded murmur3 hash for test st
      	Hashes:0x5615800c|0x5b966560|0x61174ab4|0x66983008|0x6c19155c|0x7199fab0|0x771ae004|
     
      ## t/t4216-log-bloom.sh ##
    -@@ t/t4216-log-bloom.sh: test_expect_success 'set up repo with high bit path, version 1 changed-path' '
    - 	git -C highbit1 commit-graph write --reachable --changed-paths
    - '
    - 
    --test_expect_success 'setup check value of version 1 changed-path' '
    -+test_expect_success 'check value of version 1 changed-path' '
    - 	(cd highbit1 &&
    - 		echo "52a9" >expect &&
    - 		get_first_changed_path_filter >actual &&
     @@ t/t4216-log-bloom.sh: test_expect_success 'version 1 changed-path used when version 1 requested' '
    - 		test_bloom_filters_used "-- $CENT")
    + 	)
      '
      
     +test_expect_success 'version 1 changed-path not used when version 2 requested' '
    -+	(cd highbit1 &&
    ++	(
    ++		cd highbit1 &&
     +		git config --add commitgraph.changedPathsVersion 2 &&
    -+		test_bloom_filters_not_used "-- $CENT")
    ++		test_bloom_filters_not_used "-- $CENT"
    ++	)
     +'
     +
     +test_expect_success 'version 1 changed-path used when autodetect requested' '
    -+	(cd highbit1 &&
    ++	(
    ++		cd highbit1 &&
     +		git config --add commitgraph.changedPathsVersion -1 &&
    -+		test_bloom_filters_used "-- $CENT")
    ++		test_bloom_filters_used "-- $CENT"
    ++	)
     +'
     +
     +test_expect_success 'when writing another commit graph, preserve existing version 1 of changed-path' '
     +	test_commit -C highbit1 c1double "$CENT$CENT" &&
     +	git -C highbit1 commit-graph write --reachable --changed-paths &&
    -+	(cd highbit1 &&
    ++	(
    ++		cd highbit1 &&
     +		git config --add commitgraph.changedPathsVersion -1 &&
     +		echo "options: bloom(1,10,7) read_generation_data" >expect &&
     +		test-tool read-graph >full &&
     +		grep options full >actual &&
    -+		test_cmp expect actual)
    ++		test_cmp expect actual
    ++	)
     +'
     +
     +test_expect_success 'set up repo with high bit path, version 2 changed-path' '
    @@ t/t4216-log-bloom.sh: test_expect_success 'version 1 changed-path used when vers
     +'
     +
     +test_expect_success 'check value of version 2 changed-path' '
    -+	(cd highbit2 &&
    ++	(
    ++		cd highbit2 &&
     +		echo "c01f" >expect &&
     +		get_first_changed_path_filter >actual &&
    -+		test_cmp expect actual)
    ++		test_cmp expect actual
    ++	)
     +'
     +
     +test_expect_success 'version 2 changed-path used when version 2 requested' '
    -+	(cd highbit2 &&
    -+		test_bloom_filters_used "-- $CENT")
    ++	(
    ++		cd highbit2 &&
    ++		test_bloom_filters_used "-- $CENT"
    ++	)
     +'
     +
     +test_expect_success 'version 2 changed-path not used when version 1 requested' '
    -+	(cd highbit2 &&
    ++	(
    ++		cd highbit2 &&
     +		git config --add commitgraph.changedPathsVersion 1 &&
    -+		test_bloom_filters_not_used "-- $CENT")
    ++		test_bloom_filters_not_used "-- $CENT"
    ++	)
     +'
     +
     +test_expect_success 'version 2 changed-path used when autodetect requested' '
    -+	(cd highbit2 &&
    ++	(
    ++		cd highbit2 &&
     +		git config --add commitgraph.changedPathsVersion -1 &&
    -+		test_bloom_filters_used "-- $CENT")
    ++		test_bloom_filters_used "-- $CENT"
    ++	)
     +'
     +
     +test_expect_success 'when writing another commit graph, preserve existing version 2 of changed-path' '
     +	test_commit -C highbit2 c2double "$CENT$CENT" &&
     +	git -C highbit2 commit-graph write --reachable --changed-paths &&
    -+	(cd highbit2 &&
    ++	(
    ++		cd highbit2 &&
     +		git config --add commitgraph.changedPathsVersion -1 &&
     +		echo "options: bloom(2,10,7) read_generation_data" >expect &&
     +		test-tool read-graph >full &&
     +		grep options full >actual &&
    -+		test_cmp expect actual)
    ++		test_cmp expect actual
    ++	)
    ++'
    ++
    ++test_expect_success 'when writing commit graph, do not reuse changed-path of another version' '
    ++	git init doublewrite &&
    ++	test_commit -C doublewrite c "$CENT" &&
    ++	git -C doublewrite config --add commitgraph.changedPathsVersion 1 &&
    ++	git -C doublewrite commit-graph write --reachable --changed-paths &&
    ++	git -C doublewrite config --add commitgraph.changedPathsVersion 2 &&
    ++	git -C doublewrite commit-graph write --reachable --changed-paths &&
    ++	(
    ++		cd doublewrite &&
    ++		echo "c01f" >expect &&
    ++		get_first_changed_path_filter >actual &&
    ++		test_cmp expect actual
    ++	)
     +'
     +
      test_done
-- 
2.41.0.585.gd2178a4bd4-goog


  parent reply	other threads:[~2023-08-01 18:42 UTC|newest]

Thread overview: 116+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-22 21:48 [PATCH 0/2] Changed path filter hash fix and version bump Jonathan Tan
2023-05-22 21:48 ` [PATCH 1/2] t4216: test wrong bloom filter version rejection Jonathan Tan
2023-05-22 21:48 ` [PATCH 2/2] commit-graph: fix murmur3, bump filter ver. to 2 Jonathan Tan
2023-05-23 13:00   ` Derrick Stolee
2023-05-23 23:00     ` Jonathan Tan
2023-05-23 23:51     ` Junio C Hamano
2023-05-24 21:26       ` Jonathan Tan
2023-05-26 13:19         ` Derrick Stolee
2023-05-30 17:26           ` Jonathan Tan
2023-05-23  4:42 ` [PATCH 0/2] Changed path filter hash fix and version bump Junio C Hamano
2023-05-31 23:12 ` [PATCH v2 0/3] " Jonathan Tan
2023-05-31 23:12   ` [PATCH v2 1/3] t4216: test changed path filters with high bit paths Jonathan Tan
2023-05-31 23:12   ` [PATCH v2 2/3] repo-settings: introduce commitgraph.changedPathsVersion Jonathan Tan
2023-05-31 23:12   ` [PATCH v2 3/3] commit-graph: new filter ver. that fixes murmur3 Jonathan Tan
2023-06-03  1:01   ` [PATCH v2 0/3] Changed path filter hash fix and version bump Junio C Hamano
2023-06-03  2:24     ` Junio C Hamano
2023-06-07 16:30       ` Jonathan Tan
2023-06-07 21:37         ` Jonathan Tan
2023-06-08 19:21 ` [PATCH v3 0/4] " Jonathan Tan
2023-06-08 19:21   ` [PATCH v3 1/4] gitformat-commit-graph: describe version 2 of BDAT Jonathan Tan
2023-06-08 19:52     ` Ramsay Jones
2023-06-12 21:26       ` Junio C Hamano
2023-06-08 19:21   ` [PATCH v3 2/4] t4216: test changed path filters with high bit paths Jonathan Tan
2023-06-08 19:21   ` [PATCH v3 3/4] repo-settings: introduce commitgraph.changedPathsVersion Jonathan Tan
2023-06-08 19:21   ` [PATCH v3 4/4] commit-graph: new filter ver. that fixes murmur3 Jonathan Tan
2023-06-08 19:50   ` [PATCH v3 0/4] Changed path filter hash fix and version bump Ramsay Jones
2023-06-09  0:08     ` Jonathan Tan
2023-06-12 21:31     ` Junio C Hamano
2023-06-13 17:16       ` Jonathan Tan
2023-06-13 17:29         ` [PATCH] CodingGuidelines: use octal escapes, not hex Jonathan Tan
2023-06-13 18:16           ` Eric Sunshine
2023-06-13 18:43             ` Jonathan Tan
2023-06-13 19:15               ` Eric Sunshine
2023-06-13 19:29                 ` Junio C Hamano
2023-06-13 19:16         ` [PATCH v3 0/4] Changed path filter hash fix and version bump Junio C Hamano
2023-06-13 17:39 ` [PATCH v4 " Jonathan Tan
2023-06-13 17:39   ` [PATCH v4 1/4] gitformat-commit-graph: describe version 2 of BDAT Jonathan Tan
2023-06-13 21:58     ` Junio C Hamano
2023-06-20 13:22       ` Derrick Stolee
2023-06-21 12:08       ` Taylor Blau
2023-06-22 22:26         ` Jonathan Tan
2023-06-23 13:05           ` Derrick Stolee
2023-06-13 17:39   ` [PATCH v4 2/4] t4216: test changed path filters with high bit paths Jonathan Tan
2023-06-13 17:39   ` [PATCH v4 3/4] repo-settings: introduce commitgraph.changedPathsVersion Jonathan Tan
2023-06-20 13:28     ` Derrick Stolee
2023-06-21 12:14     ` Taylor Blau
2023-06-13 17:39   ` [PATCH v4 4/4] commit-graph: new filter ver. that fixes murmur3 Jonathan Tan
2023-06-20 13:39     ` Derrick Stolee
2023-06-20 18:37       ` Junio C Hamano
2023-06-13 19:21   ` [PATCH v4 0/4] Changed path filter hash fix and version bump Junio C Hamano
2023-06-20 13:43     ` Derrick Stolee
2023-06-20 21:56       ` Jonathan Tan
2023-06-21 12:19       ` Taylor Blau
2023-06-21 17:53         ` Derrick Stolee
2023-06-22 22:27           ` Jonathan Tan
2023-06-23 13:18             ` Derrick Stolee
2023-07-13 21:42 ` [PATCH v5 " Jonathan Tan
2023-07-13 21:42   ` [PATCH v5 1/4] gitformat-commit-graph: describe version 2 of BDAT Jonathan Tan
2023-07-19 17:25     ` Taylor Blau
2023-07-20 20:20       ` Jonathan Tan
2023-07-21  1:38         ` Taylor Blau
2023-07-13 21:42   ` [PATCH v5 2/4] t4216: test changed path filters with high bit paths Jonathan Tan
2023-07-13 22:50     ` Junio C Hamano
2023-07-19 17:27       ` Taylor Blau
2023-07-19 17:55         ` [PATCH 0/4] commit-graph: avoid looking at Bloom filter data directly Taylor Blau
2023-07-19 17:55           ` [PATCH 1/4] t/helper/test-read-graph.c: extract `dump_graph_info()` Taylor Blau
2023-07-19 17:55           ` [PATCH 2/4] bloom.h: make `load_bloom_filter_from_graph()` public Taylor Blau
2023-07-19 17:55           ` [PATCH 3/4] t/helper/test-read-graph: implement `bloom-filters` mode Taylor Blau
2023-07-19 17:55           ` [PATCH 4/4] fixup! t4216: test changed path filters with high bit paths Taylor Blau
2023-07-19 19:24           ` [PATCH 0/4] commit-graph: avoid looking at Bloom filter data directly Junio C Hamano
2023-07-20 20:22           ` Jonathan Tan
2023-07-13 21:42   ` [PATCH v5 3/4] repo-settings: introduce commitgraph.changedPathsVersion Jonathan Tan
2023-07-19 18:10     ` Taylor Blau
2023-07-20 20:42       ` Jonathan Tan
2023-07-20 21:02         ` Taylor Blau
2023-07-13 21:42   ` [PATCH v5 4/4] commit-graph: new filter ver. that fixes murmur3 Jonathan Tan
2023-07-19 18:24     ` Taylor Blau
2023-07-20 21:27       ` Jonathan Tan
2023-07-26 23:32         ` Taylor Blau
2023-07-13 22:16   ` [PATCH v5 0/4] Changed path filter hash fix and version bump Junio C Hamano
2023-07-13 22:59     ` Junio C Hamano
2023-07-14 18:48     ` Jonathan Tan
2023-07-20 21:46 ` [PATCH v6 0/7] " Jonathan Tan
2023-07-20 21:46   ` [PATCH v6 1/7] gitformat-commit-graph: describe version 2 of BDAT Jonathan Tan
2023-07-20 21:46   ` [PATCH v6 2/7] t/helper/test-read-graph.c: extract `dump_graph_info()` Jonathan Tan
2023-07-26 23:26     ` Taylor Blau
2023-07-20 21:46   ` [PATCH v6 3/7] bloom.h: make `load_bloom_filter_from_graph()` public Jonathan Tan
2023-07-20 21:46   ` [PATCH v6 4/7] t/helper/test-read-graph: implement `bloom-filters` mode Jonathan Tan
2023-07-20 21:46   ` [PATCH v6 5/7] t4216: test changed path filters with high bit paths Jonathan Tan
2023-07-26 23:28     ` Taylor Blau
2023-07-20 21:46   ` [PATCH v6 6/7] repo-settings: introduce commitgraph.changedPathsVersion Jonathan Tan
2023-07-20 21:46   ` [PATCH v6 7/7] commit-graph: new filter ver. that fixes murmur3 Jonathan Tan
2023-07-25 20:52   ` [PATCH v6 0/7] Changed path filter hash fix and version bump Junio C Hamano
2023-07-26 20:39   ` Junio C Hamano
2023-07-27  0:17     ` Taylor Blau
2023-07-27  0:49       ` Junio C Hamano
2023-07-27 17:39         ` Jonathan Tan
2023-07-27 17:56           ` Taylor Blau
2023-07-27 20:53             ` Jonathan Tan
2023-08-01 18:08               ` Taylor Blau
2023-08-01 18:52                 ` Jonathan Tan
2023-08-03  0:01                 ` Taylor Blau
2023-08-03 13:18                   ` Derrick Stolee
2023-08-03 18:45                     ` Taylor Blau
2023-07-27 18:44         ` Junio C Hamano
2023-08-01 18:41 ` Jonathan Tan [this message]
2023-08-01 18:41   ` [PATCH v7 1/7] gitformat-commit-graph: describe version 2 of BDAT Jonathan Tan
2023-08-01 18:41   ` [PATCH v7 2/7] t/helper/test-read-graph.c: extract `dump_graph_info()` Jonathan Tan
2023-08-01 18:41   ` [PATCH v7 3/7] bloom.h: make `load_bloom_filter_from_graph()` public Jonathan Tan
2023-08-01 18:41   ` [PATCH v7 4/7] t/helper/test-read-graph: implement `bloom-filters` mode Jonathan Tan
2023-08-01 18:41   ` [PATCH v7 5/7] t4216: test changed path filters with high bit paths Jonathan Tan
2023-08-01 18:41   ` [PATCH v7 6/7] repo-settings: introduce commitgraph.changedPathsVersion Jonathan Tan
2023-08-01 18:41   ` [PATCH v7 7/7] commit-graph: new filter ver. that fixes murmur3 Jonathan Tan
2023-08-01 18:44   ` [PATCH v7 0/7] Changed path filter hash fix and version bump Junio C Hamano
2023-08-01 20:58     ` Taylor Blau
2023-08-01 21:07       ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cover.1690912539.git.jonathantanmy@google.com \
    --to=jonathantanmy@google.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=me@ttaylorr.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).